Athena is a project originally built for HackRU Spring 2022. It algorithmically searches for the most important sentences in a YouTube video, and summarizes the video using these sentences.
Athena's Algorithm Works as Following:
- Athena turns the mp3/mp4/wav of a YouTube video into a string transcript using a Google Cloud API
- It then parses through the transcript and calculates the frequency of each word - counting similar words (play, plays, player, playing, etc.) as the same word.
- It then goes through these words and multiplies the frequency of each word by its correlation (from 0 to 1) to the main topic of the video, assigning this value as the 'score' of each word.
- Then, Athena calculates which sentences contain the highest scoring words, and outputs the top n for the user to read
See YouTube Video in project description for more info
Athena seeks to help students who are on a time-crunch during important study periods. Students are often given small 5 point assignments in classes that are to summarize or make some opinion of a video. Instead of taking 30-45 minutes watching these videos, students can simply use Athena to gain a basic understanding of the video.
- Google Cloud API - Offloaded the speech detection and storage to Google Cloud to increase efficiency
- NTLK - Used to calculate the correlation between various words in our algorithm. Also used to scrape the transcript for garbage words (the, then, before) and mark them as negligible
- Python3
- Updated Framework using HTML
- Optimization of the scoring algorithm to decrease waiting time
- Chrome Extension to work directly in your browser
Won 'Best Usage of Google Cloud' in HackRU Spring 2022 @ Rutgers University