We created an interface with Flask for a client to search the transcripts of lectures on Coursera. The server ranks a collection of lectures based on the query and serves the updated webpage with the search results. There is also a content-based recommender system.
The dependency metapy
only works on certain versions of Python 3, such as 3.5.10. It is recommended to set up a virtual environment with a version that works with metapy
.
- Clone the repository
git clone https://github.com/IEnjoyEatingCookies/CourseProject.git
- Install dependencies
pip install metapy pytoml flask coursera-dl
If the project already has the dataset included, then this step can be skipped.
- Enter your credentials in
GetTranscript.py
. To get your CAUTH, you must use chrome. Go toChrome Settings > Cookies
and in the dropdown, click https://www.coursera.org/. Then find and clickCopy value CAUTH
. - Download the raw dataset.
python GetTranscripts.py
- Build the dataset. Keep
BuildDataset.py
outside of the folder that contains the scraped data. Running the script createsallData.txt
, which contains all of the video transcript and text files from Coursera.
python BuildDataset.py
- Format the dataset.
python GetLessonTitle.py
Run the flask server.
flask run