FRIENDS quote prediction using FRIENDS embeddings (an Udacity Data Science Nanodegree project).
Appart from the common Numpy and Pandas libraries, the project relies on Python 3.* and:
- Keras
- Scikit-learn
- FastText Common Crawl embeddings.
Data comes from here and contains more than 90K FRIENDS lines from all 10 seasons.
Starting from this data, conveniently curated and merged into a single csv file here, the project uses part of the lines available in the data to create an LSTM-based language model for each FRIENDS character starting from pre-traing FastText word embeddings. With these models in hand, a classification model is then created that is able to accurately predict the friend who said a given line.
A more detailed description of the project can be found here.
- Pre-trained language models for each FRIENDS character trained with quotes from Seasons 1-8.
- Jupyter Notebook with code for downloading the data, creating the above mentioned models and training and testing a quote classification model on data from Seasons 9 and 10.
Pre-trained FRIENDS embeddings for each character are provided, but there is code available to train your own, maybe using a different architecture. Running the entire notebook will download FastText vectors, create the language models and train the quote calssifier.
Feel free to use the code here as you would like! Thank you shilpibhattacharyya for making the curated data available.