A SCRIpt-BAsed recommender system for movies. SCRiBa computes the similarity between movies according to their scripts. Each script is approximated with the english subtitles available on OpenSubtitles and downloadable without restrictions with Tor.
Please refer to the report and to the notes for additional details on the algorithm, the pre-processing steps and the evaluation on Netflix data.
Status: Completed
Type: Academic project
Course: Data Mining
Development year(s): 2015-2016
Author(s): gcorsi, ShadowTemplate
Each script is required to complete one pre-processing step. Please refer to the project report to get information about the pipeline.
Clone the repository and install the required Python dependencies:
$ git clone https://github.com/ShadowTemplate/scriba.git
$ cd scriba/
$ pip install --user -r requirements.txt
- Python 3.4 - Programming language
- Python 2.7 - Programming language
- scikit-learn - TF-IDF features extraction, linear kernel
- stem - Anonymous and parallel download with Tor
- Beautiful Soup - Web page scraping
This project is not actively maintained and issues or pull requests may be ignored.
This project is licensed under the GNU GPLv3 license. Please refer to the LICENSE.md file for details.
This README.md complies with this project template. Feel free to adopt it and reuse it.