CS109B - Final project: Predicting movie genres

Spring 2017, Harvard


  • Angela Ambroz
  • Keun-Hwi Lee
  • Johanna Ramos
  • Pranav Sidhwani

Final deliverables

Guide to our project

We consolidated all the code into several notebooks, which are located here, in the milestone_5 subdirectory. All of the previous milestone notebooks are in their respective directories.

The main update is in how we pre-processed and cleaned the data. We faced some limitations, due to our AWS instance's memory constraints (not storage), and setbacks in the multi-label vs multi-class models.

Note: We did not use the Tensorboard in the final write-up, but the code can be found here. Also the data directory is git-ignored. The notebooks reference this directory for data, but the contents were not logged into source control.


Final dataset (.zip)

Pre-processed data for CNN

JSON format of the data, cleaned

Whole data set, used to create both test and train

Original, uncleaned train set

This project template was based on the cookiecutter data science project template. #cookiecutterdatascience