No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
notebooks
src
.env.sample
.gitignore
CS109B_Project_Final_report_Team33.pdf
README.md
requirements.txt
test_environment.py

README.md

CS109B - Final project: Predicting movie genres

Spring 2017, Harvard

Team

  • Angela Ambroz
  • Keun-Hwi Lee
  • Johanna Ramos
  • Pranav Sidhwani

Final deliverables


Guide to our project

We consolidated all the code into several notebooks, which are located here, in the milestone_5 subdirectory. All of the previous milestone notebooks are in their respective directories.

The main update is in how we pre-processed and cleaned the data. We faced some limitations, due to our AWS instance's memory constraints (not storage), and setbacks in the multi-label vs multi-class models.

Note: We did not use the Tensorboard in the final write-up, but the code can be found here. Also the data directory is git-ignored. The notebooks reference this directory for data, but the contents were not logged into source control.

Data

Final dataset (.zip)

Pre-processed data for CNN

JSON format of the data, cleaned

Whole data set, used to create both test and train

Original, uncleaned train set


This project template was based on the cookiecutter data science project template. #cookiecutterdatascience