Oscars Clustering and Prediction

Background

Many of us have watched the Oscars but have you ever wondered if there was a way to predict which movies would win ahead of time? That's our goal and we have a few different ways to try and do this.

In order to make this happen, we must start with a fairly large dataset of all the Oscar nominees and winners so far. Not sure if you realize this, but it's a lot. Thankfully, we found a dataset on kaggle that has some interesting attributes about each entry but not enough! Therefore, we had to manually scrape a bunch of data off imdb.

Once we collected our data, we had to combined the two documents and clean them. Merging the documents was fairly straight forward but cleaning was tricky. There were many subtleties (having commas in large numbers, abbreviating 1000 as 1K, special characters not translating etc.) but after some time, we cleaned it up nicely.

With our clean dataset, we were able to move on to some really cool clustering and predictive algorithms.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
data		data
output		output
proccess		proccess
scraper		scraper
.gitignore		.gitignore
93rd_test_data.csv		93rd_test_data.csv
Categorize.ipynb		Categorize.ipynb
Decision_Tree.ipynb		Decision_Tree.ipynb
Decision_Tree_Final.ipynb		Decision_Tree_Final.ipynb
Gaussian_Naive_Final.ipynb		Gaussian_Naive_Final.ipynb
Kmeans_get_titles.ipynb		Kmeans_get_titles.ipynb
Knn++_Final.ipynb		Knn++_Final.ipynb
Knn_random_Final.ipynb		Knn_random_Final.ipynb
RBF_SVM.html		RBF_SVM.html
RBF_SVM_update.ipynb		RBF_SVM_update.ipynb
README.md		README.md
RandomForest_V2.ipynb		RandomForest_V2.ipynb
Random_Forest_Final.ipynb		Random_Forest_Final.ipynb
Random_forest.ipynb		Random_forest.ipynb
SVM_RBF_Final.ipynb		SVM_RBF_Final.ipynb
SVM_Sigmoid_Final.ipynb		SVM_Sigmoid_Final.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Oscars Clustering and Prediction

Background

About

Releases

Packages

Contributors 4

Languages

Cameronwood611/datamining

Folders and files

Latest commit

History

Repository files navigation

Oscars Clustering and Prediction

Background

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages