Skip to content

JackSnowWolf/EECS-E6690-final-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EECS-E6690-final-project

Final project for EECS E6690 Statistical learning @ Columbia University

Introduction

This is the EECS 6690 final project of Statistical Learning. Our goal is to do a comprehensive review of a machine learning paper, try to reproduce the results of the original paper by recreating all the model using R code, think about new models or methods that can be applied to the dataset, and document the project and our results, including the comparison and reasons behind the results between ours and original paper’s.

Structure

All the main codes are writen in R language. You can check all .Rmd and .R files for details.

.
├── data # data set storage
├── doc # documentation
├── figures # all figures in data visualization and analysis
├── methods # all methods applied in this project
├── LICENSE
└── README.md

Method

The original paper is about predicting the popularity of movies. A number of attributes such as cast, genre, budget, production house, and rating affect the popularity of a movie. Social media such as Twitter, YouTube etc. are major platforms where people can share their views about the movies. The original paper uses Linear Regression and J48 tree to do the prediction based on the above two kinds of features (conventional features and social media features). We reproduce the results and try other methods that also successfully predict the popularity of movies, including Support Vector Machine, Naive Bayesian Model, LDA & QDA, Artificial Neural Network, Random Forest. At last, we make a comparison between all of these methods, and discuss about the profound nature behind the results.

Here are some plots for data visualization and analysis in our procedure.

  • Data Visualization:

img img

  • Analysis img img

Evaluation ans Comparison

Methods Conventional Features Social Media Features
Linear Regression 0.358 0.446
Generalized Additive Model 0.411 0.393
Generalised Linear Regression 0.429 0.357
LDA 0.339 0.303
QDA 0.285 0.393
Naive Bayesian 0.362 0.404
Support Vector Machine 0.436 0.385
ANN 0.630 0.667
Random Forest 0.625 0.693
J48 0.629 0.685

img

Documentation

All the details of project are presented in presentation slides and report. If you are interested in our project, you can check it for details.

Citation

Contact

About

EECS E6690 Final project @ Columbia

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published