EECS-E6690-final-project

Final project for EECS E6690 Statistical learning @ Columbia University

Introduction

This is the EECS 6690 final project of Statistical Learning. Our goal is to do a comprehensive review of a machine learning paper, try to reproduce the results of the original paper by recreating all the model using R code, think about new models or methods that can be applied to the dataset, and document the project and our results, including the comparison and reasons behind the results between ours and original paper’s.

Structure

All the main codes are writen in R language. You can check all .Rmd and .R files for details.

.
├── data # data set storage
├── doc # documentation
├── figures # all figures in data visualization and analysis
├── methods # all methods applied in this project
├── LICENSE
└── README.md

Method

The original paper is about predicting the popularity of movies. A number of attributes such as cast, genre, budget, production house, and rating affect the popularity of a movie. Social media such as Twitter, YouTube etc. are major platforms where people can share their views about the movies. The original paper uses Linear Regression and J48 tree to do the prediction based on the above two kinds of features (conventional features and social media features). We reproduce the results and try other methods that also successfully predict the popularity of movies, including Support Vector Machine, Naive Bayesian Model, LDA & QDA, Artificial Neural Network, Random Forest. At last, we make a comparison between all of these methods, and discuss about the profound nature behind the results.

Here are some plots for data visualization and analysis in our procedure.

Data Visualization:

Analysis

Evaluation ans Comparison

Methods	Conventional Features	Social Media Features
Linear Regression	0.358	0.446
Generalized Additive Model	0.411	0.393
Generalised Linear Regression	0.429	0.357
LDA	0.339	0.303
QDA	0.285	0.393
Naive Bayesian	0.362	0.404
Support Vector Machine	0.436	0.385
ANN	0.630	0.667
Random Forest	0.625	0.693
J48	0.629	0.685

Documentation

All the details of project are presented in presentation slides and report. If you are interested in our project, you can check it for details.

Citation

Contact

Chong Hu UNI: ch3467
Chen Wenjie UNI: wc2685
Zhang Haoran UNI: hz2619

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EECS-E6690-final-project

Introduction

Structure

Method

Evaluation ans Comparison

Documentation

Citation

Contact

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
data		data
doc		doc
figures		figures
methods		methods
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

JackSnowWolf/EECS-E6690-final-project

Folders and files

Latest commit

History

Repository files navigation

EECS-E6690-final-project

Introduction

Structure

Method

Evaluation ans Comparison

Documentation

Citation

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages