Skip to content

Cherrue/predict-drama-rating

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

predictDramaRating

Abstract

Predict Korean drama tv rate by users behavior in social network by python

Nationality : South Korea

TV rating - http://www.nielsenkorea.co.kr/ Drama Clip(comment data) - http://tv.naver.com/

Body

The South Korea have two ideal envrionment about studying that something is influence tv rating. First, the KR is small country, that is why Korean measure compareatively accurate tv rating. Second, the Korean converge on NAVER.com for search STH or social behavior.

Then, I formulate a hypothesis that naver user's behaviors in drama clip have relation with TV rate. To prove my hypothesis, i will use statistical method : multi linear regression Artificial intelligence : Support Vector Machine, Deep Learning

Step

Parse DATA

Make schema about DATA. Make spider parsing Nielsen and Naver by python.

Libs : Selenium.chromedriver (virtual browser) ODBC manager

Data Cleansing

Check data description/validation and Cleansing.

Data retrieval

Join DATA by SQL. Draw graphs - scatter, linear.... by MS Excel. Calculate correlation between variables by MS Excel or python.

Linear Regression

Multi Lnear Regression for predicting tv rating by Naver data.

Libs : numpy, pandas, scikit-learn.linear_model -> develop linear regression matplotlib.pyplot, seaborn -> data visualization

Support Vector Machine                          

The correlationship between tv rating and NAVER comment data can be thought of as a none linear model. Thus,comment data is 7-dimensions. So I predict tv rating by SVM(kernel : 'rbf'=gaussian) Referrence : https://tensorflow.blog/%ED%8C%8C%EC%9D%B4%EC%8D%AC-%EB%A8%B8%EC%8B%A0%EB%9F%AC%EB%8B%9D/ http://scikit-learn.org/stable/

Neural Network - Word2Vec, Doc2Vec

I train model by gensim.word2vec and doc2vec, and test direction of comments' vector.

Develop environment

python 3.6

Anaconda3 4.3.1 (matplotlib 2.0.0 numpy 1.11.3 scikit-learn 0.18.1 seaborn 0.7.1) morpheme tagging : konlpy gensim 2.3.0 Source IDE : Wing IDE 101 Visualize IDE : Jupyter notebook

MySQL 5.7.11 (local server)

MS Excel 2010

Problem

Adequate data is severely lacking. But, Time will solve them.

Releases

No releases published

Packages

No packages published