Skip to content

A hub that stores data science and analytics done on movie related data. The techniques used include EDA, NLP topic analysis, Recommender System, and advanced visualization in Tableau

Notifications You must be signed in to change notification settings

Olliang/All-About-Movie-Data

Repository files navigation

What Can You Do with Movie Data?


Introduction

Being a movie & tv show enthusiast, I have been intrigued to understand user viewing behavior using data and think about how to provide the right content to the right people from a user's stand point. This repository is a play ground of my data analysis on movie related data.

Data

  1. MovieLens 20M data (https://grouplens.org/datasets/movielens/20m/)

  2. MovieLens 1M data(https://grouplens.org/datasets/movielens/1m/)

  3. Netflix Prize data on Kaggle (https://www.kaggle.com/netflix-inc/netflix-prize-data)


Data Science and Analysis

1. Exporatory Data Analysis (EDA)

File: MovieLens_EDA.ipynb - jupyter notebook that contains the EDA on the MoviLens Dataset

a. How is the ratingcount distributed across movies released in different years?

b. How are the ratingcount and usercount distributed across different genre?

c. What are the popular topics of movies distributed by different rating scores?

==> Techniques used: NLP analysis and wordcloud visualization


2. Dashboard Visualization

Business Problem: The number of movies released was growing exponentially from 1880s to 2010s. However, more ratings are seen on the movies released during the year from 1993 to 1996. While there was an exponentially increasing number of movies released after this period, there are exponentially decreasing number of ratings on the newer movies. Is this a natual peak due to the cumulated ratings covering a longer period? Has the MovieLens Website become less popular after that period? What movies caused this great popularity? Are those ratings mostly positive?

File: Movie Rating Dashboard Analysis.md - Markdown file recording the storyline of analysis on the MoviLens Data

dashboard v1



3. Building Recommender System

a. Recommender System with Memory-based Collaborative Filtering

File: Memory_Based_CF.ipynb - a notebook explaining and showcasing how memory-based collaborative filtering works

b. Recommender System with Matrix Factorization

File: Matrix_Factorization_CF.ipynb - a notebook explaining and showcasing how matrix factorization works with SVD algorithm

c. Content-based Recommender System

File: Content_based_Filtering.ipynb - a notebook explaining and showcasing how Content-based Filtering works with TF-IDF algorithm



References:


https://github.com/adashofdata/nlp-in-python-tutorial
https://github.com/khanhnamle1994/movielens

About

A hub that stores data science and analytics done on movie related data. The techniques used include EDA, NLP topic analysis, Recommender System, and advanced visualization in Tableau

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published