Simple data science code and blog for Data Science nanodegree of Udacity
This is a simple project to investigate movies dataset and the impact of genres on the voting and popularity. This investigating can help in many cases, for example: providing some offers or sales with the movies that belong to the genres of the highest popularity or average voting. The project aims to answer the following questions:
- What is the movie that has the highest average voting? what its genres? how many times it has been voted?
- What is the movie that has the highest voting count? what its genres and average voting?
- What are the most 5 common genres?
- What is the genre that has the maximum average voting?
- What is the genre that has the highest popularity?
Python3, pandas and matplotlib are the only requirements to be installed.
- IMDB 5000 Movie Dataset (21 columns and 10866 rows)
- Python script to run simple exploring and and analysis on the dataset
- The movie with the highest average voting belongs to Documentary genre.
- The movie with the highest average voting belongs to 'Action', 'Adventure', 'Mystery', 'Science Fiction' and 'Thriller' genres.
- Drama is the most common genre, followed by Comedy, then Thriller.
- The movies of documentary genre have the highest average voting than other genres.
- Adventure is the most polpular genre, followed by Fiction and Fantasy.
- The least polpular genres are Foreign and Documentary.
The dataset resorce is on Kaggle: https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset