Skip to content

Latest commit

 

History

History
19 lines (16 loc) · 1.07 KB

File metadata and controls

19 lines (16 loc) · 1.07 KB

Investigating_TMDb_Movie_Dataset

Investigation of data associated with collection of movies from TMDb. This project was done as part of Udacity's Data Analyst Nanodegree.

The TMDb Movie dataset has been selected for investigation using NumPy and Pandas. The dataset is a collection of information on around 10000 movies collected from The Movie Database (TMDb),including user ratings and revenue.

  • Certain columns, like ‘cast’ and ‘genres’, contain multiple values separated by pipe (|) characters.
  • The final two columns ending with “_adj” show the budget and revenue of the associated movie in terms of 2010 dollars, accounting for inflation over time.

Outline of analysis

  • Introduction
  • Data Wrangling
  • Exploratory Data Analysis
  • Conclusions

Research Questions

  • How release numbers have changed over the years?
  • What are the Top 10 highest grossing movies? Is the list different from movies with the biggest profit?
  • Which directors/production companies/actors are associated with most profitable movies?
  • Is popularity of movie associated with month of release?