Skip to content

This report mainly focuses on giving an overview of the data analysis process(steps not covered in detail) using data from The Movies DataBase(TMDB)

Notifications You must be signed in to change notification settings

YashMotwani/TMDB-Movies-Dataset-Investigation-

Repository files navigation

TMDB 5000 Movie Dataset Analysis

Introduction

This data set contains information about 10,000 movies collected from The Movie Database (TMDb), including user ratings and revenue.

  • Certain columns, like ‘cast’ and ‘genres’, contain multiple values separated by pipe (|) characters.
  • There are some odd characters in the ‘cast’ column. Don’t worry about cleaning them. You can leave them as is.
  • The final two columns ending with “_adj” show the budget and revenue of the associated movie in terms of 2010 dollars, accounting for inflation over time.

Objectives:

  • Know all the steps involved in a typical data analysis process
  • Be comfortable posing questions that can be answered with a given dataset and then answering those questions
  • Investigate problems in a dataset and wrangle the data into a format that can be used
  • Communicating the results of your analysis

Softwares needed:

*You will need an installation of Python, plus the following libraries:

  1. pandas
  2. NumPy
  3. Matplotlib
  4. csv
  • A text editor, like VS Code or Atom.
  • A terminal application (Terminal on Mac and Linux or Cygwin on Windows).

Installation links for softwares:

References:

  1. TMDB Movies Datasets
  2. Pandas Documentation
  3. Matplotlib Documentation

About

This report mainly focuses on giving an overview of the data analysis process(steps not covered in detail) using data from The Movies DataBase(TMDB)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published