This data set contains information about 10,000 movies collected from The Movie Database (TMDb), including user ratings and revenue.
- Certain columns, like ‘cast’ and ‘genres’, contain multiple values separated by pipe (|) characters.
- There are some odd characters in the ‘cast’ column. Don’t worry about cleaning them. You can leave them as is.
- The final two columns ending with “_adj” show the budget and revenue of the associated movie in terms of 2010 dollars, accounting for inflation over time.
- Know all the steps involved in a typical data analysis process
- Be comfortable posing questions that can be answered with a given dataset and then answering those questions
- Investigate problems in a dataset and wrangle the data into a format that can be used
- Communicating the results of your analysis
*You will need an installation of Python, plus the following libraries:
- pandas
- NumPy
- Matplotlib
- csv
- A text editor, like VS Code or Atom.
- A terminal application (Terminal on Mac and Linux or Cygwin on Windows).