# **Investigating Netflix Movies**

Netflix! What started in 1997 as a DVD rental service has since exploded into one of the largest entertainment and media companies.

Given the large number of movies and series available on the platform, it is a perfect opportunity to flex your exploratory data analysis skills and dive into the entertainment industry.

You work for a production company that specializes in nostalgic styles. You want to do some research on movies released in the 1990's. You'll delve into Netflix data and perform exploratory data analysis to better understand this awesome movie decade!

You have been supplied with the dataset `netflix_data.csv`, along with the following table detailing the column names and descriptions. Feel free to experiment further after submitting!

**The data**

File : netflix_data.csv

Column  | Description
--------|------------
show_id	| The ID of the show
type |	Type of show
title |	Title of the show
director |	Director of the show
cast |	Cast of the show
country |	Country of origin
date_added |	Date added to Netflix
release_year |	Year of Netflix release
duration |	Duration of the show in minutes
description |	Description of the show
genre |	Show genre
--------|------------

Perform exploratory data analysis on the `netflix_data.csv` data to understand more about movies from the 1990s  and 20002 decades. Answer each question in its own codecell provided separately below.

1. What was the most frequent movie duration in the 1990s? Save an approximate answer as an integer called `duration`.

2. A movie is considered short if it is less than 90 minutes. Count the number of short action movies released in the 1990s and save this integer as `short_movie_count`.
3. How many commedy movies were released in 2000s in Turkey? Count the number of commedy movies released in the 2000s and save the value as an integer variable `turkey_commedy_2000s`.
4. Which country has more movies in the 2000s? Nigeria or South Africa? Find out the number of movies by these two countries and compare. Then assign the country name of the country with high number of movies in 2000s to `high_movie_country_2000s`.
5. Which countries are in the top five in terms of number of short documentary movies in 1990s? Put the names of the top five countries as python lists to a variable `top5_countries_short_documentary`.      

In [2]:
#This codecell is general and you should run it before proceeding to the next codecells to answer each question
# Importing pandas and matplotlib
import pandas as pd
import matplotlib.pyplot as plt

# Read in the Netflix CSV as a DataFrame
netflix_df = pd.read_csv("https://github.com/DataAnalyst21/DatasetsForDataAnalytics/blob/main/netflix_data.csv?raw=True")

print(netflix_df.info())

# Check for missing values
print(netflix_df.isnull().sum())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7787 entries, 0 to 7786
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       7787 non-null   object
 1   type          7787 non-null   object
 2   title         7787 non-null   object
 3   director      5398 non-null   object
 4   cast          7069 non-null   object
 5   country       7280 non-null   object
 6   date_added    7777 non-null   object
 7   release_year  7787 non-null   int64 
 8   duration      7787 non-null   int64 
 9   description   7787 non-null   object
 10  genre         7787 non-null   object
dtypes: int64(2), object(9)
memory usage: 669.3+ KB
None
show_id            0
type               0
title              0
director        2389
cast             718
country          507
date_added        10
release_year       0
duration           0
description        0
genre              0
dtype: int64


In [18]:
#1. Codecell for question #1
#Write your solution codes to answer question #1



movies_1990s = netflix_df[(netflix_df['type'] == 'Movie') &
                            (netflix_df['release_year'] >= 1990) &
                            (netflix_df['release_year'] <= 1999)]

duration = int(movies_1990s['duration'].mode()[0])



In [17]:
#2. Codecell for question #2
#Write your solution codes to answer question #2
short_movies_1990s_action = netflix_df[(netflix_df['type'] == 'Movie') &
                                         (netflix_df['release_year'] >= 1990) &
                                         (netflix_df['release_year'] <= 1999) &
                                         (netflix_df['duration'] > 90) &
                                         (netflix_df['type'].str.contains('Action', case=False))]

# Count the number of short action movies
short_movie_count = short_movies_1990s_action.shape[0]

# Display the result
print("Number of short action movies in the 1990s:", short_movie_count)

Number of short action movies in the 1990s: 0


In [20]:
#3. Codecell for question #3
#Write your solution codes to answer question #3
turkey_commedy_2000s_df = netflix_df[(netflix_df['type'] == 'Movie') &
                                      (netflix_df['release_year'] >= 2000) &
                                      (netflix_df['release_year'] <= 2009) &
                                      (netflix_df['type'].str.contains('Comedy', case=False)) &
                                      (netflix_df['country'].str.contains('Turkey', case=False))]

# Count the number of comedy movies
turkey_commedy_2000s = turkey_commedy_2000s_df.shape[0]

# Display the result
print("Number of comedy movies released in Turkey in the 2000s:", turkey_commedy_2000s)

Number of comedy movies released in Turkey in the 2000s: 0


In [31]:
#4. Codecell for question #4
#Write your solution codes to answer question #4
netflix_df = pd.read_csv("https://github.com/DataAnalyst21/DatasetsForDataAnalytics/blob/main/netflix_data.csv?raw=True")
nigeria_2000s = netflix_df[(netflix_df['country'] == 'Nigeria') & (netflix_df['release_year'] >= 2000) & (netflix_df['release_year'] < 2010)]


# Filter for movies from South Africa in the 2000s
south_africa_2000s = netflix_df[(netflix_df['country'] == 'South Africa') & (netflix_df['release_year'] >= 2000) & (netflix_df['release_year'] < 2010)]

# Count the number of movies for Nigeria and South Africa in the 2000s
nigeria_movie_count = nigeria_2000s.shape[0]
south_africa_movie_count = south_africa_2000s.shape[0]

# Compare the two countries and assign the country with more movies
if nigeria_movie_count > south_africa_movie_count:
    high_movie_country_2000s = 'Nigeria'
else:
    high_movie_country_2000s = 'South Africa'

# Output the result
print(high_movie_country_2000s)

South Africa


In [38]:
#5. Codecell for question #5
#Write your solution codes to answer question #5
short_documentary_1990s = netflix_df[(netflix_df['genre'] == 'Documentary') &
                                     (netflix_df['duration'] < 90) &
                                     (netflix_df['release_year'] >= 1990) &
                                     (netflix_df['release_year'] < 2000)]
country_counts = short_documentary_1990s.groupby('country').size()

# Sort the results in descending order
sorted_country_counts = country_counts.sort_values(ascending=False)

# Get the top 5 countries
top5_countries_short_documentary = sorted_country_counts.head(5).index.tolist()

# Display the top 5 countries
print(top5_countries_short_documentary)



[]
