**Netflix**! What started in 1997 as a DVD rental service has since exploded into one of the largest entertainment and media companies.

Given the large number of movies and series available on the platform, it is a perfect opportunity to flex your exploratory data analysis skills and dive into the entertainment industry.

You work for a production company that specializes in nostalgic styles. You want to do some research on movies released in the 1990's. You'll delve into Netflix data and perform exploratory data analysis to better understand this awesome movie decade!

You have been supplied with the dataset `netflix_data.csv`, along with the following table detailing the column names and descriptions. Feel free to experiment further after submitting!

## The data
### **netflix_data.csv**
| Column | Description |
|--------|-------------|
| `show_id` | The ID of the show |
| `type` | Type of show |
| `title` | Title of the show |
| `director` | Director of the show |
| `cast` | Cast of the show |
| `country` | Country of origin |
| `date_added` | Date added to Netflix |
| `release_year` | Year of Netflix release |
| `duration` | Duration of the show in minutes |
| `description` | Description of the show |
| `genre` | Show genre |

In [3]:
import pandas as pd
import matplotlib.pyplot as plt

# Read in the Netflix CSV as a DataFrame
netflix_df = pd.read_csv("C:\\Users\\pc\\Downloads\\netflix_data.csv", index_col = 0)

In [4]:
print(netflix_df.head())

print(netflix_df.info())

            type  title           director  \
show_id                                      
s2         Movie   7:19  Jorge Michel Grau   
s3         Movie  23:59       Gilbert Chan   
s4         Movie      9        Shane Acker   
s5         Movie     21     Robert Luketic   
s6       TV Show     46        Serdar Akar   

                                                      cast        country  \
show_id                                                                     
s2       Demián Bichir, Héctor Bonilla, Oscar Serrano, ...         Mexico   
s3       Tedd Chan, Stella Chung, Henley Hii, Lawrence ...      Singapore   
s4       Elijah Wood, John C. Reilly, Jennifer Connelly...  United States   
s5       Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...  United States   
s6       Erdal Beşikçioğlu, Yasemin Allen, Melis Birkan...         Turkey   

                date_added  release_year  duration  \
show_id                                              
s2       December 23, 2016   

In [5]:
df_date = netflix_df['date_added']
print(df_date)

show_id
s2       December 23, 2016
s3       December 20, 2018
s4       November 16, 2017
s5         January 1, 2020
s6            July 1, 2017
               ...        
s7779     November 1, 2019
s7781         July 1, 2018
s7782     January 11, 2020
s7783     October 19, 2020
s7784        March 2, 2019
Name: date_added, Length: 4812, dtype: object


In [9]:
netflix_df['date_added'] = pd.to_datetime(df_date, errors='coerce')

print(netflix_df['date_added'].dtype)

datetime64[ns]


In [11]:
movie_df = netflix_df[netflix_df['type'] == 'Movie'] #store data frame filtered only movies
print(movie_df.head())

          type  title           director  \
show_id                                    
s2       Movie   7:19  Jorge Michel Grau   
s3       Movie  23:59       Gilbert Chan   
s4       Movie      9        Shane Acker   
s5       Movie     21     Robert Luketic   
s7       Movie    122    Yasir Al Yasiri   

                                                      cast        country  \
show_id                                                                     
s2       Demián Bichir, Héctor Bonilla, Oscar Serrano, ...         Mexico   
s3       Tedd Chan, Stella Chung, Henley Hii, Lawrence ...      Singapore   
s4       Elijah Wood, John C. Reilly, Jennifer Connelly...  United States   
s5       Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...  United States   
s7       Amina Khalil, Ahmed Dawood, Tarek Lotfy, Ahmed...          Egypt   

        date_added  release_year  duration  \
show_id                                      
s2      2016-12-23          2016        93   
s3      2018-

In [15]:
#store date frame filtered 90s only(i have now movies in 90s)
df_movies_90s = movie_df[(movie_df['release_year'] >= 1990) & (movie_df['release_year'] < 2000)]
print(df_movies_90s.head())

          type                            title            director  \
show_id                                                               
s8       Movie                              187      Kevin Reynolds   
s167     Movie                A Dangerous Woman  Stephen Gyllenhaal   
s211     Movie           A Night at the Roxbury    John Fortenberry   
s239     Movie  A Thin Line Between Love & Hate     Martin Lawrence   
s274     Movie                     Aashik Awara         Umesh Mehra   

                                                      cast        country  \
show_id                                                                     
s8       Samuel L. Jackson, John Heard, Kelly Rowan, Cl...  United States   
s167     Debra Winger, Barbara Hershey, Gabriel Byrne, ...  United States   
s211     Will Ferrell, Chris Kattan, Dan Hedaya, Molly ...  United States   
s239     Martin Lawrence, Lynn Whitfield, Regina King, ...  United States   
s274     Saif Ali Khan, Mamta Kulkarni, 

In [16]:
total_movies_90s = len(df_movies_90s)
print('total movies in the 90s = ' , total_movies_90s)

total movies in the 90s =  183


In [17]:
frequent_countries = df_movies_90s['country'].value_counts()
print('top ten countries in the 90s movies\n', frequent_countries.head(10))

top ten countries in the 90s movies
 country
United States     99
India             34
United Kingdom    17
Hong Kong         11
France             5
Australia          5
Mexico             3
Germany            2
Japan              2
Poland             1
Name: count, dtype: int64


In [21]:
frequent_directors = df_movies_90s['director'].value_counts()
print('the most frequent directors produces movies in 90s\n', frequent_directors.head(10))

the most frequent directors produces movies in 90s
 director
Johnnie To            4
Youssef Chahine       3
Umesh Mehra           3
Gregory Hoblit        3
Subhash Ghai          3
Mahesh Bhatt          3
Rajkumar Santoshi     3
Sooraj R. Barjatya    3
David Dhawan          2
Quentin Tarantino     2
Name: count, dtype: int64


In [22]:
most_frepuent_duration = df_movies_90s['duration'].value_counts().head(5)
print(most_frepuent_duration) # most frequent duration is 94

duration
94     7
101    6
108    5
93     5
96     5
Name: count, dtype: int64


In [23]:
short_action_movie = df_movies_90s[(df_movies_90s['duration'] < 90) & (df_movies_90s['genre'].str.contains("Action", case=False, na=False))]
short_movie_count = len(short_action_movie)
print("number of short movies: ", short_movie_count)

number of short movies:  7
