<h1> Exercise </h1>

The dataset "Netflix Titles" is a comprehensive compilation of movies and television series available on Netflix, covering various aspects such as title type, director, actors, country of production, release year, rating, duration, genres (listed in), and a brief description. This dataset is essential for analyzing Netflix content trends, understanding genre popularity, and examining the distribution of content across different regions and time periods.

Columns:  
- show_id: A unique identifier for each title.  
- type: The category of the title, which can be 'Movie' or 'TV Show'.  
- title: The name of the movie or television show.  
- director: The director(s) of the movie or television show. (Contains null values for some entries, particularly TV shows where this information may not be applicable.)  
- cast: The list of main actors in the title. (Some entries may not have this information.)  
- country: The country or countries where the movie or television show was produced.  
- date_added: The date the title was added to Netflix.  
- release_year: The original release year of the movie or television show.  
- rating: The age rating of the title.  
- duration: The duration of the title, in minutes for movies and in seasons for TV shows.  
- listed_in: The genres the title belongs to.  
- description: A brief summary of the title.

<h1> Statement </h1>

<h2> data cleaning & data modeling </h2>

<b> 1) Importing the file </b>

Try to import the file netflix_titles.csv. If you encounter an error, it is probably related to an encoding issue. Find the appropriate encoding to read the file correctly.

<b> 2) Creating a copy of the DataFrame </b>

Create a copy of your DataFrame in a new variable to keep the original data accessible.

<b> 3) Removing unnecessary columns </b>

Check that the columns named "Unnamed:..." at the end of the DataFrame contain no data, then delete them.

<b> 4) Setting a new index </b>

Make sure that the column "Show_id" has no null values or duplicates. If this is the case, set show_id as the index of the DataFrame.

<b> 5) Correcting the column type </b>  
Check the type of the column "Date_added" and correct it to the appropriate type if necessary.

<b> 6) Managing the "Duration" column </b>

- Confirm that the column "Type" contains only the values 'Movie' and 'TV Show'.  
- Examine the nomenclature of the values in the "Duration" column.  
- Using groupby, check that durations are associated with minutes for movies and number of seasons for TV shows.  
- Create a new column 'duration (movies)' to isolate the number of minutes for movies. Ensure that this column is of type "float".  
- Create a column 'seasons (TV Show)' to isolate the number of seasons for series. Ensure that this column is of type "float".  
- Finally, delete the 'Duration' column from the dataset.

<b> 7) Creating auxiliary DataFrames for values separated by commas </b>

Some columns such as country, cast, and listed_in contain for some titles a series of values separated by commas. We want to create auxiliary tables that allow us to have each value distinctly.

For the "Country" column:  
- Create a 'countries' column by transforming the values of the 'country' column into lists (using str.split).  
- Create a DataFrame 'countries_exploded' which generates one row per country using the .explode() method.  
- Isolate only the 'countries' column in the 'countries_exploded' DataFrame (not seen in class, you need to understand how it works: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.explode.html)  
- Repeat this process for the columns 'categories' (based on 'listed_in') and 'cast', creating the - DataFrames 'categories_exploded' and 'cast_list' respectively.

<b> 8) Removing transformed columns </b>

Remove the three transformed columns from the original DataFrame.

<b> 9) Creating temporal columns </b>

From 'Date Added', create columns for the year added, month added, and day of the week added.

<h2> Data analysis </h2>

Now try to answer the following questions:  

1) How many "shows" are present in this dataset?  
2) What is the distribution between the types 'Movie' and 'TV Show'?  
3) What is the distribution of additions by year?  
4) What are the top 5 most added show categories?  
5) Who are the top 5 most popular actors in the United States?  
6) What is the distribution of additions by day of the week?  
7) In which country are the most documentaries produced?  
8) On average, how many seasons do series have?  
9) What is the distribution of movie durations (quartiles)?  
10) How many shows have drug-related themes (presence of the word "drug" in the description)?

<h1> Data Cleaning et data modeling</h1>

In [3]:
import pandas as pd

In [4]:
#1 Import file

File =pd.read_csv('netflix_titles.csv', encoding ="latin-1")
File.head(3)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,...,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,...,,,,,,,,,,
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,...,,,,,,,,,,
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,...,,,,,,,,,,


In [5]:
#2 Create a copy
data = File.copy()
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8809 entries, 0 to 8808
Data columns (total 26 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   show_id       8809 non-null   object 
 1   type          8809 non-null   object 
 2   title         8809 non-null   object 
 3   director      6175 non-null   object 
 4   cast          7984 non-null   object 
 5   country       7978 non-null   object 
 6   date_added    8799 non-null   object 
 7   release_year  8809 non-null   int64  
 8   rating        8805 non-null   object 
 9   duration      8806 non-null   object 
 10  listed_in     8809 non-null   object 
 11  description   8809 non-null   object 
 12  Unnamed: 12   0 non-null      float64
 13  Unnamed: 13   0 non-null      float64
 14  Unnamed: 14   0 non-null      float64
 15  Unnamed: 15   0 non-null      float64
 16  Unnamed: 16   0 non-null      float64
 17  Unnamed: 17   0 non-null      float64
 18  Unnamed: 18   0 non-null    

In [6]:
# DROP SOME COLUMNS
data.drop(columns= ['Unnamed: 12','Unnamed: 13','Unnamed: 14','Unnamed: 15','Unnamed: 16','Unnamed: 17','Unnamed: 18','Unnamed: 19','Unnamed: 20','Unnamed: 21','Unnamed: 22','Unnamed: 23','Unnamed: 24','Unnamed: 25'], inplace=True)

In [7]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8809 entries, 0 to 8808
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8809 non-null   object
 1   type          8809 non-null   object
 2   title         8809 non-null   object
 3   director      6175 non-null   object
 4   cast          7984 non-null   object
 5   country       7978 non-null   object
 6   date_added    8799 non-null   object
 7   release_year  8809 non-null   int64 
 8   rating        8805 non-null   object
 9   duration      8806 non-null   object
 10  listed_in     8809 non-null   object
 11  description   8809 non-null   object
dtypes: int64(1), object(11)
memory usage: 826.0+ KB


In [8]:
data.set_index('show_id', inplace =True)

In [9]:
#5 correct columns type
#data['date_added'].info()
data['date_added']=pd.to_datetime(data['date_added'], format='%B %d, %Y', errors = 'coerce')
data['date_added'].info()

<class 'pandas.core.series.Series'>
Index: 8809 entries, s1 to s8809
Series name: date_added
Non-Null Count  Dtype         
--------------  -----         
8711 non-null   datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 137.6+ KB


In [10]:
#6
data['type'].unique()

array(['Movie', 'TV Show'], dtype=object)

In [11]:
data['duration'].unique()

array(['90 min', '2 Seasons', '1 Season', '91 min', '125 min',
       '9 Seasons', '104 min', '127 min', '4 Seasons', '67 min', '94 min',
       '5 Seasons', '161 min', '61 min', '166 min', '147 min', '103 min',
       '97 min', '106 min', '111 min', '3 Seasons', '110 min', '105 min',
       '96 min', '124 min', '116 min', '98 min', '23 min', '115 min',
       '122 min', '99 min', '88 min', '100 min', '6 Seasons', '102 min',
       '93 min', '95 min', '85 min', '83 min', '113 min', '13 min',
       '182 min', '48 min', '145 min', '87 min', '92 min', '80 min',
       '117 min', '128 min', '119 min', '143 min', '114 min', '118 min',
       '108 min', '63 min', '121 min', '142 min', '154 min', '120 min',
       '82 min', '109 min', '101 min', '86 min', '229 min', '76 min',
       '89 min', '156 min', '112 min', '107 min', '129 min', '135 min',
       '136 min', '165 min', '150 min', '133 min', '70 min', '84 min',
       '140 min', '78 min', '7 Seasons', '64 min', '59 min', '139 min',
    

In [12]:
data.groupby('type')['duration'].unique()

type
Movie      [90 min, 91 min, 125 min, 104 min, 127 min, 67...
TV Show    [2 Seasons, 1 Season, 9 Seasons, 4 Seasons, 5 ...
Name: duration, dtype: object

In [13]:
data['duration (movies)'] = data['duration'].where(data['type']=='Movie').str.replace('min','').astype(float)
data

Unnamed: 0_level_0,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,duration (movies)
show_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",90.0
s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",
s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,2021-09-24,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,
s4,TV Show,Jailbirds New Orleans,,,,2021-09-24,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",
s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,
...,...,...,...,...,...,...,...,...,...,...,...,...
s8805,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,2019-11-01,2009,R,88 min,"Comedies, Horror Movies",Looking to survive in a world taken over by zo...,88.0
s8806,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,2020-01-11,2006,PG,88 min,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero...",88.0
s8807,Movie,Zubaan,Mozez Singh,"Vicky Kaushal, Sarah-Jane Dias, Raaghav Chanan...",India,2019-03-02,2015,TV-14,111 min,"Dramas, International Movies, Music & Musicals",A scrappy but poor boy worms his way into a ty...,111.0
s8808,TV Show,Parasyte: The Grey,Yeon Sang-ho,"Shin Hyun-been, Jeon Yeo-bin, Goo Kyo-hwan",South Korea,2024-04-05,2024,TV-MA,1 Season,"Sci-fi, Horror, Action",A new breed of parasitic aliens arrive on Eart...,


In [14]:
data['duration (TV Show)'] = data['duration'].where(data['type']=='TV Show')
data.head(10)

Unnamed: 0_level_0,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,duration (movies),duration (TV Show)
show_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",90.0,
s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",,2 Seasons
s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,2021-09-24,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,,1 Season
s4,TV Show,Jailbirds New Orleans,,,,2021-09-24,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",,1 Season
s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,,2 Seasons
s6,TV Show,Midnight Mass,Mike Flanagan,"Kate Siegel, Zach Gilford, Hamish Linklater, H...",,2021-09-24,2021,TV-MA,1 Season,"TV Dramas, TV Horror, TV Mysteries",The arrival of a charismatic young priest brin...,,1 Season
s7,Movie,My Little Pony: A New Generation,"Robert Cullen, JosÃ© Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,2021-09-24,2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...,91.0,
s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...",2021-09-24,1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",125.0,
s9,TV Show,The Great British Baking Show,Andy Devonshire,"Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho...",United Kingdom,2021-09-24,2021,TV-14,9 Seasons,"British TV Shows, Reality TV",A talented batch of amateur bakers face off in...,,9 Seasons
s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,2021-09-24,2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,104.0,


In [15]:
data['duration (TV Show)']=data['duration (TV Show)'].str.replace('Season', '')

In [16]:
data['duration (TV Show)'] =data['duration (TV Show)'].str.replace('s', '').astype(float)

In [17]:
data.head()

Unnamed: 0_level_0,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,duration (movies),duration (TV Show)
show_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",90.0,
s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",,2.0
s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,2021-09-24,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,,1.0
s4,TV Show,Jailbirds New Orleans,,,,2021-09-24,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",,1.0
s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,,2.0


In [18]:
data.drop(columns ='duration', inplace= True)
data.head(2)

Unnamed: 0_level_0,type,title,director,cast,country,date_added,release_year,rating,listed_in,description,duration (movies),duration (TV Show)
show_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,Documentaries,"As her father nears the end of his life, filmm...",90.0,
s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",,2.0


In [19]:
#7
data['countries']= data['country'].str.split(',')
countries_exploded = data.explode('countries')
countries_exploded = countries_exploded['countries']
countries_exploded.to_csv('countries_exploded.csv')

In [20]:
data['categories']= data['listed_in'].str.split(',')
categories_exploded = data.explode('categories')
categories_exploded =categories_exploded['categories']

In [21]:
data['categories_cast']= data['cast'].str.split(',')
cast_list = data.explode('categories_cast')
cast_list =cast_list['categories_cast']
cast_list.to_csv('cast_list.csv')

In [22]:
categories_exploded

show_id
s1                Documentaries
s2       International TV Shows
s2                    TV Dramas
s2                 TV Mysteries
s3               Crime TV Shows
                  ...          
s8808                    Horror
s8808                    Action
s8809                     Drama
s8809                   Romance
s8809                  Thriller
Name: categories, Length: 19329, dtype: object

In [23]:
#8
data.drop(columns=['countries','categories','categories_cast'],inplace =True)
data

Unnamed: 0_level_0,type,title,director,cast,country,date_added,release_year,rating,listed_in,description,duration (movies),duration (TV Show)
show_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,Documentaries,"As her father nears the end of his life, filmm...",90.0,
s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",,2.0
s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,2021-09-24,2021,TV-MA,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,,1.0
s4,TV Show,Jailbirds New Orleans,,,,2021-09-24,2021,TV-MA,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",,1.0
s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,,2.0
...,...,...,...,...,...,...,...,...,...,...,...,...
s8805,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,2019-11-01,2009,R,"Comedies, Horror Movies",Looking to survive in a world taken over by zo...,88.0,
s8806,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,2020-01-11,2006,PG,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero...",88.0,
s8807,Movie,Zubaan,Mozez Singh,"Vicky Kaushal, Sarah-Jane Dias, Raaghav Chanan...",India,2019-03-02,2015,TV-14,"Dramas, International Movies, Music & Musicals",A scrappy but poor boy worms his way into a ty...,111.0,
s8808,TV Show,Parasyte: The Grey,Yeon Sang-ho,"Shin Hyun-been, Jeon Yeo-bin, Goo Kyo-hwan",South Korea,2024-04-05,2024,TV-MA,"Sci-fi, Horror, Action",A new breed of parasitic aliens arrive on Eart...,,1.0


In [24]:
#9
data['added Year']= data['date_added'].dt.year
data['added month']= data['date_added'].dt.month
data['added week day']= data['date_added'].dt.weekday
data

Unnamed: 0_level_0,type,title,director,cast,country,date_added,release_year,rating,listed_in,description,duration (movies),duration (TV Show),added Year,added month,added week day
show_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,Documentaries,"As her father nears the end of his life, filmm...",90.0,,2021.0,9.0,5.0
s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",,2.0,2021.0,9.0,4.0
s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,2021-09-24,2021,TV-MA,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,,1.0,2021.0,9.0,4.0
s4,TV Show,Jailbirds New Orleans,,,,2021-09-24,2021,TV-MA,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",,1.0,2021.0,9.0,4.0
s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,,2.0,2021.0,9.0,4.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
s8805,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,2019-11-01,2009,R,"Comedies, Horror Movies",Looking to survive in a world taken over by zo...,88.0,,2019.0,11.0,4.0
s8806,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,2020-01-11,2006,PG,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero...",88.0,,2020.0,1.0,5.0
s8807,Movie,Zubaan,Mozez Singh,"Vicky Kaushal, Sarah-Jane Dias, Raaghav Chanan...",India,2019-03-02,2015,TV-14,"Dramas, International Movies, Music & Musicals",A scrappy but poor boy worms his way into a ty...,111.0,,2019.0,3.0,5.0
s8808,TV Show,Parasyte: The Grey,Yeon Sang-ho,"Shin Hyun-been, Jeon Yeo-bin, Goo Kyo-hwan",South Korea,2024-04-05,2024,TV-MA,"Sci-fi, Horror, Action",A new breed of parasitic aliens arrive on Eart...,,1.0,2024.0,4.0,4.0


<h1> Data analysis </h1>

In [26]:
#1
data['title'].count()

8809

In [27]:
#2
data['type'].value_counts()

type
Movie      6132
TV Show    2677
Name: count, dtype: int64

In [28]:
#3
data['added Year'].value_counts()

added Year
2019.0    1999
2020.0    1878
2018.0    1625
2021.0    1498
2017.0    1164
2016.0     418
2015.0      73
2014.0      23
2011.0      13
2013.0      10
2012.0       3
2009.0       2
2008.0       2
2024.0       2
2010.0       1
Name: count, dtype: int64

In [29]:
#4
categories_exploded.value_counts().head(5)

categories
 International Movies    2624
Dramas                   1600
Comedies                 1210
Action & Adventure        859
Documentaries             829
Name: count, dtype: int64

In [30]:
#5
merge_cast = pd.merge(cast_list, data, how='inner', left_index=True, right_index=True)
merge_cast[merge_cast['country'].str.contains('United States')==True]['categories_cast'].value_counts().head(5)

categories_cast
Adam Sandler        20
 Fred Tatasciore    19
 Molly Shannon      16
 Sean Astin         15
 Alfred Molina      15
Name: count, dtype: int64

In [31]:
#6
data['added week day'].value_counts()

added week day
4.0    2478
3.0    1387
2.0    1276
1.0    1182
0.0     845
5.0     803
6.0     740
Name: count, dtype: int64

In [74]:
#7
merge_list = pd.merge(categories_exploded, data, how='inner', left_index= True, right_index= True)
merge_list[merge_list['listed_in'].str.contains('Documentaries') == True]['country'].value_counts().head(1)

country
United States    584
Name: count, dtype: int64

In [33]:
#8
data['duration (TV Show)'].mean()

1.7646619350018677

In [34]:
#9
data['duration (movies)'].quantile([0.25,0.50,0.75])

0.25     87.0
0.50     98.0
0.75    114.0
Name: duration (movies), dtype: float64

In [35]:
#10A
data[data['description'].str.contains('drug')]

Unnamed: 0_level_0,type,title,director,cast,country,date_added,release_year,rating,listed_in,description,duration (movies),duration (TV Show),added Year,added month,added week day
show_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,2021-09-24,2021,TV-MA,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,,1.0,2021.0,9.0,4.0
s18,TV Show,Falsa identidad,,"Luis Ernesto Franco, Camila Sodi, Sergio Goyri...",Mexico,2021-09-22,2020,TV-MA,"Crime TV Shows, Spanish-Language TV Shows, TV ...",Strangers Diego and Isabel flee their home in ...,,2.0,2021.0,9.0,2.0
s37,Movie,The Stronghold,CÃ©dric Jimenez,"Gilles Lellouche, Karim Leklou, FranÃ§ois Civi...",,2021-09-17,2021,TV-MA,"Action & Adventure, Dramas, International Movies","Tired of the small-time grind, three Marseille...",105.0,,2021.0,9.0,4.0
s135,Movie,Clear and Present Danger,Phillip Noyce,"Harrison Ford, Willem Dafoe, Anne Archer, Joaq...","United States, Mexico",2021-09-01,1994,PG-13,"Action & Adventure, Dramas","When the president's friend is murdered, CIA D...",142.0,,2021.0,9.0,2.0
s151,Movie,In Too Deep,Michael Rymer,"Omar Epps, LL Cool J, Nia Long, Stanley Tucci,...",United States,2021-09-01,1999,R,Thrillers,Rookie cop Jeffrey Cole poses as a drug dealer...,97.0,,2021.0,9.0,2.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
s8587,Movie,Thumper,Jordan Ross,"Eliza Taylor, Pablo Schreiber, Daniel Webber, ...",United States,2018-02-05,2017,TV-MA,"Dramas, Thrillers",After moving to a hardscrabble suburban Califo...,93.0,,2018.0,2.0,0.0
s8637,Movie,True to the Game,Preston A. Whitmore II,"Columbus Short, Erica Peeples, Vivica A. Fox, ...",United States,2018-03-01,2017,R,Dramas,When a drug kingpin looking to go legit falls ...,108.0,,2018.0,3.0,3.0
s8649,Movie,Two Graves,Gary Young,"Cathy Tyson, Katie Jarvis, David Hayman, Josh ...",United Kingdom,2019-05-01,2018,TV-MA,Thrillers,A doctor and a drug addict kidnap the son of a...,80.0,,2019.0,5.0,2.0
s8749,Movie,Winter of Our Dreams,John Duigan,"Judy Davis, Bryan Brown, Cathy Downes, Baz Luh...",Australia,2016-11-01,1981,NR,"Classic Movies, Dramas","After the death of a long-ago lover, married p...",86.0,,2016.0,11.0,1.0


In [36]:
#10B
(data['title'].where(data['description'].str.contains('drug'))).value_counts()

title
Ganglands                   1
American Honey              1
Berlin Calling              1
Very Big Shot               1
Reggie Watts: Spatial       1
                           ..
Don                         1
The Last O.G.               1
La Reina del Sur            1
Narcoworld: Dope Stories    1
Winter's Bone               1
Name: count, Length: 158, dtype: int64