In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
df = pd.read_csv('netflix.csv')

# Understanding the Data

## Gist of dataset

In [3]:
df.head(10)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...
5,s6,TV Show,Midnight Mass,Mike Flanagan,"Kate Siegel, Zach Gilford, Hamish Linklater, H...",,"September 24, 2021",2021,TV-MA,1 Season,"TV Dramas, TV Horror, TV Mysteries",The arrival of a charismatic young priest brin...
6,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,"September 24, 2021",2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s..."
8,s9,TV Show,The Great British Baking Show,Andy Devonshire,"Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho...",United Kingdom,"September 24, 2021",2021,TV-14,9 Seasons,"British TV Shows, Reality TV",A talented batch of amateur bakers face off in...
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...


In [4]:
df.shape

(8807, 12)

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8807 non-null   object
 1   type          8807 non-null   object
 2   title         8807 non-null   object
 3   director      6173 non-null   object
 4   cast          7982 non-null   object
 5   country       7976 non-null   object
 6   date_added    8797 non-null   object
 7   release_year  8807 non-null   int64 
 8   rating        8803 non-null   object
 9   duration      8804 non-null   object
 10  listed_in     8807 non-null   object
 11  description   8807 non-null   object
dtypes: int64(1), object(11)
memory usage: 825.8+ KB


## Challenges with the data

### 1. Missing Values

In [6]:
na_variables = [ var for var in df.columns if df[var].isnull().mean() > 0 ]
na_variables

['director', 'cast', 'country', 'date_added', 'rating', 'duration']

In [7]:
df[na_variables].isnull().mean()*100

director      29.908028
cast           9.367549
country        9.435676
date_added     0.113546
rating         0.045418
duration       0.034064
dtype: float64

In [8]:
df.isnull().sum()/len(df)*100

show_id          0.000000
type             0.000000
title            0.000000
director        29.908028
cast             9.367549
country          9.435676
date_added       0.113546
release_year     0.000000
rating           0.045418
duration         0.034064
listed_in        0.000000
description      0.000000
dtype: float64

#### We can see that the columns which are significantly impacted by null values are director, cast and country with director being impacted at a very high level of almost 30%.
#### Apart from the above 3 columns, date_added, rating and duration also have missing values but the volume is not significant. Hence we can consider removing those data/rows from the dataset.

#### a) Analysing and removing missing values for date_added, rating and duration

In [9]:
print('Percent of records which are null for the columns date_added, rating and duration: ', df.iloc[np.where((pd.isna(df['date_added'])==True) | (pd.isna(df['rating'])==True) | (pd.isna(df['duration'])==True))].shape[0]/len(df)*100)

Percent of records which are null for the columns date_added, rating and duration:  0.19302827296468716


In [10]:
df.dropna(subset = ['date_added', 'rating', 'duration'], inplace = True)

In [11]:
df.shape

(8790, 12)

#### 17 records removed out of total 8807 making the new total record count of 8790

#### b) Analysing missing "director" values

In [12]:
df.iloc[np.where(pd.isna(df['director'])==True)].head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...
10,s11,TV Show,"Vendetta: Truth, Lies and The Mafia",,,,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, Docuseries, International TV S...","Sicily boasts a bold ""Anti-Mafia"" coalition. B..."
14,s15,TV Show,Crime Stories: India Detectives,,,,"September 22, 2021",2021,TV-MA,1 Season,"British TV Shows, Crime TV Shows, Docuseries",Cameras following Bengaluru police on the job ...


In [13]:
df.iloc[np.where(pd.isna(df['director'])==True)].shape

(2621, 12)

In [14]:
print('Percentage of significant null values in director column:', round((df.isnull().sum()/len(df)*100)['director'].astype(float),2))

Percentage of significant null values in director column: 29.82


#### c) Analysing missing "cast" values

In [15]:
df.iloc[np.where(pd.isna(df['cast'])==True)].head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
10,s11,TV Show,"Vendetta: Truth, Lies and The Mafia",,,,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, Docuseries, International TV S...","Sicily boasts a bold ""Anti-Mafia"" coalition. B..."
14,s15,TV Show,Crime Stories: India Detectives,,,,"September 22, 2021",2021,TV-MA,1 Season,"British TV Shows, Crime TV Shows, Docuseries",Cameras following Bengaluru police on the job ...
16,s17,Movie,Europe's Most Dangerous Man: Otto Skorzeny in ...,"Pedro de Echave García, Pablo Azorín Williams",,,"September 22, 2021",2020,TV-MA,67 min,"Documentaries, International Movies",Declassified documents reveal the post-WWII li...


In [16]:
df.iloc[np.where(pd.isna(df['cast'])==True)].shape

(825, 12)

#### We can ignore the case where cast is NaN for listed_in values Documentaries as usually, they may or may not have cast

In [17]:
df.iloc[np.where((df['listed_in']!='Documentaries') & (pd.isna(df['cast'])==True))].shape

(642, 12)

In [18]:
df.iloc[np.where((df['listed_in']!='Documentaries') & (pd.isna(df['cast'])==True))].head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
10,s11,TV Show,"Vendetta: Truth, Lies and The Mafia",,,,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, Docuseries, International TV S...","Sicily boasts a bold ""Anti-Mafia"" coalition. B..."
14,s15,TV Show,Crime Stories: India Detectives,,,,"September 22, 2021",2021,TV-MA,1 Season,"British TV Shows, Crime TV Shows, Docuseries",Cameras following Bengaluru police on the job ...
16,s17,Movie,Europe's Most Dangerous Man: Otto Skorzeny in ...,"Pedro de Echave García, Pablo Azorín Williams",,,"September 22, 2021",2020,TV-MA,67 min,"Documentaries, International Movies",Declassified documents reveal the post-WWII li...
20,s21,TV Show,Monsters Inside: The 24 Faces of Billy Milligan,Olivier Megaton,,,"September 22, 2021",2021,TV-14,1 Season,"Crime TV Shows, Docuseries, International TV S...","In the late 1970s, an accused serial rapist cl..."


In [19]:
print('Percentage of significant null values in cast column:', round(df.iloc[np.where((df['listed_in']!='Documentaries') & (pd.isna(df['cast'])==True))].shape[0]/len(df)*100, 2))

Percentage of significant null values in cast column: 7.3


#### d) Analysing missing "country" values

In [20]:
df.iloc[np.where(pd.isna(df['country'])==True)].shape

(829, 12)

In [21]:
df.iloc[np.where(pd.isna(df['country'])==True)].head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
5,s6,TV Show,Midnight Mass,Mike Flanagan,"Kate Siegel, Zach Gilford, Hamish Linklater, H...",,"September 24, 2021",2021,TV-MA,1 Season,"TV Dramas, TV Horror, TV Mysteries",The arrival of a charismatic young priest brin...
6,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,"September 24, 2021",2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...
10,s11,TV Show,"Vendetta: Truth, Lies and The Mafia",,,,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, Docuseries, International TV S...","Sicily boasts a bold ""Anti-Mafia"" coalition. B..."


In [22]:
print('Percentage of significant null values in country column:', round(df.iloc[np.where(pd.isna(df['country'])==True)].shape[0]/len(df)*100, 2))

Percentage of significant null values in country column: 9.43


In [23]:
df.shape

(8790, 12)

### 2. Nested data issue

#### It's been identified that we have nested data issue with the columns director, cast, country and listed_in.

#### a) Unnesting Director

In [24]:
df = df.assign(director = df['director'].str.split(', ')).explode('director')

In [25]:
df = df.reset_index()

In [26]:
df.drop('index', axis=1, inplace=True)

In [27]:
df

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...
...,...,...,...,...,...,...,...,...,...,...,...,...
9590,s8803,Movie,Zodiac,David Fincher,"Mark Ruffalo, Jake Gyllenhaal, Robert Downey J...",United States,"November 20, 2019",2007,R,158 min,"Cult Movies, Dramas, Thrillers","A political cartoonist, a crime reporter and a..."
9591,s8804,TV Show,Zombie Dumb,,,,"July 1, 2019",2018,TV-Y7,2 Seasons,"Kids' TV, Korean TV Shows, TV Comedies","While living alone in a spooky town, a young g..."
9592,s8805,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,"November 1, 2019",2009,R,88 min,"Comedies, Horror Movies",Looking to survive in a world taken over by zo...
9593,s8806,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,"January 11, 2020",2006,PG,88 min,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero..."


#### Column director has been unnested

#### b) Unnesting cast

In [28]:
df = df.assign(cast = df['cast'].str.split(', ')).explode('cast')

In [29]:
df = df.reset_index()

In [30]:
df.drop('index', axis=1, inplace=True)

In [31]:
df

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,Ama Qamata,South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s2,TV Show,Blood & Water,,Khosi Ngema,South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
3,s2,TV Show,Blood & Water,,Gail Mabalane,South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
4,s2,TV Show,Blood & Water,,Thabang Molaba,South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
...,...,...,...,...,...,...,...,...,...,...,...,...
70697,s8807,Movie,Zubaan,Mozez Singh,Manish Chaudhary,India,"March 2, 2019",2015,TV-14,111 min,"Dramas, International Movies, Music & Musicals",A scrappy but poor boy worms his way into a ty...
70698,s8807,Movie,Zubaan,Mozez Singh,Meghna Malik,India,"March 2, 2019",2015,TV-14,111 min,"Dramas, International Movies, Music & Musicals",A scrappy but poor boy worms his way into a ty...
70699,s8807,Movie,Zubaan,Mozez Singh,Malkeet Rauni,India,"March 2, 2019",2015,TV-14,111 min,"Dramas, International Movies, Music & Musicals",A scrappy but poor boy worms his way into a ty...
70700,s8807,Movie,Zubaan,Mozez Singh,Anita Shabdish,India,"March 2, 2019",2015,TV-14,111 min,"Dramas, International Movies, Music & Musicals",A scrappy but poor boy worms his way into a ty...


#### Column cast has been unnested

#### c) Unnesting country

In [32]:
df = df.assign(country = df['country'].str.split(', ')).explode('country')

In [33]:
df = df.reset_index()

In [34]:
df.drop('index', axis=1, inplace=True)

In [35]:
df

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,Ama Qamata,South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s2,TV Show,Blood & Water,,Khosi Ngema,South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
3,s2,TV Show,Blood & Water,,Gail Mabalane,South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
4,s2,TV Show,Blood & Water,,Thabang Molaba,South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
...,...,...,...,...,...,...,...,...,...,...,...,...
89267,s8807,Movie,Zubaan,Mozez Singh,Manish Chaudhary,India,"March 2, 2019",2015,TV-14,111 min,"Dramas, International Movies, Music & Musicals",A scrappy but poor boy worms his way into a ty...
89268,s8807,Movie,Zubaan,Mozez Singh,Meghna Malik,India,"March 2, 2019",2015,TV-14,111 min,"Dramas, International Movies, Music & Musicals",A scrappy but poor boy worms his way into a ty...
89269,s8807,Movie,Zubaan,Mozez Singh,Malkeet Rauni,India,"March 2, 2019",2015,TV-14,111 min,"Dramas, International Movies, Music & Musicals",A scrappy but poor boy worms his way into a ty...
89270,s8807,Movie,Zubaan,Mozez Singh,Anita Shabdish,India,"March 2, 2019",2015,TV-14,111 min,"Dramas, International Movies, Music & Musicals",A scrappy but poor boy worms his way into a ty...


#### Column country has been unnested

#### d) Unnesting listed_in

In [36]:
df = df.assign(listed_in = df['listed_in'].str.split(', ')).explode('listed_in')

In [37]:
df = df.reset_index()

In [38]:
df.drop('index', axis=1, inplace=True)

In [39]:
df

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,Ama Qamata,South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,International TV Shows,"After crossing paths at a party, a Cape Town t..."
2,s2,TV Show,Blood & Water,,Ama Qamata,South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,TV Dramas,"After crossing paths at a party, a Cape Town t..."
3,s2,TV Show,Blood & Water,,Ama Qamata,South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,TV Mysteries,"After crossing paths at a party, a Cape Town t..."
4,s2,TV Show,Blood & Water,,Khosi Ngema,South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,International TV Shows,"After crossing paths at a party, a Cape Town t..."
...,...,...,...,...,...,...,...,...,...,...,...,...
201758,s8807,Movie,Zubaan,Mozez Singh,Anita Shabdish,India,"March 2, 2019",2015,TV-14,111 min,International Movies,A scrappy but poor boy worms his way into a ty...
201759,s8807,Movie,Zubaan,Mozez Singh,Anita Shabdish,India,"March 2, 2019",2015,TV-14,111 min,Music & Musicals,A scrappy but poor boy worms his way into a ty...
201760,s8807,Movie,Zubaan,Mozez Singh,Chittaranjan Tripathy,India,"March 2, 2019",2015,TV-14,111 min,Dramas,A scrappy but poor boy worms his way into a ty...
201761,s8807,Movie,Zubaan,Mozez Singh,Chittaranjan Tripathy,India,"March 2, 2019",2015,TV-14,111 min,International Movies,A scrappy but poor boy worms his way into a ty...


### Revisiting Missing values

In [40]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 201763 entries, 0 to 201762
Data columns (total 12 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   show_id       201763 non-null  object
 1   type          201763 non-null  object
 2   title         201763 non-null  object
 3   director      151338 non-null  object
 4   cast          199617 non-null  object
 5   country       189869 non-null  object
 6   date_added    201763 non-null  object
 7   release_year  201763 non-null  int64 
 8   rating        201763 non-null  object
 9   duration      201763 non-null  object
 10  listed_in     201763 non-null  object
 11  description   201763 non-null  object
dtypes: int64(1), object(11)
memory usage: 18.5+ MB


In [41]:
df.isnull().sum()/len(df)*100

show_id          0.000000
type             0.000000
title            0.000000
director        24.992194
cast             1.063624
country          5.895035
date_added       0.000000
release_year     0.000000
rating           0.000000
duration         0.000000
listed_in        0.000000
description      0.000000
dtype: float64

In [42]:
na_variables = [ var for var in df.columns if df[var].isnull().mean() > 0 ]
na_variables

['director', 'cast', 'country']

In [43]:
df[na_variables].isnull().mean()*100

director    24.992194
cast         1.063624
country      5.895035
dtype: float64

In [44]:
df['director'].value_counts()

Martin Scorsese        419
Youssef Chahine        409
Cathy Garcia-Molina    356
Steven Spielberg       355
Lars von Trier         336
                      ... 
Jerry Kolber             1
Marty Callner            1
Helena Coan              1
Will Allen               1
Fabio Ock                1
Name: director, Length: 4991, dtype: int64

In [45]:
df['director'].mode()

0    Martin Scorsese
dtype: object

In [46]:
df.iloc[np.where((pd.isna(df['country'])==True)  & (pd.isna(df['director'])==True))]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
85,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,Docuseries,"Feuds, flirtations and toilet talk go down amo..."
86,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,Reality TV,"Feuds, flirtations and toilet talk go down amo..."
353,s11,TV Show,"Vendetta: Truth, Lies and The Mafia",,,,"September 24, 2021",2021,TV-MA,1 Season,Crime TV Shows,"Sicily boasts a bold ""Anti-Mafia"" coalition. B..."
354,s11,TV Show,"Vendetta: Truth, Lies and The Mafia",,,,"September 24, 2021",2021,TV-MA,1 Season,Docuseries,"Sicily boasts a bold ""Anti-Mafia"" coalition. B..."
355,s11,TV Show,"Vendetta: Truth, Lies and The Mafia",,,,"September 24, 2021",2021,TV-MA,1 Season,International TV Shows,"Sicily boasts a bold ""Anti-Mafia"" coalition. B..."
...,...,...,...,...,...,...,...,...,...,...,...,...
201196,s8786,TV Show,YOM,,Mayur Vyas,,"June 7, 2018",2016,TV-Y7,1 Season,Kids' TV,"With the mind of a human being, and the body o..."
201197,s8786,TV Show,YOM,,Ketan Kava,,"June 7, 2018",2016,TV-Y7,1 Season,Kids' TV,"With the mind of a human being, and the body o..."
201704,s8804,TV Show,Zombie Dumb,,,,"July 1, 2019",2018,TV-Y7,2 Seasons,Kids' TV,"While living alone in a spooky town, a young g..."
201705,s8804,TV Show,Zombie Dumb,,,,"July 1, 2019",2018,TV-Y7,2 Seasons,Korean TV Shows,"While living alone in a spooky town, a young g..."


In [47]:
len(df)

201763

In [48]:
df[df['listed_in']=='Docuseries']

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
85,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,Docuseries,"Feuds, flirtations and toilet talk go down amo..."
354,s11,TV Show,"Vendetta: Truth, Lies and The Mafia",,,,"September 24, 2021",2021,TV-MA,1 Season,Docuseries,"Sicily boasts a bold ""Anti-Mafia"" coalition. B..."
497,s15,TV Show,Crime Stories: India Detectives,,,,"September 22, 2021",2021,TV-MA,1 Season,Docuseries,Cameras following Bengaluru police on the job ...
589,s21,TV Show,Monsters Inside: The 24 Faces of Billy Milligan,Olivier Megaton,,,"September 22, 2021",2021,TV-14,1 Season,Docuseries,"In the late 1970s, an accused serial rapist cl..."
653,s26,TV Show,Love on the Spectrum,,Brooke Satchwell,Australia,"September 21, 2021",2021,TV-14,2 Seasons,Docuseries,Finding love can be hard for anyone. For young...
...,...,...,...,...,...,...,...,...,...,...,...,...
200167,s8742,TV Show,Wild Arabia,,Alexander Siddig,United Kingdom,"March 31, 2017",2013,TV-PG,1 Season,Docuseries,The widely varied geology and dramatic landsca...
200475,s8756,TV Show,Women Behind Bars,,,United States,"November 1, 2016",2010,TV-14,3 Seasons,Docuseries,This reality series recounts true stories of w...
200505,s8759,TV Show,World's Busiest Cities,,Anita Rani,United Kingdom,"February 1, 2019",2017,TV-PG,1 Season,Docuseries,"From Moscow to Mexico City, three BBC journali..."
200507,s8759,TV Show,World's Busiest Cities,,Ade Adepitan,United Kingdom,"February 1, 2019",2017,TV-PG,1 Season,Docuseries,"From Moscow to Mexico City, three BBC journali..."


In [49]:
df.iloc[np.where((pd.isna(df['director'])==True) & (pd.isna(df['country'])==True))].shape[0]/len(df)*100

2.440487106159207

In [68]:
df.iloc[np.where((pd.isna(df['director'])==True) & (pd.isna(df['country'])==True) & (pd.isna(df['cast'])==False))]['director'].fillna(df.groupby(['cast'])['director'].agg(pd.Series.mode))

564      NaN
565      NaN
566      NaN
567      NaN
568      NaN
          ..
201193   NaN
201194   NaN
201195   NaN
201196   NaN
201197   NaN
Name: director, Length: 4740, dtype: float64

In [70]:
df.iloc[np.where((pd.isna(df['director'])==True) & (pd.isna(df['country'])==True) & (pd.isna(df['cast'])==False))]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
564,s20,TV Show,Jaguar,,Blanca Suárez,,"September 22, 2021",2021,TV-MA,1 Season,International TV Shows,"In the 1960s, a Holocaust survivor joins a gro..."
565,s20,TV Show,Jaguar,,Blanca Suárez,,"September 22, 2021",2021,TV-MA,1 Season,Spanish-Language TV Shows,"In the 1960s, a Holocaust survivor joins a gro..."
566,s20,TV Show,Jaguar,,Blanca Suárez,,"September 22, 2021",2021,TV-MA,1 Season,TV Action & Adventure,"In the 1960s, a Holocaust survivor joins a gro..."
567,s20,TV Show,Jaguar,,Iván Marcos,,"September 22, 2021",2021,TV-MA,1 Season,International TV Shows,"In the 1960s, a Holocaust survivor joins a gro..."
568,s20,TV Show,Jaguar,,Iván Marcos,,"September 22, 2021",2021,TV-MA,1 Season,Spanish-Language TV Shows,"In the 1960s, a Holocaust survivor joins a gro..."
...,...,...,...,...,...,...,...,...,...,...,...,...
201193,s8786,TV Show,YOM,,Sairaj,,"June 7, 2018",2016,TV-Y7,1 Season,Kids' TV,"With the mind of a human being, and the body o..."
201194,s8786,TV Show,YOM,,Devyani Dagaonkar,,"June 7, 2018",2016,TV-Y7,1 Season,Kids' TV,"With the mind of a human being, and the body o..."
201195,s8786,TV Show,YOM,,Ketan Singh,,"June 7, 2018",2016,TV-Y7,1 Season,Kids' TV,"With the mind of a human being, and the body o..."
201196,s8786,TV Show,YOM,,Mayur Vyas,,"June 7, 2018",2016,TV-Y7,1 Season,Kids' TV,"With the mind of a human being, and the body o..."


In [71]:
df.groupby(['cast'])['director'].agg(pd.Series.mode)

cast
 Jr.                          Sam Macaroni
"Riley" Lakdhar Dridi    Rebecca Zlotowski
'Najite Dede                  Aniedi Anwah
2 Chainz                                []
2Mex                          Ava DuVernay
                               ...        
Şevket Çoruh                  Ozan Açıktan
Şinasi Yurtsever            Selçuk Aydemir
Şükran Ovalı                Yılmaz Erdoğan
Şükrü Özyıldız              Yılmaz Erdoğan
Ṣọpẹ́ Dìrísù                   Remi Weekes
Name: director, Length: 36392, dtype: object

In [None]:
df.groupby(['country'])['director'].agg(pd.Series.mode)

In [52]:
df[df['cast']=='Mayur Vyas']

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
23318,s941,Movie,Motu Patlu: Deep Sea Adventure,Suhas Kadav,Mayur Vyas,,"May 1, 2021",2014,TV-Y7,76 min,Children & Family Movies,Friends Motu and Patlu get more maritime excit...
23319,s941,Movie,Motu Patlu: Deep Sea Adventure,Suhas Kadav,Mayur Vyas,,"May 1, 2021",2014,TV-Y7,76 min,Comedies,Friends Motu and Patlu get more maritime excit...
24860,s1007,Movie,Motu Patlu Dino Invasion,Suhas Kadav,Mayur Vyas,,"April 20, 2021",2018,TV-Y7,80 min,Children & Family Movies,A time machine sends Motu and Patlu back to th...
24861,s1007,Movie,Motu Patlu Dino Invasion,Suhas Kadav,Mayur Vyas,,"April 20, 2021",2018,TV-Y7,80 min,Comedies,A time machine sends Motu and Patlu back to th...
24868,s1008,Movie,Motu Patlu in Octupus World,Suhas Kadav,Mayur Vyas,,"April 20, 2021",2017,TV-Y,81 min,Children & Family Movies,While returning a goldfish and an octopus from...
24869,s1008,Movie,Motu Patlu in Octupus World,Suhas Kadav,Mayur Vyas,,"April 20, 2021",2017,TV-Y,81 min,Comedies,While returning a goldfish and an octopus from...
26732,s1073,Movie,Motu Patlu in the City of Gold,Suhas Kadav,Mayur Vyas,,"April 13, 2021",2018,TV-Y7,77 min,Children & Family Movies,Defeated by the strong yet naive Motu in a rac...
26733,s1073,Movie,Motu Patlu in the City of Gold,Suhas Kadav,Mayur Vyas,,"April 13, 2021",2018,TV-Y7,77 min,Comedies,Defeated by the strong yet naive Motu in a rac...
26734,s1073,Movie,Motu Patlu in the City of Gold,Suhas Kadav,Mayur Vyas,,"April 13, 2021",2018,TV-Y7,77 min,Music & Musicals,Defeated by the strong yet naive Motu in a rac...
201196,s8786,TV Show,YOM,,Mayur Vyas,,"June 7, 2018",2016,TV-Y7,1 Season,Kids' TV,"With the mind of a human being, and the body o..."
