# Netflix Genre Trend Analysis 📺🎬

## loading data

In [388]:
import pandas as pd

In [389]:
data = pd.read_csv('./netflix_titles.csv')
data.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


## understanding data

In [390]:
data.shape

(8807, 12)

In [391]:
data.columns

Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
       'release_year', 'rating', 'duration', 'listed_in', 'description'],
      dtype='object')

In [392]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8807 non-null   object
 1   type          8807 non-null   object
 2   title         8807 non-null   object
 3   director      6173 non-null   object
 4   cast          7982 non-null   object
 5   country       7976 non-null   object
 6   date_added    8797 non-null   object
 7   release_year  8807 non-null   int64 
 8   rating        8803 non-null   object
 9   duration      8804 non-null   object
 10  listed_in     8807 non-null   object
 11  description   8807 non-null   object
dtypes: int64(1), object(11)
memory usage: 825.8+ KB


In [393]:
data.isna().sum()

Unnamed: 0,0
show_id,0
type,0
title,0
director,2634
cast,825
country,831
date_added,10
release_year,0
rating,4
duration,3


In [394]:
data.describe(include='object')

Unnamed: 0,show_id,type,title,director,cast,country,date_added,rating,duration,listed_in,description
count,8807,8807,8807,6173,7982,7976,8797,8803,8804,8807,8807
unique,8807,2,8807,4528,7692,748,1767,17,220,514,8775
top,s8807,Movie,Zubaan,Rajiv Chilaka,David Attenborough,United States,"January 1, 2020",TV-MA,1 Season,"Dramas, International Movies","Paranormal activity at a lush, abandoned prope..."
freq,1,6131,1,19,19,2818,109,3207,1793,362,4


In [395]:
data['title'].duplicated().sum()

np.int64(0)

## cleaning

In [396]:
data['date_added'] = pd.to_datetime(data['date_added'], errors='coerce')

In [397]:
data['year_added'] = data['date_added'].dt.year
data['month_added'] = data['date_added'].dt.month

In [398]:
data['director'].fillna('Unknown', inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  data['director'].fillna('Unknown', inplace=True)


In [399]:
data['country'] = data['country'].str.strip()
data['type'] = data['type'].str.strip()
data['rating'] = data['rating'].str.strip()

In [400]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 14 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   show_id       8807 non-null   object        
 1   type          8807 non-null   object        
 2   title         8807 non-null   object        
 3   director      8807 non-null   object        
 4   cast          7982 non-null   object        
 5   country       7976 non-null   object        
 6   date_added    8709 non-null   datetime64[ns]
 7   release_year  8807 non-null   int64         
 8   rating        8803 non-null   object        
 9   duration      8804 non-null   object        
 10  listed_in     8807 non-null   object        
 11  description   8807 non-null   object        
 12  year_added    8709 non-null   float64       
 13  month_added   8709 non-null   float64       
dtypes: datetime64[ns](1), float64(2), int64(1), object(10)
memory usage: 963.4+ KB


In [401]:
data.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added,month_added
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021.0,9.0
1,s2,TV Show,Blood & Water,Unknown,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",2021.0,9.0
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,2021-09-24,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,2021.0,9.0
3,s4,TV Show,Jailbirds New Orleans,Unknown,,,2021-09-24,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",2021.0,9.0
4,s5,TV Show,Kota Factory,Unknown,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,2021.0,9.0


## exploring patterns

In [402]:
# distribution of content type
data['type'].value_counts().reset_index()

Unnamed: 0,type,count
0,Movie,6131
1,TV Show,2676


In [403]:
# top 10 countries producing most content
data['country'].value_counts().reset_index().head(10)

Unnamed: 0,country,count
0,United States,2818
1,India,972
2,United Kingdom,419
3,Japan,245
4,South Korea,199
5,Canada,181
6,Spain,145
7,France,124
8,Mexico,110
9,Egypt,106


In [404]:
# title added each year (trend over time)
data.groupby('year_added')['title'].count().reset_index()

Unnamed: 0,year_added,title
0,2008.0,2
1,2009.0,2
2,2010.0,1
3,2011.0,13
4,2012.0,3
5,2013.0,10
6,2014.0,23
7,2015.0,73
8,2016.0,418
9,2017.0,1164


In [405]:
# title added each year (trend over time)
data['year_added'].value_counts().sort_index().reset_index()

Unnamed: 0,year_added,count
0,2008.0,2
1,2009.0,2
2,2010.0,1
3,2011.0,13
4,2012.0,3
5,2013.0,10
6,2014.0,23
7,2015.0,73
8,2016.0,418
9,2017.0,1164


In [406]:
# most common ratings
data['rating'].value_counts().reset_index().head(10)

Unnamed: 0,rating,count
0,TV-MA,3207
1,TV-14,2160
2,TV-PG,863
3,R,799
4,PG-13,490
5,TV-Y7,334
6,TV-Y,307
7,PG,287
8,TV-G,220
9,NR,80


In [407]:
# average duration for movies? (duration for shows is in seasons, so focus on movies)
movies = data[data['type'] == 'Movie']
movies['duration'] = movies['duration'].str.replace(' min', '').astype(float)
movies['duration'].mean()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  movies['duration'] = movies['duration'].str.replace(' min', '').astype(float)


np.float64(99.57718668407311)

In [408]:
# unique directors
data['director'].nunique()

4529

In [409]:
# top 5 directors with most content
data['director'].value_counts().reset_index().head(5)

Unnamed: 0,director,count
0,Unknown,2634
1,Rajiv Chilaka,19
2,"Raúl Campos, Jan Suter",18
3,Suhas Kadav,16
4,Marcus Raboy,16


## more on exploring data

In [410]:
# movies vs shows were added each year
data.groupby(['year_added', 'type']).size().sort_index(ascending=True)

Unnamed: 0_level_0,Unnamed: 1_level_0,0
year_added,type,Unnamed: 2_level_1
2008.0,Movie,1
2008.0,TV Show,1
2009.0,Movie,2
2010.0,Movie,1
2011.0,Movie,13
2012.0,Movie,3
2013.0,Movie,6
2013.0,TV Show,4
2014.0,Movie,19
2014.0,TV Show,4


In [411]:
# movies vs shows were added each year
data.groupby(['year_added', 'type']).size().unstack()

type,Movie,TV Show
year_added,Unnamed: 1_level_1,Unnamed: 2_level_1
2008.0,1.0,1.0
2009.0,2.0,
2010.0,1.0,
2011.0,13.0,
2012.0,3.0,
2013.0,6.0,4.0
2014.0,19.0,4.0
2015.0,56.0,17.0
2016.0,253.0,165.0
2017.0,839.0,325.0


In [412]:
# top 10 most featured actors or actresses
data_cast = data.dropna(subset=['cast'])
data_cast = data_cast.assign(cast=data_cast['cast'].str.split(', '))
df_cast = data_cast.explode('cast')
top_actors = df_cast['cast'].value_counts().reset_index().head(10)
top_actors

Unnamed: 0,cast,count
0,Anupam Kher,43
1,Shah Rukh Khan,35
2,Julie Tejwani,33
3,Takahiro Sakurai,32
4,Naseeruddin Shah,32
5,Rupa Bhimani,31
6,Om Puri,30
7,Akshay Kumar,30
8,Yuki Kaji,29
9,Amitabh Bachchan,28


In [413]:
# top 10 countries for TV Shows only
data[data['type'] == 'TV Show']['country'].value_counts().head(10).reset_index()

Unnamed: 0,country,count
0,United States,760
1,United Kingdom,213
2,Japan,169
3,South Korea,158
4,India,79
5,Taiwan,68
6,Canada,59
7,France,49
8,Australia,48
9,Spain,48


In [414]:
# most titles get added by month
data['month_added'].value_counts().sort_index().reset_index()

Unnamed: 0,month_added,count
0,1.0,727
1,2.0,557
2,3.0,734
3,4.0,759
4,5.0,626
5,6.0,724
6,7.0,819
7,8.0,749
8,9.0,765
9,10.0,755


In [415]:
# most common rating per type
data.groupby('type')['rating'].value_counts().unstack().fillna(0)

rating,66 min,74 min,84 min,G,NC-17,NR,PG,PG-13,R,TV-14,TV-G,TV-MA,TV-PG,TV-Y,TV-Y7,TV-Y7-FV,UR
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Movie,1.0,1.0,1.0,41.0,3.0,75.0,287.0,490.0,797.0,1427.0,126.0,2062.0,540.0,131.0,139.0,5.0,3.0
TV Show,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,2.0,733.0,94.0,1145.0,323.0,176.0,195.0,1.0,0.0


In [416]:
# titles where the description contains the word “crime”
data[data['description'].str.contains('crime', case=False, na=False)][['title', 'description']].head(10)

Unnamed: 0,title,description
10,"Vendetta: Truth, Lies and The Mafia","Sicily boasts a bold ""Anti-Mafia"" coalition. B..."
14,Crime Stories: India Detectives,Cameras following Bengaluru police on the job ...
84,Omo Ghetto: the Saga,Twins are reunited as a good-hearted female ga...
122,In the Cut,After embarking on an affair with the cop prob...
133,Chappie,In a futuristic society where an indestructibl...
150,In Too Deep,Rookie cop Jeffrey Cole poses as a drug dealer...
166,Once Upon a Time in America,Director Sergio Leone's sprawling crime epic f...
208,Once Upon a Time in Mumbaai,Mumbai's top mob boss rules the underworld wit...
222,Clickbait,When family man Nick Brewer is abducted in a c...
255,Memories of a Murderer: The Nilsen Tapes,Serial killer Dennis Nilsen narrates his life ...


In [417]:
# longest movie on Netflix?
movies.loc[movies['duration'].idxmax()]

Unnamed: 0,4253
show_id,s4254
type,Movie
title,Black Mirror: Bandersnatch
director,Unknown
cast,"Fionn Whitehead, Will Poulter, Craig Parkinson..."
country,United States
date_added,2018-12-28 00:00:00
release_year,2018
rating,TV-MA
duration,312.0


## Which genres were most popular across different years?

In [418]:
data.head(2)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added,month_added
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021.0,9.0
1,s2,TV Show,Blood & Water,Unknown,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",2021.0,9.0


In [419]:
genre_data = data[['release_year', 'listed_in']].dropna().copy()
genre_data['listed_in'] = genre_data['listed_in'].str.split(', ')
genre_data = genre_data.explode('listed_in')

In [420]:
genre_data['listed_in'].unique()

array(['Documentaries', 'International TV Shows', 'TV Dramas',
       'TV Mysteries', 'Crime TV Shows', 'TV Action & Adventure',
       'Docuseries', 'Reality TV', 'Romantic TV Shows', 'TV Comedies',
       'TV Horror', 'Children & Family Movies', 'Dramas',
       'Independent Movies', 'International Movies', 'British TV Shows',
       'Comedies', 'Spanish-Language TV Shows', 'Thrillers',
       'Romantic Movies', 'Music & Musicals', 'Horror Movies',
       'Sci-Fi & Fantasy', 'TV Thrillers', "Kids' TV",
       'Action & Adventure', 'TV Sci-Fi & Fantasy', 'Classic Movies',
       'Anime Features', 'Sports Movies', 'Anime Series',
       'Korean TV Shows', 'Science & Nature TV', 'Teen TV Shows',
       'Cult Movies', 'TV Shows', 'Faith & Spirituality', 'LGBTQ Movies',
       'Stand-Up Comedy', 'Movies', 'Stand-Up Comedy & Talk Shows',
       'Classic & Cult TV'], dtype=object)

In [421]:
genre_map = {
    'TV Dramas': 'Drama',
    'Dramas': 'Drama',
    'Horror Movies': 'Horror',
    'TV Horror': 'Horror',
    'TV Comedies': 'Comedy',
    'Comedies': 'Comedy',
    'Action & Adventure': 'Action',
    'TV Action & Adventure': 'Action',
    'Thrillers': 'Thriller',
    'TV Thrillers': 'Thriller',
    'Romantic Movies': 'Romance',
    'Romantic TV Shows': 'Romance',
    'Sci-Fi & Fantasy': 'Sci-Fi',
    'TV Sci-Fi & Fantasy': 'Sci-Fi'
}

genre_data['listed_in'] = genre_data['listed_in'].replace(genre_map)
genre_data['listed_in'].unique()

array(['Documentaries', 'International TV Shows', 'Drama', 'TV Mysteries',
       'Crime TV Shows', 'Action', 'Docuseries', 'Reality TV', 'Romance',
       'Comedy', 'Horror', 'Children & Family Movies',
       'Independent Movies', 'International Movies', 'British TV Shows',
       'Spanish-Language TV Shows', 'Thriller', 'Music & Musicals',
       'Sci-Fi', "Kids' TV", 'Classic Movies', 'Anime Features',
       'Sports Movies', 'Anime Series', 'Korean TV Shows',
       'Science & Nature TV', 'Teen TV Shows', 'Cult Movies', 'TV Shows',
       'Faith & Spirituality', 'LGBTQ Movies', 'Stand-Up Comedy',
       'Movies', 'Stand-Up Comedy & Talk Shows', 'Classic & Cult TV'],
      dtype=object)

In [422]:
genre_data = genre_data.groupby(['release_year', 'listed_in']).size().unstack().fillna(0).sort_index(ascending=False)

In [423]:
genre_data.head(10)

listed_in,Action,Anime Features,Anime Series,British TV Shows,Children & Family Movies,Classic & Cult TV,Classic Movies,Comedy,Crime TV Shows,Cult Movies,...,Sci-Fi,Science & Nature TV,Spanish-Language TV Shows,Sports Movies,Stand-Up Comedy,Stand-Up Comedy & Talk Shows,TV Mysteries,TV Shows,Teen TV Shows,Thriller
release_year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2021,65.0,6.0,23.0,17.0,40.0,0.0,0.0,142.0,47.0,0.0,...,14.0,10.0,32.0,15.0,12.0,7.0,14.0,2.0,8.0,42.0
2020,78.0,3.0,21.0,33.0,83.0,3.0,0.0,238.0,87.0,0.0,...,20.0,15.0,28.0,17.0,41.0,7.0,27.0,2.0,11.0,52.0
2019,79.0,6.0,18.0,26.0,82.0,3.0,0.0,234.0,92.0,0.0,...,34.0,8.0,31.0,25.0,49.0,8.0,16.0,2.0,14.0,87.0
2018,109.0,8.0,24.0,37.0,69.0,2.0,1.0,260.0,79.0,1.0,...,49.0,14.0,27.0,27.0,59.0,16.0,15.0,0.0,8.0,90.0
2017,97.0,6.0,10.0,34.0,55.0,1.0,0.0,221.0,54.0,1.0,...,27.0,7.0,12.0,29.0,58.0,10.0,9.0,2.0,5.0,71.0
2016,89.0,5.0,11.0,30.0,45.0,1.0,0.0,197.0,39.0,1.0,...,26.0,16.0,17.0,32.0,37.0,1.0,7.0,0.0,5.0,75.0
2015,58.0,2.0,11.0,22.0,23.0,2.0,0.0,131.0,25.0,0.0,...,21.0,7.0,8.0,15.0,17.0,3.0,6.0,1.0,4.0,40.0
2014,32.0,2.0,11.0,7.0,29.0,0.0,0.0,93.0,11.0,0.0,...,18.0,3.0,1.0,8.0,11.0,1.0,1.0,1.0,3.0,40.0
2013,31.0,4.0,5.0,10.0,34.0,1.0,0.0,79.0,9.0,2.0,...,8.0,2.0,3.0,5.0,9.0,1.0,1.0,0.0,0.0,16.0
2012,30.0,2.0,4.0,9.0,21.0,0.0,0.0,81.0,6.0,2.0,...,4.0,1.0,2.0,4.0,12.0,0.0,0.0,0.0,1.0,9.0


In [424]:
# Top 5 genres across all years
genre_data.sum().sort_values(ascending=False).head(5)

Unnamed: 0_level_0,0
listed_in,Unnamed: 1_level_1
Drama,3190.0
International Movies,2752.0
Comedy,2255.0
International TV Shows,1351.0
Action,1027.0


In [425]:
# genres exploded recently (e.g., since 2019)
recent_genre = genre_data.loc[2019:]
recent_genre.sum().sort_values(ascending=False).head(10)

Unnamed: 0_level_0,0
listed_in,Unnamed: 1_level_1
Drama,2693.0
International Movies,2372.0
Comedy,1875.0
International TV Shows,988.0
Action,884.0
Romance,822.0
Documentaries,739.0
Independent Movies,694.0
Thriller,540.0
Children & Family Movies,518.0


In [426]:
# Genres that are declining over time
yearly_change = genre_data.diff()
declines_only = yearly_change[yearly_change < 0].fillna(0)
declines_only.sum().sort_values(ascending=False).head(5)

Unnamed: 0_level_0,0
listed_in,Unnamed: 1_level_1
TV Shows,-9.0
Classic & Cult TV,-17.0
Stand-Up Comedy & Talk Shows,-19.0
Anime Features,-21.0
Teen TV Shows,-21.0


In [427]:
# Most popular genre each year
genre_data.idxmax(axis=1).tail(10)

Unnamed: 0_level_0,0
release_year,Unnamed: 1_level_1
1956,Classic Movies
1955,Classic Movies
1954,Classic Movies
1947,Classic Movies
1946,Classic Movies
1945,Classic Movies
1944,Classic Movies
1943,Documentaries
1942,Classic Movies
1925,TV Shows


In [428]:
# Genre that had the single biggest spike in a year
yearly_change.max().sort_values(ascending=False).head(5)

Unnamed: 0_level_0,0
listed_in,Unnamed: 1_level_1
Drama,147.0
International Movies,98.0
Comedy,96.0
International TV Shows,65.0
Independent Movies,55.0


In [429]:
# 3 genres declined the most after 2020
decline_after_2020 = genre_data.loc[2021:]
change = decline_after_2020.diff()
decline_after_2020 = change[yearly_change < 0].fillna(0)
decline_after_2020.sum().sort_values(ascending=False).head(5)

Unnamed: 0_level_0,0
listed_in,Unnamed: 1_level_1
TV Shows,-9.0
Classic & Cult TV,-17.0
Stand-Up Comedy & Talk Shows,-19.0
Anime Features,-21.0
Teen TV Shows,-21.0


## 📊 Dataset Credit

Dataset used in this analysis is publicly available from [Netflix Titles Dataset on Kaggle](https://www.kaggle.com/datasets/shivamb/netflix-shows).  
Provided by **Shivam Bansal** on Kaggle.

- Dataset last updated: 2021 (may not include newest Netflix content)