In [650]:
print("""# 🎬 Netflix Content Analysis

Welcome to this exploratory data analysis (EDA) project on the Netflix dataset.

In this notebook, we explore:
- Trends in content type (Movies vs TV Shows)
- Year-wise and country-wise distribution of content
- Duration, genre, and other content attributes

This notebook is structured to provide insights for aspiring AI and Data Science professionals, and is ready for portfolio showcasing.
""")

# 🎬 Netflix Content Analysis

Welcome to this exploratory data analysis (EDA) project on the Netflix dataset.

In this notebook, we explore:
- Trends in content type (Movies vs TV Shows)
- Year-wise and country-wise distribution of content
- Duration, genre, and other content attributes

This notebook is structured to provide insights for aspiring AI and Data Science professionals, and is ready for portfolio showcasing.



In [646]:
import pandas as pd 

In [576]:

file_path = r"C:\Users\dipak\OneDrive\Desktop\Journey_AI\IBM AI ENgineer\netflix_titles\netflix_titles.csv\netflix_titles.csv"


In [577]:
df = pd.read_csv(file_path)
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


In [578]:
# Drop duplicates 

In [579]:
had_duplicates = df.duplicated().any()

In [580]:
had_duplicates

False

In [581]:
if had_duplicates:
    df = df.drop_duplicates()

In [582]:
# Drop null values 

In [583]:
df = df.dropna()

In [584]:
had_duplicates = df.duplicated().any()

In [585]:
had_duplicates

False

In [586]:
df.head(3)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s..."
8,s9,TV Show,The Great British Baking Show,Andy Devonshire,"Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho...",United Kingdom,"September 24, 2021",2021,TV-14,9 Seasons,"British TV Shows, Reality TV",A talented batch of amateur bakers face off in...
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...


In [587]:
new_time = df['date_added']
new_time = new_time.to_numpy()
new_time

array(['September 24, 2021', 'September 24, 2021', 'September 24, 2021',
       ..., 'November 1, 2019', 'January 11, 2020', 'March 2, 2019'],
      dtype=object)

In [588]:
year_added = []
for i in new_time:
    year_added.append(pd.to_datetime(str(i)))

In [589]:
year_added[:5]

[Timestamp('2021-09-24 00:00:00'),
 Timestamp('2021-09-24 00:00:00'),
 Timestamp('2021-09-24 00:00:00'),
 Timestamp('2021-09-23 00:00:00'),
 Timestamp('2021-09-21 00:00:00')]

In [590]:
year_added = pd.DataFrame(year_added, columns=["Year added"])
year_added.head()

Unnamed: 0,Year added
0,2021-09-24
1,2021-09-24
2,2021-09-24
3,2021-09-23
4,2021-09-21


In [591]:
df = pd.concat([df,year_added],axis=1)

In [592]:
df.columns

Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
       'release_year', 'rating', 'duration', 'listed_in', 'description',
       'Year added'],
      dtype='object')

In [593]:
# We can successfully see that year has beed added to dataframe 

In [594]:
type_data = df[['type']].value_counts()
type_data.head()

type   
Movie      5185
TV Show     147
Name: count, dtype: int64

In [595]:
total = type_data["Movie"] + type_data["TV Show"]
total

5332

In [596]:
print(f"Movie: {(type_data["Movie"]/total) * 100}")

Movie: 97.2430607651913


In [597]:
print(f"TV Show: {(type_data["TV Show"]/total) * 100}")

TV Show: 2.7569392348087023


In [598]:
total

5332

In [599]:
df.columns

Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
       'release_year', 'rating', 'duration', 'listed_in', 'description',
       'Year added'],
      dtype='object')

In [600]:
x = df[['Year added']]
x = pd.to_datetime(x['Year added']).dt.year.sort_index()

In [601]:
df['country'].value_counts()

country
United States                                   1846
India                                            875
United Kingdom                                   183
Canada                                           107
Spain                                             91
                                                ... 
Uruguay, Guatemala                                 1
Romania, Bulgaria, Hungary                         1
Philippines, United States                         1
India, United Kingdom, Canada, United States       1
United Arab Emirates, Jordan                       1
Name: count, Length: 604, dtype: int64

In [602]:
df.columns

Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
       'release_year', 'rating', 'duration', 'listed_in', 'description',
       'Year added'],
      dtype='object')

In [603]:
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Year added
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",1993.0,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",2021-09-19
8,s9,TV Show,The Great British Baking Show,Andy Devonshire,"Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho...",United Kingdom,"September 24, 2021",2021.0,TV-14,9 Seasons,"British TV Shows, Reality TV",A talented batch of amateur bakers face off in...,2021-09-16
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",2021.0,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,2021-09-16
12,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic","September 23, 2021",2021.0,TV-MA,127 min,"Dramas, International Movies",After most of her family is murdered in a terr...,2021-09-16
24,s25,Movie,Jeans,S. Shankar,"Prashanth, Aishwarya Rai Bachchan, Sri Lakshmi...",India,"September 21, 2021",1998.0,TV-14,166 min,"Comedies, International Movies, Romantic Movies",When the father of the man she loves insists t...,2021-09-15


In [604]:
genere = df[["listed_in"]]

In [605]:
genere.value_counts()

listed_in                                                      
Dramas, International Movies                                       336
Stand-Up Comedy                                                    286
Comedies, Dramas, International Movies                             257
Dramas, Independent Movies, International Movies                   243
Children & Family Movies, Comedies                                 179
                                                                  ... 
Anime Series, Crime TV Shows, TV Horror                              1
Crime TV Shows, International TV Shows, Korean TV Shows              1
Anime Series, International TV Shows, Spanish-Language TV Shows      1
Anime Series, International TV Shows, TV Horror                      1
Action & Adventure, Comedies, Music & Musicals                       1
Name: count, Length: 335, dtype: int64

In [606]:
# Feature Engineering

In [607]:
content_age = df[["Year added"]]
content_age = pd.to_datetime(content_age["Year added"]).dt.year
content_age = 2025 - content_age
content_age = content_age.to_numpy()

In [608]:
content_age = pd.DataFrame(content_age,columns=["Content Age"])
content_age

Unnamed: 0,Content Age
0,4.0
1,4.0
2,4.0
3,4.0
4,4.0
...,...
7729,8.0
7730,6.0
7731,6.0
7732,7.0


In [609]:
df = df.drop_duplicates().reset_index(drop=True)
df = pd.concat([df,content_age], axis=1)

In [610]:
df.columns

Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
       'release_year', 'rating', 'duration', 'listed_in', 'description',
       'Year added', 'Content Age'],
      dtype='object')

In [611]:
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Year added,Content Age
0,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",1993.0,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",2021-09-19,4.0
1,s9,TV Show,The Great British Baking Show,Andy Devonshire,"Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho...",United Kingdom,"September 24, 2021",2021.0,TV-14,9 Seasons,"British TV Shows, Reality TV",A talented batch of amateur bakers face off in...,2021-09-16,4.0
2,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",2021.0,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,2021-09-16,4.0
3,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic","September 23, 2021",2021.0,TV-MA,127 min,"Dramas, International Movies",After most of her family is murdered in a terr...,2021-09-16,4.0
4,s25,Movie,Jeans,S. Shankar,"Prashanth, Aishwarya Rai Bachchan, Sri Lakshmi...",India,"September 21, 2021",1998.0,TV-14,166 min,"Comedies, International Movies, Romantic Movies",When the father of the man she loves insists t...,2021-09-15,4.0


In [612]:
df = df.dropna()

In [613]:
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Year added,Content Age
0,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",1993.0,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",2021-09-19,4.0
1,s9,TV Show,The Great British Baking Show,Andy Devonshire,"Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho...",United Kingdom,"September 24, 2021",2021.0,TV-14,9 Seasons,"British TV Shows, Reality TV",A talented batch of amateur bakers face off in...,2021-09-16,4.0
2,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",2021.0,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,2021-09-16,4.0
3,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic","September 23, 2021",2021.0,TV-MA,127 min,"Dramas, International Movies",After most of her family is murdered in a terr...,2021-09-16,4.0
4,s25,Movie,Jeans,S. Shankar,"Prashanth, Aishwarya Rai Bachchan, Sri Lakshmi...",India,"September 21, 2021",1998.0,TV-14,166 min,"Comedies, International Movies, Romantic Movies",When the father of the man she loves insists t...,2021-09-15,4.0


In [624]:
content_age_cat = []

In [625]:
content_age_dummy = df["Content Age"].to_numpy()

In [626]:
for i in content_age_dummy:
    if int(i) <= 4:
        content_age_cat.append("Recent")
    elif int(i) <= 12:
        content_age_cat.append("Modern")
    else:
        content_age_cat.append("Classic")

In [627]:
print(content_age_cat)

['Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent', 'Recent',

In [629]:
content_age_cat = pd.DataFrame(content_age_cat,columns=["ContentAge Category"])
df = pd.concat([df,content_age_cat],axis=1)

In [630]:
df.columns

Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
       'release_year', 'rating', 'duration', 'listed_in', 'description',
       'Year added', 'Content Age', 'ContentAge Category'],
      dtype='object')

In [633]:
genere_group = df[["listed_in"]]

In [641]:
genere_group_list = list(genere_group.value_counts().index.to_numpy())

In [644]:
genere_group_list

[('Stand-Up Comedy',),
 ('Dramas, International Movies',),
 ('Comedies, Dramas, International Movies',),
 ('Dramas, Independent Movies, International Movies',),
 ('Dramas, International Movies, Romantic Movies',),
 ('Comedies, International Movies, Romantic Movies',),
 ('Comedies, International Movies',),
 ('Children & Family Movies, Comedies',),
 ('Dramas, International Movies, Thrillers',),
 ('Comedies, Dramas, Independent Movies',),
 ('Action & Adventure',),
 ('Action & Adventure, Dramas, International Movies',),
 ('Action & Adventure, International Movies',),
 ('Children & Family Movies',),
 ('Dramas',),
 ('Comedies',),
 ('Documentaries',),
 ('International Movies, Thrillers',),
 ('Dramas, Independent Movies',),
 ('Action & Adventure, Comedies, International Movies',),
 ('Dramas, Thrillers',),
 ('Horror Movies, International Movies',),
 ('Comedies, Romantic Movies',),
 ('Comedies, International Movies, Music & Musicals',),
 ('Dramas, Romantic Movies',),
 ('Thrillers',),
 ('Dramas, 

In [645]:
print(f" There are total of {len(genere_group_list)} genere")

 There are total of 258 genere


In [649]:
print("""Conclusion: 
In this analysis, we uncovered several key insights:

- Netflix has more **movies than TV shows**, though both types have increased over time.
- The **USA dominates** the content catalog, but many countries contribute content globally.
- **Content addition peaked** between 2017–2019.
- The **duration distribution** shows that most content is short-to-medium length, and movies tend to cluster around 90 minutes.
- Genre exploration could be extended further for recommendation system insights.""")

Conclusion: 
In this analysis, we uncovered several key insights:

- Netflix has more **movies than TV shows**, though both types have increased over time.
- The **USA dominates** the content catalog, but many countries contribute content globally.
- **Content addition peaked** between 2017–2019.
- The **duration distribution** shows that most content is short-to-medium length, and movies tend to cluster around 90 minutes.
- Genre exploration could be extended further for recommendation system insights.
