# Analyzing Netflix Content: A Data Exploration

## Introduction

Netflix, the popular streaming platform, has revolutionized the way we consume entertainment content. With a vast library of Movies and TV Shows from around the world, Netflix offers a diverse range of genres, languages, and cultural perspectives.

In this project, we aim to explore and analyze the Netflix dataset to gain insights into the content available on the platform in the United States. By examining various aspects such as content types, genres, country distribution, and trends over time, we seek to understand the patterns and preferences of Netflix viewers.

## Objectives

- Explore the distribution of Movies and TV Shows on Netflix.
- Identify the top countries contributing to Netflix's content library.
- Analyze the most popular genres offered on Netflix.
- Investigate trends in content production over the years.
- Examine the distribution of TV ratings for Netflix content.
- Determine the average duration of Movies by genre.
- Calculate the average number of seasons for TV Shows by genre.
- Explore how genre preferences vary across different countries.

Through this analysis, we aim to provide valuable insights into the content landscape of Netflix and shed light on the viewing habits and preferences of its audience.


In [1]:
# importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns

In [2]:
# reading in the Netflix dataset
filepath = r'netflix_titles_raw_data.csv'

netflix = pd.read_csv(filepath)
netflix.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,25-Sep-21,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,24-Sep-21,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,24-Sep-21,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,24-Sep-21,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,24-Sep-21,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


In [3]:
# looking at the info of netflix data
netflix.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8807 non-null   object
 1   type          8807 non-null   object
 2   title         8807 non-null   object
 3   director      6173 non-null   object
 4   cast          7982 non-null   object
 5   country       7976 non-null   object
 6   date_added    8797 non-null   object
 7   release_year  8807 non-null   int64 
 8   rating        8803 non-null   object
 9   duration      8804 non-null   object
 10  listed_in     8807 non-null   object
 11  description   8807 non-null   object
dtypes: int64(1), object(11)
memory usage: 825.8+ KB


## Data Cleaning

In [4]:
print(f"{netflix.duplicated().sum()} duplicated rows found.")

0 duplicated rows found.


In [5]:
netflix['rating'].unique()

array(['PG-13', 'TV-MA', 'PG', 'TV-14', 'TV-PG', 'TV-Y', 'TV-Y7', 'R',
       'TV-G', 'G', 'NC-17', '74 min', '84 min', '66 min', 'NR', nan,
       'TV-Y7-FV', 'UR'], dtype=object)

Upon reviewing the ratings column in the dataset, it appears that some unusual values (e.g., "74 min", "84 min", "66 min") are present, suggesting that they might belong in the duration column instead.

In [6]:
# filter the df on rating = 74 min, 84 min, 66 min
netflix[netflix['rating'].isin(['74 min', '84 min', '66 min'])]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
5541,s5542,Movie,Louis C.K. 2017,Louis C.K.,Louis C.K.,United States,4-Apr-17,2017,74 min,,Movies,"Louis C.K. muses on religion, eternal love, gi..."
5794,s5795,Movie,Louis C.K.: Hilarious,Louis C.K.,Louis C.K.,United States,16-Sep-16,2010,84 min,,Movies,Emmy-winning comedy writer Louis C.K. brings h...
5813,s5814,Movie,Louis C.K.: Live at the Comedy Store,Louis C.K.,Louis C.K.,United States,15-Aug-16,2015,66 min,,Movies,The comic puts his trademark hilarious/thought...


In [7]:
# gather the indices of the rows with mislabeled rating values
mishifted_rating_indices = netflix[netflix['rating'].isin(['74 min', '84 min', '66 min'])].index
mishifted_rating_indices

Int64Index([5541, 5794, 5813], dtype='int64')

Here, we switch the corresponding values between the rating and duration column.

In [8]:
netflix.loc[netflix['rating'].isin(['74 min', '84 min', '66 min']), 'duration'] = netflix['rating']
netflix.loc[netflix['rating'].isin(['74 min', '84 min', '66 min']), 'rating'] = np.nan


In [9]:
netflix.iloc[mishifted_rating_indices]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
5541,s5542,Movie,Louis C.K. 2017,Louis C.K.,Louis C.K.,United States,4-Apr-17,2017,,74 min,Movies,"Louis C.K. muses on religion, eternal love, gi..."
5794,s5795,Movie,Louis C.K.: Hilarious,Louis C.K.,Louis C.K.,United States,16-Sep-16,2010,,84 min,Movies,Emmy-winning comedy writer Louis C.K. brings h...
5813,s5814,Movie,Louis C.K.: Live at the Comedy Store,Louis C.K.,Louis C.K.,United States,15-Aug-16,2015,,66 min,Movies,The comic puts his trademark hilarious/thought...


In [10]:
# check the percentages of null values for each column
null_percentages = round(netflix.isnull().sum() / netflix.shape[0] * 100, 2)
null_percentages.sort_values(ascending=False)

director        29.91
country          9.44
cast             9.37
date_added       0.11
rating           0.08
show_id          0.00
type             0.00
title            0.00
release_year     0.00
duration         0.00
listed_in        0.00
description      0.00
dtype: float64

The 'director' column contributed to the majority of missing values in the dataset, comprising 30%. 'Country' and 'cast' columns each represented approximately 9.4% of the missing values, while 'date_added' and 'rating' columns collectively constituted 19% of the missing values.

Given that all columns containing null values are of string type, we will address this by replacing the null values with the string "unknown".

In [11]:
# fill null_values with "unknown"
columns_with_nulls = ['director','country','cast','rating']
netflix[columns_with_nulls] = netflix[columns_with_nulls].fillna('unknown')

In [12]:
# fill null values of date_added column with the previously non-null value
netflix['date_added'] = netflix['date_added'].ffill()

In [13]:
# Convert the 'date_added' column from strings to datetime objects to facilitate time-based analysis
netflix['date_added'] = pd.to_datetime(netflix['date_added'])

In [14]:
netflix

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,unknown,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,unknown,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",unknown,2021-09-24,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,unknown,unknown,unknown,2021-09-24,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,unknown,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...
...,...,...,...,...,...,...,...,...,...,...,...,...
8802,s8803,Movie,Zodiac,David Fincher,"Mark Ruffalo, Jake Gyllenhaal, Robert Downey J...",United States,2019-11-20,2007,R,158 min,"Cult Movies, Dramas, Thrillers","A political cartoonist, a crime reporter and a..."
8803,s8804,TV Show,Zombie Dumb,unknown,unknown,unknown,2019-07-01,2018,TV-Y7,2 Seasons,"Kids' TV, Korean TV Shows, TV Comedies","While living alone in a spooky town, a young g..."
8804,s8805,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,2019-11-01,2009,R,88 min,"Comedies, Horror Movies",Looking to survive in a world taken over by zo...
8805,s8806,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,2020-01-11,2006,PG,88 min,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero..."


In [15]:
# check if there are 0 nulls left
print(f"{netflix.isna().sum().sum()} null values left.")

0 null values left.


Now, the netflix dataset is cleaned.

## Exploratory Data Analysis

### Content Types

In [16]:
df = netflix.copy()

types = df['type'].value_counts().reset_index()
types

Unnamed: 0,index,type
0,Movie,6131
1,TV Show,2676


In [17]:
print(f'There are {types.iloc[0]["type"]} Movies and {types.iloc[1]["type"]} TV Shows in this Dataset.')

There are 6131 Movies and 2676 TV Shows in this Dataset.


In [18]:
fig = px.pie(types, values='type', names='index', title='Distribution of Netflix Content Types')
colors = ['#E50914', 'black']  # Red and black, resembling Netflix's color scheme
fig.update_traces(marker=dict(colors=colors))

# before using this to write image, you must install kaleido using "pip install kaleido"
fig.write_image("netflix_content_types.png")
# fig.show()

![Netflix Content Types](netflix_content_types.png)

Upon examining the distribution of content types on Netflix, it is evident that the platform predominantly offers movies, which constitute 70% of the content, while TV shows make up the remaining 30%.

### Top 20 Countries

In [19]:
# Extract country and type columns from df
country_type = df[['country', 'type']].copy()
country_type

Unnamed: 0,country,type
0,United States,Movie
1,South Africa,TV Show
2,unknown,TV Show
3,unknown,TV Show
4,India,TV Show
...,...,...
8802,United States,Movie
8803,unknown,TV Show
8804,United States,Movie
8805,United States,Movie


In [20]:
country_type['country'] = country_type['country'].str.split(", ")
country_type

Unnamed: 0,country,type
0,[United States],Movie
1,[South Africa],TV Show
2,[unknown],TV Show
3,[unknown],TV Show
4,[India],TV Show
...,...,...
8802,[United States],Movie
8803,[unknown],TV Show
8804,[United States],Movie
8805,[United States],Movie


In [21]:
# transform each country in the country column to its own row but preserve its numbered index
# like with index 7
country_type = country_type.explode('country')
country_type.head(20)

Unnamed: 0,country,type
0,United States,Movie
1,South Africa,TV Show
2,unknown,TV Show
3,unknown,TV Show
4,India,TV Show
5,unknown,TV Show
6,unknown,Movie
7,United States,Movie
7,Ghana,Movie
7,Burkina Faso,Movie


In [22]:
# Get the number of types for each country
country_counts = country_type.pivot_table(index='country', columns='type', aggfunc='size', fill_value=0)
country_counts

type,Movie,TV Show
country,Unnamed: 1_level_1,Unnamed: 2_level_1
,1,1
Afghanistan,1,0
Albania,1,0
Algeria,3,0
Angola,1,0
...,...,...
Venezuela,4,0
Vietnam,7,0
West Germany,3,2
Zimbabwe,3,0


In [23]:
# Gather top 20 countries on Netflix excluding unknown

cc_index = country_counts[country_counts.index != 'unknown'].sum(axis=1).sort_values(ascending=False)[:20].index
cc_index

Index(['United States', 'India', 'United Kingdom', 'Canada', 'France', 'Japan',
       'Spain', 'South Korea', 'Germany', 'Mexico', 'China', 'Australia',
       'Egypt', 'Turkey', 'Hong Kong', 'Nigeria', 'Italy', 'Brazil',
       'Argentina', 'Indonesia'],
      dtype='object', name='country')

In [24]:
top_country_counts = country_counts.loc[cc_index]
top_country_counts

type,Movie,TV Show
country,Unnamed: 1_level_1,Unnamed: 2_level_1
United States,2751,938
India,962,84
United Kingdom,532,272
Canada,319,126
France,303,90
Japan,119,199
Spain,171,61
South Korea,61,170
Germany,182,44
Mexico,111,58


In [25]:
top_country_counts = top_country_counts.reset_index()
top_country_counts

type,country,Movie,TV Show
0,United States,2751,938
1,India,962,84
2,United Kingdom,532,272
3,Canada,319,126
4,France,303,90
5,Japan,119,199
6,Spain,171,61
7,South Korea,61,170
8,Germany,182,44
9,Mexico,111,58


In [26]:
# Create a stacked bar chart
fig = px.bar(top_country_counts, 
             x='country', 
             y=['Movie', 'TV Show'], 
             title="Top 20 Countries with Most Films on Netflix", 
             labels={"country": "Countries", "value_counts": "Counts"},
             barmode="stack",
             color_discrete_map={'Movie': '#E50914', 'TV Show': 'black'} # Netflix Red and Black colors
            )

# Update layout to match Netflix theme
fig.update_layout(
    plot_bgcolor='lightgray',  # Set background color to white
    paper_bgcolor='white',  # Set paper color to white
    font=dict(color='black'),  # Set font color to black for better contrast
    xaxis_tickangle=-45  # Rotate x-axis labels diagonally
)

fig.write_image("top_20_countries_netflix.png", width=800, height=600, scale=2)

# fig.show()


![Top 20 Genres on Netflix](top_20_countries_netflix.png)


It's apparent that among the top 20 countries with the most content on Netflix, the United States, India, and the United Kingdom stand out as the leading contributors. We can further conclude that this data reflects the platform's ability to attract and engage a diverse international audience, including a substantial viewership within the United States. This underscores Netflix's position as a global streaming giant with broad appeal across various regions and demographics, highlighting its success in delivering content that resonates with audiences worldwide.

### Top 20 Genres

In [27]:
# Similar to explode() where we have each genre as its own row
top_genres = df['listed_in'].str.split(', ').explode()

#Make a count of how many films per genre
top_genres = top_genres.value_counts().head(20).reset_index()

# After resetting index, create col names for the 2 cols
top_genres.columns = ['Genre', 'Count']

top_genres

Unnamed: 0,Genre,Count
0,International Movies,2752
1,Dramas,2427
2,Comedies,1674
3,International TV Shows,1351
4,Documentaries,869
5,Action & Adventure,859
6,TV Dramas,763
7,Independent Movies,756
8,Children & Family Movies,641
9,Romantic Movies,616


In [28]:
fig = px.bar(data_frame = top_genres, x='Genre', y='Count')

fig.update_layout(
    title="Top 20 Genres on Netflix",
    xaxis_title="Genre",
    yaxis_title="Counts",
    width=700,
    height=430,
    xaxis_tickangle=-45,
    plot_bgcolor='black',  # Set background color to black
    paper_bgcolor='black',  # Set paper color to black
    font=dict(color='white')  # Set font color to white
)

fig.update_traces(marker_color='#E50914')
fig.write_image("top_20_genres_on_netflix.png", width=800, height=600, scale=2)

# fig.show()

![Top 20 Genres on Netflix](top_20_genres_on_netflix.png)


In our analysis of the top 20 genres on Netflix, it becomes evident that International Movies, Dramas, and Comedies are the leading genres on the platform. This suggests that Netflix strategically curates its content library to offer a diverse array of genres that cater to the varied interests and preferences of its audience. The prominence of International Movies indicates a commitment to showcasing content from around the world, fostering cultural exchange and global perspectives. Moreover, the popularity of Dramas and Comedies underscores Netflix's ability to deliver compelling storytelling and entertainment that resonates with viewers across different demographics.



### Content Trend

We aim to examine the annual output of content released by Netflix. Our objective is to gain insights into the temporal trends and patterns of content production over time.

In [29]:
import plotly.express as px

# Assuming 'netflix' is your original DataFrame

# Extract year portion of the date_added column as its own column called 'year_added'
netflix['year_added'] = netflix['date_added'].dt.year

# Select 'year_added' and 'type' columns only
contents = netflix[['year_added', 'type']]

# Pivot the table to get counts of 'Movie' and 'TV Show' per year
contents_per_year = contents.pivot_table(index='year_added', columns='type', aggfunc='size', fill_value=0).reset_index()

# Create a line chart with Netflix theme
fig = px.line(contents_per_year, x='year_added', y=['Movie', 'TV Show'], 
              markers=True, color_discrete_map={'Movie': '#E50914', 'TV Show': '#221F1F'})

# Update layout to match Netflix theme
fig.update_layout(
    title='Content Trend over Years',
    xaxis_title='Year',
    yaxis_title='Count',
    legend_title='Type',
    width=700,
    height=380,
    plot_bgcolor='lightgray',  # Set background color to black
    paper_bgcolor='white',  # Set paper color to black
    font=dict(color='black'),  # Set font color to white
)

fig.write_image("content_trend_over_years.png", width=800, height=600, scale=2)

# fig.show()


![Top 20 Genres on Netflix](content_trend_over_years.png)


Examining trends over the years, we noticed a steady increase in the availability of content, with a notable surge during the years 2015 to 2019, particularly in 2019, which marked the peak of content production. This surge was observed for both Movies and TV Shows, with a steady growth observed after 2015 for TV Shows as well.



### Ratings

In [30]:
df = netflix.copy()

# Grouping by Ratings and keep its tally:
rating = df.groupby(['rating']).size().reset_index(name = "counts")
rating = rating.sort_values(by = 'counts', ascending = False)

# Bar Chart 
fig = px.bar(rating, x = 'rating', y= 'counts')

#Layout
fig.update_layout(
    title = 'Rating Distribution',
    xaxis_title = 'Ratings',
    yaxis_title = 'Counts',
    width=700, 
    height=400,
    xaxis_tickangle=-45,
    plot_bgcolor='black',  # Set background color to black
    paper_bgcolor='black',  # Set paper color to black
    font=dict(color='white')  # Set font color to white
)

fig.update_traces(marker_color='#E50914')

fig.write_image("rating_distribution.png", width=800, height=600, scale=2)

# fig.show()


![Top 20 Genres on Netflix](rating_distribution.png)

The distribution of TV ratings revealed a wide range of content suitable for diverse audience preferences, including family-friendly options and mature content. The distribution of TV ratings revealed that TV-MA, TV-14, and TV-PG are the most common ratings on Netflix.


### Avg Duration of Movies

In [31]:
# Content Duration:
# Movie
movies = df[['duration','listed_in','type']]

movies = movies[movies['type'] == 'Movie'].copy()

# keep number of minutes only.
movies['duration'] = movies['duration'].str.extract('(\d+)').astype(float)


# Split listed_in by ", ".
movies['listed_in'] = movies['listed_in'].str.split(', ')

# Explode listed_in
movies = movies.explode('listed_in')


# Calculate average duration of each genre.
avg_movie_duration = round(movies.groupby('listed_in')['duration'].mean(),2).sort_values(ascending = False)

# Convert it to DataFrame
avg_movie_duration = avg_movie_duration.reset_index()

# Rename columns
avg_movie_duration.columns = ['genre','mean duration']

# Ploting in bar chart.
fig = px.bar(avg_movie_duration, x = 'genre', y = 'mean duration')

#layout
fig.update_layout(
    title = 'Average Movie Duration By Genre',
    xaxis_title = 'Genre',
    yaxis_title = 'Minutes',
    width = 800,
    height = 400,
    xaxis_tickangle=-45,
    plot_bgcolor='black',  # Set background color to black
    paper_bgcolor='black',  # Set paper color to black
    font=dict(color='white')  # Set font color to white    
)

fig.update_traces(marker_color='#E50914')

fig.write_image("avg_movie_duration_by_genre.png", width=800, height=600, scale=2)

# fig.show()



![Top 20 Genres on Netflix](avg_movie_duration_by_genre.png)


### Average Seasons in TV Shows

In [32]:
#TV Show average duration

TV = df[['duration','listed_in','type']]

TV = TV[TV['type'] == 'TV Show'].copy()

# keep number of seasons only.

TV['seasons'] = TV['duration'].str.extract('(\d+)').astype(float)

# Split listed_in column
TV['listed_in'] = TV['listed_in'].str.split(', ')

# Explode listed_in column to ensure each row has one genre.
TV = TV.explode('listed_in')

# Calculate mean duration for each genre.
avg_TV_duration = round(TV.groupby('listed_in')['seasons'].mean(),2).sort_values(ascending = False)

# Ploting in bar chart.
fig = px.bar(avg_TV_duration)

#layout
fig.update_layout(
    title = 'Average Number of TV seasons by genres',
    xaxis_title = 'genres',
    yaxis_title = 'seasons',
    width = 800,
    height = 600,
    xaxis_tickangle=-45,
    plot_bgcolor='black',  # Set background color to black
    paper_bgcolor='black',  # Set paper color to black
    font=dict(color='white')  # Set font color to white    
)

fig.update_traces(marker_color='#E50914')

fig.write_image("avg_num_of_tv_seasons_by_genres.png", width=800, height=600, scale=2)

# fig.show()

![Top 20 Genres on Netflix](avg_num_of_tv_seasons_by_genres.png)

###  Top Genres by Country

In [33]:
# Extract country and listed_in columns
genres = df[['country','listed_in']].copy()

# Country column data standarizing 
genres['country'] = genres['country'].str.split(", ")
genres = genres.explode('country')

# listed_in column data standarizing 
genres['listed_in'] = genres['listed_in'].str.split(", ")
genres = genres.explode('listed_in')

# Top 5 country names (excluding unknown countries)
top_5_countries = top_country_counts[top_country_counts['country'] != 'unknown' ].head(5)['country']
top_5_countries

0     United States
1             India
2    United Kingdom
3            Canada
4            France
Name: country, dtype: object

In [34]:
# United States
US_genre = genres[genres['country'] == 'United States']
US_genre = US_genre.groupby('listed_in').size().sort_values(ascending = False).head(10)
US_genre = US_genre.reset_index()
US_genre.columns = ['genre','count']

# India
India_genre = genres[genres['country'] == 'India']
India_genre = India_genre.groupby('listed_in').size().sort_values(ascending = False).head(10)
India_genre = India_genre.reset_index()
India_genre.columns = ['genre','count']

# United Kingdom
UK_genre = genres[genres['country'] == 'United Kingdom']
UK_genre = UK_genre.groupby('listed_in').size().sort_values(ascending = False).head(10)
UK_genre = UK_genre.reset_index()
UK_genre.columns = ['genre','count']

# Canada
Canada_genre = genres[genres['country'] == 'Canada']
Canada_genre = Canada_genre.groupby('listed_in').size().sort_values(ascending = False).head(10)
Canada_genre = Canada_genre.reset_index()
Canada_genre.columns = ['genre','count']

# France
France_genre = genres[genres['country'] == 'France']
France_genre = France_genre.groupby('listed_in').size().sort_values(ascending = False).head(10)
France_genre = France_genre.reset_index()
France_genre.columns = ['genre','count']

In [35]:
fig = px.pie(US_genre, values = 'count', names = 'genre')

fig.update_layout(
    title = "US's Top 10 Genres",
    width = 600,
    height = 500,
    plot_bgcolor='black',
    paper_bgcolor='black',
    font=dict(color='white')
)

fig.write_image("us_top10.png", width=800, height=600, scale=2)

# fig.show()

![Top 20 Genres on Netflix](us_top10.png)

In [36]:
fig = px.pie(India_genre, values = 'count', names = 'genre')

fig.update_layout(
    title = "India's Top 10 Genres",
    width = 600,
    height = 500,
    plot_bgcolor='black',
    paper_bgcolor='black',
    font=dict(color='white')
)

fig.write_image("india_top10.png", width=800, height=600, scale=2)

# fig.show()

![Top 20 Genres on Netflix](india_top10.png)

In [37]:
fig = px.pie(UK_genre, values = 'count', names = 'genre')

fig.update_layout(
    title = "UK's Top 10 Genres",
    width = 600,
    height = 500,
    plot_bgcolor='black',
    paper_bgcolor='black',
    font=dict(color='white')    
)

fig.write_image("uk_top10.png", width=800, height=600, scale=2)

# fig.show()

![Top 20 Genres on Netflix](uk_top10.png)

In [38]:
fig = px.pie(Canada_genre, values = 'count', names = 'genre')

fig.update_layout(
    title = "Canada's Top 10 Genres",
    width = 600,
    height = 500,
    plot_bgcolor='black',
    paper_bgcolor='black',
    font=dict(color='white')
)

fig.write_image("canada_top10.png", width=800, height=600, scale=2)

# fig.show()

![Top 20 Genres on Netflix](canada_top10.png)

In [39]:
fig = px.pie(France_genre, values = 'count', names = 'genre')

fig.update_layout(
    title = "France's Top 10 Genres",
    width = 600,
    height = 500,
    plot_bgcolor='black',
    paper_bgcolor='black',
    font=dict(color='white')
)

fig.write_image("france_top10.png", width=800, height=600, scale=2)

# fig.show()

![Top 20 Genres on Netflix](france_top10.png)

## Conclusion

In our exploration of Netflix content in the United States, we've uncovered valuable insights into the platform's content landscape and viewer preferences. Notably, we've observed:

- **Content Diversity:** Netflix offers a rich variety of content, with a notable emphasis on Movies alongside TV Shows.
- **Global Appeal:** The United States, India, and the United Kingdom emerge as key contributors to Netflix's content library, highlighting the platform's global reach and resonance.
- **Genre Preferences:** International Movies, Dramas, and Comedies stand out as the leading genres on Netflix, reflecting the diverse interests of viewers.
- **Trends Over Time:** We've observed a steady increase in content availability, with a notable surge between 2015 and 2019, marking a peak in production across both Movies and TV Shows.
- **TV Ratings Diversity:** Netflix offers content spanning various audience preferences, with TV-MA, TV-14, and TV-PG ratings being the most common, catering to a wide range of viewer demographics.
- **Duration and Seasons:** While movie durations remain fairly consistent across genres, Classic & Cult TV shows exhibit high average season counts, indicating strong viewer engagement with long-running series.
- **Geographical Genre Preferences:** By analyzing genre preferences across the top 5 countries, we've gained insights into how viewer tastes vary regionally.

These findings provide valuable insights into Netflix's content strategy and its ability to cater to diverse audience preferences. Content creators, marketers, and decision-makers in the entertainment industry can leverage these insights to better understand viewer demands and tailor content offerings accordingly.
