## Netflix Content Strategy Analysis
 
 Content Strategy Analysis means analyzing how content is created, released, distributed, and consumed to achieve specific goals, such as maximizing audience engagement, viewership, brand reach, or revenue. 

In [1]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.templates.default = "plotly_white"



In [2]:
# Read data

df = pd.read_csv("../../data/netflix_content_2023.csv")
df.head()

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed,Language Indicator,Content Type
0,The Night Agent: Season 1,Yes,2023-03-23,812100000,English,Show
1,Ginny & Georgia: Season 2,Yes,2023-01-05,665100000,English,Show
2,The Glory: Season 1 // 더 글로리: 시즌 1,Yes,2022-12-30,622800000,Korean,Show
3,Wednesday: Season 1,Yes,2022-11-23,507700000,English,Show
4,Queen Charlotte: A Bridgerton Story,Yes,2023-05-04,503000000,English,Movie


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24812 entries, 0 to 24811
Data columns (total 6 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Title                24812 non-null  object
 1   Available Globally?  24812 non-null  object
 2   Release Date         8166 non-null   object
 3   Hours Viewed         24812 non-null  object
 4   Language Indicator   24812 non-null  object
 5   Content Type         24812 non-null  object
dtypes: object(6)
memory usage: 1.1+ MB


In [4]:
# converting to date time and extract the month
df['Release Date'] = pd.to_datetime(df['Release Date']).dt.month

In [5]:
df.head()

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed,Language Indicator,Content Type
0,The Night Agent: Season 1,Yes,3.0,812100000,English,Show
1,Ginny & Georgia: Season 2,Yes,1.0,665100000,English,Show
2,The Glory: Season 1 // 더 글로리: 시즌 1,Yes,12.0,622800000,Korean,Show
3,Wednesday: Season 1,Yes,11.0,507700000,English,Show
4,Queen Charlotte: A Bridgerton Story,Yes,5.0,503000000,English,Movie


In [6]:
# lets clean Hours view column

df['Hours Viewed'] = df['Hours Viewed'].replace(',', '',regex=True ).astype('float')

In [7]:
df.head()

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed,Language Indicator,Content Type
0,The Night Agent: Season 1,Yes,3.0,812100000.0,English,Show
1,Ginny & Georgia: Season 2,Yes,1.0,665100000.0,English,Show
2,The Glory: Season 1 // 더 글로리: 시즌 1,Yes,12.0,622800000.0,Korean,Show
3,Wednesday: Season 1,Yes,11.0,507700000.0,English,Show
4,Queen Charlotte: A Bridgerton Story,Yes,5.0,503000000.0,English,Movie


The “Hours Viewed” column has been successfully cleaned and converted to a numeric format.

In [13]:
# aggregate viewership hours by content type

viewership_content = df.groupby('Content Type')['Hours Viewed'].sum()

viewership_content.head()

Content Type
Movie    5.063780e+10
Show     1.077641e+11
Name: Hours Viewed, dtype: float64

Let's visualize this

In [12]:
fig = go.Figure(data=[
    go.Bar(
        x = viewership_content.index,
        y = viewership_content.values,
        marker_color = ['skyblue', 'salmon']
    )
])

fig.update_layout(
    title = 'Total Viewership Hours by Content Type ',
    xaxis_title = 'Content Type',
    yaxis_title = 'Hours',  
    xaxis_tickangle=0  ,
    height=500,
    width=800
)
# fig.show()

ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

The visualization indicates that shows dominate the total viewership hours on Netflix in 2023 compared to movies. This suggests that Netflix’s content strategy leans heavily toward shows, as they tend to attract more watch hours overall.

In [14]:
# aggregate viewership hours by language

view_lang = df.groupby('Language Indicator')['Hours Viewed'].sum()
view_lang

Language Indicator
English        1.244417e+11
Hindi          9.261000e+08
Japanese       7.102000e+09
Korean         1.537840e+10
Non-English    1.043910e+10
Russian        1.146000e+08
Name: Hours Viewed, dtype: float64

Let's visualize to be more clear


In [15]:
fig = go.Figure(data=[
    go.Bar(
        x = view_lang.index,
        y = view_lang.values,
        marker_color = 'lightcoral'
    )
])

fig.update_layout(
    title = "Total Viewership Hours by Language",
    xaxis_title = "Language",
    yaxis_title = 'Total hours',
    xaxis_tickangle = 45,
    height = 600,
    width = 1000
)

ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

The visualization reveals that English-language content significantly dominates Netflix’s viewership, followed by other languages like Korean. It indicates that Netflix’s primary audience is consuming English content, although non-English shows and movies also have a considerable viewership share, which shows a diverse content strategy.

In [18]:
# aggregate viewership hours by release month

view_month = df.groupby('Release Date')['Hours Viewed'].sum()
view_month

Release Date
1.0     7.271600e+09
2.0     7.103700e+09
3.0     7.437100e+09
4.0     6.865700e+09
5.0     7.094600e+09
6.0     8.522000e+09
7.0     6.524800e+09
8.0     6.817800e+09
9.0     7.262200e+09
10.0    8.123200e+09
11.0    7.749500e+09
12.0    1.005580e+10
Name: Hours Viewed, dtype: float64

Let's visualize.

In [20]:
fig = go.Figure(data=[
    go.Scatter(
        x=view_month.index,
        y=view_month.values,
        mode='lines+markers',
        marker=dict(color='blue'),
        line=dict(color='blue')
    )
])

fig.update_layout(
    title='Total Viewership Hours by Release Month (2023)',
    xaxis_title='Month',
    yaxis_title='Total Hours Viewed (in billions)',
    xaxis=dict(
        tickmode='array',
        tickvals=list(range(1, 13)),
        ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    ),
    height=600,
    width=1000
)

ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

The graph shows the total viewership hours by month, which reveals a notable increase in viewership during June and a sharp rise toward the end of the year in December. It suggests that Netflix experiences spikes in audience engagement during these periods, possibly due to strategic content releases, seasonal trends, or holidays, while the middle months have a steady but lower viewership pattern.

In [21]:
# extract the top 5 titles based on viewership hours

top_five = df.nlargest(5, "Hours Viewed")

top_five[['Title', 'Hours Viewed', 'Language Indicator', 'Content Type', 'Release Date']]

Unnamed: 0,Title,Hours Viewed,Language Indicator,Content Type,Release Date
0,The Night Agent: Season 1,812100000.0,English,Show,3.0
1,Ginny & Georgia: Season 2,665100000.0,English,Show,1.0
18227,King the Land: Limited Series // 킹더랜드: 리미티드 시리즈,630200000.0,Korean,Movie,6.0
2,The Glory: Season 1 // 더 글로리: 시즌 1,622800000.0,Korean,Show,12.0
18214,ONE PIECE: Season 1,541900000.0,English,Show,8.0


#### The top 5 most-viewed titles on Netflix in 2023 are:
   - The Night Agent: Season 1 (English, Show) with 812.1 million hours viewed.
   - Ginny & Georgia: Season 2 (English, Show) with 665.1 million hours viewed.
   - King the Land: Limited Series (Korean, Movie) with 630.2 million hours viewed.
   - The Glory: Season 1 (Korean, Show) with 622.8 million hours viewed.
   - ONE PIECE: Season 1 (English, Show) with 541.9 million hours viewed.


In [24]:
# aggregate viewership hours by content type and release month

viewership_by_type = df.pivot_table(
    index= 'Release Date',
    columns= 'Content Type',
    values= 'Hours Viewed',
    aggfunc='sum'

)

viewership_by_type.head()

Content Type,Movie,Show
Release Date,Unnamed: 1_level_1,Unnamed: 2_level_1
1.0,2275900000.0,4995700000.0
2.0,1654400000.0,5449300000.0
3.0,2109400000.0,5327700000.0
4.0,2757600000.0,4108100000.0
5.0,2520500000.0,4574100000.0


In [25]:
fig = go.Figure()

for content_type in viewership_by_type.columns:
    fig.add_trace(
        go.Scatter(
             x=viewership_by_type.index,
            y=viewership_by_type[content_type],
            mode='lines+markers',
            name=content_type
        )
    )

fig.update_layout(
    title='Viewership Trends by Content Type and Release Month (2023)',
    xaxis_title='Month',
    yaxis_title='Total Hours Viewed (in billions)',
    xaxis=dict(
        tickmode='array',
        tickvals=list(range(1, 13)),
        ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    ),
    height=600,
    width=1000,
    legend_title='Content Type'
)

ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

The graph compares viewership trends between movies and shows throughout 2023. It shows that shows consistently have higher viewership than movies, peaking in December. Movies have more fluctuating viewership, with notable increases in June and October. This indicates that Netflix’s audience engages more with shows across the year, while movie viewership experiences occasional spikes, possibly linked to specific releases or events.

In [26]:
# define seasons based on release months
def get_season(month):
    if month in [12, 1, 2]:
        return 'Winter'
    elif month in [3, 4, 5]:
        return 'Spring'
    elif month in [6, 7, 8]:
        return 'Summer'
    else:
        return 'Fall'
    
# apply the season categorization to the dataset
df['Release Season'] = df['Release Date'].apply(get_season)
df.head()


Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed,Language Indicator,Content Type,Release Season
0,The Night Agent: Season 1,Yes,3.0,812100000.0,English,Show,Spring
1,Ginny & Georgia: Season 2,Yes,1.0,665100000.0,English,Show,Winter
2,The Glory: Season 1 // 더 글로리: 시즌 1,Yes,12.0,622800000.0,Korean,Show,Winter
3,Wednesday: Season 1,Yes,11.0,507700000.0,English,Show,Fall
4,Queen Charlotte: A Bridgerton Story,Yes,5.0,503000000.0,English,Movie,Spring


In [30]:
# aggregate viewership hours by release season
viewship_season = df.groupby('Release Season')['Hours Viewed'].sum()
viewship_season.head()


Release Season
Fall      9.070880e+10
Spring    2.139740e+10
Summer    2.186460e+10
Winter    2.443110e+10
Name: Hours Viewed, dtype: float64

In [31]:
# order the seasons as 'Winter', 'Spring', 'Summer', 'Fall'
seasons_order = ['Winter', 'Spring', 'Summer', 'Fall']
viewship_season = viewship_season.reindex(seasons_order)


In [33]:
fig = go.Figure(data=[
    go.Bar(
        x=viewship_season.index,
        y=viewship_season.values,
        marker_color='orange'
    )
])

fig.update_layout(
    title='Total Viewership Hours by Release Season (2023)',
    xaxis_title='Season',
    yaxis_title='Total Hours Viewed (in billions)',
    xaxis_tickangle=0,
    height=500,
    width=800,
    xaxis=dict(
        categoryorder='array',
        categoryarray=seasons_order
    )
)

ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

The graph indicates that viewership hours peak significantly in the Fall season, with over 80 billion hours viewed, while Winter, Spring, and Summer each have relatively stable and similar viewership around the 20 billion mark. This suggests that Netflix experiences the highest audience engagement during the Fall.

## Conclusion

So, the content strategy of Netflix revolves around maximizing viewership through targeted release timing and content variety. Shows consistently outperform movies in viewership, with significant spikes in December and June, indicating strategic releases around these periods. The Fall season stands out as the peak time for audience engagement. Most content is released on Fridays, which aims to capture viewers right before the weekend, and viewership aligns strongly with this release pattern. While the number of releases is steady throughout the year, viewership varies, which suggests a focus on high-impact titles and optimal release timing over sheer volume.