## **Netflix Content Strategy Analysis with Python**

Dataset and case study link:
https://statso.io/netflix-content-strategy-case-study/

## **Introduction**

Netflix has become one of the most popular streaming platforms worldwide, attracting millions of viewers with its vast library of shows and movies. To maintain its dominance in the market, Netflix relies on sophisticated algorithms to recommend content and enhance user experience. These algorithms help personalize viewing suggestions and drive engagement, playing a critical role in the platform's content strategy.

Content Strategy Analysis involves examining how content is produced, released, distributed, and consumed to achieve goals such as maximizing audience engagement, viewership, brand reach, or revenue. In this project, I will be analyzing Netflix's Content Strategy using Python.

To perform this analysis, we will need data on content titles, type (whether it's a show or movie), genre, language, and release details (such as date, day of the week, and season) to assess timing and content performance. Viewership metrics, like hours watched, are also crucial for evaluating audience engagement.

The objective of this analysis is to explore Netflix's content strategy by examining how different factors such as content type, language, release season, and timing influence viewership trends. By identifying the top-performing content and the optimal release timing, the goal is to gain insights into how Netflix effectively maximizes audience engagement year-round.

## **About the dataset**

The dataset focuses on Netflix content released globally in 2023 and includes key information such as the title, release date, language, content type (show or movie), availability status, and hours viewed. This data offers valuable insights into Netflix’s content strategy by allowing for the exploration of viewership trends based on different attributes.

Specifically, the dataset helps analyze patterns in audience engagement by examining factors like content type (whether a show or movie), release season, language, and availability status (whether available globally or not). The viewership data, measured in hours viewed, acts as an indicator of the popularity of each title, allowing for an assessment of how these factors impact audience behavior and engagement. 

With this dataset, it’s possible to uncover insights into how Netflix times its releases and which content resonates most with its viewers.

In [1]:
# import the packages
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.templates.default = "plotly_white" # make the plot bg white

#load the dataset
df = pd.read_csv('netflix_content_2023.csv')
df.head(10)

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed,Language Indicator,Content Type
0,The Night Agent: Season 1,Yes,2023-03-23,812100000,English,Show
1,Ginny & Georgia: Season 2,Yes,2023-01-05,665100000,English,Show
2,The Glory: Season 1 // 더 글로리: 시즌 1,Yes,2022-12-30,622800000,Korean,Show
3,Wednesday: Season 1,Yes,2022-11-23,507700000,English,Show
4,Queen Charlotte: A Bridgerton Story,Yes,2023-05-04,503000000,English,Movie
5,You: Season 4,Yes,2023-02-09,440600000,English,Show
6,La Reina del Sur: Season 3,No,2022-12-30,429600000,English,Show
7,Outer Banks: Season 3,Yes,2023-02-23,402500000,English,Show
8,Ginny & Georgia: Season 1,Yes,2021-02-24,302100000,English,Show
9,FUBAR: Season 1,Yes,2023-05-25,266200000,English,Show


In [2]:
# check for datatype consistency
df.dtypes

Title                  object
Available Globally?    object
Release Date           object
Hours Viewed           object
Language Indicator     object
Content Type           object
dtype: object

Let me start with cleaning and preprocessing the data to prepare it for analysis:

In [3]:
# convert hours viewed to numeric value
df['Hours Viewed'] = df['Hours Viewed'].replace(',', '', regex=True).astype(float)

# convert release date into datetime format
df['Release Date'] = pd.to_datetime(df['Release Date'])

# extract month and weekdays
df['Release Month'] = df['Release Date'].dt.month
df['Release Day'] = df['Release Date'].dt.day_name()

df.head()

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed,Language Indicator,Content Type,Release Month,Release Day
0,The Night Agent: Season 1,Yes,2023-03-23,812100000.0,English,Show,3.0,Thursday
1,Ginny & Georgia: Season 2,Yes,2023-01-05,665100000.0,English,Show,1.0,Thursday
2,The Glory: Season 1 // 더 글로리: 시즌 1,Yes,2022-12-30,622800000.0,Korean,Show,12.0,Friday
3,Wednesday: Season 1,Yes,2022-11-23,507700000.0,English,Show,11.0,Wednesday
4,Queen Charlotte: A Bridgerton Story,Yes,2023-05-04,503000000.0,English,Movie,5.0,Thursday


The “Hours Viewed” column has been successfully cleaned and converted to a numeric format. Now, I’ll analyze trends in content type to determine whether shows or movies dominate viewership. Let’s visualize the distribution of total viewership hours between Shows and Movies:

In [4]:
# aggregate viewership hrs by content type
content_type_viewership = df.groupby('Content Type')['Hours Viewed'].sum()

In [5]:
fig = go.Figure()

fig.add_trace(go.Bar(
        x = content_type_viewership.index,
        y = content_type_viewership.values,
        marker_color = ['skyblue', 'salmon']
    ))

fig.update_layout(
    title = 'Total Watch Hours by Content Type',
    yaxis_title = 'Total Hours Viewed (in Billion)',
    xaxis_title = 'Content Type',
    xaxis_tickangle = 0, # x axis tick's angle
    height = 500,
    width = 650
)

fig.show()

![plot1](img/newplot_1.png)

In case visualisation is not rendered, click [here](img/newplot_1.png) to see it.

The visualization reveals that shows account for the majority of total viewership hours on Netflix in 2023, surpassing movies. This indicates that Netflix’s content strategy is strongly focused on shows, as they tend to generate higher watch hours overall.

Next, we will examine the distribution of viewership across various languages to determine which languages contribute the most to Netflix’s content consumption.

In [6]:
# aggregate viewership hours by language
language_viewership = df.groupby('Language Indicator')['Hours Viewed'].sum().sort_values(ascending=False)
language_viewership

Language Indicator
English        1.244417e+11
Korean         1.537840e+10
Non-English    1.043910e+10
Japanese       7.102000e+09
Hindi          9.261000e+08
Russian        1.146000e+08
Name: Hours Viewed, dtype: float64

In [7]:
fig = go.Figure()

fig.add_trace(go.Bar(
    x = language_viewership.index,
    y = language_viewership.values,
    marker_color = 'cyan'
))

fig.update_layout(
    title = 'Total Watch Hours by Language',
    xaxis_title = 'Language',
    yaxis_title = 'Total Watch Hours (in Billions)',
    xaxis_tickangle = 0,
    height = 500,
    width = 800
)

![plot1](img/newplot_2.png)

The visualization shows that English-language content overwhelmingly leads in Netflix’s viewership, with Korean content following behind. This suggests that while Netflix’s main audience consumes English content, non-English shows and movies also capture a substantial share, reflecting a diverse content strategy.

Next, I will analyze how viewership fluctuates based on release dates to uncover potential trends over time, such as seasonality or patterns around specific months.

In [8]:
# Aggregate viwership by release month
monthly_viewership = df.groupby('Release Month')['Hours Viewed'].sum()

In [9]:
fig = go.Figure()

fig.add_trace(go.Scatter(
    x = monthly_viewership.index,
    y = monthly_viewership.values,
    mode = 'lines+markers',
    marker = dict(color = 'blue'),
    line = dict(color = 'blue')
))

fig.update_layout(
    title = 'Total Watch Hours by Release Month',
    xaxis_title = 'Months',
    yaxis_title = 'Total Watch Hours (in Billions)',
    xaxis = dict(
        tickmode = 'array',
        tickvals = list(range(1,13)),
        ticktext = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    ),
    height = 600,
    width = 1000
)

fig.show()

![plot1](img/newplot_3.png)

The graph depicting total viewership hours by month highlights a significant surge in viewership during June and an even sharper rise in December. This suggests that Netflix sees spikes in audience engagement during these periods, likely due to strategic content releases, seasonal trends, or holiday-related viewing. In contrast, the middle months show a steady but lower level of viewership.

To further explore this, we can analyze the top-performing content—both shows and movies—and identify specific factors such as genre or theme that may have contributed to their high viewership.

In [10]:
# extract top 5 titles based on viewership hours
top_5_titles = df.nlargest(5, 'Hours Viewed').reset_index()
top_5_titles[['Title', 'Hours Viewed', 'Language Indicator', 'Content Type']]

Unnamed: 0,Title,Hours Viewed,Language Indicator,Content Type
0,The Night Agent: Season 1,812100000.0,English,Show
1,Ginny & Georgia: Season 2,665100000.0,English,Show
2,King the Land: Limited Series // 킹더랜드: 리미티드 시리즈,630200000.0,Korean,Movie
3,The Glory: Season 1 // 더 글로리: 시즌 1,622800000.0,Korean,Show
4,ONE PIECE: Season 1,541900000.0,English,Show


The top 5 most-watched titles on Netflix in 2023 are:

1. **The Night Agent: Season 1** (English, Show) – 812.1 million hours viewed.
2. **Ginny & Georgia: Season 2** (English, Show) – 665.1 million hours viewed.
3. **King the Land: Limited Series** (Korean, Movie) – 630.2 million hours viewed.
4. **The Glory: Season 1** (Korean, Show) – 622.8 million hours viewed.
5. **ONE PIECE: Season 1** (English, Show) – 541.9 million hours viewed.

English-language shows dominate the top viewership rankings, but the strong presence of Korean content in the top titles highlights its global appeal.

Next, let’s examine the viewership trends based on content type to gain further insights.

In [11]:
# aggregate viwership hours by content type and release month
monthly_viewership_by_type = df.pivot_table(index = 'Release Month',
                                            columns = 'Content Type',
                                            values = 'Hours Viewed',
                                            aggfunc = 'sum')
monthly_viewership_by_type

Content Type,Movie,Show
Release Month,Unnamed: 1_level_1,Unnamed: 2_level_1
1.0,2275900000.0,4995700000.0
2.0,1654400000.0,5449300000.0
3.0,2109400000.0,5327700000.0
4.0,2757600000.0,4108100000.0
5.0,2520500000.0,4574100000.0
6.0,3135800000.0,5386200000.0
7.0,1615700000.0,4909100000.0
8.0,2186400000.0,4631400000.0
9.0,2092300000.0,5169900000.0
10.0,3400400000.0,4722800000.0


In [12]:
fig = go.Figure()

for content_type in monthly_viewership_by_type.columns:
    fig.add_trace(go.Scatter(
        x = monthly_viewership_by_type.index,
        y = monthly_viewership_by_type[content_type],
        mode = 'lines+markers',
        name = content_type
    ))

fig.update_layout(
    title = 'Viewership Trends by Content Type and Release Month',
    xaxis_title = 'Month',
    yaxis_title = 'Total Hours Viewed (in Billions)',
    xaxis = dict(
        tickmode = 'array',
        tickvals = list(range(1,13)),
        ticktext = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    ),
    height = 600,
    width = 1000,
    legend_title = 'Content Type'
)

fig.show()

![plot1](img/newplot_4.png)

The graph compares viewership trends between movies and shows throughout 2023, revealing that shows consistently attract higher viewership compared to movies, with a peak in December. In contrast, movies exhibit more fluctuating viewership, showing notable increases in June and October. This suggests that Netflix's audience engages more with shows throughout the year, while movie viewership experiences occasional spikes, likely associated with specific releases or events.

Next, let's analyze the total viewership hours distributed across different release seasons.

In [13]:
# function to define seasons based on release months
def get_season(month):
    if month in [12,1,2]:
        return 'Winter'
    elif month in [3,4,5]:
        return 'Spring'
    elif month in [6,7,8]:
        return 'Summer'
    else:
        return 'Fall'

In [14]:
# apply seasons to the dataset
df['Release Season'] = df['Release Month'].apply(get_season)

# aggregate viewewrship hours by release season
seasonal_viewership = df.groupby('Release Season')['Hours Viewed'].sum()

# order the seasons (winter, spring, summer and fall)
seasons_order = ['Winter', 'Spring', 'Summer', 'Fall']
seasonal_viewership = seasonal_viewership.reindex(seasons_order)

In [15]:
fig = go.Figure()

fig.add_trace(go.Bar(
    x = seasonal_viewership.index,
    y = seasonal_viewership.values,
    marker_color = 'orange'
))

fig.update_layout(
    title = 'Total Watch Hours by Release Season',
    xaxis_title = 'Season',
    yaxis_title = 'Total Hours Viewed (in Billions)',
    yaxis_tickangle = 0,
    xaxis = dict(
        categoryorder = 'array',
        categoryarray = seasons_order
    ),
    height = 500,
    width = 800
)

fig.show()

![plot1](img/newplot_5.png)

The graph shows that viewership hours peak notably in the Fall season, exceeding 80 billion hours viewed, while Winter, Spring, and Summer each maintain relatively stable and comparable viewership around the 20 billion mark. This indicates that Netflix sees the highest audience engagement during the Fall.

Next, let's examine the number of content releases and their corresponding viewership hours across different months.

In [16]:
monthly_releases = df['Release Month'].value_counts().sort_index()
monthly_viewership = df.groupby('Release Month')['Hours Viewed'].sum()

In [17]:
fig = go.Figure()

fig.add_trace(
    go.Bar(
        x = monthly_releases.index,
        y = monthly_releases.values,
        name = 'Number of Releases',
        marker_color = 'lightgreen',
        opacity = 1,
        yaxis = 'y1'
    )
)

fig.add_trace(
    go.Scatter(
        x = monthly_viewership.index,
        y = monthly_viewership.values,
        name = 'Viewership Hours',
        mode = 'lines+markers',
        marker = dict(color = 'red'),
        line = dict(color = 'red'),
        yaxis = 'y2'
    )
)

fig.update_layout(
    title = 'Monthly Release Patterns and Viewership Hours',
    xaxis = dict(
        title = 'Months',
        tickmode = 'array',
        tickvals = list(range(1,13)),
        ticktext = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    ),
    yaxis = dict(
        title = 'Number of Releases',
        showgrid = False,
        side = 'left'
    ),
    yaxis2 = dict(
        title = 'Total Hours Viewed (in Billions)',
        overlaying = 'y',
        side = 'right',
        showgrid = False
    ),
    legend = dict(
        x = 0.05,
        y = 1,
        orientation = 'v',
        xanchor = 'left'
    ),
    height = 600,
    width = 1000
)

fig.show()

![plot1](img/newplot_6.png)

Although the number of releases remains relatively consistent throughout the year, viewership hours show a marked increase in June and a significant rise in December, even with a steady release count. This suggests that viewership is influenced more by the timing and appeal of specific content during these months rather than solely by the number of releases.

Next, let’s investigate whether Netflix tends to release content on specific weekdays and how this may affect viewership patterns.

In [18]:

weekday_releases = df['Release Day'].value_counts().reindex(
    ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
)

# aggregate viewership hours by day of the week
weekday_viewership = df.groupby('Release Day')['Hours Viewed'].sum().reindex(
    ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
)

In [19]:
fig = go.Figure()

fig.add_trace(
    go.Bar(
        x = weekday_releases.index,
        y = weekday_releases.values,
        name = 'Number of Releases',
        marker_color = 'skyblue',
        opacity = 0.9,
        yaxis = 'y1'
    )
)

fig.add_trace(
    go.Scatter(
        x = weekday_viewership.index,
        y = weekday_viewership.values,
        name = 'Viewership Hours',
        mode = 'lines+markers',
        marker = dict(color = 'red'),
        line = dict(color = 'red'),
        yaxis = 'y2'
    )
)

fig.update_layout(
    title = 'Weekly Release Patterns and Viewership Hours',
    xaxis = dict(
        title = 'Day of the Week',
        categoryorder = 'array',
        categoryarray = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
    ),
    yaxis = dict(
        title = 'Number of Releases',
        showgrid = False,
        side = 'left'
    ),
    yaxis2 = dict(
        title = 'Total Hours Viewed (in Billions)',
        overlaying = 'y',
        side = 'right',
        showgrid = False
    ),
    legend = dict(
        x = 0.05, # places legend in left side
        y = 1,
        orientation = 'v',
        xanchor = 'left'
    ),
    height = 600,
    width = 1000
)

![plot1](img/newplot_7.png)

The graph reveals that the majority of content releases take place on Fridays, coinciding with a significant peak in viewership hours on that day. This suggests that Netflix strategically schedules content releases for the weekend to maximize audience engagement. Notably, viewership declines sharply on Saturdays and Sundays, despite some releases, indicating that viewers tend to consume newly released content right at the beginning of the weekend. Thus, Friday emerges as the most influential day for both releases and viewership.

To gain deeper insights into this strategy, let’s examine specific high-impact dates, such as holidays or major events, and their correlation with content releases.

In [20]:
# take a few significant holidays
important_dates = [
    '2023-01-01',  # new year's day
    '2023-02-14',  # valentine's ay
    '2023-07-04',  # independence day (in US)
    '2023-10-31',  # halloween
    '2023-12-25'   # christmas day
]

#convert to dateetime
important_dates = pd.to_datetime(important_dates)

# content releases close (3 days) to these holidays
holiday_releases = df[df['Release Date'].apply(
    lambda x: any((x - date).days in range(-3,4) for date in important_dates)
)]

#aggregate viewership hours for releases near significant holidays
holiday_viewership = holiday_releases.groupby('Release Date')['Hours Viewed'].sum()

holiday_releases[['Title', 'Release Date', 'Hours Viewed']].head(8)

Unnamed: 0,Title,Release Date,Hours Viewed
2,The Glory: Season 1 // 더 글로리: 시즌 1,2022-12-30,622800000.0
6,La Reina del Sur: Season 3,2022-12-30,429600000.0
11,Kaleidoscope: Limited Series,2023-01-01,252500000.0
29,Perfect Match: Season 1,2023-02-14,176800000.0
124,Lady Voyeur: Limited Series // Olhar Indiscret...,2022-12-31,86000000.0
126,The Law According to Lidia Poët: Season 1 // L...,2023-02-15,85000000.0
146,Alpha Males: Season 1 // Machos alfa: Season 1,2022-12-30,78200000.0
169,Red Rose: Season 1,2023-02-15,71100000.0


In [21]:
holiday_releases[['Title', 'Release Date', 'Hours Viewed']].tail()

Unnamed: 0,Title,Release Date,Hours Viewed
22324,The Romantics: Limited Series,2023-02-14,1000000.0
22327,Aggretsuko: Season 5 // アグレッシブ烈子: シーズン5,2023-02-16,900000.0
22966,The Lying Life of Adults: Limited Series // La...,2023-01-04,900000.0
22985,Community Squad: Season 1 // División Palermo:...,2023-02-17,800000.0
24187,Live to Lead: Limited Series,2022-12-31,400000.0


The data indicates that Netflix has strategically timed content releases around major holidays and events. Some notable releases include:

- **New Year’s Period**: *The Glory: Season 1*, *La Reina del Sur: Season 3*, and *Kaleidoscope: Limited Series* were launched just before New Year’s Day, leading to high viewership.

- **Valentine’s Day**: *Perfect Match: Season 1*, *Red Rose: Season 1*, and *The Romantics: Limited Series* were released on February 14th, aligning with a romantic theme to capitalize on the holiday’s sentiment.


## Conclusion

In summary, Netflix's content strategy is designed to optimize viewership through strategic release timing and a diverse range of offerings. The data demonstrates that shows consistently attract more viewers than movies, with notable spikes in viewership occurring in December and June, reflecting targeted releases during peak viewing periods. The Fall season emerges as the peak time for audience engagement. 

Content is predominantly released on Fridays, effectively capturing viewer attention just before the weekend, which reinforces the correlation between release timing and viewership patterns. Although the number of releases remains steady year-round, fluctuations in viewership indicate a strategic emphasis on high-impact titles and optimal timing over sheer volume. This approach allows Netflix to maximize audience engagement and maintain its competitive edge in the streaming landscape.