### Problem
The goal is to analyze Netflix’s content strategy to understand how various factors like content type, language, release season, and timing affect viewership patterns. By identifying the best-performing content and the timing of its release, the aim is to uncover insights into how Netflix maximizes audience engagement throughout the year.

In [23]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go

In [5]:
data=pd.read_csv('netflix_content_2023.csv')


In [7]:
data.head()

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed,Language Indicator,Content Type
0,The Night Agent: Season 1,Yes,2023-03-23,812100000,English,Show
1,Ginny & Georgia: Season 2,Yes,2023-01-05,665100000,English,Show
2,The Glory: Season 1 // 더 글로리: 시즌 1,Yes,2022-12-30,622800000,Korean,Show
3,Wednesday: Season 1,Yes,2022-11-23,507700000,English,Show
4,Queen Charlotte: A Bridgerton Story,Yes,2023-05-04,503000000,English,Movie


In [8]:
data.shape

(24812, 6)

In [11]:
data.isna().any()

Title                  False
Available Globally?    False
Release Date            True
Hours Viewed           False
Language Indicator     False
Content Type           False
dtype: bool

In [15]:
data['Release Date'].isnull().sum()

16646

In [21]:
data['Hours Viewed']=data['Hours Viewed'].replace(',','',regex=True).astype(float)

In [22]:
data.head()

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed,Language Indicator,Content Type
0,The Night Agent: Season 1,Yes,2023-03-23,812100000.0,English,Show
1,Ginny & Georgia: Season 2,Yes,2023-01-05,665100000.0,English,Show
2,The Glory: Season 1 // 더 글로리: 시즌 1,Yes,2022-12-30,622800000.0,Korean,Show
3,Wednesday: Season 1,Yes,2022-11-23,507700000.0,English,Show
4,Queen Charlotte: A Bridgerton Story,Yes,2023-05-04,503000000.0,English,Movie


In [37]:
data['Language Indicator'].nunique()

6

In [47]:
Language=data.groupby('Language Indicator')['Hours Viewed'].sum().sort_values(ascending=False)

fig=go.Figure(data=[
    go.Bar(x=Language.index,
          y=Language.values,
           marker_color='chocolate')
])
fig.update_layout(
    title='Total Viewership Hours by Language (2023)',
    xaxis_title='Language',
    yaxis_title='Total Hours Viewed (in billions)',
    xaxis_tickangle=45,
    height=500,
    width=800)
fig.show()

Above bar graph clearly reveals that English-language content significantly dominates Netflix’s viewership, followed by Korean,Non English. It indicates that Netflix’s primary audience is consuming English content, although non-English shows and movies also have a considerable viewership share, which shows a diverse content strategy.

In [66]:
content_type=data.groupby('Content Type')['Hours Viewed'].sum()
fig=go.Figure(data=[
    go.Bar(x=content_type.index,
          y=content_type.values,
           marker_color=['coral','skyBlue'])
])
fig.update_layout(
    title='Total Viewership Hours by Content Type (2023)',
    xaxis_title='Content Type',
    yaxis_title='Total Hours Viewed (in billions)',
    xaxis_tickangle=0,
    height=500,
    width=800)
fig.show()

In [60]:
data['Release Date'] = pd.to_datetime(data['Release Date'])
data['Release Month'] = data['Release Date'].dt.month
monthly_viewership = data.groupby('Release Month')['Hours Viewed'].sum()
fig=go.Figure(data=[
    go.Scatter(
        x=monthly_viewership.index,
        y=monthly_viewership.values,
        mode='lines+markers',
        marker=dict(color='blue'),
        line=dict(color='blue')
    )
])
fig.update_layout(
    title="Total Viewership Hours by Release Month (2023)",
    xaxis_title='Month',
    yaxis_title='Total Hours Viewed (in billions)',
    xaxis=dict(
        tickmode='array',
        tickvals=list(range(1, 13)),
        ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    ),
    height=600,
    width=1000
)
fig.show()

The Above graph shows Hourly Viewership in various months where December is top performer followed by June and October.

In [68]:

monthly_viewership_by_type = data.pivot_table(index='Release Month',
                                                      columns='Content Type',
                                                      values='Hours Viewed',
                                                      aggfunc='sum')
monthly_viewership_by_type

Content Type,Movie,Show
Release Month,Unnamed: 1_level_1,Unnamed: 2_level_1
1.0,2275900000.0,4995700000.0
2.0,1654400000.0,5449300000.0
3.0,2109400000.0,5327700000.0
4.0,2757600000.0,4108100000.0
5.0,2520500000.0,4574100000.0
6.0,3135800000.0,5386200000.0
7.0,1615700000.0,4909100000.0
8.0,2186400000.0,4631400000.0
9.0,2092300000.0,5169900000.0
10.0,3400400000.0,4722800000.0


In [70]:
fig = go.Figure()

for content_type in monthly_viewership_by_type.columns:
    fig.add_trace(
        go.Scatter(
            x=monthly_viewership_by_type.index,
            y=monthly_viewership_by_type[content_type],
            mode='lines+markers',
            name=content_type
        )
    )

fig.update_layout(
    title='Viewership Trends by Content Type and Release Month (2023)',
    xaxis_title='Month',
    yaxis_title='Total Hours Viewed (in billions)',
    xaxis=dict(
        tickmode='array',
        tickvals=list(range(1, 13)),
        ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    ),
    height=600,
    width=1000,
    legend_title='Content Type'
)

fig.show()

Above graph clearly shows movie's viewership is at its peak in oct and june whereas Shows have highest viewership in december and a constant viewership in other months.

In [65]:
top_5_titles = data.nlargest(5, 'Hours Viewed')
top_5_titles

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed,Language Indicator,Content Type,Release Month
0,The Night Agent: Season 1,Yes,2023-03-23,812100000.0,English,Show,3.0
1,Ginny & Georgia: Season 2,Yes,2023-01-05,665100000.0,English,Show,1.0
18227,King the Land: Limited Series // 킹더랜드: 리미티드 시리즈,Yes,2023-06-17,630200000.0,Korean,Movie,6.0
2,The Glory: Season 1 // 더 글로리: 시즌 1,Yes,2022-12-30,622800000.0,Korean,Show,12.0
18214,ONE PIECE: Season 1,Yes,2023-08-31,541900000.0,English,Show,8.0


In top 5 Performers Most are contents are shows and are mostly english.Korean shows also have notable presence in top titles. 

In [72]:
def get_season(month):
    if month in [12, 1, 2]:
        return 'Winter'
    elif month in [3, 4, 5]:
        return 'Spring'
    elif month in [6, 7, 8]:
        return 'Summer'
    else:
        return 'Fall'

data['Release Season']=data['Release Month'].apply(get_season)
seasonal_viewership = data.groupby('Release Season')['Hours Viewed'].sum()
seasons_order = ['Winter', 'Spring', 'Summer', 'Fall']
seasonal_viewership = seasonal_viewership.reindex(seasons_order)
seasonal_viewership

Release Season
Winter    2.443110e+10
Spring    2.139740e+10
Summer    2.186460e+10
Fall      9.070880e+10
Name: Hours Viewed, dtype: float64

In [84]:
fig=go.Figure(data=
             go.Bar(x=seasonal_viewership.index,
                   y=seasonal_viewership.values,
                    marker_color='cornflowerblue'
                   )
             )
fig.update_layout(
    title='Viewership Hours by Release Seasons (2023)',
    xaxis_title='Seasons',
    yaxis_title='Total Viewership Hours(in billions)',
    xaxis=dict(
        categoryorder='array',
        categoryarray=seasons_order
    )
)
fig.show()

The graph clearly highlights Fall season has Maximum 90 billions Viewership Hours while other seasons dont have much variations in viewed hours.

In [92]:
No_of_release=data['Release Month'].value_counts().sort_index()
No_of_release

1.0     608
2.0     560
3.0     690
4.0     647
5.0     624
6.0     670
7.0     631
8.0     674
9.0     739
10.0    802
11.0    734
12.0    787
Name: Release Month, dtype: int64

In [98]:
fig = go.Figure()

fig.add_trace(
    go.Bar(
        x=No_of_release.index,
        y=No_of_release.values,
        name='Number of Releases',
        marker_color='darkseagreen', 
        opacity=0.7,
        yaxis='y1'
    )
)

fig.add_trace(
    go.Scatter(
        x=monthly_viewership.index,
        y=monthly_viewership.values,
        name='Viewership Hours',
        mode='lines+markers',
        marker=dict(color='red'),
        line=dict(color='red'),
        yaxis='y2'
    )
)
fig.update_layout(
    title='Monthly Release Patterns and Viewership Hours (2023)',
    xaxis=dict(
        title='Month',
        tickmode='array',
        tickvals=list(range(1, 13)),
        ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    ),
    yaxis=dict(
        title='Number of Releases',
        showgrid=False,
        side='left'
    ),
    yaxis2=dict(
        title='Total Hours Viewed (in billions)',
        overlaying='y',
        side='right',
        showgrid=False
    ),
    legend=dict(
        x=1.05,  
        y=1,
        orientation='v',
        xanchor='left'
    ),
    height=600,
    width=1000
)

fig.show()

The Graph shows viewership doesn't solely dependent on number of releases.

In [100]:
data['Release Day'] = data['Release Date'].dt.day_name()

weekday_releases =data['Release Day'].value_counts().reindex(
    ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
)
weekday_releases



Monday        436
Tuesday       995
Wednesday    1310
Thursday     1145
Friday       3863
Saturday      238
Sunday        179
Name: Release Day, dtype: int64

In [103]:
weekday_viewership =data.groupby('Release Day')['Hours Viewed'].sum().reindex(
    ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
)
weekday_viewership

Release Day
Monday       3.954500e+09
Tuesday      5.562300e+09
Wednesday    1.574410e+10
Thursday     2.029280e+10
Friday       3.821720e+10
Saturday     5.121800e+09
Sunday       1.935300e+09
Name: Hours Viewed, dtype: float64

In [107]:
fig = go.Figure()
fig.add_trace(
    go.Bar(
        x=weekday_releases.index,
        y=weekday_releases.values,
        name='Number of Releases',
        marker_color='orange',
        opacity=0.6,
        yaxis='y1'
    )
)
fig.add_trace(
    go.Scatter(
        x=weekday_viewership.index,
        y=weekday_viewership.values,
        name='Viewership Hours',
        mode='lines+markers',
        marker=dict(color='red'),
        line=dict(color='red'),
        yaxis='y2'
    )
)
fig.update_layout(
    title='Weekly Release Patterns and Viewership Hours(2023)',
    xaxis=dict(
        title='Day of the Week',
        categoryorder='array',
        categoryarray=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
    ),
    yaxis=dict(
        title='Number of Releases',
        showgrid=False,
        side='left'
    ),
    yaxis2=dict(
        title='Total Hours Viewed (in billions)',
        overlaying='y',
        side='right',
        showgrid=False
    ),
    legend=dict(
        x=1.05,  
        y=1,
        orientation='v',
        xanchor='left'
    )
)    
fig.show()

Graph shows most of releases are on friday and viewership hours are high on friday.Sunday has less number of releases and viewership hours.

### Conclusion 
The content strategy of Netflix revolves around maximizing viewership hours through targeted release timing and content variety.
Shows have more viewership hours than movies almost double having significant spikes in December and June, indicating strategic releases around these periods. The Fall season stands out as the peak time for audience engagement. Most content is released on Fridays, which aims to capture viewers right before the weekend, and viewership aligns strongly with this release pattern.