<a href="https://colab.research.google.com/github/PERAMPRAKASH/Netflix-Content-Strategy-Data-Analysis/blob/main/Netflix_Content_Strategy_Analysis_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.templates.default = "plotly_white"

netflix_data = pd.read_csv("/content/netflix_content_2023.csv")

netflix_data.head()

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed,Language Indicator,Content Type
0,The Night Agent: Season 1,Yes,2023-03-23,812100000,English,Show
1,Ginny & Georgia: Season 2,Yes,2023-01-05,665100000,English,Show
2,The Glory: Season 1 // 더 글로리: 시즌 1,Yes,2022-12-30,622800000,Korean,Show
3,Wednesday: Season 1,Yes,2022-11-23,507700000,English,Show
4,Queen Charlotte: A Bridgerton Story,Yes,2023-05-04,503000000,English,Movie


**Start with cleaning and preprocessing the “Hours Viewed” column to prepare it for analysis**

In [2]:
netflix_data['Hours Viewed'] = netflix_data['Hours Viewed'].replace(',', '', regex=True).astype(float)

netflix_data[['Title', 'Hours Viewed']].head()

Unnamed: 0,Title,Hours Viewed
0,The Night Agent: Season 1,812100000.0
1,Ginny & Georgia: Season 2,665100000.0
2,The Glory: Season 1 // 더 글로리: 시즌 1,622800000.0
3,Wednesday: Season 1,507700000.0
4,Queen Charlotte: A Bridgerton Story,503000000.0


**Visualize the distribution of total viewership hours between Shows and Movies**

In [3]:
content_type_viewership = netflix_data.groupby('Content Type')['Hours Viewed'].sum()

fig = go.Figure(data=[
    go.Bar(
        x=content_type_viewership.index,
        y=content_type_viewership.values,
        marker_color=['skyblue', 'salmon']
    )
])

fig.update_layout(
    title='Total Viewership Hours by Content Type (2023)',
    xaxis_title='Content Type',
    yaxis_title='Total Hours Viewed (in billions)',
    xaxis_tickangle=0,
    height=500,
    width=800
)

fig.show()

Above plot indicates shows are more in Netflix compared to movies with respect to viwership.

Now let's visualize the viwership across different languages.

In [4]:
language_viwership=netflix_data.groupby('Language Indicator')['Hours Viewed'].sum().sort_values(ascending=False)
fig = go.Figure(data=[go.Bar(
    x=language_viwership.index,
    y=language_viwership.values,
    marker_color='lightcoral')])
#above lines Creates a figure object using the plotly.graph_objects library (plotly is a Python data visualization library) .

fig.update_layout(
    title='Total Viewership Hours by Language Indicator (2023)',
    xaxis_title='Language ',
    yaxis_title='Total Hours Viewed (in billions)',
    xaxis_tickangle=45,   #Rotates the x-axis labels by 45 degrees for better readability.
    height=600,
    width=1000
    )
fig.show()

Visualization to see any seasonality or patterns around specific months with respect to viwership.

In [5]:
netflix_data['Release Date'] = pd.to_datetime(netflix_data['Release Date'])
netflix_data['Month'] = netflix_data['Release Date'].dt.month

monthly_viwership = netflix_data.groupby('Month')['Hours Viewed'].sum()
fig=go.Figure(data=[
    go.Scatter(
        x=monthly_viwership.index,
        y=monthly_viwership.values,
        mode='lines+markers',
        marker=dict(color='blue'),
        line=dict(color='blue')
    )
])
fig.update_layout(
    title='Total Viewership Hours by Month (2023)',
    xaxis_title='Month',
    yaxis_title='Total Hours Viewed (in billions)',
    xaxis=dict(tickmode='array',
               tickvals=list(range(1, 13)),
               ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']),
    height=500,
    width=800
)
fig.show()

Let's analyze top 5 titles based on viwership hours.

In [9]:
top_5_titles=netflix_data.nlargest(5,'Hours Viewed')
top_5_titles[['Title' ,'Hours Viewed','Language Indicator','Content Type','Release Date']]

Unnamed: 0,Title,Hours Viewed,Language Indicator,Content Type,Release Date
0,The Night Agent: Season 1,812100000.0,English,Show,2023-03-23
1,Ginny & Georgia: Season 2,665100000.0,English,Show,2023-01-05
18227,King the Land: Limited Series // 킹더랜드: 리미티드 시리즈,630200000.0,Korean,Movie,2023-06-17
2,The Glory: Season 1 // 더 글로리: 시즌 1,622800000.0,Korean,Show,2022-12-30
18214,ONE PIECE: Season 1,541900000.0,English,Show,2023-08-31


Let's check with viewership trends by Content Type.

In [16]:
monthly_viewership_by_type = netflix_data.pivot_table(index='Month',
                                                      columns='Content Type',
                                                      values='Hours Viewed',
                                                      aggfunc='sum')
fig=go.Figure()
for content_type in monthly_viewership_by_type.columns:
  fig.add_trace(
      go.Scatter(
          x=monthly_viewership_by_type.index,
          y=monthly_viewership_by_type[content_type],
          mode='lines+markers',
          name=content_type
          )
      )
fig.update_layout(
    title='Viewership Trends by Content Type and Release Month',
    xaxis_title='Month',
    yaxis_title='Total Hours Viewed',
    xaxis=dict(
        tickmode='array',
        tickvals=list(range(1,13)),
        ticktext=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']),
    height=600,
    width=1000,
    legend_title='Content Type'
)
fig.show()

Total Viewership Hours Across different Release Seasons.

In [19]:
def get_season(month):
  if month in [12,1,2]:
    return 'Winter'
  elif month in [3,4,5]:
    return 'Spring'
  elif month in [6,7,8]:
    return 'Summer'
  else:
    return 'Fall'
netflix_data['Release Season']= netflix_data['Month'].apply(get_season)
seasonal_viewership=netflix_data.groupby("Release Season")['Hours Viewed'].sum()
seasonal_order=['Winter','Spring','Summer','Fall']
seasonal_viewership=seasonal_viewership.reindex(seasonal_order)
fig=go.Figure(data=[
    go.Bar(
        x=seasonal_viewership.index,
        y=seasonal_viewership.values,
        marker_color='orange'
    )
])
fig.update_layout(
    title='Total Viewership Hours by Release Season',
    xaxis_title='Season',
    yaxis_title='Total Hours Viewed',
    xaxis_tickangle=0,
    height=500,
    width=800,
    xaxis=dict(
        categoryorder='array',
        categoryarray=seasonal_order
    )
)
fig.show()

Let's Analyze the number of content  releases and their viewership hours across months.

In [25]:
monthly_releases=netflix_data['Month'].value_counts().sort_index()
monthly_viewership=netflix_data.groupby('Month')['Hours Viewed'].sum()
fig=go.Figure()
fig.add_trace(
    go.Bar(
        x=monthly_releases.index,
        y=monthly_releases.values,
        marker_color='goldenrod',
        opacity=0.7,
        yaxis='y1'
    )
)
fig.add_trace(
    go.Scatter(
        x=monthly_releases.index,
        y=monthly_releases.values,
        name='Viewership Hours',
        mode='lines+markers',
        marker=dict(color='red'),
        line=dict(color='red'),
        yaxis='y2'
    )
)
fig.update_layout(
    title='Monthly release Patterns and Viewership Hours',
    xaxis=dict(
        title='Month',
        tickmode='array',
        tickvals=list(range(1,13)),
        ticktext=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
    ),
    yaxis=dict(
        title='Number of Releases',
        showgrid=False,
        side='left'
    ),
    yaxis2=dict(
        title='Total Hours Viewed(in billions) ',
        overlaying='y',
        side='right',
        showgrid=False
    ),
    legend=dict(
        x=1.05,
        y=1,
        orientation='v',
        xanchor='left'
    ),
    height=600,
    width=1000
    )
fig.show()

**Visualizing Week Trends across number of Releases and Total hours viewed**

In [29]:
netflix_data['Release Day'] = netflix_data['Release Date'].dt.day_name()
weekday_releases=netflix_data['Release Day'].value_counts().reindex(
    ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
)
weekday_viewership=netflix_data.groupby('Release Day')['Hours Viewed'].sum().reindex(
    ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
)
fig=go.Figure()
fig.add_trace(
    go.Bar(
        x=weekday_releases.index,
        y=weekday_releases.values,
        name='Number of Releases',
        marker_color='blue',
        opacity=0.6,
        yaxis='y1'
    )
)
fig.add_trace(
    go.Scatter(
        x=weekday_viewership.index,
        y=weekday_viewership.values,
        name='Viewership Hours',
        mode='lines+markers',
        marker=dict(color='red'),
        line=dict(color='red'),
        yaxis='y2'
    )
)
fig.update_layout(
    title='Weekly release Patterns and Viewership Hours',
    xaxis=dict(
        title='Day of the week',
        categoryorder='array',
        categoryarray=['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
    ),
    yaxis=dict(
        title='Number of Releases',
        showgrid=False,
        side='left'
    ),
    yaxis2=dict(
        title='Total Hours Viewed(in billions) ',
        overlaying='y',
        side='right',
        showgrid=False
    ),
    legend=dict(
        x=1.05,
        y=1,
        orientation='v',
        xanchor='left'
    ),
    height=600,
    width=1000
)
fig.show()

# **Conclusion**:
The content strategy of Netflix revolves around maximizing viewership through targeted release timing and content variety.

Shows consistently outperform movies in viewership, with significant spikes in December and June, indicating strategic releases around these periods.

The Fall season stands out as the peak time for audience engagement.

Most content is released on Fridays,
which aims to capture viewers right before the weekend, and viewership aligns strongly with this release pattern.

While the number of releases is steady throughout the year, viewership varies, which suggests a focus on high-impact titles and optimal release timing over sheer volume.