In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
import pandas as pd

df = pd.read_csv('/kaggle/input/datasets/shivamb/netflix-shows/netflix_titles.csv')
df.head()




In [None]:
df.shape


In [None]:
df.columns


In [None]:
df.info()


In [None]:
df.isnull().sum()


In [None]:
df['type'].value_counts()


In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

sns.countplot(data=df, x='type')
plt.title('Count of Movies vs TV Shows on Netflix')
plt.show()


Quick Insight:
There are more Movies than TV Shows in this dataset.
This gives a first impression of how Netflix content is distributed.


In [None]:
df['date_added'] = df['date_added'].str.strip()
df['date_added'] = pd.to_datetime(df['date_added'], format='%B %d, %Y', errors='coerce')

In [None]:
df['date_added'].head()
df['date_added'].isna().sum()


In [None]:
df['year_added'] = df['date_added'].dt.year
df['month_added'] = df['date_added'].dt.month
df['weekday_added'] = df['date_added'].dt.day_name()

In [None]:
df['year_added'].value_counts().sort_index()


In [None]:
df['month_added'].value_counts().sort_index()


In [None]:
df['weekday_added'].value_counts()


In [None]:
df['year_added'].value_counts().sort_index().plot(kind='bar', figsize=(10,4))

In [None]:
df['year_added'] = df['date_added'].dt.year
df['month_added'] = df['date_added'].dt.month
df['weekday_added'] = df['date_added'].dt.day_name()


In [None]:
df['year_added'].value_counts().sort_index()


In [None]:
df['year_added'].value_counts().sort_index().plot(kind='bar', figsize=(12,4))

In [None]:
df['month_added'].value_counts().sort_index()

In [None]:
df['weekday_added'].value_counts()


In [None]:
df['month_added'].value_counts().sort_index().plot(kind='bar', figsize=(10,4))


In [None]:
df['weekday_added'].value_counts().plot(kind='bar', figsize=(10,4))

In [None]:
df['month_added'].value_counts().sort_index()


In [None]:
df['weekday_added'].value_counts()


In [None]:
df.groupby(['year_added', 'type']).size().unstack(fill_value=0)


In [None]:
(df.groupby('year_added')['type']
   .value_counts(normalize=True)
   .unstack()
   .fillna(0))


# üìä Netflix Content Dashboard (2008‚Äì2021)

This dashboard summarizes key trends in Netflix's catalog, including growth over time, seasonal patterns, release-day behavior, and the balance between movies and TV shows.

---

## üìà Titles Added Per Year

In [None]:
df['year_added'].value_counts().sort_index().plot(
    kind='bar',
    figsize=(12,4),
    title='Titles Added Per Year'
)

## üìÖ Titles Added Per Month

In [None]:
df['month_added'].value_counts().sort_index().plot(
    kind='bar',
    figsize=(12,4),
    title='Titles Added Per Month'
)

## üìÜ Titles Added by Weekday

In [None]:
df['weekday_added'].value_counts().plot(
    kind='bar',
    figsize=(12,4),
    title='Titles Added by Weekday'
)


## üé¨ Movies vs TV Shows Added Per Year

In [None]:
df.groupby(['year_added', 'type']).size().unstack(fill_value=0).plot(
    kind='bar',
    figsize=(12,5),
    title='Movies vs TV Shows Added Per Year'
)


## üìä Percent Share of Movies vs TV Shows Over Time

In [None]:
(df.groupby('year_added')['type']
   .value_counts(normalize=True)
   .unstack()
   .plot(kind='bar', figsize=(12,5),
         title='Percent Share of Movies vs TV Shows'))


## üî¢ Ratings Distribution

In [None]:
df['rating'].value_counts().plot(
    kind='bar',
    figsize=(12,4),
    title='Ratings Distribution'
)

## üé≠ Top Genres on Netflix

In [None]:
df['listed_in'].str.split(', ').explode().value_counts().head(15).plot(
    kind='bar',
    figsize=(12,4),
    title='Top 15 Genres'
)

## üéûÔ∏è Movie Runtime Distribution

In [None]:
movie_minutes = df[df['type']=='Movie']['duration'].str.extract(r'(\d+)').astype(float)
movie_minutes.plot(
    kind='hist',
    bins=30,
    figsize=(12,4),
    title='Distribution of Movie Runtimes (Minutes)'
)

## üì∫ TV Show Season Counts

In [None]:
tv_seasons = df[df['type']=='TV Show']['duration'].str.extract(r'(\d+)').astype(int)
tv_seasons.value_counts().sort_index().plot(
    kind='bar',
    figsize=(12,4),
    title='Number of Seasons per TV Show'
)

## üìù Key Insights

- Netflix's catalog grew rapidly from 2015‚Äì2020.
- July and December are peak months for new content.
- Fridays dominate as the primary release day.
- Movies remain the majority, but TV shows have grown to ~30% of yearly additions.

# üß† Project Summary

This analysis explores Netflix's content catalog from 2008‚Äì2021, focusing on growth patterns, seasonal trends, and the balance between movies and TV shows. Key findings include:

### üìà Growth Over Time
- Netflix's catalog expanded rapidly between 2015 and 2020.
- 2019 was the peak year for new additions.

### üìÖ Seasonal Patterns
- July and December are the busiest months for new content.
- February consistently has the fewest additions.

### üìÜ Release-Day Behavior
- Fridays dominate as Netflix's primary release day.
- Weekends have the fewest additions.

### üé¨ Movies vs TV Shows
- Movies remain the majority of yearly additions.
- TV shows grew significantly after 2015 and now represent ~30% of additions.

### üî¢ Ratings & Genres
- TV-MA is the most common rating.
- International content and documentaries are strongly represented.

This dashboard provides a clear overview of how Netflix has evolved its content strategy over time.