# What's On Netflix? A Visual Deep Dive Into 8,800 Titles

Netflix went from mailing DVDs to becoming the most influential content platform on the planet. But what does its library actually look like under the hood?

This notebook tears apart 8,807 titles spanning movies and TV shows to uncover patterns in genre, geography, ratings, duration, and how Netflix's content strategy has shifted over the years. We are going heavy on the visuals here because with a dataset this rich, the charts tell the story better than words can.

**Dataset:** Netflix Movies and TV Shows (Kaggle)  
**Records:** 8,807 | **Columns:** 12 | **Timeframe:** Content added 2008 through 2021


## 1. Setup

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import matplotlib.patches as mpatches
import matplotlib.gridspec as gridspec
from matplotlib.lines import Line2D
from matplotlib import cm
import seaborn as sns
import squarify
from collections import Counter
import warnings
warnings.filterwarnings('ignore')

# ---- Light theme with custom Netflix-inspired palette ----
plt.rcParams.update({
    'figure.facecolor': '#FAFAFA',
    'axes.facecolor': '#FAFAFA',
    'axes.edgecolor': '#CCCCCC',
    'axes.grid': True,
    'grid.color': '#E8E8E8',
    'grid.linewidth': 0.5,
    'font.family': 'sans-serif',
    'font.size': 11,
    'text.color': '#2D2D2D',
    'axes.labelcolor': '#2D2D2D',
    'xtick.color': '#555555',
    'ytick.color': '#555555',
})

# Color palettes
NETFLIX_RED = '#E50914'
NETFLIX_DARK = '#221F1F'
PALETTE_MAIN = ['#E50914', '#1A1A2E', '#16213E', '#0F3460', '#533483', '#E94560']
PALETTE_WARM = ['#E50914', '#FF6B35', '#FFB563', '#FFC6AC', '#D4A5A5', '#9B4DCA']
PALETTE_GENRES = ['#E50914', '#FF6347', '#3D405B', '#81B29A', '#F2CC8F',
                  '#E07A5F', '#8338EC', '#06D6A0', '#118AB2', '#073B4C',
                  '#FFD166', '#EF476F', '#26547C', '#84A98C', '#B5838D']
TYPE_COLORS = {'Movie': '#E50914', 'TV Show': '#1A1A2E'}

def style_ax(ax, title='', xlabel='', ylabel='', title_size=15):
    ax.set_title(title, fontsize=title_size, fontweight='bold', pad=14, loc='left', color='#1A1A2E')
    ax.set_xlabel(xlabel, fontsize=11, color='#555555')
    ax.set_ylabel(ylabel, fontsize=11, color='#555555')
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.spines['left'].set_linewidth(0.5)
    ax.spines['bottom'].set_linewidth(0.5)

def annotate_bars(ax, bars, fmt='{:.0f}', fontsize=8.5, offset=0, color='#333333', bold=True):
    for bar in bars:
        h = bar.get_height()
        ax.text(bar.get_x() + bar.get_width()/2, h + offset,
                fmt.format(h), ha='center', va='bottom',
                fontsize=fontsize, fontweight='bold' if bold else 'normal', color=color)

print("All set. Let's dig in.")


## 2. Data Loading and Cleaning

In [None]:
df = pd.read_csv('/kaggle/input/netflix-dataset/netflix_titles.csv')

# Parse dates
df['date_added'] = pd.to_datetime(df['date_added'].str.strip(), format='%B %d, %Y', errors='coerce')
df['year_added'] = df['date_added'].dt.year
df['month_added'] = df['date_added'].dt.month
df['month_name'] = df['date_added'].dt.strftime('%b')

# Parse duration
df['duration_min'] = df['duration'].str.extract(r'(\d+)').astype(float)
df['is_movie'] = df['type'] == 'Movie'

# Primary country (first listed)
df['primary_country'] = df['country'].str.split(',').str[0].str.strip()

# Extract genres into a flat series for analysis
genre_series = df['listed_in'].str.split(', ').explode().str.strip()

print(f"Loaded {len(df):,} titles")
print(f"Movies: {df['is_movie'].sum():,} | TV Shows: {(~df['is_movie']).sum():,}")
print(f"Null values:\n{df[['director','cast','country','date_added','rating']].isnull().sum().to_string()}")
print(f"\nDate range: {df['date_added'].min().date()} to {df['date_added'].max().date()}")


## 3. The Netflix Content Explosion

Netflix started streaming in 2007 with about 1,000 titles. By 2021, its library had grown to nearly 9,000. But the growth was not linear. There was a massive acceleration starting around 2016 when Netflix pivoted hard into original and licensed international content. Let's visualize that trajectory.


In [None]:
# Stacked area chart: content additions by type over time
yearly = df.groupby(['year_added', 'type']).size().unstack(fill_value=0)
yearly = yearly.loc[2015:]  # Focus on meaningful years

fig, ax = plt.subplots(figsize=(15, 7))

ax.fill_between(yearly.index, 0, yearly['Movie'], alpha=0.85, color='#E50914', label='Movies')
ax.fill_between(yearly.index, yearly['Movie'], yearly['Movie'] + yearly['TV Show'],
                alpha=0.85, color='#1A1A2E', label='TV Shows')

# Annotate totals
for yr in yearly.index:
    total = yearly.loc[yr].sum()
    ax.text(yr, total + 30, f"{int(total)}", ha='center', va='bottom',
            fontsize=10, fontweight='bold', color='#333333')

# Annotate the peak
peak_yr = yearly.sum(axis=1).idxmax()
ax.annotate(f'Peak: {int(yearly.sum(axis=1).max())} titles added',
            xy=(peak_yr, yearly.sum(axis=1).max()),
            xytext=(peak_yr - 1.5, yearly.sum(axis=1).max() + 200),
            arrowprops=dict(arrowstyle='->', color='#E50914', lw=1.5),
            fontsize=11, fontweight='bold', color='#E50914')

ax.set_xlim(2015, 2021)
ax.set_xticks(range(2015, 2022))
style_ax(ax, 'Netflix Content Additions by Year', xlabel='Year', ylabel='Titles Added')
ax.legend(fontsize=12, loc='upper left', framealpha=0.9)

plt.tight_layout()
plt.show()


**Insight:** Netflix hit its content acquisition peak in 2019 with over 2,000 titles added in a single year. After that, additions actually declined, likely driven by a combination of COVID production shutdowns and Netflix's strategic shift toward fewer but higher quality originals. Also notice how TV shows have steadily eaten into the share: in 2015, shows were a small fraction. By 2021, they made up roughly a third of all new additions.


## 4. The Strategic Shift: Movies vs TV Shows Over Time

Netflix clearly started as a movie platform. But is the ratio changing? Let's look at how the Movie to TV Show balance has evolved year by year.


In [None]:
# Ratio evolution with dual axis
yearly_pct = yearly.div(yearly.sum(axis=1), axis=0) * 100

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6), gridspec_kw={'width_ratios': [2, 1]})

# Left: Stacked percentage area
ax1.fill_between(yearly_pct.index, 0, yearly_pct['Movie'], alpha=0.8, color='#E50914', label='Movies')
ax1.fill_between(yearly_pct.index, yearly_pct['Movie'], 100, alpha=0.8, color='#1A1A2E', label='TV Shows')

# Add percentage labels
for yr in yearly_pct.index:
    mv = yearly_pct.loc[yr, 'Movie']
    tv = yearly_pct.loc[yr, 'TV Show']
    ax1.text(yr, mv/2, f"{mv:.0f}%", ha='center', va='center', fontsize=9,
             fontweight='bold', color='white')
    ax1.text(yr, mv + tv/2, f"{tv:.0f}%", ha='center', va='center', fontsize=9,
             fontweight='bold', color='white')

ax1.set_xlim(2015, 2021)
ax1.set_xticks(range(2015, 2022))
ax1.set_ylim(0, 100)
ax1.yaxis.set_major_formatter(mticker.PercentFormatter())
style_ax(ax1, 'Content Mix Evolution (% of Additions)', xlabel='Year')
ax1.legend(fontsize=11, loc='upper right')

# Right: Overall library donut
sizes = df['type'].value_counts()
colors = [TYPE_COLORS[t] for t in sizes.index]
wedges, texts, autotexts = ax2.pie(sizes, labels=None, colors=colors,
                                     autopct='%1.1f%%', startangle=90,
                                     pctdistance=0.75, wedgeprops=dict(width=0.4, edgecolor='white', linewidth=2))
for autotext in autotexts:
    autotext.set_fontsize(13)
    autotext.set_fontweight('bold')
    autotext.set_color('white')

centre_circle = plt.Circle((0, 0), 0.55, fc='#FAFAFA')
ax2.add_artist(centre_circle)
ax2.text(0, 0.05, f"{len(df):,}", ha='center', va='center', fontsize=22, fontweight='bold', color='#1A1A2E')
ax2.text(0, -0.12, "Total Titles", ha='center', va='center', fontsize=10, color='#888888')
ax2.legend(sizes.index, loc='lower center', fontsize=11, ncol=2, frameon=False,
           bbox_to_anchor=(0.5, -0.05))
ax2.set_title('Full Library Split', fontsize=14, fontweight='bold', pad=12, loc='center', color='#1A1A2E')

plt.tight_layout()
plt.show()


**Insight:** TV shows have been steadily gaining ground. In 2015, about 68% of new additions were movies. By 2021, movies had dropped to roughly 66%, with TV shows commanding a full third of new content. The overall library is still 70/30 in favor of movies, but the trend is clear: Netflix is investing more and more in serialized content. This makes strategic sense because TV shows drive longer engagement and subscription retention.


## 5. The Genre Landscape

Netflix tags each title with one or more genres. What does the overall genre map look like? A treemap gives us both the ranking and the relative scale in one shot.


In [None]:
# Treemap of top genres
genre_counts = genre_series.value_counts().head(18)

fig, ax = plt.subplots(figsize=(16, 9))

# Calculate sizes and create labels
labels = [f"{g}\n{c:,}" for g, c in zip(genre_counts.index, genre_counts.values)]
colors = PALETTE_GENRES + ['#D4A5A5', '#B8B8D1', '#95B8D1']

squarify.plot(sizes=genre_counts.values,
              label=labels,
              color=colors[:len(genre_counts)],
              alpha=0.88,
              text_kwargs={'fontsize': 10, 'fontweight': 'bold', 'color': 'white'},
              edgecolor='white', linewidth=2.5,
              ax=ax)

ax.set_title('Netflix Genre Landscape (Top 18 Genres)', fontsize=18, fontweight='bold',
             pad=15, loc='left', color='#1A1A2E')
ax.axis('off')

plt.tight_layout()
plt.show()


**Insight:** "International Movies" dominates the genre tags with over 2,700 titles, followed by Dramas (2,400+) and Comedies (1,670+). This tells us something important about Netflix's global strategy: international content is not a niche category. It is the backbone of the library. Also interesting is how "International TV Shows" ranks 4th, reinforcing the idea that Netflix bets heavily on non-US content across both formats. Stand-Up Comedy at around 340 titles is a surprisingly large standalone category, reflecting Netflix's well known investment in comedy specials.


## 6. Genre Split: Movies vs TV Shows

Not all genres are created equal across formats. Some genres lean heavily toward movies, others toward TV. Let's use a diverging bar chart to compare.


In [None]:
genre_df = df.copy()
genre_df['genre'] = genre_df['listed_in'].str.split(', ')
genre_df = genre_df.explode('genre')
genre_df['genre'] = genre_df['genre'].str.strip()

top_genres = genre_df['genre'].value_counts().head(14).index
genre_type_counts = genre_df[genre_df['genre'].isin(top_genres)].groupby(['genre', 'type']).size().unstack(fill_value=0)
genre_type_counts = genre_type_counts.loc[top_genres]  # Keep the order

fig, ax = plt.subplots(figsize=(14, 8))

y_pos = np.arange(len(genre_type_counts))
movie_vals = genre_type_counts.get('Movie', 0)
tv_vals = genre_type_counts.get('TV Show', 0)

# Movies go right, TV shows go left
bars_m = ax.barh(y_pos, movie_vals, height=0.6, color='#E50914', alpha=0.85, label='Movies', edgecolor='white')
bars_t = ax.barh(y_pos, -tv_vals, height=0.6, color='#1A1A2E', alpha=0.85, label='TV Shows', edgecolor='white')

# Labels
for i, (m, t) in enumerate(zip(movie_vals, tv_vals)):
    if m > 0:
        ax.text(m + 15, i, f"{m}", va='center', fontsize=9, fontweight='bold', color='#E50914')
    if t > 0:
        ax.text(-t - 15, i, f"{t}", va='center', ha='right', fontsize=9, fontweight='bold', color='#1A1A2E')

ax.set_yticks(y_pos)
ax.set_yticklabels(genre_type_counts.index, fontsize=11)
ax.axvline(0, color='#999999', linewidth=0.8)
ax.set_xlabel('Number of Titles', fontsize=11)

# Clean up x axis to show absolute values
ticks = ax.get_xticks()
ax.set_xticklabels([f"{abs(int(t))}" for t in ticks])

style_ax(ax, 'Genre Breakdown: Movies vs TV Shows', xlabel='Number of Titles')
ax.legend(fontsize=12, loc='lower right', framealpha=0.9)
ax.invert_yaxis()

plt.tight_layout()
plt.show()


**Insight:** Some genres are almost exclusively one format. Documentaries, Stand-Up Comedy, and Independent Movies are overwhelmingly film. Meanwhile, Crime TV Shows, Kids' TV, and Reality TV exist almost entirely in the TV space (naturally). The genres where both formats compete most evenly are Dramas and International content, which Netflix populates heavily across both movies and series.


## 7. Where Does Netflix Content Come From?

Netflix markets itself as a global platform. Let's see which countries actually produce the most content and how the split between movies and TV differs by geography.


In [None]:
# Top 15 countries -- lollipop chart with movie/tv split
country_counts = df['primary_country'].value_counts().head(15)
country_type = df[df['primary_country'].isin(country_counts.index)].groupby(['primary_country', 'type']).size().unstack(fill_value=0)
country_type = country_type.loc[country_counts.index]

fig, ax = plt.subplots(figsize=(15, 8))

y_pos = np.arange(len(country_type))[::-1]

# Stems
for i, (country, row) in enumerate(country_type.iterrows()):
    total = row.sum()
    pos = y_pos[i]
    ax.plot([0, total], [pos, pos], color='#E0E0E0', linewidth=2, zorder=1)
    # Movie dot
    ax.scatter(row.get('Movie', 0), pos, s=120, color='#E50914', zorder=3, edgecolor='white', linewidth=1)
    # TV dot
    ax.scatter(row.get('TV Show', 0), pos, s=120, color='#1A1A2E', zorder=3, edgecolor='white', linewidth=1)
    # Total dot
    ax.scatter(total, pos, s=160, color='#FFB563', zorder=3, edgecolor='white', linewidth=1.5, marker='D')
    # Total label
    ax.text(total + 30, pos, f"{total}", va='center', fontsize=10, fontweight='bold', color='#333333')

ax.set_yticks(y_pos)
ax.set_yticklabels(country_type.index, fontsize=11)
ax.set_xlim(-20, country_type.sum(axis=1).max() + 200)
style_ax(ax, 'Top 15 Content-Producing Countries', xlabel='Number of Titles')

legend_elements = [
    Line2D([0], [0], marker='o', color='w', markerfacecolor='#E50914', markersize=10, label='Movies'),
    Line2D([0], [0], marker='o', color='w', markerfacecolor='#1A1A2E', markersize=10, label='TV Shows'),
    Line2D([0], [0], marker='D', color='w', markerfacecolor='#FFB563', markersize=10, label='Total'),
]
ax.legend(handles=legend_elements, fontsize=11, loc='lower right', framealpha=0.9)

plt.tight_layout()
plt.show()


**Insight:** The US dominates with over 2,800 titles, nearly three times more than India in second place. But India punches hard with almost 1,000 titles, the vast majority being movies (Bollywood and regional cinema). The UK, Japan, and South Korea round out the top 5 and each has a meaningful TV show presence. South Korea's inclusion here foreshadows the Squid Game era. What stands out is how top-heavy this distribution is: the US alone accounts for roughly a third of all Netflix content with a listed country.


## 8. How Long Are Netflix Movies?

Movie runtimes can vary wildly. Let's look at the distribution across different ratings to see if content targeted at different audiences runs longer or shorter.


In [None]:
# Duration ridge plot by rating category
movies = df[(df['type'] == 'Movie') & (df['duration_min'].notna())].copy()

# Group ratings into broader categories
rating_map = {
    'G': 'Kids/Family', 'TV-Y': 'Kids/Family', 'TV-Y7': 'Kids/Family',
    'TV-Y7-FV': 'Kids/Family', 'PG': 'Kids/Family', 'TV-G': 'Kids/Family',
    'PG-13': 'Teen/PG-13', 'TV-PG': 'Teen/PG-13', 'TV-14': 'Teen/PG-13',
    'R': 'Mature', 'TV-MA': 'Mature', 'NC-17': 'Mature',
    'NR': 'Not Rated', 'UR': 'Not Rated'
}
movies['rating_group'] = movies['rating'].map(rating_map)
movies = movies.dropna(subset=['rating_group'])

groups = ['Kids/Family', 'Teen/PG-13', 'Mature', 'Not Rated']
colors_ridge = ['#06D6A0', '#118AB2', '#E50914', '#888888']

fig, axes = plt.subplots(len(groups), 1, figsize=(14, 8), sharex=True)
fig.subplots_adjust(hspace=-0.15)

from scipy.stats import gaussian_kde

for i, (grp, color) in enumerate(zip(groups, colors_ridge)):
    subset = movies[movies['rating_group'] == grp]['duration_min'].dropna()
    ax = axes[i]

    if len(subset) > 2:
        kde = gaussian_kde(subset, bw_method=0.3)
        x_range = np.linspace(0, 250, 500)
        density = kde(x_range)

        ax.fill_between(x_range, density, alpha=0.7, color=color)
        ax.plot(x_range, density, color=color, linewidth=1.5)
        ax.axvline(subset.median(), color=color, linestyle='--', linewidth=1.2, alpha=0.7)

    ax.text(0.02, 0.65, f"{grp}  (n={len(subset):,}, median={subset.median():.0f} min)",
            transform=ax.transAxes, fontsize=11, fontweight='bold', color=color)

    ax.set_yticks([])
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.spines['left'].set_visible(False)
    if i < len(groups) - 1:
        ax.spines['bottom'].set_visible(False)
        ax.set_xticks([])

axes[-1].set_xlabel('Duration (minutes)', fontsize=12)
axes[0].set_title('Movie Runtime Distribution by Rating Category', fontsize=16,
                   fontweight='bold', pad=15, loc='left', color='#1A1A2E')

plt.tight_layout()
plt.show()


**Insight:** Mature-rated movies have the widest runtime spread and the highest median (around 100 minutes). Kids/Family movies cluster tightly around 75 to 85 minutes, which makes sense since younger audiences have shorter attention spans. Teen/PG-13 content sits in between. The "Not Rated" category is small and scattered, likely representing older catalog titles or niche international content that was not formally rated.


## 9. The One-Season Problem: How Long Do TV Shows Last?

A common criticism of Netflix is that it cancels shows too quickly. What does the data say about how many seasons Netflix shows typically run?


In [None]:
tv = df[df['type'] == 'TV Show'].copy()
tv['seasons'] = tv['duration'].str.extract(r'(\d+)').astype(float)

season_counts = tv['seasons'].value_counts().sort_index()
season_counts = season_counts[season_counts.index <= 10]  # Cap at 10 for readability

fig, ax = plt.subplots(figsize=(14, 6))

bars = ax.bar(season_counts.index, season_counts.values,
              color=['#E50914' if s == 1 else '#3D405B' for s in season_counts.index],
              edgecolor='white', linewidth=0.8, width=0.7)

# Percentage labels
total = season_counts.sum()
for bar, val in zip(bars, season_counts.values):
    pct = val / total * 100
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 10,
            f"{val}\n({pct:.1f}%)", ha='center', va='bottom', fontsize=9, fontweight='bold')

# Callout for 1-season
ax.annotate(f'{season_counts.iloc[0] / total * 100:.0f}% of all TV shows\nare single-season',
            xy=(1, season_counts.iloc[0]),
            xytext=(3.5, season_counts.iloc[0] * 0.85),
            arrowprops=dict(arrowstyle='->', color='#E50914', lw=2),
            fontsize=13, fontweight='bold', color='#E50914',
            bbox=dict(boxstyle='round,pad=0.4', facecolor='#FFF0F0', edgecolor='#E50914', alpha=0.9))

style_ax(ax, 'Number of Seasons per TV Show', xlabel='Seasons', ylabel='Number of Shows')
ax.set_xticks(season_counts.index)

plt.tight_layout()
plt.show()


**Insight:** The data confirms the "one and done" reputation. A staggering 67% of all TV shows on Netflix have just one season. Only about 16% make it to a second season, and it drops off a cliff from there. Shows with 5+ seasons are extremely rare on the platform. This could mean Netflix frequently cancels after season one, or it could mean they add a lot of limited series and docuseries that were always intended as single-season runs. Either way, if you start a Netflix show, the odds are not great that you will get a season two.


## 10. Content Ratings: What Audience Is Netflix Building For?

Is Netflix a family-friendly platform or does it skew toward adult content? Let's look at how ratings break down, and how that has changed over time.


In [None]:
# Horizontal bar chart with clean design
rating_order = ['TV-MA', 'TV-14', 'TV-PG', 'R', 'PG-13', 'TV-Y7', 'TV-Y', 'PG', 'TV-G', 'NR', 'G']
rating_data = df['rating'].value_counts().reindex(rating_order).dropna()

# Color code by audience
rating_audience = {
    'TV-MA': '#E50914', 'R': '#E50914', 'NC-17': '#E50914',
    'TV-14': '#FF6B35', 'PG-13': '#FF6B35',
    'TV-PG': '#118AB2', 'PG': '#118AB2', 'TV-G': '#118AB2', 'G': '#118AB2',
    'TV-Y': '#06D6A0', 'TV-Y7': '#06D6A0', 'TV-Y7-FV': '#06D6A0',
    'NR': '#888888', 'UR': '#888888'
}

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 7), gridspec_kw={'width_ratios': [1.5, 1]})

# Left: Horizontal bars
colors = [rating_audience.get(r, '#888888') for r in rating_data.index]
bars = ax1.barh(range(len(rating_data)), rating_data.values, color=colors,
                edgecolor='white', linewidth=0.8, height=0.65)

for i, (val, rating) in enumerate(zip(rating_data.values, rating_data.index)):
    pct = val / len(df) * 100
    ax1.text(val + 20, i, f"{val:,}  ({pct:.1f}%)", va='center', fontsize=9, fontweight='bold')

ax1.set_yticks(range(len(rating_data)))
ax1.set_yticklabels(rating_data.index, fontsize=11)
ax1.invert_yaxis()
style_ax(ax1, 'Content by Rating', xlabel='Number of Titles')

# Right: Pie of audience groups
audience_groups = {
    'Mature (TV-MA, R)': df['rating'].isin(['TV-MA', 'R', 'NC-17']).sum(),
    'Teen (TV-14, PG-13)': df['rating'].isin(['TV-14', 'PG-13']).sum(),
    'Family (PG, TV-PG, G, TV-G)': df['rating'].isin(['TV-PG', 'PG', 'TV-G', 'G']).sum(),
    'Kids (TV-Y, TV-Y7)': df['rating'].isin(['TV-Y', 'TV-Y7', 'TV-Y7-FV']).sum(),
}
ag_series = pd.Series(audience_groups)
ag_colors = ['#E50914', '#FF6B35', '#118AB2', '#06D6A0']

wedges, texts, autotexts = ax2.pie(ag_series, colors=ag_colors, autopct='%1.1f%%',
                                     startangle=90, pctdistance=0.78,
                                     wedgeprops=dict(width=0.45, edgecolor='white', linewidth=2))
for autotext in autotexts:
    autotext.set_fontsize(11)
    autotext.set_fontweight('bold')
    autotext.set_color('white')

centre = plt.Circle((0, 0), 0.52, fc='#FAFAFA')
ax2.add_artist(centre)
ax2.legend(ag_series.index, loc='lower center', fontsize=9, ncol=1, frameon=False,
           bbox_to_anchor=(0.5, -0.1))
ax2.set_title('Audience Breakdown', fontsize=14, fontweight='bold', pad=10, loc='center', color='#1A1A2E')

plt.tight_layout()
plt.show()


**Insight:** Netflix skews heavily toward adult content. TV-MA alone accounts for over 36% of all titles, and when combined with R-rated movies, mature content makes up nearly 46% of the library. Teen-appropriate content (TV-14 and PG-13) adds another 30%. Kid and family content combined is under 25%. This makes sense strategically since Netflix's core paying subscribers are adults, and mature content (think Narcos, Stranger Things, Dark) tends to drive the most buzz and retention.


## 11. Old vs New: When Was Netflix Content Originally Made?

Netflix adds content that was released recently but also fills its library with older titles. How fresh is the catalog really?


In [None]:
fig, ax = plt.subplots(figsize=(15, 7))

# 2D histogram / heatmap of release_year vs year_added
subset = df.dropna(subset=['year_added'])
subset = subset[(subset['release_year'] >= 2000) & (subset['year_added'] >= 2015)]

# Hexbin for density
hb = ax.hexbin(subset['release_year'], subset['year_added'],
               gridsize=20, cmap='YlOrRd', mincnt=1, linewidths=0.3, edgecolors='white')

# Diagonal reference line (released = added same year)
ax.plot([2000, 2021], [2000, 2021], '--', color='#333333', linewidth=1.5, alpha=0.5, label='Released = Added same year')

cb = fig.colorbar(hb, ax=ax, label='Number of Titles', shrink=0.8)

style_ax(ax, 'When Content Was Made vs When Netflix Added It',
         xlabel='Original Release Year', ylabel='Year Added to Netflix')
ax.legend(fontsize=10, loc='upper left')
ax.set_xlim(2000, 2022)
ax.set_ylim(2014.5, 2021.5)

plt.tight_layout()
plt.show()


**Insight:** The densest cluster sits right along the diagonal, meaning most content is added to Netflix within a year or two of its original release. But there is a visible spread to the left of the diagonal, especially in the 2017 to 2019 rows, where Netflix was aggressively back-filling its catalog with movies and shows from 2005 to 2015. By 2020 and 2021, additions cluster much more tightly around recent releases, suggesting Netflix shifted from "fill the library" to "feature what is new."


## 12. When Does Netflix Drop New Content?

Is there a pattern to when Netflix adds titles throughout the year? Let's build a heatmap of additions by month and year.


In [None]:
# Monthly additions heatmap
monthly = df.dropna(subset=['year_added']).copy()
monthly['month_added'] = monthly['date_added'].dt.month
monthly = monthly[(monthly['year_added'] >= 2016) & (monthly['year_added'] <= 2021)]

pivot = monthly.groupby(['year_added', 'month_added']).size().unstack(fill_value=0)
pivot.columns = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
pivot.index = pivot.index.astype(int)

fig, ax = plt.subplots(figsize=(15, 5))

sns.heatmap(pivot, annot=True, fmt='d', cmap='YlOrRd', linewidths=1, linecolor='white',
            ax=ax, cbar_kws={'label': 'Titles Added', 'shrink': 0.8},
            annot_kws={'fontsize': 9})

ax.set_title('Netflix Monthly Content Additions (2016-2021)', fontsize=16,
             fontweight='bold', pad=15, loc='left', color='#1A1A2E')
ax.set_ylabel('')
ax.set_xlabel('')
ax.tick_params(labelsize=11)

plt.tight_layout()
plt.show()


**Insight:** A few patterns jump out. First, December and January tend to be heavy addition months across most years, likely to capture holiday viewership. There is also a visible uptick in October (spooky season content drops). The pandemic impact shows clearly: 2020 sees a dip in the middle months when production pipelines dried up, followed by a recovery push. July 2021 stands out as a massive drop month with 231 titles added in a single month.


## 13. The Busiest Creators on Netflix

Which directors have the most titles on the platform? This gives us a sense of who Netflix keeps coming back to.


In [None]:
# Top directors horizontal lollipop
directors = df['director'].dropna().str.split(', ').explode().str.strip()
top_dirs = directors.value_counts().head(15)

fig, ax = plt.subplots(figsize=(14, 7))

y_pos = np.arange(len(top_dirs))[::-1]
colors_d = ['#E50914' if i < 3 else '#3D405B' for i in range(len(top_dirs))]

# Stems
for i, (director, count) in enumerate(top_dirs.items()):
    pos = y_pos[i]
    ax.plot([0, count], [pos, pos], color='#E0E0E0', linewidth=2, zorder=1)
    ax.scatter(count, pos, s=150, color=colors_d[i], zorder=3, edgecolor='white', linewidth=1.5)
    ax.text(count + 0.3, pos, f" {count}", va='center', fontsize=10, fontweight='bold', color=colors_d[i])

ax.set_yticks(y_pos)
ax.set_yticklabels(top_dirs.index, fontsize=11)
style_ax(ax, 'Most Prolific Directors on Netflix', xlabel='Number of Titles')
ax.set_xlim(0, top_dirs.max() + 3)

plt.tight_layout()
plt.show()


**Insight:** Rajiv Chilaka tops the list, known for the animated series Chhota Bheem, which has many episodes cataloged as separate entries. Raul Campos and Jan Suter, who co-direct many titles together, both appear in the top spots. The list skews international, with Indian and Turkish directors well represented. This again underscores how much of Netflix's raw volume comes from international content. The household name directors (Scorsese, Spielberg, etc.) are not volume players on the platform.


## 14. How Old Is Netflix's Library?

Let's calculate the "content age" for each title: the gap between its original release year and when it was added to Netflix. Are they stocking fresh content or digging into the archives?


In [None]:
# Content age analysis
df_age = df.dropna(subset=['year_added']).copy()
df_age['content_age'] = df_age['year_added'] - df_age['release_year']

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Left: Overall content age distribution
ax1.hist(df_age['content_age'], bins=np.arange(-1, 40, 1),
         color='#E50914', alpha=0.8, edgecolor='white', linewidth=0.5)
ax1.axvline(df_age['content_age'].median(), color='#1A1A2E', linestyle='--', linewidth=2,
            label=f"Median: {df_age['content_age'].median():.0f} years")
style_ax(ax1, 'Content Age When Added to Netflix', xlabel='Years Since Release', ylabel='Number of Titles')
ax1.legend(fontsize=11)
ax1.set_xlim(-1, 35)

# Right: Median content age by year added (trend)
med_age = df_age[df_age['year_added'] >= 2015].groupby('year_added')['content_age'].median()

ax2.plot(med_age.index, med_age.values, 'o-', color='#E50914', linewidth=2.5, markersize=8)
for yr, val in med_age.items():
    ax2.text(yr, val + 0.15, f"{val:.0f}yr", ha='center', fontsize=9, fontweight='bold', color='#E50914')

style_ax(ax2, 'Median Content Age at Addition (by Year)', xlabel='Year Added', ylabel='Median Age (years)')
ax2.set_xticks(range(2015, 2022))
ax2.set_ylim(0, med_age.max() + 2)

plt.tight_layout()
plt.show()


**Insight:** Most Netflix content is added within 1 to 2 years of its original release. The median content age is about 2 years, and the distribution drops off sharply after 5 years. The trend over time is interesting: in 2016 and 2017, Netflix was adding slightly older content (median age around 3 to 4 years) as it built out the library. By 2020 and 2021, the median content age shrinks, reflecting the platform's shift toward originals and first-run licenses. Netflix is getting fresher.


## 15. How Netflix's Genre Mix Has Evolved

This streamgraph shows how the top genre categories have grown and shifted as a share of annual additions. It gives a feel for which genres Netflix has doubled down on and which have stayed flat.


In [None]:
# Genre evolution stacked area (stream-style)
genre_yr = genre_df.dropna(subset=['year_added']).copy()
genre_yr = genre_yr[(genre_yr['year_added'] >= 2016) & (genre_yr['year_added'] <= 2021)]

top_8_genres = genre_yr['genre'].value_counts().head(8).index.tolist()
genre_yr_filtered = genre_yr[genre_yr['genre'].isin(top_8_genres)]

pivot_genre = genre_yr_filtered.groupby(['year_added', 'genre']).size().unstack(fill_value=0)
pivot_genre = pivot_genre[top_8_genres]  # Consistent order

fig, ax = plt.subplots(figsize=(15, 7))

colors_stream = ['#E50914', '#FF6347', '#3D405B', '#81B29A', '#F2CC8F',
                 '#8338EC', '#118AB2', '#06D6A0']

ax.stackplot(pivot_genre.index, pivot_genre.T,
             labels=pivot_genre.columns, colors=colors_stream, alpha=0.85,
             edgecolor='white', linewidth=0.5)

style_ax(ax, 'Genre Composition of Annual Netflix Additions (Top 8)',
         xlabel='Year', ylabel='Number of Genre Tags')
ax.legend(loc='upper left', fontsize=9, ncol=2, framealpha=0.9)
ax.set_xticks(range(2016, 2022))
ax.set_xlim(2016, 2021)

plt.tight_layout()
plt.show()


**Insight:** International Movies and Dramas have been the two dominant genre tags every year, and both expanded massively between 2016 and 2019. Comedies held steady as the third biggest category. The most interesting trend is International TV Shows, which barely registered in 2016 but grew significantly by 2021, mirroring the broader TV show investment pattern we saw earlier. The 2020 to 2021 pullback is visible across nearly all genres, consistent with the overall content slowdown.


## 16. Key Takeaways

After pulling apart 8,807 Netflix titles, here is what the data tells us:

**1. Netflix peaked in 2019.** Content additions hit an all-time high that year with over 2,000 titles. Since then, volume has declined as the platform shifts from quantity to quality.

**2. TV shows are gaining ground.** The library is still 70% movies, but TV shows now make up a third of annual additions and growing. Serialized content drives retention, and Netflix knows it.

**3. International content is the backbone.** "International Movies" is the single largest genre tag. India, UK, Japan, and South Korea are all major contributors. Netflix is not just a US platform anymore.

**4. The one-season problem is real.** 67% of TV shows on Netflix have just one season. Whether that is cancellation or by design, it is a defining characteristic of the platform.

**5. Netflix skews mature.** Nearly half the library is rated TV-MA or R. Family and kids content is under 25%. The core audience is adults.

**6. The catalog is getting fresher.** Median content age at addition has been shrinking. Netflix is leaning into new releases and originals rather than back-catalog licensing.

**7. December and January are prime drop months.** The holiday season sees the biggest content pushes, likely timed to capture viewers with time off.

These patterns paint a picture of a platform in transition. From a "something for everyone" library play to a curated, international, originals-first strategy.
