# Netflix Movies & TV Shows Exploration
**Objective**:
To explore the Netflix Movies and TV Shows dataset to uncover insights such as content distribution over time, type proportions (Movie vs TV Show), genre popularity, rating categories, content-producing countries, and content durations. The goal is to understand Netflix's content strategy using exploratory data analysis (EDA).

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from collections import Counter
%matplotlib inline


In [None]:
# Load Dataset
df = pd.read_csv("../data/Netflix_movies_and_tv_shows.csv")
df.head()

## Basic Overview

In [None]:
print("Dataset Shape:", df.shape)

In [None]:
print("\nData Types:\n", df.dtypes)


In [None]:
print("\nMissing Values:\n", df.isnull().sum())


In [None]:
print("Duplicate Rows:", df.duplicated().sum())


## Data Cleaning

In [None]:
# Remove leading/trailing whitespace from date_added
df['date_added'] = df['date_added'].str.strip()

# Then safely convert to datetime
df['date_added'] = pd.to_datetime(df['date_added'], errors='coerce')  # 'coerce' will turn invalid entries into NaT
df.head()

In [None]:
# Fill missing values with appropriate placeholders
df['director'].fillna('Unknown', inplace=True)
df['cast'].fillna('Unknown', inplace=True)
df['country'].fillna('Unknown', inplace=True)
df['rating'].fillna('Unrated', inplace=True)


In [None]:
# Extract year from date_added
df['year_added'] = df['date_added'].dt.year

In [None]:
# Split duration into numeric and type (min/seasons)
df['duration_num'] = df['duration'].str.extract(r'(\\d+)').astype(float)
df['duration_type'] = df['duration'].str.extract(r'([a-zA-Z ]+)$').astype(str).apply(lambda x: x.str.strip())

## Visualization & Insights

In [None]:
#  Donut Chart: Type of Content (Movie vs TV Show)
type_counts = df['type'].value_counts()
labels = type_counts.index
sizes = type_counts.values
colors = ['#ff9999', '#66b3ff']

plt.figure(figsize=(6, 6))
plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90,
        colors=colors, wedgeprops=dict(width=0.4))
plt.title('Distribution of Movies vs TV Shows')
plt.axis('equal')
plt.show()


**Insight:**
Netflix hosts 69.1% Movies and 30.9% TV Shows, indicating a strong focus on film content.

In [None]:
# Ratings Distribution
df['rating'].value_counts().head(10).plot(kind='bar', color='coral')
plt.title("Top 10 Content Ratings")
plt.xlabel("Rating")
plt.ylabel("Count")
plt.xticks(rotation=45)
plt.show()


**Insight:**
TV-MA and TV-14 are the most frequent, showing a large amount of teen and adult-oriented content.

In [None]:
# Top 10 Genres 
genres = df['listed_in'].str.split(', ').explode()
top_genres = pd.DataFrame(Counter(genres).most_common(10), columns=['Genre', 'Count'])

sns.barplot(data=top_genres, x='Count', y='Genre', hue='Genre', palette='Blues_d', legend=False)
plt.title("Top 10 Most Common Genres on Netflix")
plt.xlabel("Number of Titles")
plt.ylabel("Genre")
plt.show()


**Insight:**
Netflix's top genres are International Movies, Dramas, Comedies, and Documentaries, showing they cater to varied interests.

In [None]:
# Content Release Over Years
sns.histplot(df['release_year'], bins=30, kde=False, color='green')
plt.title("Content Releases Over the Years")
plt.xlabel("Release Year")
plt.ylabel("Count")
plt.show()


**Insight:**
Most content was released post-2000, with peaks around 2016–2020. Netflix’s recent growth is significant.

In [None]:
# Top 10 Countries Producing Netflix Content
top_countries = df['country'].str.split(', ').explode().value_counts().head(10)
top_countries.plot(kind='barh', color='skyblue')
plt.title("Top 10 Countries Producing Netflix Content")
plt.xlabel("Number of Titles")
plt.gca().invert_yaxis()
plt.show()


**Insight:**
USA is the leading content producer, with India and the U.K. also major contributors, proving Netflix's worldwide reach.

# ✅ Final Conclusion
Netflix mostly focuses on movies, and many of them are made for adults (TV-MA). Most of the content is from the United States, but Netflix is also adding more shows and movies from countries like India, the UK, and others.

The most popular genres on Netflix are International Movies, Dramas, Comedies, International Dramas, and Documentaries. A lot of this content was added after 2016, which shows Netflix has grown a lot in recent years.

This exploratory data analysis provides a clear picture of what kind of content Netflix offers around the world.