**This Netflix dataset provides detailed information about the movies and TV shows available on the streaming platform. It includes key attributes such as title, director, cast, country, release year, rating, duration, genre (listed_in), and the date each title was added to Netflix. This data helps in understanding the platform’s content distribution patterns across countries, genres, and years. Through exploratory data analysis, users can uncover insights about Netflix’s expansion trends, popular content categories, and audience preferences.**

**To load to the dataset, first we need to import some necessary libraries.**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

**Loading the dataset.**

In [2]:
df = pd.read_csv('/kaggle/input/netflix-shows/netflix_titles.csv')

Let's see the first 5 rows of the dataset.

In [None]:
df.head()

In [None]:
df.shape

In [None]:
df.isnull().sum()

**The dataset has multiple missing values. Filling the missing values with 'Unknown'.**

In [6]:
df['director'] = df['director'].fillna('Unknown')

In [7]:
df['cast'] = df['cast'].fillna('Unknown')

In [8]:
df['country'] = df['country'].fillna('Unknown')

In [9]:
df['date_added'] = df['date_added'].fillna('NaT')

In [10]:
df['rating'] = df['rating'].fillna('Unknown')

In [11]:
df['duration'] = df['duration'].fillna('Unknown')

In [None]:
df.isnull().sum()

In [None]:
df.dtypes

**Changing the date_added and release_year into datetime and int respectively**

In [14]:
df['date_added'] = pd.to_datetime(df['date_added'], errors = 'coerce')

From here starts the visualizations

**How many Movies and TV shows are recorded in the dataset?**

In [None]:
content_counts = df['type'].value_counts()
print(content_counts)

**What is the distribution of Movies/TV shows over year?**

In [None]:
release_counts = df['release_year'].value_counts().sort_index()

plt.figure(figsize=(12, 6))
sns.lineplot(x=release_counts.index, y=release_counts.values, marker='o', color='b')
plt.title('Distribution of Movie/TV Show Releases Over Time', fontsize=16)
plt.xlabel('Release Year', fontsize=14)
plt.ylabel('Number of Releases', fontsize=14)
plt.xticks(ticks=release_counts.index[::5], rotation=45)

for x, y in zip(release_counts.index, release_counts.values):
    plt.text(x, y, str(y), color='black', ha='center', va='bottom')

plt.tight_layout()
plt.show()

**How are the ratings distributed by content type?**

In [None]:
plt.figure(figsize=(12, 6))
sns.countplot(x='type', hue='rating', data=df, palette='Set2')

plt.title('Rating Distribution by Content Type (Movie vs TV Show)', fontsize=16)
plt.xlabel('Content Type', fontsize=14)
plt.ylabel('Number of Shows/Movies', fontsize=14)

for container in plt.gca().containers:
    plt.bar_label(container, fmt='%d', label_type='edge', fontsize=10, padding=2)

plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

**Which countries produce the most content on netflix?**

In [None]:
country_counts = df['country'].value_counts().head(10)  # Get the top 10 countries with most content

plt.figure(figsize=(12, 6))
sns.barplot(x=country_counts.index, y=country_counts.values, palette='Blues_d')

plt.title('Top 10 Countries with the Most Content on Netflix', fontsize=16)
plt.xlabel('Country', fontsize=14)
plt.ylabel('Number of Shows/Movies', fontsize=14)

for i, value in enumerate(country_counts.values):
    plt.text(i, value + 1, str(value), ha='center', va='bottom', fontsize=12)

plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

**Do certain countries more on tv shows vs movies?**

In [None]:
country_type_counts = df.groupby(['country', 'type']).size().reset_index(name='count')
top_countries = df['country'].value_counts().head(10).index
country_type_counts = country_type_counts[country_type_counts['country'].isin(top_countries)]

plt.figure(figsize=(12, 6))
sns.barplot(x='country', y='count', hue='type', data=country_type_counts, palette='magma')

plt.title('Top 10 Countries: Movies vs TV Shows on Netflix', fontsize=16)
plt.xlabel('Country', fontsize=14)
plt.ylabel('Number of Titles', fontsize=14)

for container in plt.gca().containers:
    plt.bar_label(container, fmt='%d', label_type='edge', fontsize=10, padding=2)

plt.xticks(rotation=45)
plt.legend(title='Content Type')
plt.tight_layout()
plt.show()

**What are the top genres by country?**


In [None]:
df2 = df[['country', 'listed_in']].dropna()

df2 = df2.assign(listed_in=df2['listed_in'].str.split(',')).explode('listed_in')
df2['listed_in'] = df2['listed_in'].str.strip()

top_countries = df2['country'].value_counts().head(5).index
df2 = df2[df2['country'].isin(top_countries)]

genre_country = df2.groupby(['country', 'listed_in']).size().reset_index(name='count')
top_genres = genre_country.groupby('country').apply(lambda x: x.nlargest(3, 'count')).reset_index(drop=True)

plt.figure(figsize=(10, 6))
sns.barplot(x='country', y='count', hue='listed_in', data=top_genres, palette='Set1')
plt.title('Top Genres by Country on Netflix', fontsize=16)
plt.xlabel('Country', fontsize=13)
plt.ylabel('Number of Titles', fontsize=13)

for container in plt.gca().containers:
    plt.bar_label(container, fmt='%d', label_type='edge', fontsize=9, padding=2)

plt.xticks(rotation=45)
plt.legend(title='Genre', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

**Which directors have the most titles on netflix?**

In [None]:
director_counts = df['director'].dropna().value_counts().head(10)

plt.figure(figsize=(12, 6))
sns.barplot(x=director_counts.values, y=director_counts.index, palette='viridis')
plt.title('Top 10 Directors with the Most Titles on Netflix', fontsize=16)
plt.xlabel('Number of Titles', fontsize=14)
plt.ylabel('Director', fontsize=14)

for index, value in enumerate(director_counts.values):
    plt.text(value + 0.5, index, str(value), va='center', fontsize=10)

plt.tight_layout()
plt.show()

**Conclusion:
The analysis reveals that Netflix’s content library has grown rapidly over time, with a shift toward TV shows in recent years. The United States, India, and the United Kingdom dominate the content production landscape, while genres like Drama, Comedy, and International TV Shows are most prevalent. Overall, the dataset offers valuable insights into Netflix’s global strategy, content diversity, and evolving viewer preferences.**