Age ratings for shows in the dataset

📊 Netflix Data Analysis Project Author: Pierre Lukozi LinkedIn: https://www.linkedin.com/in/pierre-musili-65195a112/)

🎯 Objective This project aims to perform an exploratory data analysis (EDA) on the Netflix dataset. The goal is to uncover key insights regarding:

Types of content (Movies vs. TV Shows)

Most frequent directors

Top producing countries

Distribution of age ratings

Trends in content release and addition over time

📁 Project Structure This notebook includes the following steps:

Data loading and cleaning

Exploratory Data Analysis (EDA)

Visualization of main insights

📈 Visual Results & Interpretations

Distribution of Age Ratings python Copier Modifier

Age ratings for shows in the dataset

for p in ax.patches: height = p.get_height() ax.annotate(f'{height}', (p.get_x() + p.get_width() / 2, height), ha='center', va='bottom', fontsize=10) plt.xticks(rotation=45) plt.tight_layout() plt.show() ✅ Insight: The most common age ratings are TV-MA, TV-14, and TV-PG, indicating a preference for mature and teen-oriented content.

2. Top 10 Most Prolific Directors # Top 10 directors in the dataset

plt.figure(figsize=(10, 5))

Données des 10 meilleurs réalisateurs

top_directors = df1[~(df1["director"] == "Unknown")].director.value_counts().nlargest(10)

Tracer le graphique

ax = top_directors.plot(kind="barh", color="skyblue")

Ajouter les étiquettes sur les barres

for i, v in enumerate(top_directors): ax.text(v + 1, i, str(v), color='black', va='center', fontsize=10)

plt.title("Top 10 directors by number of shows directed") plt.xlabel("Number of shows") plt.tight_layout() plt.show()

... ✅ Insight: Directors like Raúl Campos and Jan Suter appear most frequently. They are often associated with local productions and documentaries.

3. Age Rating Distribution for Top 10 Directors python Copier Modifier # Distribution of Rating the 10 best Directors # Filter directors "known" df_known = df1[df1["director"] != "Unknown"]

Maintains the 10 bests directors

top_directors = df_known["director"].value_counts().nlargest(10).index df_top_directors = df_known[df_known["director"].isin(top_directors)]

use plot

plt.figure(figsize=(12,6)) ax = sns.countplot(data=df_top_directors, x="director", hue="rating", order=top_directors) plt.title("Distribution of Rating the 10 best Directors") plt.xlabel("Directors") plt.ylabel("Count number of rating") plt.xticks(rotation=45) plt.legend(title="Rating") plt.tight_layout()

Add patches

for p in ax.patches: height = p.get_height() if height > 0: ax.annotate(f'{height}', (p.get_x() + p.get_width() / 2., height), ha='center', va='bottom', fontsize=8)

plt.show() ... ✅ Insight: Each director works with a variety of ratings. Some focus on mature content, while others on more family-friendly genres.

4. Rating Distribution by Country (Top 10) # Distribution of rating per country (Top 10 country) # Top 10 country top_countries = df1['country'].value_counts().nlargest(10).index df_top_countries = df1[df1['country'].isin(top_countries)]

use plot

plt.figure(figsize=(12,6)) ax = sns.countplot(data=df_top_countries, x='country', hue='rating', order=top_countries) plt.title("Distribution of rating per country (Top 10 country)") plt.xlabel("Country") plt.ylabel("Count number of rating") plt.xticks(rotation=45) plt.legend(title="Rating") plt.tight_layout()

Add patches

for p in ax.patches: height = p.get_height() if height > 0: ax.annotate(f'{height}', (p.get_x() + p.get_width() / 2., height), ha='center', va='bottom', fontsize=8)

plt.show() ... ✅ Insight: The United States dominates in volume, followed by India and the United Kingdom, reflecting their strong media production industries.

5. Release Year Distribution

sns.histplot(df1["release_year"]) ✅ Insight: Most content was released between 2010 and 2020, peaking in 2018–2019, likely due to Netflix’s global expansion.

6. Number of Shows Released Each Year Since 2008

Number of shows released each year since 2008

order = range(2008,2022) plt.figure(figsize=(10,5))

Personnalize colours

palette_colors = {"Movie": "skyblue"} p = sns.countplot(x="release_year",data=df1, hue="type", order = order, palette=palette_colors) plt.title("Number of shows released each year since 2008 that are on Netflix") plt.xlabel("") for i in p.patches: p.annotate(format(i.get_height(), '.0f'), (i.get_x() + i.get_width() / 2., i.get_height()), ha = "center", va = "center", xytext = (0, 10), textcoords = "offset points") ... ✅ Insight: Netflix's content production significantly increased after 2015, with a strong upward trend until 2020.

🧰 Technologies Used Python

Pandas

Matplotlib

Seaborn

Jupyter Notebook

📌 Conclusion This exploratory data analysis (EDA) revealed key trends in Netflix's catalog: the dominance of certain countries and directors, a strong push in recent years for more content, and a concentration in mature audience ratings.

These findings form a solid foundation for future work such as:

Building a recommendation engine

Genre and duration analysis

NLP on descriptions and reviews

🖊️ Author Pierre Lukozi 📎 LinkedIn Profile

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
EDA.ipynb		EDA.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Age ratings for shows in the dataset

Données des 10 meilleurs réalisateurs

Tracer le graphique

Ajouter les étiquettes sur les barres

Maintains the 10 bests directors

use plot

Add patches

use plot

Add patches

Number of shows released each year since 2008

Personnalize colours

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Age ratings for shows in the dataset

Données des 10 meilleurs réalisateurs

Tracer le graphique

Ajouter les étiquettes sur les barres

Maintains the 10 bests directors

use plot

Add patches

use plot

Add patches

Number of shows released each year since 2008

Personnalize colours

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages