Netflix Data Analysis Project
Hey there!
This is a mini project where I explored a dataset of Netflix movies and TV shows. The goal was to practice data cleaning, analysis, and visualization while gaining insights into content trends on Netflix.
-
Data Cleaning & Preparation
- Checked for missing values and filled them with
"Not found"
where needed (e.g., missing directors or ratings). - Created a new column
title_len
to store the length of each title for further analysis.
- Checked for missing values and filled them with
-
Exploration & Insights
- Compared the number of Movies vs TV Shows, overall and per year.
- Identified directors with the most content.
- Found the longest titles in the dataset.
- Explored content by country.
-
Statistics
- Calculated mean, median, min, max, and percentile for title lengths using NumPy.
-
Visualizations
- Trend of Movies vs TV Shows per year.
- Ratings distribution (side by side for Movies and TV Shows).
- Director analysis and longest titles.
- Used
tight_layout()
and proper labeling for clarity.
-
Storytelling
- Added Markdown explanations for every chart and table to make insights clear.
- Focused on readability and presentation so that someone can understand the story behind the data.
- Python 3
- pandas
- NumPy
- Matplotlib
- Handling missing data and creating derived columns.
- Using pandas and NumPy for data exploration.
- Plotting clean, informative charts.
- Writing Markdown for storytelling and sharing insights.
- Clone this repo:
git clone <[repo-link](https://github.com/Mukesh2006-dev/Netflix_dataset_analysis)>