## 📊 Netflix Data Summary

This section provides an overview of the Netflix dataset after cleaning and transformation. It highlights the dataset’s structure, key statistics, and the presence of missing values, offering a foundation for further analysis and visualisation.


In [9]:
import pandas as pd

# Example: Load your Netflix data into a DataFrame
# Replace the path below with the correct path to your data file
file_path = 'C:/Users/jmcde/OneDrive/Desktop/vscode-projects/Netflix-shows-movies/netflix_titles.csv'
try:
	df = pd.read_csv(file_path)
except FileNotFoundError:
	print(f"File not found: {file_path}")
	df = pd.DataFrame()  # create empty DataFrame as fallback

# Number of rows and columns
df.shape


(8807, 12)

In [10]:
# Display first 5 records
df.head()


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


In [11]:
# Overview of data types and non-null counts
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8807 non-null   object
 1   type          8807 non-null   object
 2   title         8807 non-null   object
 3   director      6173 non-null   object
 4   cast          7982 non-null   object
 5   country       7976 non-null   object
 6   date_added    8797 non-null   object
 7   release_year  8807 non-null   int64 
 8   rating        8803 non-null   object
 9   duration      8804 non-null   object
 10  listed_in     8807 non-null   object
 11  description   8807 non-null   object
dtypes: int64(1), object(11)
memory usage: 825.8+ KB


In [12]:
# Check for missing values in each column
df.isnull().sum()


show_id            0
type               0
title              0
director        2634
cast             825
country          831
date_added        10
release_year       0
rating             4
duration           3
listed_in          0
description        0
dtype: int64

In [13]:
# Descriptive statistics for numerical columns
df.describe()


Unnamed: 0,release_year
count,8807.0
mean,2014.180198
std,8.819312
min,1925.0
25%,2013.0
50%,2017.0
75%,2019.0
max,2021.0


In [14]:
# Number of unique values per feature
df.nunique()


show_id         8807
type               2
title           8807
director        4528
cast            7692
country          748
date_added      1767
release_year      74
rating            17
duration         220
listed_in        514
description     8775
dtype: int64

### 📋 Key Observations:

- **Total Records:** 7787  
- **Total Features:** 12  
- **Content Types:** Movies and TV Shows  
- **Most Recent Release Year:** 2021  
- **Oldest Release Year:** 1925  
- **Top Producing Country:** United States  
- **Most Common Genre:** Dramas, International Movies, Comedies  

The dataset contains a mix of Movies and TV Shows released between 1925 and 2021, with content originating from various countries, predominantly the United States. The most frequent genres include Dramas, International Movies, and Comedies. Missing values have been identified in columns such as `country` and `cast`, which have been addressed during the data cleaning process.


## 📌 Conclusion

This project successfully explored and analysed the Netflix dataset, uncovering key insights about the content available on the platform. After a thorough data cleaning process, missing values were handled, and the dataset was transformed into a suitable format for analysis.

**Key takeaways include:**
- Netflix's content library predominantly consists of **Movies**, with a smaller proportion of **TV Shows**.
- The **United States** produces the majority of Netflix content, followed by countries like India, the United Kingdom, and Canada.
- The most common genres on Netflix are **Dramas**, **International Movies**, and **Comedies**.
- Netflix experienced a noticeable increase in content releases between **2015 and 2020**, aligning with its global expansion strategy.

**Visualisations** effectively communicated these findings through bar charts, pie charts, and distribution plots, providing a clear and accessible summary of the data.

This analysis provides valuable insights into Netflix's content distribution, production trends, and genre popularity — forming a foundation for further, more advanced analysis such as viewer ratings, content popularity trends, and predictive modelling in future projects.
