# 📌 Feature Engineering — Netflix Dataset

**Objective:**  
Enhance the cleaned Netflix dataset (`netflix_cleaned.csv`) by creating new features that add analytical value for further exploration.

---

## **Steps Covered in this Notebook**
1. **Import Libraries & Load Dataset**
2. **Extract `year_added` and `month_added` from `date_added`**
3. **Calculate `content_age`**
4. **Ensure proper types for `duration_value` and `duration_unit`**
5. **Create binary**


In [2]:

import pandas as pd

df = pd.read_csv("data_cleaned.csv", parse_dates=['date_added'])

df.head()


Unnamed: 0.1,Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,duration_value,duration_unit
0,0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",90.0,Min
1,1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",2.0,Seasons
2,2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,2021-09-24,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,1.0,Season
3,3,s4,TV Show,Jailbirds New Orleans,,,,2021-09-24,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",1.0,Season
4,4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,2.0,Seasons


In [3]:
df['year_added'] = df['date_added'].dt.year
df['month_added'] = df['date_added'].dt.month


In [4]:
df['content_age'] = df['year_added'] - df['release_year']


In [5]:
df['duration_value'] = pd.to_numeric(df['duration_value'], errors='coerce')
df['duration_unit'] = df['duration_unit'].str.lower()



In [6]:
df['is_movie'] = df['type'].apply(lambda x: 1 if x == 'Movie' else 0)


In [7]:
df['genre_count'] = df['listed_in'].apply(lambda x: len(x.split(',')))


In [8]:
df.to_csv("netflix_feature_engineered.csv", index=False)


In [9]:
df.head()

Unnamed: 0.1,Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,duration_value,duration_unit,year_added,month_added,content_age,is_movie,genre_count
0,0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",90.0,min,2021,9,1,1,1
1,1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",2.0,seasons,2021,9,0,0,3
2,2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,2021-09-24,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,1.0,season,2021,9,0,0,3
3,3,s4,TV Show,Jailbirds New Orleans,,,,2021-09-24,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",1.0,season,2021,9,0,0,2
4,4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,2.0,seasons,2021,9,0,0,3



## 💾 Save Dataset

We will save the feature-engineered dataset for use in EDA.



In [10]:
df.to_csv("final.csv", index=False)
print("✅ Feature-engineered dataset saved as netflix_feature_engineered.csv")


✅ Feature-engineered dataset saved as netflix_feature_engineered.csv


In [11]:
df.columns

Index(['Unnamed: 0', 'show_id', 'type', 'title', 'director', 'cast', 'country',
       'date_added', 'release_year', 'rating', 'duration', 'listed_in',
       'description', 'duration_value', 'duration_unit', 'year_added',
       'month_added', 'content_age', 'is_movie', 'genre_count'],
      dtype='object')