
# Step 6: Building a Simple Predictive Model (MLPClassifier)

In this section, you will build a simple neural network model to predict whether a title is a **Movie** or a **TV Show** using the Netflix dataset.  
We will use the `MLPClassifier` from `scikit-learn`.

---

## 1. Import required libraries

We will use **pandas** for data manipulation and **scikit-learn** for model building and evaluation.


In [2]:
! pip install pandas
! pip install matplotlib
! pip install seaborn
! pip install scikit-learn
! pip install numpy
! pip install scipy

Collecting matplotlib
  Downloading matplotlib-3.10.7-cp313-cp313-win_amd64.whl.metadata (11 kB)
Collecting contourpy>=1.0.1 (from matplotlib)
  Downloading contourpy-1.3.3-cp313-cp313-win_amd64.whl.metadata (5.5 kB)
Collecting cycler>=0.10 (from matplotlib)
  Downloading cycler-0.12.1-py3-none-any.whl.metadata (3.8 kB)
Collecting fonttools>=4.22.0 (from matplotlib)
  Downloading fonttools-4.60.1-cp313-cp313-win_amd64.whl.metadata (114 kB)
Collecting kiwisolver>=1.3.1 (from matplotlib)
  Downloading kiwisolver-1.4.9-cp313-cp313-win_amd64.whl.metadata (6.4 kB)
Collecting pillow>=8 (from matplotlib)
  Downloading pillow-12.0.0-cp313-cp313-win_amd64.whl.metadata (9.0 kB)
Downloading matplotlib-3.10.7-cp313-cp313-win_amd64.whl (8.1 MB)
   ---------------------------------------- 0.0/8.1 MB ? eta -:--:--
   --------- ------------------------------ 1.8/8.1 MB 10.1 MB/s eta 0:00:01
   ------------------ --------------------- 3.7/8.1 MB 11.0 MB/s eta 0:00:01
   ------------------ -------------

In [3]:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.metrics import classification_report, confusion_matrix


## 2. Load and inspect the dataset

In [3]:

# Load Netflix dataset (from Kaggle public source)
df =df = pd.read_csv("netflix_titles.csv")
df.head()


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


## 3. Data cleaning and preparation

In [4]:

# Drop rows with missing type values
df = df.dropna(subset=['type'])

# Select relevant columns
features = ['release_year', 'duration']
target = 'type'

# Convert duration to numeric where possible
df['duration_num'] = df['duration'].str.extract('(\d+)').astype(float)
X = df[['release_year', 'duration_num']].fillna(0)
y = df[target]

# Encode target variable
encoder = LabelEncoder()
y_encoded = encoder.fit_transform(y)


  df['duration_num'] = df['duration'].str.extract('(\d+)').astype(float)


## 4. Split data into training and test sets

In [5]:

X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=42)


## 5. Train the MLPClassifier

In [6]:

mlp = MLPClassifier(hidden_layer_sizes=(16,), max_iter=1000, random_state=42)
mlp.fit(X_train, y_train)


## 6. Evaluate the model

In [7]:

y_pred = mlp.predict(X_test)
print("Classification Report:\n", classification_report(y_test, y_pred, target_names=encoder.classes_))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))


Classification Report:
               precision    recall  f1-score   support

       Movie       1.00      1.00      1.00      1214
     TV Show       1.00      1.00      1.00       548

    accuracy                           1.00      1762
   macro avg       1.00      1.00      1.00      1762
weighted avg       1.00      1.00      1.00      1762

Confusion Matrix:
 [[1213    1]
 [   0  548]]



## 7. Reflection

- Which features seem most helpful for predicting the type?
- How well does the model perform?
- What preprocessing steps could improve the accuracy?
