In [None]:
import pandas as pd
df = pd.read_csv("netflix_content.csv")
df.head()

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed,Language Indicator,Content Type
0,The Night Agent: Season 1,Yes,2023-03-23,812100000,English,Show
1,Ginny & Georgia: Season 2,Yes,2023-01-05,665100000,English,Show
2,The Glory: Season 1 // 더 글로리: 시즌 1,Yes,2022-12-30,622800000,Korean,Show
3,Wednesday: Season 1,Yes,2022-11-23,507700000,English,Show
4,Queen Charlotte: A Bridgerton Story,Yes,2023-05-04,503000000,English,Movie


# Clean and Preprocess the Data

Before modelling, we need to convert the data into a numerical format. So, let’s clean and preprocess the data:

In [None]:
df['Hours Viewed'] = df['Hours Viewed'].str.replace(',', '', regex=False).astype('int64')

# drop rows with missing titles or duplicate titles
df.dropna(subset=['Title'], inplace=True)
df.drop_duplicates(subset=['Title'], inplace=True)

# create simple content IDs for TensorFlow embeddings
df['Content_ID'] = df.reset_index().index.astype('int32')

# encode 'Language Indicator' and 'Content Type'
df['Language_ID'] = df['Language Indicator'].astype('category').cat.codes
df['ContentType_ID'] = df['Content Type'].astype('category').cat.codes

df[['Content_ID', 'Title', 'Hours Viewed', 'Language_ID', 'ContentType_ID']].head()

Unnamed: 0,Content_ID,Title,Hours Viewed,Language_ID,ContentType_ID
0,0,The Night Agent: Season 1,812100000,0,1
1,1,Ginny & Georgia: Season 2,665100000,0,1
2,2,The Glory: Season 1 // 더 글로리: 시즌 1,622800000,3,1
3,3,Wednesday: Season 1,507700000,0,1
4,4,Queen Charlotte: A Bridgerton Story,503000000,0,0


# Build a Neural Recommendation Model Using TensorFlow

We will use embeddings to capture complex relationships between features like language, type, and content ID:

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, Model

num_contents = df['Content_ID'].nunique()
num_languages = df['Language_ID'].nunique()
num_types = df['ContentType_ID'].nunique()

content_input = layers.Input(shape=(1,), dtype=tf.int32, name='content_id')
language_input = layers.Input(shape=(1,), dtype=tf.int32, name='language_id')
type_input = layers.Input(shape=(1,), dtype=tf.int32, name='content_type')

content_embedding = layers.Embedding(input_dim=num_contents+1, output_dim=32)(content_input)
language_embedding = layers.Embedding(input_dim=num_languages+1, output_dim=8)(language_input)
type_embedding = layers.Embedding(input_dim=num_types+1, output_dim=4)(type_input)

content_vec = layers.Flatten()(content_embedding)
language_vec = layers.Flatten()(language_embedding)
type_vec = layers.Flatten()(type_embedding)

combined = layers.Concatenate()([content_vec, language_vec, type_vec])
x = layers.Dense(64, activation='relu')(combined)
x = layers.Dense(32, activation='relu')(x)
output = layers.Dense(num_contents, activation='softmax')(x)

model = Model(inputs=[content_input, language_input, type_input], outputs=output)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

Embeddings compress high-dimensional categorical data (like content IDs or languages) into dense vectors where similar values cluster together. It will allow our model to learn which content is similar.

# Train the Recommendation Model

We’ll use the content itself as the label so the model learns to predict content from its features. This is a self-supervised learning approach:

In [None]:
model.fit(
    x={
        'content_id': df['Content_ID'],
        'language_id': df['Language_ID'],
        'content_type': df['ContentType_ID']
    },
    y=df['Content_ID'],
    epochs=5,
    batch_size=64
)

Epoch 1/5
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 44ms/step - accuracy: 0.0000e+00 - loss: 9.8788
Epoch 2/5
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 44ms/step - accuracy: 0.0000e+00 - loss: 9.8657
Epoch 3/5
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 44ms/step - accuracy: 0.0014 - loss: 9.6772
Epoch 4/5
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 45ms/step - accuracy: 0.0135 - loss: 8.1816
Epoch 5/5
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 43ms/step - accuracy: 0.1424 - loss: 5.8109


<keras.src.callbacks.history.History at 0x7a8671f0c9e0>

It acts like a smart map for content, plotting everything based on its actual characteristics. On this map, similar things are close neighbors. This lets us suggest new content by saying, "Here are other things that are in your immediate neighborhood."

In short: It groups similar things together based on their real qualities, which allows for intuitive, similarity-based recommendations.



# Recommend Similar Content

Once the model is trained, you can input any show/movie and get a list of similar titles. Here’s how:

In [None]:
import numpy as np

def recommend_similar(content_title, top_k=5):
    content_row = df[df['Title'].str.contains(content_title, case=False, na=False)].iloc[0]
    content_id = content_row['Content_ID']
    language_id = content_row['Language_ID']
    content_type_id = content_row['ContentType_ID']

    predictions = model.predict({
        'content_id': np.array([content_id]),
        'language_id': np.array([language_id]),
        'content_type': np.array([content_type_id])
    })

    top_indices = predictions[0].argsort()[-top_k-1:][::-1]
    recommendations = df[df['Content_ID'].isin(top_indices)]
    return recommendations[['Title', 'Language Indicator', 'Content Type', 'Hours Viewed']]

recommend_similar("Wednesday")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 187ms/step


Unnamed: 0,Title,Language Indicator,Content Type,Hours Viewed
3,Wednesday: Season 1,English,Show,507700000
6537,Hi! Come in: Season 2 // 嗨! 營業中: Season 2,Non-English,Show,1600000
6666,Bakugan: Evolutions: Season 1,English,Show,1500000
7672,Juanita,English,Movie,1100000
9490,First Sunday,English,Movie,600000
12962,Expeditionen ins Tierreich: Wildes Skandinavie...,English,Show,200000


The embeddings map each content item into a 32-dimensional space. Items that are closer in this space are likely to be similar in:

Language
Content Type
Viewership Pattern
So, even without user feedback, your model can say: “Hey, these titles are kind of alike.”

With just content features and deep learning, you’ve now built a powerful, user-independent recommendation system using TensorFlow. This not only showcases your ability to work with embeddings and real-world data but also lays the foundation for building smarter, scalable, and personalized AI systems