## Recommendation System using Python and TensorFlow

Recommendation systems are the invisible engine behind the success of platforms like Netflix, Amazon, Spotify, and YouTube. They personalize your experience by suggesting what to watch, buy, or listen to next.

## Recommendation System using Python and TensorFlow

We’ll use a real Netflix dataset containing titles, content types, languages, and viewing hours. By the end, you’ll have a deep learning model that can answer questions like: If someone liked Wednesday, what else might they enjoy?

## Step 1: Load and Understand the Dataset

We’re using a Netflix 2023 dataset with the following fields:

* Title
* Available Globally?
* Release Date
* Hours Viewed
* Language Indicator
* Content Type

Let’s load the data and move forward:

In [1]:
import pandas as pd

netflix = pd.read_csv("netflix_content.csv")

netflix.head()

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed,Language Indicator,Content Type
0,The Night Agent: Season 1,Yes,2023-03-23,812100000,English,Show
1,Ginny & Georgia: Season 2,Yes,2023-01-05,665100000,English,Show
2,The Glory: Season 1 // 더 글로리: 시즌 1,Yes,2022-12-30,622800000,Korean,Show
3,Wednesday: Season 1,Yes,2022-11-23,507700000,English,Show
4,Queen Charlotte: A Bridgerton Story,Yes,2023-05-04,503000000,English,Movie


This data is rich for content-based filtering, even without user behaviour data

## Step 2: Clean and Preprocess the Data

Before modelling, we need to convert the data into a numerical format. So, let’s clean and preprocess the data:

In [2]:
netflix.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24812 entries, 0 to 24811
Data columns (total 6 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Title                24812 non-null  object
 1   Available Globally?  24812 non-null  object
 2   Release Date         8166 non-null   object
 3   Hours Viewed         24812 non-null  object
 4   Language Indicator   24812 non-null  object
 5   Content Type         24812 non-null  object
dtypes: object(6)
memory usage: 1.1+ MB


In [3]:
netflix["Hours Viewed"] = netflix["Hours Viewed"].str.replace(",", "" , regex = False).astype("int64")

#Dropping Rows with Missing titles or duplicate titles

netflix.dropna(subset= ["Title"], inplace = True)
netflix.drop_duplicates(subset= ["Title"], inplace = True)

# Create Simple content IDs for Tensorflow Embeddings
netflix["Content_ID"] = netflix.reset_index().index.astype("int32")

#Encode "Language Indicator"  and "Content Type"
netflix["Language_ID"] = netflix["Language Indicator"].astype("category").cat.codes
netflix["ContentType_ID"] = netflix["Content Type"].astype("category").cat.codes

netflix.head()

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed,Language Indicator,Content Type,Content_ID,Language_ID,ContentType_ID
0,The Night Agent: Season 1,Yes,2023-03-23,812100000,English,Show,0,0,1
1,Ginny & Georgia: Season 2,Yes,2023-01-05,665100000,English,Show,1,0,1
2,The Glory: Season 1 // 더 글로리: 시즌 1,Yes,2022-12-30,622800000,Korean,Show,2,3,1
3,Wednesday: Season 1,Yes,2022-11-23,507700000,English,Show,3,0,1
4,Queen Charlotte: A Bridgerton Story,Yes,2023-05-04,503000000,English,Movie,4,0,0


TensorFlow doesn’t work with strings; it needs numbers. So, we converted content metadata into categorical encodings for use in embeddings.

## Step 3: Build a Neural Recommendation Model Using TensorFlow

We will use embeddings to capture complex relationships between features like language, type, and content ID:

In [4]:
import tensorflow as tf
from tensorflow.keras import layers, Model

In [7]:
#It counts the number of unique content IDs in the "XXX" column of the netflix DataFrame.

num_contents = netflix["Content_ID"].nunique()
num_languages = netflix["Language_ID"].nunique()
num_types = netflix["ContentType_ID"].nunique()

In [8]:
#This is TensorFlow / Keras code used when building a neural network model, 
#especially in functional API style. It defines an input layer to the model.

content_input = layers.Input(shape=(1,), dtype= tf.int32 , name= "content_id")
language_input = layers.Input(shape=(1,), dtype= tf.int32 , name= "language_id")
type_input = layers.Input(shape=(1,), dtype= tf.int32 , name= "content_type")

* Creating an embedding layer
*  Applying it to the input (content_input)
*   Producing a dense vector representation of the content_id

In [10]:
content_embedding = layers.Embedding(input_dim= num_contents+1 , output_dim= 32)(content_input)
language_embedding = layers.Embedding(input_dim= num_languages+1 , output_dim= 8)(language_input)
type_embedding = layers.Embedding(input_dim= num_types+1 , output_dim= 4)(type_input)

This line flattens the output of the embedding layer — turning a multi-dimensional tensor into a 1D vector per sample — so it can be passed into dense (fully connected) layers or other downstream layers.
* Below:

In [11]:
content_vec = layers.Flatten()(content_embedding)
language_vec = layers.Flatten()(language_embedding)
type_vec = layers.Flatten()(type_embedding)

In [13]:
combined = layers.Concatenate()([content_vec , language_vec , type_vec])

X = layers.Dense(64,activation= "relu")(combined)
X = layers.Dense(32 , activation = "relu")(X)

output = layers.Dense(num_contents , activation= "softmax")(X)


model = Model(inputs = [content_input , language_input , type_input] , outputs = output)
model.compile(optimizer = "adam" , loss = "sparse_categorical_crossentropy", metrics = ["accuracy"])

Embeddings compress high-dimensional categorical data (like content IDs or languages) into dense vectors where similar values cluster together. It will allow our model to learn which content is similar.

## Step 4: Train the Recommendation Model

We’ll use the content itself as the label so the model learns to predict content from its features. This is a self-supervised learning approach:

In [14]:
model.fit(
        x = {"content_id": netflix["Content_ID"],
            "language_id": netflix["Language_ID"],
             "content_type": netflix["ContentType_ID"]
            },
        y = netflix["Content_ID"],epochs = 15 , batch_size = 64
          )

Epoch 1/15
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 30ms/step - accuracy: 0.0000e+00 - loss: 9.8788
Epoch 2/15
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 29ms/step - accuracy: 0.0000e+00 - loss: 9.8661
Epoch 3/15
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 29ms/step - accuracy: 0.0015 - loss: 9.7094
Epoch 4/15
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 31ms/step - accuracy: 0.0105 - loss: 8.3892
Epoch 5/15
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 31ms/step - accuracy: 0.1127 - loss: 6.1094
Epoch 6/15
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 32ms/step - accuracy: 0.3409 - loss: 4.0164
Epoch 7/15
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 33ms/step - accuracy: 0.5962 - loss: 2.3226
Epoch 8/15
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 33ms/step - accuracy: 0.8068 - loss: 1.0820
Epoch 9/15
[1m300/3

<keras.src.callbacks.history.History at 0x207455866b0>

It structures the embedding space based on real metadata. Similar content will have similar embeddings. It will allow recommendations based on closeness in vector space.

## Step 5: Recommend Similar Content

Once the model is trained, you can input any show/movie and get a list of similar titles. Here’s how:

In [15]:
import numpy as np

def recommend_similar(content_title , top_k = 5):
    content_row = netflix[netflix["Title"].str.contains(content_title , case = False , na= False)].iloc[0]
    content_id = content_row["Content_ID"]
    language_id = content_row["Language_ID"]
    content_type_id = content_row["ContentType_ID"]

    predictions = model.predict({
        "content_id" : np.array([content_id]),
        "language_id" : np.array([language_id]),
        "content_type" : np.array([content_type_id]),
    })

    top_indices = predictions[0].argsort()[-top_k-1: ][::-1]
    recommendations = netflix[netflix["Content_ID"].isin(top_indices)]
    return recommendations[["Title" , "Language Indicator" , "Content Type", "Hours Viewed"]]   

In [16]:
recommend_similar("Wednesday") 

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 423ms/step


Unnamed: 0,Title,Language Indicator,Content Type,Hours Viewed
3,Wednesday: Season 1,English,Show,507700000
657,ALVINNN!!! And the Chipmunks: Season 2,English,Show,29300000
3364,Justice Served: Season 1,English,Show,5400000
4948,Superstition: Season 1,English,Show,2900000
9940,Berserk: Season 1 // ベルセルク: シーズン1,Japanese,Show,500000
10906,Last Tango in Halifax: Season 4,English,Show,400000


The embeddings map each content item into a 32-dimensional space. Items that are closer in this space are likely to be similar in:

* Language
* Content Type
* Viewership Pattern
- So, even without user feedback, your model can say: “Hey, these titles are kind of alike.”

## Final Words

With just content features and deep learning, you’ve now built a powerful, user-independent recommendation system using TensorFlow. This not only showcases your ability to work with embeddings and real-world data but also lays the foundation for building smarter, scalable, and personalized AI systems, just like the ones used by Netflix and Amazon.