<div style="padding:20px; 
            color:#36FF00;
            margin:10px;
            text-align: center;
            font-size:200%;
            display:fill;
            border-radius:10px;
            border-style: solid;
            border-color: #36FF00;
            background-color:#000000;
            overflow:hidden;
            font-weight:500">INTRODUCTION</div>

### Unlock a world of cinematic wonders with the AI-driven Movie Recommendation System. Tailored to your unique tastes, it uses advanced algorithms to suggest personalized films, turning every movie night into an adventure of discovery.

#### <a id="top"></a>
# <div style="box-shadow: rgb(255,217,19) 0px 0px 20px 3px inset, rgb(255,255, 255) 10px -10px 5px -3px, rgb(31, 193, 27) 10px -10px, rgb(255, 255, 255) 20px -20px 10px -3px, rgb(60,121,245) 20px -20px, rgb(255, 255, 255) 30px -30px 15px -3px, rgb(255, 156, 85) 30px -30px, rgb(255, 255, 255) 40px -40px 0px -3px; padding:20px; margin-right: 40px; font-size:30px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(255, 85, 85);"><b>TABLE OF CONTENT</b></div>

<div style="background-color: rgba(60, 121, 245, 0.03); padding:30px; font-size:15px; font-family: consolas;">
<ul>
    <li><a href="#1" target="_self" rel=" noreferrer nofollow">1. Importing Some Libraries </a></li>
    <li><a href="#2" target="_self" rel=" noreferrer nofollow">2. Data Collection & Processing </a></li>
    <li><a href="#3" target="_self" rel=" noreferrer nofollow">3. Model</a></li>
    <li><a href="#4" target="_self" rel=" noreferrer nofollow">4. Training</a></li>
    <li><a href="#5" target="_self" rel=" noreferrer nofollow">5. Evaluation</a></li>
    <li><a href="#6" target="_self" rel=" noreferrer nofollow">6. Confusion Matrix / Accuracy </a></li>
    
</ul>

#### <a id="1"></a>
# <div style="box-shadow: rgb(31 , 193 , 27)  0px 0px 20px 3px inset, rgb(255,255, 255) 10px -10px 5px -3px, rgb(0 , 121 ,245 ) 20px -10px, rgb(255, 255, 255) 20px -20px 10px -3px, rgb(255,85,85) 30px -20px, rgb(255, 255, 255) 30px -30px 15px -3px; padding:20px; margin-right: 40px; font-size:30px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(255, 85, 85);"><b>IMPORTING SOME LIBRARIES</b></div>

In [1]:
!pip install ydata-profiling



In [2]:
#PYTORCH
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

#PROFILEREPORT
from ydata_profiling import ProfileReport

#SKLEAN
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix

#PANDAS FOR DATA PROCESSING 
import pandas as pd

  def hasna(x: np.ndarray) -> bool:


### **Checking if GPU is available or not**

In [3]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


#### <a id="2"></a>
# <div style="box-shadow: rgb(255 , 156 , 85) 0px 0px 20px 3px inset, rgb(255,255, 255) 10px -10px 5px -3px, rgb(127, 27, 193) 20px -10px, rgb(255, 255, 255) 20px -20px 10px -3px; padding:20px; margin-right: 40px; font-size:30px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(255, 85, 85);"><b>DATA COLLECTION / PRE-PROCESSING </b></div>

In [4]:
movies = pd.read_csv('/kaggle/input/movierecommenderdataset/movies.csv')
ratings = pd.read_csv('/kaggle/input/movierecommenderdataset/ratings.csv')

In [5]:
movies.isnull().sum()
print("\n")
ratings.isnull().sum()





userId       0
movieId      0
rating       0
timestamp    0
dtype: int64

### **Merging the data frames on 'movieId'**

In [6]:
data = pd.merge(ratings, movies, on='movieId')

In [7]:
ProfileReport(data)

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]



### **Creating unique indices for users and movies**

In [8]:
user_ids = data['userId'].unique()
movie_ids = data['movieId'].unique()

### **Creating a mapping between original IDs and indices**

In [9]:
user_id_mapping = {original: idx for idx, original in enumerate(user_ids)}
movie_id_mapping = {original: idx for idx, original in enumerate(movie_ids)}

### **Adding new columns with the indices**

In [10]:
data['userIndex'] = data['userId'].apply(lambda x: user_id_mapping[x])
data['movieIndex'] = data['movieId'].apply(lambda x: movie_id_mapping[x])

### **Extracting movie genres as a list**

In [11]:
data['genres_list'] = data['genres'].apply(lambda x: x.split('|'))

### **Spliting the data into training and testing sets**

In [12]:
train, test = train_test_split(data, test_size=0.2, random_state=42)

### **Defining the PyTorch dataset**

In [13]:
class HybridDataset(Dataset):
    
    """ -> Initialization of the dataset 
             > Converting uer , movie , genre , ratings to torch tensors 
        -> Returning the length of user indices
        -> getting indices of user , movie , genre , ratings """
    
    def __init__(self, user_indices, movie_indices, genres, ratings):
        self.user_indices = torch.tensor(user_indices, dtype=torch.long)
        self.movie_indices = torch.tensor(movie_indices, dtype=torch.long)
        self.genres = torch.stack(genres)  # Stack the list of tensors
        self.ratings = torch.tensor(ratings, dtype=torch.float32)

    def __len__(self):
        return len(self.user_indices)

    def __getitem__(self, idx):
        return self.user_indices[idx], self.movie_indices[idx], self.genres[idx], self.ratings[idx]

### **Creating DataLoader for training and testing sets**

In [14]:
train_dataset = HybridDataset(train['userIndex'].values, train['movieIndex'].values,
                              [torch.tensor([1 if genre in genres else 0 for genre in data['genres'].unique()],
                                            dtype=torch.float32) for genres in train['genres_list']],
                              train['rating'].values)
test_dataset = HybridDataset(test['userIndex'].values, test['movieIndex'].values,
                             [torch.tensor([1 if genre in genres else 0 for genre in data['genres'].unique()],
                                           dtype=torch.float32) for genres in test['genres_list']],
                             test['rating'].values)

#### <a id="4"></a>
# <div style="box-shadow: rgb(193, 27, 138) 0px 0px 20px 3px inset, rgb(255,255, 255) 10px -10px 5px -3px, rgb(27, 160, 193) 20px -10px, rgb(255, 255, 255) 20px -20px 10px -3px; padding:20px; margin-right: 40px; font-size:30px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(255, 85, 85);"><b>MODEL</b></div>

In [15]:
class HybridModel(nn.Module):
    
    """ This a class of my Hybrid_model 
    -> Firstly , We define a function for initialization
          > We define embeddings for user and movies
          > We define MLP(MULTI-LAYER PERCEPTRON) for genre 
          > Concatenate user, movie, and genre embeddings
    -> Then we define a function for forward pass 
          > Apply MLP to genre information
          > Concatenate user, movie, and genre embeddings
          > Apply a linear layer to get the final prediction  """
    
    
    def __init__(self, num_users, num_movies, num_genres, embedding_size, hidden_size, dropout_rate=0.5):
        super(HybridModel, self).__init__()
        self.user_embedding = nn.Embedding(num_users, embedding_size)
        self.movie_embedding = nn.Embedding(num_movies, embedding_size)
        self.genre_layer = nn.Sequential(
            nn.Linear(num_genres, hidden_size),
            nn.ReLU(),
            nn.Dropout(dropout_rate),
            nn.Linear(hidden_size, embedding_size),  # Adjusted this layer
            nn.ReLU()
        )
        self.concat_layer = nn.Linear(embedding_size * 3, 1)

    def forward(self, user_indices, movie_indices, genres):
        user_embedded = self.user_embedding(user_indices)
        movie_embedded = self.movie_embedding(movie_indices)
        genre_embedded = self.genre_layer(genres)
        concatenated = torch.cat([user_embedded, movie_embedded, genre_embedded], dim=1)
        output = self.concat_layer(concatenated).squeeze()

        return output

### **Instantiate the model**

In [16]:
model = HybridModel(num_users=len(user_ids), num_movies=len(movie_ids), num_genres=len(data['genres'].unique()),
                    embedding_size=50, hidden_size=64)

In [17]:
model

HybridModel(
  (user_embedding): Embedding(610, 50)
  (movie_embedding): Embedding(9724, 50)
  (genre_layer): Sequential(
    (0): Linear(in_features=951, out_features=64, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=64, out_features=50, bias=True)
    (4): ReLU()
  )
  (concat_layer): Linear(in_features=150, out_features=1, bias=True)
)

#### **Moving the model to GPU**

In [18]:
model.to(device)

HybridModel(
  (user_embedding): Embedding(610, 50)
  (movie_embedding): Embedding(9724, 50)
  (genre_layer): Sequential(
    (0): Linear(in_features=951, out_features=64, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=64, out_features=50, bias=True)
    (4): ReLU()
  )
  (concat_layer): Linear(in_features=150, out_features=1, bias=True)
)

### **Defining loss function and optimizer**

In [19]:
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

### **Moving the data loaders to GPU**

In [20]:
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, pin_memory=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False, pin_memory=True)

#### <a id="5"></a>
# <div style="box-shadow: rgb(27, 146, 193) 0px 5px 40px 3px inset, rgb(255,255, 255) 10px -10px 5px -3px, rgb(31, 193, 27) 20px -10px, rgb(255, 255, 255) 20px -20px 10px -3px; padding:20px; margin-right: 40px; font-size:30px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(255, 85, 85);"><b>Training</div>

In [21]:
num_epochs = 10
for epoch in range(num_epochs):
    model.train()
    for user_indices, movie_indices, genres, ratings in train_loader:
        user_indices, movie_indices, genres, ratings = user_indices.to(device), movie_indices.to(device), genres.to(device), ratings.to(device)

        optimizer.zero_grad()
        predictions = model(user_indices, movie_indices, genres)
        loss = criterion(predictions, ratings)
        loss.backward()
        optimizer.step()

#### <a id="6"></a>
# <div style="box-shadow: rgb(116, 193, 27) 0px 0px 30px 3px inset, rgb(255,255, 255) 10px -10px 5px -3px, rgb(31, 193, 27) 20px -10px, rgb(255, 255, 255) 40px -20px 10px -3px; padding:20px; margin-right: 40px; font-size:30px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(255, 85, 85);"><b>EVALUATION</b></div>

In [22]:
model.eval()
all_predictions = []
all_ratings = []

with torch.no_grad():
    for user_indices, movie_indices, genres, ratings in test_loader:
        user_indices, movie_indices, genres, ratings = user_indices.to(device), movie_indices.to(device), genres.to(device), ratings.to(device)
        predictions = model(user_indices, movie_indices, genres)
        all_predictions.extend(predictions.cpu().tolist())
        all_ratings.extend(ratings.cpu().tolist())

### **Converting ratings to binary recommendations using a threshold (e.g., 3.5)**

In [23]:
threshold = 3.5
binary_predictions = [1 if pred >= threshold else 0 for pred in all_predictions]
binary_ratings = [1 if rating >= threshold else 0 for rating in all_ratings]

#### <a id="7"></a>
# <div style="box-shadow: rgb(27, 146, 193) 0px 5px 40px 3px inset, rgb(255,255, 255) 10px -10px 5px -3px, rgb(31, 193, 27) 20px -10px, rgb(255, 255, 255) 20px -20px 10px -3px; padding:20px; margin-right: 40px; font-size:30px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(255, 85, 85);"><b>CONFUSION MATRIX & ACCURACY</b></div>

In [24]:
accuracy = accuracy_score(binary_ratings, binary_predictions)
print(f'Accuracy: {accuracy:.4f}')

# print confusion matrix
conf_matrix = confusion_matrix(binary_ratings, binary_predictions)
print('Confusion Matrix:')
print(conf_matrix)

Accuracy: 0.7019
Confusion Matrix:
[[5547 2285]
 [3728 8608]]


# <div style="box-shadow: rgba(240, 46, 170, 0.4) -5px 5px inset, rgba(240, 46, 170, 0.3) -10px 10px inset, rgba(240, 46, 170, 0.2) -15px 15px inset, rgba(240, 46, 170, 0.1) -20px 20px inset, rgba(240, 46, 170, 0.05) -25px 25px inset; padding:20px; font-size:30px; font-family: consolas; display:fill; border-radius:15px; color: rgba(240, 46, 170, 0.7)"> <b> 💻 Thank You!</b></div>