# Movie Recommendation System

This notebook demonstrates a movie recommendation system using an epsilon-greedy K-arm bandit algorithm to dynamically suggest movies based on user feedback.

### Import libraries:

In [83]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [84]:
import numpy as np
import pandas as pd
from typing import List, Tuple

### Loading and Cleaning the Dataset

In [85]:
df = pd.read_csv("/content/drive/MyDrive/dataset/movie.csv",)
df

Unnamed: 0,Movie,Reviewer,Publish,Review,Date,Score,unique_id
0,HOTEL TRANSYLVANIA: TRANSFORMANIA,James Luxford,City AM,I guess its always been hard for me to see the...,03/03/2022,40.0,1000
1,HOTEL TRANSYLVANIA: TRANSFORMANIA,Mat Brunet,AniMat's Review (YouTube),Hotel Transylvania: Transformania is a present...,15/02/2022,30.0,1000
2,HOTEL TRANSYLVANIA: TRANSFORMANIA,Robert Levin,Newsday,The conceit still works well enough to mostly ...,29/01/2022,75.0,1000
3,HOTEL TRANSYLVANIA: TRANSFORMANIA,Jackie K. Cooper,jackiekcooper.com,"No Adam Sandler this time out, but the animate...",28/01/2022,60.0,1000
4,HOTEL TRANSYLVANIA: TRANSFORMANIA,Asher Luberto,The Playlist,It's not likely we'll see another one of these...,28/01/2022,74.0,1000
...,...,...,...,...,...,...,...
417053,HITLER: THE LAST TEN DAYS,Roger Ebert,Chicago Sun-Times,"There's no tragedy in this movie, no sense of ...",13/06/2020,25.0,10364
417054,HITLER: THE LAST TEN DAYS,TV Guide Staff,TV Guide,Yet another failed attempt to make the Fuhrer ...,23/02/2012,40.0,10364
417055,THE ONE AND ONLY,Gene Siskel,Chicago Tribune,"""The One and Only"" is really two stories at od...",31/08/2021,50.0,10365
417056,THE ONE AND ONLY,Stanley Eichelbaum,San Francisco Examiner,"[A] silly, old-fashioned comedy directed with ...",31/08/2021,75.0,10365


In [86]:
df.nunique()

Unnamed: 0,0
Movie,9366
Reviewer,5736
Publish,1615
Review,416395
Date,6259
Score,500
unique_id,9366


In [87]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 417058 entries, 0 to 417057
Data columns (total 7 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   Movie      417058 non-null  object 
 1   Reviewer   417058 non-null  object 
 2   Publish    417058 non-null  object 
 3   Review     417058 non-null  object 
 4   Date       417058 non-null  object 
 5   Score      417058 non-null  float64
 6   unique_id  417058 non-null  int64  
dtypes: float64(1), int64(1), object(5)
memory usage: 22.3+ MB


In [88]:
df_new = df[['unique_id','Movie','Score']]

In [89]:
df_new.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 417058 entries, 0 to 417057
Data columns (total 3 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   unique_id  417058 non-null  int64  
 1   Movie      417058 non-null  object 
 2   Score      417058 non-null  float64
dtypes: float64(1), int64(1), object(1)
memory usage: 9.5+ MB


In [90]:
clean_df_new = df_new.dropna()
clean_df_new.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 417058 entries, 0 to 417057
Data columns (total 3 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   unique_id  417058 non-null  int64  
 1   Movie      417058 non-null  object 
 2   Score      417058 non-null  float64
dtypes: float64(1), int64(1), object(1)
memory usage: 9.5+ MB


In [91]:
clean_df_new.head(5)

Unnamed: 0,unique_id,Movie,Score
0,1000,HOTEL TRANSYLVANIA: TRANSFORMANIA,40.0
1,1000,HOTEL TRANSYLVANIA: TRANSFORMANIA,30.0
2,1000,HOTEL TRANSYLVANIA: TRANSFORMANIA,75.0
3,1000,HOTEL TRANSYLVANIA: TRANSFORMANIA,60.0
4,1000,HOTEL TRANSYLVANIA: TRANSFORMANIA,74.0


## Bandit Class for Movie Recommendations

The following class implements an epsilon-greedy algorithm for a K-arm bandit model to recommend movies.

In [92]:
class MovieRecommendationBandit:
    def __init__(self, df: pd.DataFrame, epsilon: float = 0.1):
        self.df = df
        self.epsilon = epsilon
        self.movie_ids = df['unique_id'].unique()
        self.n_arms = len(self.movie_ids)
        self.counts = {movie_id: 0 for movie_id in self.movie_ids}
        self.values = {movie_id: 0.0 for movie_id in self.movie_ids}

    def select_arm(self) -> int:
        if np.random.random() < self.epsilon:
            return np.random.choice(self.movie_ids)
        else:
            return max(self.values, key=self.values.get)

    def update(self, movie_id: int, reward: float):
        self.counts[movie_id] += 1
        n = self.counts[movie_id]
        value = self.values[movie_id]
        new_value = ((n - 1) / n) * value + (1 / n) * reward
        self.values[movie_id] = new_value

    def recommend(self, n_recommendations: int = 5) -> List[Tuple[int, str, float]]:
        recommendations = []
        for _ in range(n_recommendations):
            movie_id = self.select_arm()
            movie = self.df[self.df['unique_id'] == movie_id].iloc[0]
            recommendations.append((movie_id, movie['Movie'], self.values[movie_id]))
            self.update(movie_id, movie['Score'])
        return recommendations

## Testing the Recommendation System

Initialize the recommender and retrieve recommendations.

In [93]:
recommender = MovieRecommendationBandit(df)

def get_recommendations(n_recommendations: int = 5):
    recommendations = recommender.recommend(n_recommendations)
    print(f"Top {n_recommendations} Recommendations:")
    for i, (movie_id, title, estimated_rating) in enumerate(recommendations, 1):
        print(f"{i}. {title} (ID: {movie_id}, Estimated Rating: {estimated_rating:.2f})")


In [94]:
get_recommendations(5)

Top 5 Recommendations:
1. HOTEL TRANSYLVANIA: TRANSFORMANIA (ID: 1000, Estimated Rating: 0.00)
2. HOTEL TRANSYLVANIA: TRANSFORMANIA (ID: 1000, Estimated Rating: 40.00)
3. HOTEL TRANSYLVANIA: TRANSFORMANIA (ID: 1000, Estimated Rating: 40.00)
4. HOTEL TRANSYLVANIA: TRANSFORMANIA (ID: 1000, Estimated Rating: 40.00)
5. HOTEL TRANSYLVANIA: TRANSFORMANIA (ID: 1000, Estimated Rating: 40.00)


## Displaying Recommendations

Run get_recommendations multiple times to simulate interactions and see how recommendations evolve.

In [95]:
for _ in range(100):
    get_recommendations(1)

Top 1 Recommendations:
1. HOTEL TRANSYLVANIA: TRANSFORMANIA (ID: 1000, Estimated Rating: 40.00)
Top 1 Recommendations:
1. HOTEL TRANSYLVANIA: TRANSFORMANIA (ID: 1000, Estimated Rating: 40.00)
Top 1 Recommendations:
1. HOTEL TRANSYLVANIA: TRANSFORMANIA (ID: 1000, Estimated Rating: 40.00)
Top 1 Recommendations:
1. THE BABADOOK (ID: 5670, Estimated Rating: 0.00)
Top 1 Recommendations:
1. THE BABADOOK (ID: 5670, Estimated Rating: 79.00)
Top 1 Recommendations:
1. THE BABADOOK (ID: 5670, Estimated Rating: 79.00)
Top 1 Recommendations:
1. THE BABADOOK (ID: 5670, Estimated Rating: 79.00)
Top 1 Recommendations:
1. THE BABADOOK (ID: 5670, Estimated Rating: 79.00)
Top 1 Recommendations:
1. THE BABADOOK (ID: 5670, Estimated Rating: 79.00)
Top 1 Recommendations:
1. ILO ILO (ID: 6142, Estimated Rating: 0.00)
Top 1 Recommendations:
1. THE BABADOOK (ID: 5670, Estimated Rating: 79.00)
Top 1 Recommendations:
1. THE BABADOOK (ID: 5670, Estimated Rating: 79.00)
Top 1 Recommendations:
1. THE BABADOOK (ID: 

In [96]:
print("\nFinal Recommendations after 100 interactions:")
get_recommendations(5)


Final Recommendations after 100 interactions:
Top 5 Recommendations:
1. THE BABADOOK (ID: 5670, Estimated Rating: 79.00)
2. THE BABADOOK (ID: 5670, Estimated Rating: 79.00)
3. THE BABADOOK (ID: 5670, Estimated Rating: 79.00)
4. THE BABADOOK (ID: 5670, Estimated Rating: 79.00)
5. THE BABADOOK (ID: 5670, Estimated Rating: 79.00)
