# Benchmark calculation

## Idea behind benchmark

The model takes info about user and movie as input. \
Based on this data, model predicts if user will like this movie or not. \
So in the end we can tell which movies user will like. \
The idea about benchmark evaluation is following: \
    1. We take data about ratings of each user from initial dataset \
    2. Then we evaluate all movies for some users \
    3. We compare results: If model said that user will like the movie and in initial dataset he reviewed it and gave score >= 4 then model predicted correctly and we give it score points.
    If he gave it score < 4 than model made wrong prediction and it loses score points.
    Alike, if model said that user will dislike film and in initial dataset he gave it rating < 4 then model gains score, if original rating was >= 4 then model lose points.
    
    
## Score points calculation
Score points are calculated according to the formula score = 1 / len(marks.get(user_id)) \
This means that the more films user reviewed, the less will be penalty for mistake. The less films user reviewed, the more will be penalty.

In [1]:
import pandas as pd

# Downloading data
file_path = '/home/danila/Assignment2PMLDL/encoded_data.csv'

user_preferences = pd.read_csv(file_path)
user_preferences = user_preferences.drop(columns=['Unnamed: 0'], axis=1)

# Creating main metric
threshold = 4
user_preferences['liked'] = (user_preferences['rating'] >= threshold).astype(int)

In [2]:
user_preferences

Unnamed: 0,user_id,movie_id,rating,age,unknown,Action,Adventure,Animation,Children's,Comedy,...,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western,occupation_encoded,gender_encoded,liked
0,196,242,3,49,0,0,0,0,0,1,...,0,0,0,0,0,0,0,20,1,0
1,305,242,5,23,0,0,0,0,0,1,...,0,0,0,0,0,0,0,14,1,1
2,6,242,4,42,0,0,0,0,0,1,...,0,0,0,0,0,0,0,6,1,1
3,234,242,4,60,0,0,0,0,0,1,...,0,0,0,0,0,0,0,15,1,1
4,63,242,3,31,0,0,0,0,0,1,...,0,0,0,0,0,0,0,11,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99995,863,1679,3,17,0,0,0,0,0,0,...,0,0,1,0,1,0,0,18,1,0
99996,863,1678,1,17,0,0,0,0,0,0,...,0,0,0,0,0,0,0,18,1,0
99997,863,1680,2,17,0,0,0,0,0,0,...,0,0,1,0,0,0,0,18,1,0
99998,896,1681,3,28,0,0,0,0,0,1,...,0,0,0,0,0,0,0,20,1,0


In [3]:
from sklearn.model_selection import train_test_split

# Creating data for evaluation
X = user_preferences.drop(columns=['rating'])
y = user_preferences['liked']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=43)

In [4]:
import pickle

#Downloading model
path = "/home/danila/Assignment2PMLDL/github/Movie-Recommender-System/models/model.pkl"

with open(path, "rb") as f:
    model = pickle.load(f)

In [5]:
print(model)

LogisticRegression(max_iter=10000)


In [6]:
predictions = model.predict(X_test)

In [7]:
# Dict where model predicted rating for each film for each user
X_test_reset = X_test.reset_index(drop=True)
marks = {}

for i in range(len(X_test_reset)):
    marks[X_test_reset.loc[i, 'user_id']] = []
for i in range(len(X_test_reset)):
    marks[X_test_reset.loc[i, 'user_id']].append \
    ((X_test_reset.loc[i, 'movie_id'], predictions[i]))

In [8]:
print(len(marks))

943


In [9]:
# Dict with all films that different users liked
user_preferences_reset = user_preferences.reset_index(drop=True)
marks_initial = {}

for i in range(len(user_preferences_reset)):
    marks_initial[user_preferences_reset.loc[i, 'user_id']] = []
for i in range(len(user_preferences_reset)):
    marks_initial[user_preferences_reset.loc[i, 'user_id']].append \
    ((user_preferences_reset.loc[i, 'movie_id'], user_preferences_reset.loc[i, 'rating']))

In [10]:
print(len(marks_initial))

943


In [11]:
# Sorting dicts for convinient work
marks = dict(sorted(marks.items()))
marks_initial = dict(sorted(marks_initial.items()))

In [12]:
def calculate_benchmark(marks, marks_initial):
    score = 0

    for user_id in marks.keys():
        for (movie_id, prediction) in marks.get(user_id):
            if user_id in marks_initial.keys():
                if (movie_id, 4) in marks_initial.get(user_id) \
                    or (movie_id, 5) in marks_initial.get(user_id):
                    if prediction == 1:
                        score += 1 / len(marks.get(user_id))
                        continue
                    else:
                        score -= 1 / len(marks.get(user_id))
                        continue
                if (movie_id, 1) in marks_initial.get(user_id) \
                    or (movie_id, 2) in marks_initial.get(user_id) \
                    or (movie_id, 3) in marks_initial.get(user_id):
                    if prediction == 0:
                        score += 1 / len(marks.get(user_id))
                    else:
                        score -= 1 / len(marks.get(user_id))

    return score/len(marks.keys())

In [13]:
print(calculate_benchmark(marks, marks_initial))

0.9999999999998678


We can see that result of benchmark test is 99.9% working on 40% of all data, which means model worked almost perfectly