# RatingBased Models Prediction

In this Notebook, predictions will be made Both RatedOnly and UnratedSample

Here, it will be exemplified how to make predictions with the Regression Model.

However, it would be useful to ask the why question before asking the how question. In this case,

# Why Should A Prediction Be Made For Rating Based Datasets?

As mentioned before, Rating Based datasets contains a continues rating value between [0, 1] for each interacted pair. This provides best valid criterion for likes. This means that these predictions will be directly based on users' tastes.

These models can provide a better experience for users, as they aim to recommend movies that users might like. Users who are satisfied with their experience will not lose their motivation to continue using the system. Especially when recommending products such as movies that will allow users to have a good time, it will be a great advantage if the recommendations are based on the tastes of the users rather than needs of users or the system


# How To Make Predictions With Rating Based Models?

Since Datasets have continues rating value for each interacted pair, Regression Model is used. Regression Models has an output perceptrons that produces a result with a sigmoid activation function.  I.E. models' predictions will contain decimals in the range [0, 1]. As the movies that are most likely to Like the user will be recommended. The relevant user will be given as input to the regression model with all movies separately. The desired number of movies that are closest to the 1.0(Highest Rating) result will be suggested to the user, excluding the movies that the user has watched.


In [1]:
#Importing libraries
import numpy as np
import pandas as pd
import tensorflow as tf
import warnings

In [2]:
#Printing library versions
print('numpy Version: ' + np.__version__)
print('pandas Version: ' + pd.__version__)
print('tensorflow Version: ' + tf.__version__)

numpy Version: 1.16.5
pandas Version: 0.25.1
tensorflow Version: 2.0.0


In [3]:
#Loading Rating raw data and Movie Raw data from pkl file
ratingDf = pd.read_pickle('../Data/pkl/1M/RawData/Rating.pkl')
movieDf = pd.read_pickle('../Data/pkl/1M/RawData/Movie.pkl')

#Users to be used for predictions are determined
#Numbers are randomly selected
#Unnecessary information : First of the numbers by my mother and the other by me
users = [905, 13]

#Determining Pool Size 
recommendedPoolSize = 25

#Determining recommendation Size
recommendationSize = 5

#Returns Top poolSize movie for given userId as dataFrame
def GetRecommendedPool(userId, model, poolSize):

    #Getting User interactions over movies
    ratedMoviesByUser = ratingDf[ratingDf['UserId'] == userId]['MovieId'].values

    #Combining Full userId numpy array that has shape (movieSize, ) and
    #a numpy array that include all movieId's 
    #For create input for model
    #Model will predict a user's Ratings for all movies, after that already interacted movies by user will be delete
    #Then top poolSize movie will be selected and returned
    predictInput = [np.full((movieDf.shape[0]), userId, dtype=int), movieDf['MovieId'].values]

    #Model Predict
    predictList = model.predict(x = predictInput)

    #Creating Dataframe with MovieId and Prediction that contains user's Ratings for movies
    resultDf = pd.DataFrame({'MovieId' : movieDf['MovieId'].values, 'Prediction' : predictList.reshape(-1)})

    #The (poolSize) movies with the highest Prediction score (Closest to 1.0)
    #Are selected from the movies that are not interacted by the user
    resultDf = resultDf[~resultDf['MovieId'].isin(ratedMoviesByUser)].nlargest(poolSize, 'Prediction')

    #Merging resultDf and movieNames on movieId for add resultDf movie Titles
    resultDf = pd.merge(resultDf, movieDf, on='MovieId', how='left').reset_index(drop = True)

    return resultDf

#Returns Top (recommendedMovieSize) Recommended Movies From TopRecommendedDf as a String
def TopNMovie(TopRecommendedDf, recommendedMovieSize):
    result = ''
    
    counter = 1
    for index, row in TopRecommendedDf[:recommendedMovieSize].iterrows():
        result += str(counter) + ") " + row['Title'] + '\r\n'
        counter += 1

    return result

#Returns Random (recommendedMovieSize) Movies From TopRecommendedDf as a String
#This prevents always recommending the same movies to the user in the same order.
#For this, Fitness Proportionate Selection(Roulette Wheel Selection) method can also be used.
#see https://en.wikipedia.org/wiki/Fitness_proportionate_selection
#For this,Tournament selection method can also be used.
#see https://en.wikipedia.org/wiki/Tournament_selection
def TopNRandom(TopRecommendedDf, recommendedMovieSize):
    result = ''

    counter = 1
    for index, row in TopRecommendedDf.sample(n = recommendedMovieSize).iterrows():
        result += str(counter) + ") " + row['Title'] + '\r\n'
        counter += 1

    return result

## RatedOnlyModel Prediction

This model is trained with the RatingBased RatedOnly Dataset. This dataset contains a continues rating value between [0, 1] for each interacted pair(See Training7 notebook for more information).

In [4]:
#ignore warnings due to Converting sparse IndexedSlices to a dense Tensor of unknown shape warning
warnings.filterwarnings('ignore')

#Best Model For RatingBased RatedOnly dataset loading from h5 file
#See Training7 notebook for more information
model = tf.keras.models.load_model("../Model/RatedOnlyModel/Model15.h5")

#Loading ModelTable and LookupTable from pkl file
#These tables will provide more information about the reliability of the predictions
#See PredictPreparation7 notebook for more information
modelTable = pd.read_pickle('../PredictData/ModelTable/RatedOnly.pkl')
lookupTable = pd.read_pickle('../PredictData/LookupTable/RatedOnly.pkl')

In [5]:
#Printing ModelTable for information
modelTable

Unnamed: 0,ModelData,Loss,Mse,Rmse,Msle,Mae,Mape
0,Training,0.028725,0.028634,0.169215,0.012069,0.126442,5789873.5
1,Validation,0.036957,0.036867,0.192007,0.015573,0.144817,7129178.5
2,Test,0.037141,0.037051,0.192486,0.015663,0.145324,7102044.0


In [6]:
#Printing first user's representation numbers and Mean Squared Error information from lookup table
#The table shows the reliability of the prediction for first user
lookupTable.iloc[users[0]]

User             905.00000
TraingingRep       2.00000
TrainingMSE        0.01399
ValidationRep      0.00000
ValidationMSE          NaN
TestRep            1.00000
TestMSE            0.00136
AllRep             3.00000
AllMSE             0.00978
Name: 905, dtype: float64

In [7]:
#According to the prediction results of the RatedOnlyModel,
#The first 25 movies recommended to the first user are calculated and stored as dataframe.
#This dataframe can be used repeatedly for movie recommendation to the first user.
#This prevent prediction repeatedly for each set of movies recommended to a user in recommendation system
pool = GetRecommendedPool(users[0], model, recommendedPoolSize)
pool

Unnamed: 0,MovieId,Prediction,Title
0,6107,0.986484,Ivan Vasilievich: Back to the Future (Ivan Vas...
1,6095,0.978646,Gentlemen of Fortune (Dzhentlmeny udachi) (1972)
2,15371,0.975409,Shoah (1985)
3,6684,0.972694,The Adventures of Sherlock Holmes and Doctor W...
4,8461,0.968305,Planet Earth II (2016)
5,10852,0.960202,If Only (2004)
6,12048,0.960075,"Diamond Arm, The (Brilliantovaya ruka) (1968)"
7,7931,0.956375,Kenny (2006)
8,6597,0.955681,What Men Talk About (2010)
9,12075,0.955642,Don't Look Now: We're Being Shot At (La grande...


In [8]:
#Most recommended movies
print(TopNMovie(pool, recommendationSize))

1) Ivan Vasilievich: Back to the Future (Ivan Vasilievich menyaet professiyu) (1973)
2) Gentlemen of Fortune (Dzhentlmeny udachi) (1972)
3) Shoah (1985)
4) The Adventures of Sherlock Holmes and Doctor Watson: The Hunt for the Tiger (1980)
5) Planet Earth II (2016)



In [9]:
#Randomly selected movies from pool
#As seen in the example, with this function, different movies can be recommended to the user
#For detailed information, see the comment lines of the function.
print(TopNRandom(pool, recommendationSize))

1) Star Wars: Episode V - The Empire Strikes Back (1980)
2) Patema Inverted (2013)
3) Red and the White, The (Csillagosok, katonák) (1967)
4) Atalante, L' (1934)
5) Cinderella (1997)



In [10]:
print(TopNRandom(pool, recommendationSize))

1) Son's Room, The (Stanza del figlio, La) (2001)
2) Margaret's Museum (1995)
3) Gentlemen of Fortune (Dzhentlmeny udachi) (1972)
4) If Only (2004)
5) Lord of the Rings: The Return of the King, The (2003)



In [11]:
#Printing second user's representation numbers and Mean Squared Error information from lookup table
#The table shows the reliability of the prediction for second user
lookupTable.iloc[users[1]]

User              13.000000
TraingingRep     125.000000
TrainingMSE        0.006530
ValidationRep     23.000000
ValidationMSE      0.016269
TestRep           26.000000
TestMSE            0.005393
AllRep           174.000000
AllMSE             0.007648
Name: 13, dtype: float64

In [12]:
#According to the prediction results of the RatedOnlyModel,
#The second 25 movies recommended to the second user are calculated and stored as dataframe.
#This dataframe can be used repeatedly for movie recommendation to the second user.
#This prevent prediction repeatedly for each set of movies recommended to a user in recommendation system
pool = GetRecommendedPool(users[1], model, recommendedPoolSize)
pool

Unnamed: 0,MovieId,Prediction,Title
0,10850,0.971989,Cosmos: A Spacetime Odissey
1,7533,0.965129,Band of Brothers (2001)
2,6616,0.964532,Puella Magi Madoka Magica the Movie Part I: Be...
3,10244,0.962851,"Trouble with Angels, The (1966)"
4,8461,0.960614,Planet Earth II (2016)
5,15371,0.958868,Shoah (1985)
6,15056,0.958004,"Night in Casablanca, A (1946)"
7,12617,0.95514,Rick and Morty: State of Georgia Vs. Denver Fe...
8,6107,0.953472,Ivan Vasilievich: Back to the Future (Ivan Vas...
9,8463,0.952721,Death Note: Desu nôto (2006–2007)


In [13]:
#Most recommended movies
print(TopNMovie(pool, recommendationSize))

1) Cosmos: A Spacetime Odissey
2) Band of Brothers (2001)
3) Puella Magi Madoka Magica the Movie Part I: Beginnings (2012)
4) Trouble with Angels, The (1966)
5) Planet Earth II (2016)



In [14]:
#Randomly selected movies from pool
#As seen in the example, with this function, different movies can be recommended to the user
#For detailed information, see the comment lines of the function.
print(TopNRandom(pool, recommendationSize))

1) Cosmos
2) Black Mirror
3) Puella Magi Madoka Magica the Movie Part I: Beginnings (2012)
4) Jimmy Carr: Making People Laugh (2010)
5) Night in Casablanca, A (1946)



In [15]:
print(TopNRandom(pool, recommendationSize))

1) Night in Casablanca, A (1946)
2) Mother and Child (2009)
3) Cosmos
4) 12 Chairs (1971)
5) Ivan Vasilievich: Back to the Future (Ivan Vasilievich menyaet professiyu) (1973)



## UnratedSampleModel Prediction

This model is trained with the RatingBased UnratedSample Dataset. This dataset contains a continues rating value between [0, 1] for each interacted pair and a small sample of unrated pairs rated as 0.0(See Training8 notebook for more information).

In [16]:
#Best Model For RatingBased UnratedSample dataset loading from h5 file
#See Training8 notebook for more information
model = tf.keras.models.load_model("../Model/UnratedSampleModel/Model9.h5")

#Loading ModelTable and LookupTable from pkl file
#These tables will provide more information about the reliability of the predictions
#See PredictPreparation8 notebook for more information
modelTable = pd.read_pickle('../PredictData/ModelTable/UnratedSample.pkl')
lookupTable = pd.read_pickle('../PredictData/LookupTable/UnratedSample.pkl')

In [17]:
#Printing ModelTable for information
modelTable

Unnamed: 0,ModelData,Loss,Mse,Rmse,Msle,Mae,Mape
0,Training,0.025243,0.025198,0.15874,0.0103,0.118217,3273819.75
1,Validation,0.033959,0.033915,0.18416,0.014048,0.136753,4935205.5
2,Test,0.033638,0.033594,0.183286,0.013907,0.136222,4838788.5


In [18]:
#Printing first user's representation numbers and Mean Squared Error information from lookup table
#The table shows the reliability of the prediction for first user
lookupTable.iloc[users[0]]

User             905.000000
TraingingRep       5.000000
TrainingMSE        0.065296
ValidationRep      0.000000
ValidationMSE           NaN
TestRep            0.000000
TestMSE                 NaN
AllRep             5.000000
AllMSE             0.065296
Name: 905, dtype: float64

In [19]:
#According to the prediction results of the UnratedSampleModel,
#The first 25 movies recommended to the first user are calculated and stored as dataframe.
#This dataframe can be used repeatedly for movie recommendation to the first user.
#This prevent prediction repeatedly for each set of movies recommended to a user in recommendation system
pool = GetRecommendedPool(users[0], model, recommendedPoolSize)
pool

Unnamed: 0,MovieId,Prediction,Title
0,6947,0.988059,From Justin to Kelly (2003)
1,15889,0.971369,Die! Die! My Darling! (Fanatic) (1965)
2,12850,0.966814,Myra Breckinridge (1970)
3,15368,0.964247,Sergeants 3 (1962)
4,20844,0.959079,Tropicália (2012)
5,18290,0.958615,Jimmy Carr: Comedian (2007)
6,15627,0.9541,I Know a Woman Like That (2009)
7,11393,0.953406,Annie Get Your Gun (1967)
8,15446,0.952785,Enigma (1983)
9,15517,0.951666,This is Bob Hope... (2017)


In [20]:
#Most recommended movies
print(TopNMovie(pool, recommendationSize))

1) From Justin to Kelly (2003)
2) Die! Die! My Darling! (Fanatic) (1965)
3) Myra Breckinridge (1970)
4) Sergeants 3 (1962)
5) Tropicália (2012)



In [21]:
#Randomly selected movies from pool
#As seen in the example, with this function, different movies can be recommended to the user
#For detailed information, see the comment lines of the function.
print(TopNRandom(pool, recommendationSize))

1) Incendiary (2008)
2) Teenage Caveman (2002)
3) Son of the Mask (2005)
4) My House in Umbria (2003)
5) Battle of Los Angeles (2011)



In [22]:
print(TopNRandom(pool, recommendationSize))

1) Unbeatable (Ji zhan) (2013)
2) Tale of Tales (2015)
3) Enigma (1983)
4) Myra Breckinridge (1970)
5) This is Bob Hope... (2017)



In [23]:
#Printing second user's representation numbers and Mean Squared Error information from lookup table
#The table shows the reliability of the prediction for second user
lookupTable.iloc[users[1]]

User              13.000000
TraingingRep     131.000000
TrainingMSE        0.006175
ValidationRep     26.000000
ValidationMSE      0.006305
TestRep           18.000000
TestMSE            0.004073
AllRep           175.000000
AllMSE             0.005978
Name: 13, dtype: float64

In [24]:
#According to the prediction results of the UnratedSampleModel,
#The second 25 movies recommended to the second user are calculated and stored as dataframe.
#This dataframe can be used repeatedly for movie recommendation to the second user.
#This prevent prediction repeatedly for each set of movies recommended to a user in recommendation system
pool = GetRecommendedPool(users[1], model, recommendedPoolSize)
pool

Unnamed: 0,MovieId,Prediction,Title
0,10850,0.957108,Cosmos: A Spacetime Odissey
1,15056,0.944261,"Night in Casablanca, A (1946)"
2,15371,0.943624,Shoah (1985)
3,1386,0.941989,Hellsing Ultimate OVA Series (2006)
4,3155,0.938403,"Unvanquished, The (Aparajito) (1957)"
5,5257,0.937016,Your Name. (2016)
6,7533,0.936471,Band of Brothers (2001)
7,15956,0.935918,Electric Dreams (1984)
8,11534,0.926764,Ghost in the Shell Arise - Border 4: Ghost Sta...
9,2135,0.926077,Anne of Green Gables (1985)


In [25]:
#Most recommended movies
print(TopNMovie(pool, recommendationSize))

1) Cosmos: A Spacetime Odissey
2) Night in Casablanca, A (1946)
3) Shoah (1985)
4) Hellsing Ultimate OVA Series (2006)
5) Unvanquished, The (Aparajito) (1957)



In [26]:
#Randomly selected movies from pool
#As seen in the example, with this function, different movies can be recommended to the user
#For detailed information, see the comment lines of the function.
print(TopNRandom(pool, recommendationSize))

1) Band of Brothers (2001)
2) Cosmos: A Spacetime Odissey
3) Anne of Green Gables (1985)
4) I Am David (2003)
5) Planet Ocean (2012)



In [27]:
print(TopNRandom(pool, recommendationSize))

1) Firemen's Ball, The (Horí, má panenko) (1967)
2) Night in Casablanca, A (1946)
3) My Mother's Castle (Château de ma mère, Le) (1990)
4) Electric Dreams (1984)
5) Hellsing Ultimate OVA Series (2006)

