# BinaryLike Models Prediction

In this Notebook, predictions will be made Both ObservedOnlyModel and UnobservedSampleModel

Here, it will be exemplified how to make predictions with the Binary Classification Model.

However, it would be useful to ask the why question before asking the how question. In this case,

# Why Should A Prediction Be Made For Binary Like Datasets?

As mentioned before, Binary Like datasets contains labeled rows [0, 1] - ['Not Liked', 'Liked'] pairs. According to raw rating data, ratings below 3.0 are labeled as not liked, ratings 3.0 and above are labeled as liked. This provides a simple valid criterion for likes. This means that these predictions will be based on users' tastes, albeit at a very basic level.

These models can provide a better experience for users, as they aim to recommend movies that users might like. Users who are satisfied with their experience will not lose their motivation to continue using the system. Especially when recommending products such as movies that will allow users to have a good time, it will be a great advantage if the recommendations are based on the tastes of the users rather than needs of users or the system


# How To Make Predictions With Binary Like Models?

Since Datasets have 2 labels [0, 1] - ['Not Liked', 'Liked'] pairs, Binary Classification Model is used. Binary Classification Models has an output perceptron that produces a result with a sigmoid activation function. I.E. models' predictions will contain decimals in the range [0, 1]. As the movies that are most likely to Like the user will be recommended. The relevant user will be given as input to the classification model with all movies separately. The desired number of movies that are closest to the 1.0 result will be suggested to the user, excluding the movies that the user has watched.


In [1]:
#Importing libraries
import numpy as np
import pandas as pd
import tensorflow as tf
import warnings

In [2]:
#Printing library versions
print('numpy Version: ' + np.__version__)
print('pandas Version: ' + pd.__version__)
print('tensorflow Version: ' + tf.__version__)

numpy Version: 1.16.5
pandas Version: 0.25.1
tensorflow Version: 2.0.0


In [3]:
#Loading Rating raw data and Movie Raw data from pkl file
ratingDf = pd.read_pickle('../Data/pkl/1M/RawData/Rating.pkl')
movieDf = pd.read_pickle('../Data/pkl/1M/RawData/Movie.pkl')

#Users to be used for predictions are determined
#Numbers are randomly selected
#Unnecessary information : First of the numbers by my mother and the other by me
users = [905, 13]

#Determining Pool Size 
recommendedPoolSize = 25

#Determining recommendation Size
recommendationSize = 5

#Returns Top poolSize movie for given userId as dataFrame
def GetRecommendedPool(userId, model, poolSize):

    #Getting User interactions over movies
    ratedMoviesByUser = ratingDf[ratingDf['UserId'] == userId]['MovieId'].values

    #Combining Full userId numpy array that has shape (movieSize, ) and
    #a numpy array that include all movieId's 
    #For create input for model
    #Model will predict a user's Likes for all movies, after that already interacted movies by user will be delete
    #Then top poolSize movie will be selected and returned
    predictInput = [np.full((movieDf.shape[0]), userId, dtype=int), movieDf['MovieId'].values]

    #Model Predict
    predictList = model.predict(x = predictInput)

    #Creating Dataframe with MovieId and Prediction that contains user's Likes for movies
    resultDf = pd.DataFrame({'MovieId' : movieDf['MovieId'].values, 'Prediction' : predictList.reshape(-1)})

    #The (poolSize) movies with the highest Prediction score (Closest to 1.0)
    #Are selected from the movies that are not interacted by the user
    resultDf = resultDf[~resultDf['MovieId'].isin(ratedMoviesByUser)].nlargest(poolSize, 'Prediction')

    #Merging resultDf and movieNames on movieId for add resultDf movie Titles
    resultDf = pd.merge(resultDf, movieDf, on='MovieId', how='left').reset_index(drop = True)

    return resultDf

#Returns Top (recommendedMovieSize) Recommended Movies From TopRecommendedDf as a String
def TopNMovie(TopRecommendedDf, recommendedMovieSize):
    result = ''
    
    counter = 1
    for index, row in TopRecommendedDf[:recommendedMovieSize].iterrows():
        result += str(counter) + ") " + row['Title'] + '\r\n'
        counter += 1

    return result

#Returns Random (recommendedMovieSize) Movies From TopRecommendedDf as a String
#This prevents always recommending the same movies to the user in the same order.
#For this, Fitness Proportionate Selection(Roulette Wheel Selection) method can also be used.
#see https://en.wikipedia.org/wiki/Fitness_proportionate_selection
#For this,Tournament selection method can also be used.
#see https://en.wikipedia.org/wiki/Tournament_selection
def TopNRandom(TopRecommendedDf, recommendedMovieSize):
    result = ''

    counter = 1
    for index, row in TopRecommendedDf.sample(n = recommendedMovieSize).iterrows():
        result += str(counter) + ") " + row['Title'] + '\r\n'
        counter += 1

    return result

## ObservedOnlyModel Prediction

This model is trained with the BinaryLike ObservedOnly Dataset. This dataset contains labeled rows [0, 1] - ['Not Liked', 'Liked'] pairs (See Training3 notebook for more information).

In [4]:
#ignore warnings due to Converting sparse IndexedSlices to a dense Tensor of unknown shape warning
warnings.filterwarnings('ignore')

#Best Model For BinaryLike ObservedOnly dataset loading from h5 file
#See Training3 notebook for more information
model = tf.keras.models.load_model("../Model/ObservedOnlyModel/Model9.h5")

#Loading ModelTable and LookupTable from pkl file
#These tables will provide more information about the reliability of the predictions
#See PredictPreparation3 notebook for more information
modelTable = pd.read_pickle('../PredictData/ModelTable/ObservedOnly.pkl')
lookupTable = pd.read_pickle('../PredictData/LookupTable/ObservedOnly.pkl')

In [5]:
#Printing ModelTable for information
modelTable

Unnamed: 0,ModelData,Negative,Positive,Loss,TP,FP,TN,FN,Accuracy
0,Training,136798,616756,0.298054,596475,73934,62864,20281,0.874972
1,Validation,22165,101504,0.375304,96258,14130,8035,5246,0.843324
2,Test,22207,101463,0.374106,96340,14155,8052,5123,0.844117


In [6]:
#Printing first user's representation numbers and correct classification information from lookup table
#The table shows the reliability of the prediction for first user
lookupTable.iloc[users[0]]

User                 905
TraingingRep           2
TrainingCorrect        2
ValidationRep          0
ValidationCorrect      0
TestRep                1
TestCorrect            0
AllRep                 3
AllCorrect             2
Name: 905, dtype: int64

In [7]:
#According to the prediction results of the ObservedOnlyModel,
#The first 25 movies recommended to the first user are calculated and stored as dataframe.
#This dataframe can be used repeatedly for movie recommendation to the first user.
#This prevent prediction repeatedly for each set of movies recommended to a user in recommendation system
pool = GetRecommendedPool(users[0], model, recommendedPoolSize)
pool

Unnamed: 0,MovieId,Prediction,Title
0,4207,0.983531,Widows' Peak (1994)
1,2060,0.981522,Bicycle Thieves (a.k.a. The Bicycle Thief) (a....
2,2983,0.97912,"Cure, The (1995)"
3,8000,0.978546,"Court Jester, The (1956)"
4,1805,0.978482,Laura (1944)
5,5373,0.977598,It Happened One Night (1934)
6,3012,0.975161,Meet John Doe (1941)
7,1273,0.972279,"Adventures of Robin Hood, The (1938)"
8,4197,0.971712,"Chamber, The (1996)"
9,8094,0.968837,Niagara (1953)


In [8]:
#Most recommended movies
print(TopNMovie(pool, recommendationSize))

1) Widows' Peak (1994)
2) Bicycle Thieves (a.k.a. The Bicycle Thief) (a.k.a. The Bicycle Thieves) (Ladri di biciclette) (1948)
3) Cure, The (1995)
4) Court Jester, The (1956)
5) Laura (1944)



In [9]:
#Randomly selected movies from pool
#As seen in the example, with this function, different movies can be recommended to the user
#For detailed information, see the comment lines of the function.
print(TopNRandom(pool, recommendationSize))

1) North by Northwest (1959)
2) Laura (1944)
3) Cure, The (1995)
4) Ghosts of Mississippi (1996)
5) Widows' Peak (1994)



In [10]:
print(TopNRandom(pool, recommendationSize))

1) Court Jester, The (1956)
2) Chamber, The (1996)
3) Guys and Dolls (1955)
4) Widows' Peak (1994)
5) Shadowlands (1993)



In [11]:
#Printing second user's representation numbers and correct classification information from lookup table
#The table shows the reliability of the prediction for second user
lookupTable.iloc[users[1]]

User                  13
TraingingRep         138
TrainingCorrect      136
ValidationRep         17
ValidationCorrect     17
TestRep               19
TestCorrect           19
AllRep               174
AllCorrect           172
Name: 13, dtype: int64

In [12]:
#According to the prediction results of the ObservedOnlyModel,
#The second 25 movies recommended to the second user are calculated and stored as dataframe.
#This dataframe can be used repeatedly for movie recommendation to the second user.
#This prevent prediction repeatedly for each set of movies recommended to a user in recommendation system
pool = GetRecommendedPool(users[1], model, recommendedPoolSize)
pool

Unnamed: 0,MovieId,Prediction,Title
0,1637,1.0,Von Ryan's Express (1965)
1,1805,1.0,Laura (1944)
2,4282,1.0,Murder on the Orient Express (1974)
3,6761,1.0,Stalingrad (1993)
4,9169,1.0,The Post (2017)
5,11658,1.0,Green Lantern: First Flight (2009)
6,8513,0.999999,Frequently Asked Questions About Time Travel (...
7,796,0.999999,Letters from Iwo Jima (2006)
8,911,0.999999,Strangers on a Train (1951)
9,1091,0.999998,"Girl with the Dragon Tattoo, The (Män som hata..."


In [13]:
#Most recommended movies
print(TopNMovie(pool, recommendationSize))

1) Von Ryan's Express (1965)
2) Laura (1944)
3) Murder on the Orient Express (1974)
4) Stalingrad (1993)
5) The Post (2017)



In [14]:
#Randomly selected movies from pool
#As seen in the example, with this function, different movies can be recommended to the user
#For detailed information, see the comment lines of the function.
print(TopNRandom(pool, recommendationSize))

1) Murder on the Orient Express (1974)
2) Pale Rider (1985)
3) Sidewalks of New York (2001)
4) Christmas Carol, A (1938)
5) And Then There Were None (1945)



In [15]:
print(TopNRandom(pool, recommendationSize))

1) The Post (2017)
2) State of Play (2009)
3) Von Ryan's Express (1965)
4) Revolver (2005)
5) Letters from Iwo Jima (2006)



## UnobservedSampleModel Prediction

This model is trained with the BinaryLike UnobservedSample Dataset. This dataset contains labeled rows [0, 1] - ['Not Liked', 'Liked'] pairs and a small sample of not observed pairs labeled as not liked (See Training4 notebook for more information).


In [16]:
#Best Model For BinaryLike UnobservedSample dataset loading from h5 file
#See Training4 notebook for more information
model = tf.keras.models.load_model("../Model/UnobservedSampleModel/Model6.h5")

#Loading ModelTable and LookupTable from pkl file
#These tables will provide more information about the reliability of the predictions
#See PredictPreparation4 notebook for more information
modelTable = pd.read_pickle('../PredictData/ModelTable/UnobservedSample.pkl')
lookupTable = pd.read_pickle('../PredictData/LookupTable/UnobservedSample.pkl')

In [17]:
#Printing ModelTable for information
modelTable

Unnamed: 0,ModelData,Negative,Positive,Loss,TP,FP,TN,FN,Accuracy
0,Training,144600,616153,0.313082,593015,74714,69886,23138,0.871375
1,Validation,23138,101936,0.385071,96009,14225,8913,5927,0.838879
2,Test,23440,101634,0.387543,95670,14416,9024,5964,0.837056


In [18]:
#Printing first user's representation numbers and correct classification information from lookup table
#The table shows the reliability of the prediction for first user
lookupTable.iloc[users[0]]

User                 905
TraingingRep           5
TrainingCorrect        5
ValidationRep          0
ValidationCorrect      0
TestRep                2
TestCorrect            1
AllRep                 7
AllCorrect             6
Name: 905, dtype: int64

In [19]:
#According to the prediction results of the UnobservedSampleModel,
#The first 25 movies recommended to the first user are calculated and stored as dataframe.
#This dataframe can be used repeatedly for movie recommendation to the first user.
#This prevent prediction repeatedly for each set of movies recommended to a user in recommendation system
pool = GetRecommendedPool(users[0], model, recommendedPoolSize)
pool

Unnamed: 0,MovieId,Prediction,Title
0,2989,0.944068,Man of the House (1995)
1,7183,0.934238,Leatherface: Texas Chainsaw Massacre III (1990)
2,1487,0.924031,Leonard Part 6 (1987)
3,4758,0.919242,Dunston Checks In (1996)
4,8306,0.916119,Curly Sue (1991)
5,4296,0.912362,Alone in the Dark (2005)
6,4041,0.912103,Glitter (2001)
7,1239,0.909909,Bulletproof (1996)
8,3634,0.909755,Leprechaun (1993)
9,1991,0.907215,American Ninja 2: The Confrontation (1987)


In [20]:
#Most recommended movies
print(TopNMovie(pool, recommendationSize))

1) Man of the House (1995)
2) Leatherface: Texas Chainsaw Massacre III (1990)
3) Leonard Part 6 (1987)
4) Dunston Checks In (1996)
5) Curly Sue (1991)



In [21]:
#Randomly selected movies from pool
#As seen in the example, with this function, different movies can be recommended to the user
#For detailed information, see the comment lines of the function.
print(TopNRandom(pool, recommendationSize))

1) Legally Blonde 2: Red, White & Blonde (2003)
2) It Takes Two (1995)
3) Friday the 13th Part 3: 3D (1982)
4) Iron Eagle II (1988)
5) Leonard Part 6 (1987)



In [22]:
print(TopNRandom(pool, recommendationSize))

1) It Takes Two (1995)
2) Leonard Part 6 (1987)
3) Bulletproof (1996)
4) Solo (1996)
5) Employee of the Month (2006)



In [23]:
#Printing second user's representation numbers and correct classification information from lookup table
#The table shows the reliability of the prediction for second user
lookupTable.iloc[users[1]]

User                  13
TraingingRep         121
TrainingCorrect      120
ValidationRep         28
ValidationCorrect     28
TestRep               27
TestCorrect           26
AllRep               176
AllCorrect           174
Name: 13, dtype: int64

In [24]:
#According to the prediction results of the UnobservedSampleModel,
#The second 25 movies recommended to the second user are calculated and stored as dataframe.
#This dataframe can be used repeatedly for movie recommendation to the second user.
#This prevent prediction repeatedly for each set of movies recommended to a user in recommendation system
pool = GetRecommendedPool(users[1], model, recommendedPoolSize)
pool

Unnamed: 0,MovieId,Prediction,Title
0,855,1.0,Shadowlands (1993)
1,1731,1.0,Tender Mercies (1983)
2,1805,1.0,Laura (1944)
3,6314,1.0,"Boys of St. Vincent, The (1992)"
4,6427,1.0,Murphy's Romance (1985)
5,8912,1.0,"Night to Remember, A (1958)"
6,1758,1.0,Moneyball (2011)
7,3136,1.0,"Next Three Days, The (2010)"
8,4207,1.0,Widows' Peak (1994)
9,1801,1.0,Some Folks Call It a Sling Blade (1993)


In [25]:
#Most recommended movies
print(TopNMovie(pool, recommendationSize))

1) Shadowlands (1993)
2) Tender Mercies (1983)
3) Laura (1944)
4) Boys of St. Vincent, The (1992)
5) Murphy's Romance (1985)



In [26]:
#Randomly selected movies from pool
#As seen in the example, with this function, different movies can be recommended to the user
#For detailed information, see the comment lines of the function.
print(TopNRandom(pool, recommendationSize))

1) Sully (2016)
2) Christmas Carol, A (1938)
3) Spy Who Loved Me, The (1977)
4) Night to Remember, A (1958)
5) Widows' Peak (1994)



In [27]:
print(TopNRandom(pool, recommendationSize))

1) Night to Remember, A (1958)
2) Murphy's Romance (1985)
3) Murder on the Orient Express (1974)
4) Moll Flanders (1996)
5) Widows' Peak (1994)

