# BinaryInteraction Models Prediction

In this Notebook, predictions will be made Both InteractedOnlyModel and NotInteractedSampleModel

Here, it will be exemplified how to make predictions with the Binary Classification Model.

However, it would be useful to ask the why question before asking the how question. In this case, 

# Why Should A Prediction Be Made For Binary Interaction Datasets?

As mentioned before, Binary Interaction datasets do not provide a valid criterion for Like. This means that these predictions will not be based on users' tastes. So what are the predictions based on? If reasoned too hard, the predictions for this case are based on movies that will force users to interact.

Since a user doesn't have to rate every movie they watch. If users are recommended movies (they will love or hate) that will force them to rate them, their interaction with the system will be increased. It is not important whether they like the recommended movie, but whether they will rate it for the movie. For this reason, users can be directed to the movies they are most likely to interact with.

In this case, by looking at the interaction numbers of the system, it can be given the impression that there are more users than there actually are. This may lead new users to join the system they think is more preferred. However, on the other hand, since they are directed to movies they do not like, users may see the movies they watch as a waste of time and stop using the system.

# How To Make Predictions With Binary Interaction Models?

Since Datasets have 2 labels [0, 1] - ['Not Interacted', 'Interacted'] pairs, Binary Classification Model is used. Binary Classification Models has an output perceptron that produces a result with a sigmoid activation function. I.E. models' predictions will contain decimals in the range [0, 1]. As the movies that are most likely to interact the user will be recommended. The relevant user will be given as input to the classification model with all movies separately. The desired number of movies that are closest to the 1.0 result will be suggested to the user, excluding the movies that the user has watched.


In [1]:
#Importing libraries
import numpy as np
import pandas as pd
import tensorflow as tf
import warnings

In [2]:
#Printing library versions
print('numpy Version: ' + np.__version__)
print('pandas Version: ' + pd.__version__)
print('tensorflow Version: ' + tf.__version__)

numpy Version: 1.16.5
pandas Version: 0.25.1
tensorflow Version: 2.0.0


In [3]:
#Loading Rating raw data and Movie Raw data from pkl file
ratingDf = pd.read_pickle('../Data/pkl/1M/RawData/Rating.pkl')
movieDf = pd.read_pickle('../Data/pkl/1M/RawData/Movie.pkl')

#Users to be used for predictions are determined
#Numbers are randomly selected
#Unnecessary information : First of the numbers by my mother and the other by me
users = [905, 13]

#Determining Pool Size 
recommendedPoolSize = 25

#Determining recommendation Size
recommendationSize = 5

#Returns Top poolSize movie for given userId as dataFrame
def GetRecommendedPool(userId, model, poolSize):

    #Getting User interactions over movies
    ratedMoviesByUser = ratingDf[ratingDf['UserId'] == userId]['MovieId'].values

    #Combining Full userId numpy array that has shape (movieSize, ) and
    #a numpy array that include all movieId's 
    #For create input for model
    #Model will predict a user's interctions for all movies, after that already interacted movies by user will be delete
    #Then top poolSize movie will be selected and returned
    predictInput = [np.full((movieDf.shape[0]), userId, dtype=int), movieDf['MovieId'].values]

    #Model Predict
    predictList = model.predict(x = predictInput)

    #Creating Dataframe with MovieId and Prediction that contains user's intearactions for movies
    resultDf = pd.DataFrame({'MovieId' : movieDf['MovieId'].values, 'Prediction' : predictList.reshape(-1)})

    #The (poolSize) movies with the highest Prediction score (Closest to 1.0) 
    #Are selected from the movies that are not interacted by the user
    resultDf = resultDf[~resultDf['MovieId'].isin(ratedMoviesByUser)].nlargest(poolSize, 'Prediction')

    #Merging resultDf and movieNames on movieId for add resultDf movie Titles
    resultDf = pd.merge(resultDf, movieDf, on='MovieId', how='left').reset_index(drop = True)

    return resultDf

#Returns Top (recommendedMovieSize) Recommended Movies From TopRecommendedDf as a String
def TopNMovie(TopRecommendedDf, recommendedMovieSize):
    result = ''
    
    counter = 1
    for index, row in TopRecommendedDf[:recommendedMovieSize].iterrows():
        result += str(counter) + ") " + row['Title'] + '\r\n'
        counter += 1

    return result

#Returns Random (recommendedMovieSize) Movies From TopRecommendedDf as a String
#This prevents always recommending the same movies to the user in the same order.
#For this, Fitness Proportionate Selection(Roulette Wheel Selection) method can also be used.
#see https://en.wikipedia.org/wiki/Fitness_proportionate_selection
#For this,Tournament selection method can also be used.
#see https://en.wikipedia.org/wiki/Tournament_selection
def TopNRandom(TopRecommendedDf, recommendedMovieSize):
    result = ''

    counter = 1
    for index, row in TopRecommendedDf.sample(n = recommendedMovieSize).iterrows():
        result += str(counter) + ") " + row['Title'] + '\r\n'
        counter += 1

    return result

## InteractedOnlyModel Prediction

This model is trained with the BinaryInteraction InteractedOnly Dataset. Since BinaryInteraction InteractedOnly dataset contains only interacted pairs, the model will tend to produce a positive output for each input. So this model is the most useless model (See Training1 notebook for more information).

### Predictions will be made as an example only!

In [4]:
#ignore warnings due to Converting sparse IndexedSlices to a dense Tensor of unknown shape warning
warnings.filterwarnings('ignore')

#Best Model For BinaryInteraction InteractedOnly dataset loading from h5 file
#See Training1 notebook for more information
model = tf.keras.models.load_model("../Model/InteractedOnlyModel/Model3.h5")

#Loading ModelTable and LookupTable from pkl file
#These tables will provide more information about the reliability of the predictions
#See PredictPreparation1 notebook for more information
modelTable = pd.read_pickle('../PredictData/ModelTable/InteractedOnly.pkl')
lookupTable = pd.read_pickle('../PredictData/LookupTable/InteractedOnly.pkl')

In [5]:
#Printing ModelTable for information
#As can be seen from the table, the model is completely useless
modelTable

Unnamed: 0,ModelData,Negative,Positive,Loss,TP,FP,TN,FN,Accuracy
0,Training,0,753545,1e-06,753545,0,0,0,1.0
1,Validation,0,123674,1e-06,123674,0,0,0,1.0
2,Test,0,123674,1e-06,123674,0,0,0,1.0


In [6]:
#Printing first user's representation numbers and correct classification information from lookup table
#The table shows the reliability of the prediction for first user
lookupTable.iloc[users[0]]

User                 905
TraingingRep           3
TrainingCorrect        3
ValidationRep          0
ValidationCorrect      0
TestRep                0
TestCorrect            0
AllRep                 3
AllCorrect             3
Name: 905, dtype: int64

In [7]:
#According to the prediction results of the InteractedOnlyModel,
#The first 25 movies recommended to the first user are calculated and stored as dataframe.
#This dataframe can be used repeatedly for movie recommendation to the first user.
#This prevent prediction repeatedly for each set of movies recommended to a user in recommendation system
pool = GetRecommendedPool(users[0], model, recommendedPoolSize)
pool

Unnamed: 0,MovieId,Prediction,Title
0,34,1.0,"Godfather: Part II, The (1974)"
1,40,1.0,Saving Private Ryan (1998)
2,42,1.0,Toy Story (1995)
3,43,1.0,Jumanji (1995)
4,44,1.0,Father of the Bride Part II (1995)
5,45,1.0,Heat (1995)
6,46,1.0,GoldenEye (1995)
7,49,1.0,Ace Ventura: When Nature Calls (1995)
8,52,1.0,Leaving Las Vegas (1995)
9,54,1.0,Twelve Monkeys (a.k.a. 12 Monkeys) (1995)


In [8]:
#Most recommended movies
print(TopNMovie(pool, recommendationSize))

1) Godfather: Part II, The (1974)
2) Saving Private Ryan (1998)
3) Toy Story (1995)
4) Jumanji (1995)
5) Father of the Bride Part II (1995)



In [9]:
#Randomly selected movies from pool
#As seen in the example, with this function, different movies can be recommended to the user
#For detailed information, see the comment lines of the function.
print(TopNRandom(pool, recommendationSize))

1) Broken Arrow (1996)
2) Batman Forever (1995)
3) Braveheart (1995)
4) Crimson Tide (1995)
5) Twelve Monkeys (a.k.a. 12 Monkeys) (1995)



In [10]:
print(TopNRandom(pool, recommendationSize))

1) Saving Private Ryan (1998)
2) Godfather: Part II, The (1974)
3) Toy Story (1995)
4) Leaving Las Vegas (1995)
5) Father of the Bride Part II (1995)



In [11]:
#Printing second user's representation numbers and correct classification information from lookup table
#The table shows the reliability of the prediction for second user
lookupTable.iloc[users[1]]

User                  13
TraingingRep         135
TrainingCorrect      135
ValidationRep         17
ValidationCorrect     17
TestRep               22
TestCorrect           22
AllRep               174
AllCorrect           174
Name: 13, dtype: int64

In [12]:
#According to the prediction results of the InteractedOnlyModel,
#The second 25 movies recommended to the second user are calculated and stored as dataframe.
#This dataframe can be used repeatedly for movie recommendation to the second user.
#This prevent prediction repeatedly for each set of movies recommended to a user in recommendation system
pool = GetRecommendedPool(users[1], model, recommendedPoolSize)
pool

Unnamed: 0,MovieId,Prediction,Title
0,0,1.0,Three Colors: Blue (Trois couleurs: Bleu) (1993)
1,1,1.0,Kalifornia (1993)
2,2,1.0,Weekend at Bernie's (1989)
3,3,1.0,Better Off Dead... (1985)
4,4,1.0,Waiting for Guffman (1996)
5,5,1.0,Event Horizon (1997)
6,6,1.0,Spawn (1997)
7,7,1.0,Weird Science (1985)
8,8,1.0,¡Three Amigos! (1986)
9,9,1.0,Stigmata (1999)


In [13]:
#Most recommended movies
print(TopNMovie(pool, recommendationSize))

1) Three Colors: Blue (Trois couleurs: Bleu) (1993)
2) Kalifornia (1993)
3) Weekend at Bernie's (1989)
4) Better Off Dead... (1985)
5) Waiting for Guffman (1996)



In [14]:
#Randomly selected movies from pool
#As seen in the example, with this function, different movies can be recommended to the user
#For detailed information, see the comment lines of the function.
print(TopNRandom(pool, recommendationSize))

1) Waiting for Guffman (1996)
2) Hackers (1995)
3) Spawn (1997)
4) Nurse Betty (2000)
5) Stigmata (1999)



In [15]:
print(TopNRandom(pool, recommendationSize))

1) Harold and Maude (1971)
2) Three Colors: Blue (Trois couleurs: Bleu) (1993)
3) Room with a View, A (1986)
4) Hollow Man (2000)
5) RoboCop 2 (1990)



## NotInteractedSampleModel Prediction

This model is trained with the BinaryInteraction NotInteractedSample Dataset. Since BinaryInteraction NotInteractedSample dataset contains a small sample of not interacted pairs, it is more useful than the BinaryInteraction InteractedOnly dataset (See Training2 notebook for more information).


In [16]:
#Best Model For BinaryInteraction NotInteractedSample dataset loading from h5 file
#See Training2 notebook for more information
model = tf.keras.models.load_model("../Model/NotInteractedSampleModel/Model8.h5")

#Loading ModelTable and LookupTable from pkl file
#These tables will provide more information about the reliability of the predictions
#See PredictPreparation2 notebook for more information
modelTable = pd.read_pickle('../PredictData/ModelTable/NotInteractedSample.pkl')
lookupTable = pd.read_pickle('../PredictData/LookupTable/NotInteractedSample.pkl')

In [17]:
#Printing ModelTable for information
modelTable

Unnamed: 0,ModelData,Negative,Positive,Loss,TP,FP,TN,FN,Accuracy
0,Training,250322,750814,0.290982,734148,36244,214078,16666,0.94715
1,Validation,41648,125046,0.320546,121500,6677,34971,3546,0.938672
2,Test,41661,125033,0.324791,121393,6833,34828,3640,0.937172


In [18]:
#Printing first user's representation numbers and correct classification information from lookup table
#The table shows the reliability of the prediction for first user
lookupTable.iloc[users[0]]

User                 905
TraingingRep          25
TrainingCorrect       23
ValidationRep          6
ValidationCorrect      6
TestRep                4
TestCorrect            4
AllRep                35
AllCorrect            33
Name: 905, dtype: int64

In [19]:
#According to the prediction results of the NotInteractedSampleModel,
#The first 25 movies recommended to the first user are calculated and stored as dataframe.
#This dataframe can be used repeatedly for movie recommendation to the first user.
#This prevent prediction repeatedly for each set of movies recommended to a user in recommendation system
pool = GetRecommendedPool(users[0], model, recommendedPoolSize)
pool

Unnamed: 0,MovieId,Prediction,Title
0,121,0.679001,Forrest Gump (1994)
1,376,0.660949,"Matrix, The (1999)"
2,106,0.638051,Pulp Fiction (1994)
3,221,0.595714,Star Wars: Episode VI - Return of the Jedi (1983)
4,110,0.58865,"Shawshank Redemption, The (1994)"
5,421,0.586525,Fight Club (1999)
6,73,0.567923,Braveheart (1995)
7,101,0.564251,Star Wars: Episode IV - A New Hope (1977)
8,165,0.560148,"Silence of the Lambs, The (1991)"
9,190,0.554815,"Godfather, The (1972)"


In [20]:
#Most recommended movies
print(TopNMovie(pool, recommendationSize))

1) Forrest Gump (1994)
2) Matrix, The (1999)
3) Pulp Fiction (1994)
4) Star Wars: Episode VI - Return of the Jedi (1983)
5) Shawshank Redemption, The (1994)



In [21]:
#Randomly selected movies from pool
#As seen in the example, with this function, different movies can be recommended to the user
#For detailed information, see the comment lines of the function.
print(TopNRandom(pool, recommendationSize))

1) Twelve Monkeys (a.k.a. 12 Monkeys) (1995)
2) Robin Hood: Men in Tights (1993)
3) Forrest Gump (1994)
4) Matrix, The (1999)
5) Usual Suspects, The (1995)



In [22]:
print(TopNRandom(pool, recommendationSize))

1) Twelve Monkeys (a.k.a. 12 Monkeys) (1995)
2) Jurassic Park (1993)
3) Godfather, The (1972)
4) Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)
5) Silence of the Lambs, The (1991)



In [23]:
#Printing second user's representation numbers and correct classification information from lookup table
#The table shows the reliability of the prediction for second user
lookupTable.iloc[users[1]]

User                  13
TraingingRep         154
TrainingCorrect      146
ValidationRep         26
ValidationCorrect     23
TestRep               31
TestCorrect           30
AllRep               211
AllCorrect           199
Name: 13, dtype: int64

In [24]:
#According to the prediction results of the NotInteractedSampleModel,
#The second 25 movies recommended to the second user are calculated and stored as dataframe.
#This dataframe can be used repeatedly for movie recommendation to the second user.
#This prevent prediction repeatedly for each set of movies recommended to a user in recommendation system
pool = GetRecommendedPool(users[1], model, recommendedPoolSize)
pool

Unnamed: 0,MovieId,Prediction,Title
0,376,0.999918,"Matrix, The (1999)"
1,668,0.999825,"Lord of the Rings: The Return of the King, The..."
2,674,0.999808,Kill Bill: Vol. 2 (2004)
3,216,0.999728,Raiders of the Lost Ark (Indiana Jones and the...
4,924,0.99959,Sleepy Hollow (1999)
5,215,0.999509,Star Wars: Episode V - The Empire Strikes Back...
6,1084,0.999498,"Lives of Others, The (Das leben der Anderen) (..."
7,1079,0.999391,Harry Potter and the Prisoner of Azkaban (2004)
8,1589,0.999383,"Great Escape, The (1963)"
9,1333,0.99932,E.T. the Extra-Terrestrial (1982)


In [25]:
#Most recommended movies
print(TopNMovie(pool, recommendationSize))

1) Matrix, The (1999)
2) Lord of the Rings: The Return of the King, The (2003)
3) Kill Bill: Vol. 2 (2004)
4) Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)
5) Sleepy Hollow (1999)



In [26]:
#Randomly selected movies from pool
#As seen in the example, with this function, different movies can be recommended to the user
#For detailed information, see the comment lines of the function.
print(TopNRandom(pool, recommendationSize))

1) Avatar (2009)
2) Lives of Others, The (Das leben der Anderen) (2006)
3) Star Wars: Episode VI - Return of the Jedi (1983)
4) Kill Bill: Vol. 2 (2004)
5) American History X (1998)



In [27]:
print(TopNRandom(pool, recommendationSize))

1) Great Escape, The (1963)
2) Kill Bill: Vol. 2 (2004)
3) Harry Potter and the Prisoner of Azkaban (2004)
4) Matrix, The (1999)
5) Alien (1979)

