# CategoricalLike Models Prediction

In this Notebook, predictions will be made Both CategorizedOnlyModel and NotCategorizedSampleModel

Here, it will be exemplified how to make predictions with the Categorical Classification Model.

However, it would be useful to ask the why question before asking the how question. In this case,

# Why Should A Prediction Be Made For Categorical Like Datasets?

As mentioned before, Categorical Like datasets contains labeled rows [0, 1, 2] - ['Hated', 'Not Liked', 'Liked'] pairs. According to raw rating data, ratings range [0, 2) are labeled as Hated, ratings range [2, 4) are labeled as Not Liked, ratings range [4, 5] are labeled as Liked. This provides a better valid criterion than Binary Like dataset for likes. This means that these predictions will be based on users' tastes.

These models can provide a better experience for users, as they aim to recommend movies that users might like. Users who are satisfied with their experience will not lose their motivation to continue using the system. Especially when recommending products such as movies that will allow users to have a good time, it will be a great advantage if the recommendations are based on the tastes of the users rather than needs of users or the system


# How To Make Predictions With Categorical Like Models?

Since Datasets have 3 labels [0, 1, 2] - ['Hated', 'Not Liked', 'Liked'] pairs, Categorical Classification Model is used. Categorical Classification Models has 3 output perceptrons that produces a result with a softmax activation function. I.E. the models' predictions will contain an array of decimals indicating the proportion of belonging to each class. As the movies that are most likely to Like the user will be recommended. The relevant user will be given as input to the classification model with all movies separately. The desired number of movies with the highest proportion of belonging to the Liked class will be suggested to the user, excluding the movies that the user has watched.


In [1]:
#Importing libraries
import numpy as np
import pandas as pd
import tensorflow as tf
import warnings

In [2]:
#Printing library versions
print('numpy Version: ' + np.__version__)
print('pandas Version: ' + pd.__version__)
print('tensorflow Version: ' + tf.__version__)

numpy Version: 1.16.5
pandas Version: 0.25.1
tensorflow Version: 2.0.0


In [3]:
#Loading Rating raw data and Movie Raw data from pkl file
ratingDf = pd.read_pickle('../Data/pkl/1M/RawData/Rating.pkl')
movieDf = pd.read_pickle('../Data/pkl/1M/RawData/Movie.pkl')

#Users to be used for predictions are determined
#Numbers are randomly selected
#Unnecessary information : First of the numbers by my mother and the other by me
users = [905, 13]

#Determining Pool Size 
recommendedPoolSize = 25

#Determining recommendation Size
recommendationSize = 5

#Returns Top poolSize movie for given userId as dataFrame
def GetRecommendedPool(userId, model, poolSize):

    #Getting User interactions over movies
    ratedMoviesByUser = ratingDf[ratingDf['UserId'] == userId]['MovieId'].values

    #Combining Full userId numpy array that has shape (movieSize, ) and
    #a numpy array that include all movieId's 
    #For create input for model
    #Model will predict the proportion at which all movies belong to Categories for a user
    #after that already interacted movies by user will be delete
    #Then top poolSize movie with the highest proportion in the category of Liked will be returned
    predictInput = [np.full((movieDf.shape[0]), userId, dtype=int), movieDf['MovieId'].values]

    #Model Predict
    predictList = model.predict(x = predictInput)

    #Creating Dataframe with the MovieId and Prediction
    #That contains the movie's proportion of belonging to category Liked for the user
    resultDf = pd.DataFrame({'MovieId' : movieDf['MovieId'].values, 'Prediction' : predictList[:, 2]})

    #The (poolSize) movies with the highest Prediction score (Closest to 1.0)
    #Are selected from the movies that are not interacted by the user
    resultDf = resultDf[~resultDf['MovieId'].isin(ratedMoviesByUser)].nlargest(poolSize, 'Prediction')

    #Merging resultDf and movieNames on movieId for add resultDf movie Titles
    resultDf = pd.merge(resultDf, movieDf, on='MovieId', how='left').reset_index(drop = True)

    return resultDf

#Returns Top (recommendedMovieSize) Recommended Movies From TopRecommendedDf as a String
def TopNMovie(TopRecommendedDf, recommendedMovieSize):
    result = ''
    
    counter = 1
    for index, row in TopRecommendedDf[:recommendedMovieSize].iterrows():
        result += str(counter) + ") " + row['Title'] + '\r\n'
        counter += 1

    return result

#Returns Random (recommendedMovieSize) Movies From TopRecommendedDf as a String
#This prevents always recommending the same movies to the user in the same order.
#For this, Fitness Proportionate Selection(Roulette Wheel Selection) method can also be used.
#see https://en.wikipedia.org/wiki/Fitness_proportionate_selection
#For this,Tournament selection method can also be used.
#see https://en.wikipedia.org/wiki/Tournament_selection
def TopNRandom(TopRecommendedDf, recommendedMovieSize):
    result = ''

    counter = 1
    for index, row in TopRecommendedDf.sample(n = recommendedMovieSize).iterrows():
        result += str(counter) + ") " + row['Title'] + '\r\n'
        counter += 1

    return result

## CategorizedOnlyModel Prediction

This model is trained with the CategoricalLike CategorizedOnly Dataset. This dataset contains labeled rows [0, 1, 2] - ['Hated', 'Not Liked', 'Liked'] pairs (See Training5 notebook for more information).

In [4]:
#ignore warnings due to Converting sparse IndexedSlices to a dense Tensor of unknown shape warning
warnings.filterwarnings('ignore')

#Best Model For CategoricalLike CategorizedOnly dataset loading from h5 file
#See Training5 notebook for more information
model = tf.keras.models.load_model("../Model/CategorizedOnlyModel/Model18.h5")

#Loading ModelTable and LookupTable from pkl file
#These tables will provide more information about the reliability of the predictions
#See PredictPreparation5 notebook for more information
modelTable = pd.read_pickle('../PredictData/ModelTable/CategorizedOnly.pkl')
lookupTable = pd.read_pickle('../PredictData/LookupTable/CategorizedOnly.pkl')

In [5]:
#Printing ModelTable for information
modelTable

Unnamed: 0,ModelData,Hated,Not Liked,Liked,Loss,Accuracy
0,Training,50014,329878,373741,0.639107,0.817713
1,Validation,7834,54101,61695,0.75143,0.780199
2,Test,7860,53972,61798,0.750584,0.782199


In [6]:
#Printing first user's representation numbers and correct classification information from lookup table
#The table shows the reliability of the prediction for first user
lookupTable.iloc[users[0]]

User                 905
TraingingRep           2
TrainingCorrect        1
ValidationRep          0
ValidationCorrect      0
TestRep                1
TestCorrect            0
AllRep                 3
AllCorrect             1
Name: 905, dtype: int64

In [7]:
#According to the prediction results of the CategorizedOnlyModel,
#The first 25 movies recommended to the first user are calculated and stored as dataframe.
#This dataframe can be used repeatedly for movie recommendation to the first user.
#This prevent prediction repeatedly for each set of movies recommended to a user in recommendation system
pool = GetRecommendedPool(users[0], model, recommendedPoolSize)
pool

Unnamed: 0,MovieId,Prediction,Title
0,6817,0.88603,"Tunnel, The (Tunnel, Der) (2001)"
1,12439,0.88372,Citizen X (1995)
2,6297,0.873212,Sleep Tight (Mientras duermes) (2011)
3,10833,0.863436,Anne of the Thousand Days (1969)
4,12357,0.86213,Brimstone (2016)
5,11666,0.86131,Rififi (Du rififi chez les hommes) (1955)
6,11152,0.859802,Ballad of a Soldier (Ballada o soldate) (1959)
7,10044,0.858677,"Four Musketeers, The (1974)"
8,12793,0.854288,"Hush... Hush, Sweet Charlotte (1964)"
9,18752,0.852095,"Song of Bernadette, The (1943)"


In [8]:
#Most recommended movies
print(TopNMovie(pool, recommendationSize))

1) Tunnel, The (Tunnel, Der) (2001)
2) Citizen X (1995)
3) Sleep Tight (Mientras duermes) (2011)
4) Anne of the Thousand Days (1969)
5) Brimstone (2016)



In [9]:
#Randomly selected movies from pool
#As seen in the example, with this function, different movies can be recommended to the user
#For detailed information, see the comment lines of the function.
print(TopNRandom(pool, recommendationSize))

1) Long Walk Home, The (1990)
2) Twelve O'Clock High (1949)
3) Four Musketeers, The (1974)
4) You're Not You (2014)
5) Scooby-Doo! and the Loch Ness Monster (2004)



In [10]:
print(TopNRandom(pool, recommendationSize))

1) I Want to Live! (1958)
2) Into the Woods (1991)
3) Twelve O'Clock High (1949)
4) Confessions (Kokuhaku) (2010)
5) Sleep Tight (Mientras duermes) (2011)



In [11]:
#Printing second user's representation numbers and correct classification information from lookup table
#The table shows the reliability of the prediction for second user
lookupTable.iloc[users[1]]

User                  13
TraingingRep         131
TrainingCorrect      113
ValidationRep         23
ValidationCorrect     19
TestRep               20
TestCorrect           17
AllRep               174
AllCorrect           149
Name: 13, dtype: int64

In [12]:
#According to the prediction results of the CategorizedOnlyModel,
#The second 25 movies recommended to the second user are calculated and stored as dataframe.
#This dataframe can be used repeatedly for movie recommendation to the second user.
#This prevent prediction repeatedly for each set of movies recommended to a user in recommendation system
pool = GetRecommendedPool(users[1], model, recommendedPoolSize)
pool

Unnamed: 0,MovieId,Prediction,Title
0,1542,0.999867,DiG! (2004)
1,2946,0.999821,Samsara (2011)
2,4065,0.999666,American Dream (1990)
3,8466,0.999481,Black Mirror
4,3377,0.99928,Cashback (2004)
5,10384,0.999273,Call Northside 777 (1948)
6,11701,0.998688,Summer of '42 (1971)
7,13137,0.998546,"My Mother's Castle (Château de ma mère, Le) (1..."
8,10044,0.99852,"Four Musketeers, The (1974)"
9,2687,0.998489,Witness for the Prosecution (1957)


In [13]:
#Most recommended movies
print(TopNMovie(pool, recommendationSize))

1) DiG! (2004)
2) Samsara (2011)
3) American Dream (1990)
4) Black Mirror
5) Cashback (2004)



In [14]:
#Randomly selected movies from pool
#As seen in the example, with this function, different movies can be recommended to the user
#For detailed information, see the comment lines of the function.
print(TopNRandom(pool, recommendationSize))

1) Black Mirror
2) Witness for the Prosecution (1957)
3) Four Musketeers, The (1974)
4) Call Northside 777 (1948)
5) Seven Days in May (1964)



In [15]:
print(TopNRandom(pool, recommendationSize))

1) My Flesh and Blood (2003)
2) Witness for the Prosecution (1957)
3) Call Northside 777 (1948)
4) My Mother's Castle (Château de ma mère, Le) (1990)
5) Dodsworth (1936)



## NotCategorizedSampleModel Prediction

This model is trained with the CategoricalLike NotCategorizedSample Dataset. This dataset contains labeled rows [0, 1, 2] - ['Hated', 'Not Liked', 'Liked'] pairs and a small sample of not categorized pairs labeled as Hated (See Training6 notebook for more information).

In [16]:
#Best Model For CategoricalLike NotCategorizedSample dataset loading from h5 file
#See Training6 notebook for more information
model = tf.keras.models.load_model("../Model/NotCategorizedSampleModel/Model18.h5")

#Loading ModelTable and LookupTable from pkl file
#These tables will provide more information about the reliability of the predictions
#See PredictPreparation6 notebook for more information
modelTable = pd.read_pickle('../PredictData/ModelTable/NotCategorizedSample.pkl')
lookupTable = pd.read_pickle('../PredictData/LookupTable/NotCategorizedSample.pkl')

In [17]:
#Printing ModelTable for information
modelTable

Unnamed: 0,ModelData,Hated,Liked,Loved,Loss,Accuracy
0,Training,57452,329602,373745,0.663164,0.817254
1,Validation,9153,54141,61757,0.795134,0.777153
2,Test,9111,54208,61732,0.794496,0.778344


In [18]:
#Printing first user's representation numbers and correct classification information from lookup table
#The table shows the reliability of the prediction for first user
lookupTable.iloc[users[0]]

User                 905
TraingingRep           4
TrainingCorrect        2
ValidationRep          0
ValidationCorrect      0
TestRep                0
TestCorrect            0
AllRep                 4
AllCorrect             2
Name: 905, dtype: int64

In [19]:
#According to the prediction results of the NotCategorizedSampleModel,
#The first 25 movies recommended to the first user are calculated and stored as dataframe.
#This dataframe can be used repeatedly for movie recommendation to the first user.
#This prevent prediction repeatedly for each set of movies recommended to a user in recommendation system
pool = GetRecommendedPool(users[0], model, recommendedPoolSize)
pool

Unnamed: 0,MovieId,Prediction,Title
0,8334,0.980466,Death on the Staircase (Soupçons) (2004)
1,1542,0.971499,DiG! (2004)
2,1808,0.968911,Angel and the Badman (1947)
3,2753,0.966664,"Jetée, La (1962)"
4,9332,0.964461,Night and Fog (Nuit et brouillard) (1955)
5,11666,0.959249,Rififi (Du rififi chez les hommes) (1955)
6,7965,0.958997,"Doulos, Le (1962)"
7,7922,0.957007,Cria Cuervos (1976)
8,2967,0.953518,Burden of Dreams (1982)
9,8333,0.953437,56 Up (2012)


In [20]:
#Most recommended movies
print(TopNMovie(pool, recommendationSize))

1) Death on the Staircase (Soupçons) (2004)
2) DiG! (2004)
3) Angel and the Badman (1947)
4) Jetée, La (1962)
5) Night and Fog (Nuit et brouillard) (1955)



In [21]:
#Randomly selected movies from pool
#As seen in the example, with this function, different movies can be recommended to the user
#For detailed information, see the comment lines of the function.
print(TopNRandom(pool, recommendationSize))

1) Doulos, Le (1962)
2) Mirror, The (Zerkalo) (1975)
3) Brothers (Brødre) (2004)
4) Night and Fog (Nuit et brouillard) (1955)
5) Rififi (Du rififi chez les hommes) (1955)



In [22]:
print(TopNRandom(pool, recommendationSize))

1) When Night Is Falling (1995)
2) Olive Kitteridge (2014)
3) Matewan (1987)
4) Come and See (Idi i smotri) (1985)
5) Gangs of Wasseypur (2012)



In [23]:
#Printing second user's representation numbers and correct classification information from lookup table
#The table shows the reliability of the prediction for second user
lookupTable.iloc[users[1]]

User                  13
TraingingRep         129
TrainingCorrect      107
ValidationRep         20
ValidationCorrect     16
TestRep               25
TestCorrect           22
AllRep               174
AllCorrect           145
Name: 13, dtype: int64

In [24]:
#According to the prediction results of the NotCategorizedSampleModel,
#The second 25 movies recommended to the second user are calculated and stored as dataframe.
#This dataframe can be used repeatedly for movie recommendation to the second user.
#This prevent prediction repeatedly for each set of movies recommended to a user in recommendation system
pool = GetRecommendedPool(users[1], model, recommendedPoolSize)
pool

Unnamed: 0,MovieId,Prediction,Title
0,9447,0.999274,Frozen Planet (2011)
1,8803,0.998855,"Last Klezmer: Leopold Kozlowski, His Life and ..."
2,8616,0.998128,Imagine: John Lennon (1988)
3,9932,0.997904,Montenegro (1981)
4,3927,0.997863,Richard Pryor Live on the Sunset Strip (1982)
5,11733,0.997687,"Adventures of Sherlock Holmes, The (1939)"
6,7925,0.997314,"U.S. vs. John Lennon, The (2006)"
7,7861,0.997142,"Train, The (1964)"
8,713,0.997127,I Am David (2003)
9,3884,0.997122,"Killing, The (1956)"


In [25]:
#Most recommended movies
print(TopNMovie(pool, recommendationSize))

1) Frozen Planet (2011)
2) Last Klezmer: Leopold Kozlowski, His Life and Music, The (1994)
3) Imagine: John Lennon (1988)
4) Montenegro (1981)
5) Richard Pryor Live on the Sunset Strip (1982)



In [26]:
#Randomly selected movies from pool
#As seen in the example, with this function, different movies can be recommended to the user
#For detailed information, see the comment lines of the function.
print(TopNRandom(pool, recommendationSize))

1) The Night Manager (2016)
2) U.S. vs. John Lennon, The (2006)
3) Endurance (1999)
4) Matewan (1987)
5) Divorce - Italian Style (Divorzio all'italiana) (1961)



In [27]:
print(TopNRandom(pool, recommendationSize))

1) Band of Brothers (2001)
2) Killing, The (1956)
3) Imagine: John Lennon (1988)
4) My Winnipeg (2007)
5) Adventures of Sherlock Holmes, The (1939)

