# Recommendation - Model 🍿

---

<img src="https://visithrastnik.si/uploads/tic/public/generic_list_item/6-kulturna_prireditev_v_avli_kulturnega_centra_zagorje_ob_savi.jpg" />

---

Importons dans un premier temps les résultats obtenues précedemment

In [3]:
import pickle

with open("ratings_matrix.pkl", "rb") as file:
    ratings_matrix = pickle.load(file)
    
ratings_matrix

<610x9724 sparse matrix of type '<class 'numpy.float64'>'
	with 100836 stored elements in Compressed Sparse Row format>

In [4]:
index_path = 'index.pkl'
with open(index_path, 'rb') as f:
    indexes = pickle.load(f)
idx_to_mid, mid_to_idx, idx_to_uid, uid_to_idx = indexes

In [5]:
idx_to_mid[20]

356

In [6]:
with open("idx_to_mid.pkl", "wb") as file:
    pickle.dump(idx_to_mid, file)
    
with open("mid_to_idx.pkl", "wb") as file:
    pickle.dump(mid_to_idx, file)
    
with open("uid_to_idx.pkl", "wb") as file:
    pickle.dump(uid_to_idx, file)
    
with open("idx_to_uid.pkl", "wb") as file:
    pickle.dump(idx_to_uid, file)

Lançons donc notre outil lightfm qui va nous permettre de créer une matrice de similarité

In [7]:
from lightfm.cross_validation import random_train_test_split
import numpy as np

train_matrix, test_matrix = random_train_test_split(ratings_matrix,
                                                    test_percentage=0.2,
                                                    random_state=np.random.RandomState(0))



In [8]:
from lightfm import LightFM

model = LightFM(no_components=30, loss="warp", random_state=0)

model.fit(train_matrix, epochs=50, verbose=True)

Epoch 0
Epoch 1
Epoch 2
Epoch 3
Epoch 4
Epoch 5
Epoch 6
Epoch 7
Epoch 8
Epoch 9
Epoch 10
Epoch 11
Epoch 12
Epoch 13
Epoch 14
Epoch 15
Epoch 16
Epoch 17
Epoch 18
Epoch 19
Epoch 20
Epoch 21
Epoch 22
Epoch 23
Epoch 24
Epoch 25
Epoch 26
Epoch 27
Epoch 28
Epoch 29
Epoch 30
Epoch 31
Epoch 32
Epoch 33
Epoch 34
Epoch 35
Epoch 36
Epoch 37
Epoch 38
Epoch 39
Epoch 40
Epoch 41
Epoch 42
Epoch 43
Epoch 44
Epoch 45
Epoch 46
Epoch 47
Epoch 48
Epoch 49


<lightfm.lightfm.LightFM at 0x114d34b00>

In [9]:
from lightfm.evaluation import precision_at_k

k = 5
pre_k = precision_at_k(model, test_matrix, train_matrix, k=k).mean()

print("Precision at k={} is {}".format(k, pre_k))

Precision at k=5 is 0.2671052813529968


Nous arrivons donc à notre matrice de similarité qui nous sera très utile lorsque nous souhaiterons à partir d'un film ou plusieurs nous proposer les 5 films qui lui ressemble le plus. Dans le cas où il y aurait plusieurs films, nous n'aurons qu'à sommer chaque valeur donnée par la matrice

In [10]:
print(model.user_embeddings.shape)
model.user_embeddings

(610, 30)


array([[ 0.8197754 ,  0.58469   ,  0.8000792 , ...,  0.1683342 ,
         0.3600048 , -0.21243623],
       [-0.6790457 , -0.12120218, -0.22247835, ...,  0.16173059,
        -0.00376948,  0.7129165 ],
       [-0.3109635 ,  0.00349523,  0.44889408, ...,  0.25184995,
         0.8891051 , -0.25782192],
       ...,
       [ 1.4590213 ,  0.12048177,  0.23078242, ..., -0.2774047 ,
        -1.1821449 ,  0.4094976 ],
       [-0.7227708 ,  0.80425507, -0.17750198, ..., -0.45044312,
         0.02396172, -0.43270677],
       [ 0.562239  ,  1.4131409 , -1.4402713 , ...,  0.42333522,
        -2.0645046 ,  1.479885  ]], dtype=float32)

In [11]:
import numpy as np
similarity_scores = np.corrcoef(model.item_embeddings)

Essayons notre modèle sur **Toy Story** dont le movie_id est 1. Voyons quel est le **top 5 recommendations**

In [25]:
np.argsort(similarity_scores[mid_to_idx[1]])[:5]

mid_to_idx[1]

0

In [27]:
idx = mid_to_idx[1]
movie = []

for i in np.argsort(-similarity_scores[idx])[1:6]:
    movie.append(movies[movies['movieId']==idx_to_mid[i]]['title'].values[0])

movie

['Star Wars: Episode VI - Return of the Jedi (1983)',
 'Star Wars: Episode IV - A New Hope (1977)',
 'Pulp Fiction (1994)',
 'Beetlejuice (1988)',
 'Jurassic Park (1993)']

Sauvegardons ça maintenant en pickles afin de pouvoir la réutiliser dans une application de type flask

In [28]:
pickle.dump(movie, open( "movie.pkl", "wb" ))
movie

['Star Wars: Episode VI - Return of the Jedi (1983)',
 'Star Wars: Episode IV - A New Hope (1977)',
 'Pulp Fiction (1994)',
 'Beetlejuice (1988)',
 'Jurassic Park (1993)']

In [23]:
with open("similarity_scores.pkl", "wb") as file:
    pickle.dump(similarity_scores, file)

In [31]:
with open("movies.pkl", "wb") as file:
    pickle.dump(movies, file)

La photo que vous pouvez lire dans le readme est juste une implémentation de ces calculs via une application de type flask (interface en html). Pour cela, vous aurez besoin des fonctions suivantes. Les codes sont également dans le dossier.

In [19]:
def get_sim_scores(mid):
    idx = mid_to_idx[mid]
    sims = similarity_scores[idx]
    return sims

In [20]:
def get_ranked_recos(sims):
    movie = []
    for i in np.argsort(-sims)[1:]:
        mid = idx_to_mid[i]
        score = sims[i]
        name = movies[movies['movieId']==mid]['title'].values[0]
        movie.append((mid, score, name))
    return movie