# Recommendation system

### Applying to the Real Movie Data

https://towardsdatascience.com/item-based-collaborative-filtering-in-python-91f747200fab

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import math

from sklearn.neighbors import NearestNeighbors

### 1. **Obtener los datos:** 
El conjunto de datos de MovieLens es una colección de calificaciones de películas recopiladas por el GrupoLens de la Universidad de Minnesota. El conjunto de datos incluye información sobre aproximadamente 100,000 calificaciones de películas realizadas por más de 600 usuarios. 

In [2]:
ratings = pd.read_csv('dataset/ratings.csv', usecols=['userId','movieId','rating'])
movies_df = pd.read_csv('dataset/movies.csv')

In [3]:
ratings

Unnamed: 0,userId,movieId,rating
0,1,1,4.0
1,1,3,4.0
2,1,6,4.0
3,1,47,5.0
4,1,50,5.0
...,...,...,...
100831,610,166534,4.0
100832,610,168248,5.0
100833,610,168250,5.0
100834,610,168252,5.0


In [4]:
movies_df

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
...,...,...,...
9737,193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy
9738,193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy
9739,193585,Flint (2017),Drama
9740,193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation


### 2. Limpiar los datos:
El primer paso será asegurarnos de que los datos estén limpios y en un formato adecuado. Para ello lo primero será crear una nueva columna donde añadir el año.

**Nota:** el genero de la película no se tendrá en cuenta para el desarrollo del algoritmo.

In [5]:
movies_df['year'] = movies_df.title.str.extract('(\(\d\d\d\d\))',expand=False) 
movies_df['year'] = movies_df.year.str.extract('(\d\d\d\d)',expand=False) 
movies_df['title'] = movies_df.title.str.replace(' (\(\d\d\d\d\))', '', regex=True) 
movies_df = movies_df.drop(['genres'], axis=1) 

### 3. Crear una matriz de usuario-item:
Crea una matriz en la que las filas representen a los usuarios y las columnas representen a las películas. Los valores en la matriz representan la valoración de un usuario a una película.

In [6]:
ratings_df = pd.merge(ratings, movies_df, how='inner', on='movieId')

In [7]:
ratings_df

Unnamed: 0,userId,movieId,rating,title,year
0,1,1,4.0,Toy Story,1995
1,5,1,4.0,Toy Story,1995
2,7,1,4.5,Toy Story,1995
3,15,1,2.5,Toy Story,1995
4,17,1,4.5,Toy Story,1995
...,...,...,...,...,...
100831,610,160341,2.5,Bloodmoon,1997
100832,610,160527,4.5,Sympathy for the Underdog,1971
100833,610,160836,3.0,Hazard,2005
100834,610,163937,3.5,Blair Witch,2016


In [8]:
df_matrix = ratings_df.pivot_table(index='title',columns='userId',values='rating').fillna(0)

In [9]:
df_matrix

userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'71,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0
'Hellboy': The Seeds of Creation,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
'Round Midnight,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
'Salem's Lot,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
'Til There Was You,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
eXistenZ,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,5.0,0.0,0.0,0.0,0.0,4.5,0.0,0.0
xXx,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.5,0.0,2.0
xXx: State of the Union,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.5
¡Three Amigos!,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [10]:
df_matrix.values

array([[0. , 0. , 0. , ..., 0. , 0. , 4. ],
       [0. , 0. , 0. , ..., 0. , 0. , 0. ],
       [0. , 0. , 0. , ..., 0. , 0. , 0. ],
       ...,
       [0. , 0. , 0. , ..., 0. , 0. , 1.5],
       [4. , 0. , 0. , ..., 0. , 0. , 0. ],
       [0. , 0. , 0. , ..., 0. , 0. , 0. ]])

Usaremos **NearestNeighbors()** para calcular la distancia entre películas usando la **cosine similarity** y encontrar las películas más similares a cada película.

In [11]:
from sklearn.neighbors import NearestNeighbors

knn = NearestNeighbors(metric='cosine', algorithm='brute')
knn.fit(df_matrix.values)
distances, indices = knn.kneighbors(df_matrix.values, n_neighbors=8)

El parámetro para el número de vecinos más cercanos se ha establecido en 8.

In [12]:
indices

array([[8809, 6281, 1910, ..., 5523, 8808, 9306],
       [5602, 7659,    1, ...,   11,    2, 4547],
       [   2,  607,  646, ..., 5602,  389,  403],
       ...,
       [9443, 6658, 4740, ..., 3696, 6718, 2879],
       [9444, 5582, 3199, ..., 7665, 2723,  722],
       [9445, 5813, 3058, ..., 1867,  708, 7829]], dtype=int64)

**indices** muestra las películas más cercanas a cada película. Cada fila corresponde a la fila en df_matrix. El primer elemento de una fila es la película más similar (más cercana). Es la película en sí. El segundo elemento es el segundo más cercano y el tercero es el tercero más cercano, etc.

Por ejemplo, la primera fila **[8809,66281, 1910, ...., 5523, 8808, 9306]**, la película más cercana a 'movie_8809' es ella misma, la segunda película más cercana es 'movie_66281', la tercera es 'movie_1910' y así sucesivamente.

In [13]:
distances

array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.28445825, 0.29289322,
        0.45727958],
       [0.        , 0.29289322, 0.29289322, ..., 0.29289322, 0.29289322,
        0.29289322],
       ...,
       [0.        , 0.30170275, 0.30239885, ..., 0.30398834, 0.30563493,
        0.30563493],
       [0.        , 0.46429966, 0.48672079, ..., 0.53204949, 0.53611603,
        0.53811618],
       [0.        , 0.        , 0.        , ..., 0.24742331, 0.25721865,
        0.29289322]])

**distances** muestra la distancia entre pelícluas. Cada número en esta matriz corresponde al número en la matriz de **indices**.

In [14]:
data1 = pd.DataFrame(indices, columns=['col1', 'col2', 'col3', 'col4','col5', 'col6', 'col7', 'col8'])

In [15]:
data2 = data1.loc[:, ['col1', 'col2']].values

In [16]:
data3 = data1.loc[:, ['col1', 'col3']].values

In [17]:
data3 = data1.loc[:, ['col1', 'col4']].values

In [18]:
data4 = data1.loc[:, ['col1', 'col5']].values

In [19]:
data5 = data1.loc[:, ['col1', 'col6']].values

In [20]:
data6 = data1.loc[:, ['col1', 'col7']].values

In [21]:
data7 = data1.loc[:, ['col1', 'col8']].values

In [22]:
c = np.concatenate((data2, data3, data4, data5, data6, data7), axis=0)

In [23]:
c

array([[8809, 6281],
       [5602, 7659],
       [   2,  607],
       ...,
       [9443, 2879],
       [9444,  722],
       [9445, 7829]], dtype=int64)

In [24]:
import csv

with open('source.csv', 'w') as file:
    writer = csv.writer(file)
    writer.writerows(c)

### 4. Calcular similitudes:
El algoritmo consta de tres pasos:

* **1.** Calcular la similitud entre películas utilizando similitud de Coseno o de Pearson. Las similitudes se calculan en función de todas las calificaciones realizadas por todos los usuarios.

* **2.** Predecir la calificación de cada película que el usuario no ha visto:

    * Primero, se buscan las películas más similares a la película que el usuario no ha visto, utilizando la similitud calculada en el Paso 1.
    * A continuación, se calcula el promedio ponderado de las calificaciones realizadas por el usuario para las películas más similares. Se utiliza la métrica de distancia inversa como peso en este cálculo.
    * El promedio ponderado se utiliza como la calificación predicha para la película no vista por el usuario.

* **3.** Se recomiendan las películas que tienen las calificaciones predichas más altas para el usuario.

De esta manera, el recomendador sugiere películas basándose en las calificaciones de otras películas similares, lo que permite ofrecer recomendaciones personalizadas y precisas al usuario.

#### 4.1. Cosine similarity

In [25]:
df = df_matrix
df1 = df.copy()

In [26]:
def recommend_movies(user, num_recommended_movies):
    
    print('The list of the Movies {} Has Watched \n'.format(user))

    for m in df[df[user] > 0][user].index.tolist():
        print(m)
  
    print('\n')

    recommended_movies = []

    for m in df[df[user] == 0].index.tolist():
        index_df = df.index.tolist().index(m)
        predicted_rating = df1.iloc[index_df, df1.columns.tolist().index(user)]
        recommended_movies.append((m, predicted_rating))

    sorted_rm = sorted(recommended_movies, key=lambda x:x[1], reverse=True)
  
    print('The list of the Recommended Movies \n')
    rank = 1
    for recommended_movie in sorted_rm[:num_recommended_movies]:
        
        print('{}: {} - predicted rating:{}'.format(rank, recommended_movie[0], recommended_movie[1]))
        rank = rank + 1

In [27]:
def movie_recommender_cosinus(user, num_neighbors, num_recommendation):
    
    number_neighbors = num_neighbors

    knn = NearestNeighbors(metric='cosine', algorithm='brute')
    knn.fit(df.values)
    distances, indices = knn.kneighbors(df.values, n_neighbors=number_neighbors)

    user_index = df.columns.tolist().index(user)

    for m,t in list(enumerate(df.index)):
        if df.iloc[m, user_index] == 0:
            sim_movies = indices[m].tolist()
            movie_distances = distances[m].tolist()
            
            if m in sim_movies:
                id_movie = sim_movies.index(m)
                sim_movies.remove(m)
                movie_distances.pop(id_movie) 

            else:
                sim_movies = sim_movies[:num_neighbors-1]
                movie_distances = movie_distances[:num_neighbors-1]
           
            movie_similarity = [1-x for x in movie_distances]
            movie_similarity_copy = movie_similarity.copy()
            nominator = 0
            
            for s in range(0, len(movie_similarity)):
                if df.iloc[sim_movies[s], user_index] == 0:
                    if len(movie_similarity_copy) == (number_neighbors - 1):
                        movie_similarity_copy.pop(s)
          
                    else:
                        movie_similarity_copy.pop(s-(len(movie_similarity)-len(movie_similarity_copy)))
            
                else:
                    nominator = nominator + movie_similarity[s]*df.iloc[sim_movies[s],user_index]
          
            if len(movie_similarity_copy) > 0:
                if sum(movie_similarity_copy) > 0:
                    predicted_r = nominator/sum(movie_similarity_copy)
        
                else:
                    predicted_r = 0

            else:
                predicted_r = 0
        
            df1.iloc[m,user_index] = predicted_r
    recommend_movies(user,num_recommendation)

#### 4.2. Pearson Correlation

En este caso he decidio cargar mi base de datos de netflix en vez de utilizar a un usuario desconocido de la base de datos de MovieLens.

#### 4.2.1. Netflix Count Data Ratings

In [28]:
netflix = pd.read_csv("dataset/netflix_ratings.csv",  usecols=['Profile Name','Title Name','Thumbs Value'])
netflix.columns = ['Profile','title','rating']

netflix

Unnamed: 0,Profile,title,rating
0,Casa,The Wedding Unplanner,1.0
1,Casa,Freud,2.0
2,guillermo,Gladiator,5.0
3,guillermo,Bohemian Rhapsody,4.0
4,guillermo,Seven Pounds,2.0
...,...,...,...
58,25,Thor: Ragnarok,4.5
59,25,"Three Billboards Outside Ebbing, Missouri",5.0
60,25,Up,5.0
61,25,WALL·E,5.0


In [29]:
netflix = netflix[netflix['Profile'] == '25']
netflix = netflix.drop(['Profile'], axis=1)

In [30]:
netflix.head()

Unnamed: 0,title,rating
37,Avengers: Infinity War - Part I,5.0
38,Blade Runner 2049,4.0
39,"Dark Knight Rises, The",5.0
40,"Dark Knight, The",5.0
41,Deadpool 2,5.0


In [31]:
inputId = movies_df[movies_df['title'].isin(netflix['title'].tolist())]
netflix = pd.merge(inputId, netflix)
netflix = netflix.drop(['year'], axis=1)

In [32]:
userSubset = ratings_df[ratings_df['movieId'].isin(netflix['movieId'].tolist())]
userSubset.head()

Unnamed: 0,userId,movieId,rating,title,year
1365,1,231,5.0,Dumb & Dumber (Dumb and Dumber),1994
1366,6,231,3.0,Dumb & Dumber (Dumb and Dumber),1994
1367,8,231,4.0,Dumb & Dumber (Dumb and Dumber),1994
1368,14,231,3.0,Dumb & Dumber (Dumb and Dumber),1994
1369,18,231,2.5,Dumb & Dumber (Dumb and Dumber),1994


Primero filtramos a los usuarios que han visto nuestras entradas y las almacenamos en una lista 

In [33]:
userSubsetGroup = userSubset.groupby(['userId'])

In [34]:
userSubsetGroup = sorted(userSubsetGroup, key=lambda x: len(x[1]), reverse=True)
userSubsetGroup = userSubsetGroup[0:100]

Seleccionamos a los usuarios que han puntuado a las peliculas en común con nuestra entrada

In [35]:
pearsonCorrelationDict = {}

for name, group in userSubsetGroup:
    
     
    group = group.sort_values(by='movieId')
    netflix = netflix.sort_values(by='movieId')
    
    
    temp_df = netflix[netflix['movieId'].isin(group['movieId'].tolist())]
    
     
    tempRatingList = temp_df['rating'].tolist()
    
     
    tempGroupList = group['rating'].tolist()
    data_corr = {'tempGroupList': tempGroupList,
            'tempRatingList': tempRatingList}
    pd_corr = pd.DataFrame(data_corr)
    r = pd_corr.corr(method="pearson")["tempRatingList"]["tempGroupList"]
    
    
    if math.isnan(r) == True:
        r = 0
    pearsonCorrelationDict[name] = r

In [36]:
temp_df

Unnamed: 0,movieId,title,rating
1,260,Star Wars: Episode IV - A New Hope,5.0
3,1198,Raiders of the Lost Ark (Indiana Jones and the...,5.0
4,2028,Saving Private Ryan,5.0
5,2571,"Matrix, The",5.0
8,4993,"Lord of the Rings: The Fellowship of the Ring,...",5.0
9,5952,"Lord of the Rings: The Two Towers, The",5.0
10,7153,"Lord of the Rings: The Return of the King, The",5.0
11,58559,"Dark Knight, The",5.0
16,68954,Up,5.0
18,79132,Inception,5.0


In [37]:
data_corr

{'tempGroupList': [2.0, 4.0, 3.0, 2.5, 2.0, 2.0, 2.0, 4.0, 5.0, 2.0],
 'tempRatingList': [5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0]}

In [40]:
userMatrix = pd.DataFrame.from_dict(pearsonCorrelationDict, orient='index')
userMatrix.columns = ['similarityIndex']
userMatrix['userId'] = userMatrix.index
userMatrix.index = range(len(userMatrix))
userMatrix.head()

Unnamed: 0,similarityIndex,userId
0,1.0,25
1,0.613722,62
2,0.007564,249
3,0.648546,305
4,0.3669,68


In [41]:
topUsers=userMatrix.sort_values(by='similarityIndex', ascending=False)[0:50]

In [42]:
topUsersRating=topUsers.merge(ratings_df, left_on='userId', right_on='userId', how='inner')

In [43]:
topUsersRating['weightedRating'] = topUsersRating['similarityIndex']*topUsersRating['rating']
topUsersRating.head()

Unnamed: 0,similarityIndex,userId,movieId,rating,title,year,weightedRating
0,1.0,25,231,4.0,Dumb & Dumber (Dumb and Dumber),1994,4.0
1,1.0,25,260,5.0,Star Wars: Episode IV - A New Hope,1977,5.0
2,1.0,25,527,5.0,Schindler's List,1993,5.0
3,1.0,25,1198,5.0,Raiders of the Lost Ark (Indiana Jones and the...,1981,5.0
4,1.0,25,2028,5.0,Saving Private Ryan,1998,5.0


In [44]:
tempTopUsersRating = topUsersRating.groupby('movieId').sum()[['similarityIndex','weightedRating']]
tempTopUsersRating.columns = ['sum_similarityIndex','sum_weightedRating']
tempTopUsersRating.head()

Unnamed: 0_level_0,sum_similarityIndex,sum_weightedRating
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,16.793302,64.22969
2,9.505035,30.445308
3,2.699094,7.696141
5,4.197755,10.521883
6,9.178755,37.021836


In [45]:
recommendation_df = pd.DataFrame()

In [46]:
recommendation_df['weighted average recommendation score'] = tempTopUsersRating['sum_weightedRating']/tempTopUsersRating['sum_similarityIndex']
recommendation_df['movieId'] = tempTopUsersRating.index
recommendation_df.head()

Unnamed: 0_level_0,weighted average recommendation score,movieId
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,3.824721,1
2,3.203072,2
3,2.851379,3
5,2.50655,5
6,4.033427,6


In [47]:
recommendation_df = recommendation_df.sort_values(by='weighted average recommendation score', ascending=False)
recommendation_df.head(10)

Unnamed: 0_level_0,weighted average recommendation score,movieId
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
92494,5.0,92494
71033,5.0,71033
232,5.0,232
8607,5.0,8607
60333,5.0,60333
27320,5.0,27320
5279,5.0,5279
109968,5.0,109968
94810,5.0,94810
3814,5.0,3814


In [48]:
def movie_recommender_pearson(num_recommended_movies):
    
    print('The list of the Movies Watched by User \n')

    for m in netflix['title']:
        print(m)
  
    print('\n')
    
    recommended_movies = movies_df.loc[movies_df['movieId'].isin(recommendation_df.head(num_recommended_movies)['movieId'].tolist())]
    
    print('The list of the Recommended Movies: \n')
    rank = 1
    
    for n in recommended_movies['title']:
        print('{}: {}'.format(rank,n))
        rank = rank + 1

### 5. Hacer predicciones:
Usa la matriz de similitud de usuario para hacer predicciones sobre la valoración de un usuario a una película específica.

#### 5.1. Recomendación de películas para un usuario seleccionado

In [49]:
movie_recommender_cosinus(25,10,5)

The list of the Movies 25 Has Watched 

Avengers: Infinity War - Part I
Blade Runner 2049
Dark Knight Rises, The
Dark Knight, The
Deadpool 2
Dumb & Dumber (Dumb and Dumber)
Gladiator
Inception
Incredibles 2
Inglourious Basterds
Iron Man
Lord of the Rings: The Fellowship of the Ring, The
Lord of the Rings: The Return of the King, The
Lord of the Rings: The Two Towers, The
Matrix, The
Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark)
Saving Private Ryan
Schindler's List
Shutter Island
Star Wars: Episode IV - A New Hope
The Imitation Game
Thor: Ragnarok
Three Billboards Outside Ebbing, Missouri
Up
WALL·E
Wonder


The list of the Recommended Movies 

1: Amelie (Fabuleux destin d'Amélie Poulain, Le) - predicted rating:5.000000000000001
2: Fight Club - predicted rating:5.000000000000001
3: Green Mile, The - predicted rating:5.000000000000001
4: Intouchables - predicted rating:5.000000000000001
5: Memento - predicted rating:5.000000000000001


In [50]:
movie_recommender_pearson(5)

The list of the Movies Watched by User 

Dumb & Dumber (Dumb and Dumber)
Star Wars: Episode IV - A New Hope
Schindler's List
Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark)
Saving Private Ryan
Matrix, The
Gladiator
Lord of the Rings: The Fellowship of the Ring, The
Lord of the Rings: The Two Towers, The
Lord of the Rings: The Return of the King, The
Gladiator
Dark Knight, The
Iron Man
WALL·E
Inglourious Basterds
Up
Shutter Island
Inception
Dark Knight Rises, The
The Imitation Game
Avengers: Infinity War - Part I
Thor: Ragnarok
Iron Man
Blade Runner 2049
Three Billboards Outside Ebbing, Missouri
Wonder
Incredibles 2
Deadpool 2


The list of the Recommended Movies: 

1: Eat Drink Man Woman (Yin shi nan nu)
2: Tokyo Godfathers
3: Encounters at the End of the World
4: Secret in Their Eyes, The (El secreto de sus ojos)
5: Dylan Moran: Monster


#### 5.2. Recomendación de películas similares a una película seleccionada

In [51]:
def recommend_movie(title):
    
    index_movie = df_matrix.index.tolist().index(title) # get an index for a movie
    similar_movies = indices[index_movie].tolist() # make list for similar movies
    movie_distances = distances[index_movie].tolist() # the list for distances of similar movies
    id_movie = similar_movies.index(index_movie) # get the position of the movie itself in indices and distances

    print('Similar Movies to '+str(df_matrix.index[index_movie])+': \n')

    similar_movies.remove(index_movie) # remove the movie itself in indices
    movie_distances.pop(id_movie) # remove the movie itself in distances

    j = 1
    
    for i in similar_movies:
        print(str(j)+': '+str(df_matrix.index[i])+', the distance with '+str(title)+': '+str(movie_distances[j-1]))
        j = j + 1

In [52]:
recommend_movie('Toy Story')

Similar Movies to Toy Story: 

1: Toy Story 2, the distance with Toy Story: 0.42739873968028474
2: Jurassic Park, the distance with Toy Story: 0.43436319591384365
3: Independence Day (a.k.a. ID4), the distance with Toy Story: 0.43573830647233414
4: Star Wars: Episode IV - A New Hope, the distance with Toy Story: 0.4426118294200635
5: Forrest Gump, the distance with Toy Story: 0.45290409205982585
6: Lion King, The, the distance with Toy Story: 0.4588546505397664
7: Star Wars: Episode VI - Return of the Jedi, the distance with Toy Story: 0.458910695227416


In [53]:
pelis = df_matrix.index

In [54]:
pelis[8486]

'Thrill of It All, The'