# Finding the Non-personalized Recomendation:
Finding all pairs of movies
In this exercise, you will work through how to find all pairs of movies or all permutations of pairs of movies that have been watched by the same person.

The user_ratings_df has been loaded once again containing users, and the movies they have seen.

You will need to first create a function that finds all possible pairs of items in a list it is applied to. For ease of use, you will output the values of this as a DataFrame. Since you only want to find movies that have been seen by the same person and not all possible permutations, you will group by user_id when applying the function.

In [None]:
from itertools import permutations

# Create the function to find all permutations
def find_movie_pairs(x):
  pairs = pd.DataFrame(list(permutations(x.values, 2)),
                       columns=['movie_a', 'movie_b'])
  return pairs

# Apply the function to the title column and reset the index
movie_combinations = user_ratings_df.groupby('userId')['title'].apply(find_movie_pairs).reset_index(drop=True)

print(movie_combinations)

##  Counting up the pairs
You can now create DataFrame of all the permutations of movies that have been watched by the same user. This is of limited use unless you can find which movies are most commonly paired.

In this exercise, you will work with the movie_combinations DataFrame that you created in the last exercise (that has been loaded for you), and generate a new DataFrame containing the counts of occurrences of each of the pairs within.

In [None]:
# Calculate how often each item in movie_a occurs with the items in movie_b
combination_counts = movie_combinations.groupby(['movie_a', 'movie_b']).size()

# Convert the results to a DataFrame and reset the index
combination_counts_df = combination_counts.to_frame(name='size').reset_index()
print(combination_counts_df.head())

In [None]:
          movie_a |                         movie_b | size
0  21 Jump Street |          Atlas Shrugged: Part 1 |    1
1  21 Jump Street | Avengers: Infinity War - Part I |    3
2  21 Jump Street |                        Bad Boys |    7
3  21 Jump Street |                    Bad Teacher  |    8
4  21 Jump Street |             Battle: Los Angeles |    2

## Making your first movie recommendations
Now that you have found the most commonly paired movies, you can make your first recommendations!

While you are not taking in any information about the person watching, and do not even know any details about the movie, valuable recommendations can still be made by examining what groups of movies are watched by the same people. In this exercise, you will examine the movies often watched by the same people that watched Thor, and then use this data to give a recommendation to someone who just watched the movie. The DataFrame you generated in the last lesson, combination_counts_df, that contains counts of how often movies are watched together has been loaded for you.

In [None]:
import matplotlib.pyplot as plt

# Sort the counts from highest to lowest
combination_counts_df.sort_values('size', ascending=False, inplace=True)

# Find the movies most frequently watched by people who watched Thor
thor_df = combination_counts_df[combination_counts_df['movie_a'] == 'Thor']

# Plot the results
thor_df.plot.bar(x="movie_b")
plt.show()

In [None]:
In [2]:
thor_df
Out[2]:

    movie_a                          movie_b  size
137    Thor                   21 Jump Street    12
147    Thor                    Green Lantern    10
143    Thor                      Bridesmaids     9
140    Thor                         Bad Boys     8
141    Thor                      Bad Teacher     6
139    Thor  Avengers: Infinity War - Part I     6
142    Thor              Battle: Los Angeles     4
146    Thor                           Cars 2     2
145    Thor                          Carnage     2
144    Thor                  Captain America     2
148    Thor                              Rio     2
138    Thor           Atlas Shrugged: Part 1     1

## Inference - 
Good work! You can see that 21 Jump Street was the most commonly watched movie by those who watched Thor. This means that it would be a good movie to recommend Thor watchers as it shows they have similar fans.

# Content Based Recomendation 

In [None]:
# Select only the rows with values in the name column equal to Toy Story
toy_story_genres = movie_genre_df[movie_genre_df['name'] == 'Toy Story']


In [None]:
"""
movie_genre_df.head(2)


        name genre_list
0  Toy Story  Adventure
1  Toy Story  Animation
"""

In [None]:
# Create cross-tabulated DataFrame from name and genre_list columns
movie_cross_table = pd.crosstab(movie_genre_df['name'], movie_genre_df['genre_list'])

In [None]:
"""
movie_cross_table


genre_list                      Action  Adventure  Animation  Children  Comedy  ...  Drama  Fantasy  Horror  Romance  Thriller
name                                                                            ...                                           
Ace Ventura: When Nature Calls       0          0          0         0       1  ...      0        0       0        0         0
American President, The              0          0          0         0       1  ...      1        0       0        1         0
Balto                                0          1          1         1       0  ...      0        0       0        0         0
"""

In [None]:
# Select only the rows with Toy Story as the index
toy_story_genres_ct = movie_cross_table[movie_cross_table.index == 'Toy Story']
print(toy_story_genres_ct)

In [None]:
"""
In [7]:
toy_story_genres_ct
Out[7]:

genre_list  Action  Adventure  Animation  Children  Comedy  ...  Drama  Fantasy  Horror  Romance  Thriller
name                                                        ...                                           
Toy Story        0          1          1         1       1  ...      0        1       0        0         0

"""

## Comparing individual movies with Jaccard similarity
In the last lesson, you built a DataFrame of movies, where each column represents a different genre. You can now use this DataFrame to compare movies by measuring the Jaccard similarity between rows. The higher the Jaccard similarity score, the more similar the two items are.

In this exercise, you will compare the movie GoldenEye with the movie Toy Story, and GoldenEye with SkyFall and compare the results.

The DataFrame movie_cross_table containing all the movies as rows and the genres as Boolean columns that you created in the last lesson has been loaded.

In [None]:
# Import numpy and the distance metric
import numpy as np
from sklearn.metrics import jaccard_score

# Extract just the rows containing GoldenEye and Toy Story
goldeneye_values = movie_cross_table.loc['GoldenEye'].values
toy_story_values = movie_cross_table.loc['Toy Story'].values

# Find the similarity between GoldenEye and Toy Story
print(jaccard_score(goldeneye_values, toy_story_values))

# Repeat for GoldenEye and Skyfall
skyfall_values = movie_cross_table.loc['Skyfall'].values
print(jaccard_score(goldeneye_values, skyfall_values))

## Comparing all your movies at once
While finding the Jaccard similarity between any two individual movies in your dataset is great for small-scale analyses, it can prove slow on larger datasets to make recommendations.

In this exercise, you will find the similarities between all movies and store them in a DataFrame for quick and easy lookup.

When finding the similarities between the rows in a DataFrame, you could run through all pairs and calculate them individually, but it's more efficient to use the pdist() (pairwise distance) function from scipy.

This can be reshaped into the desired rectangular shape using squareform() from the same library. Since you want similarity values as opposed to distances, you should subtract the values from 1.

movie_cross_table has once again been loaded for you.

In [None]:
# Import functions from scipy
from scipy.spatial.distance import pdist, squareform

# Calculate all pairwise distances
jaccard_distances = pdist(movie_cross_table.values, metric='jaccard')

# Convert the distances to a square matrix
jaccard_similarity_array = 1 -  squareform(jaccard_distances)

# Wrap the array in a pandas DataFrame
jaccard_similarity_df = pd.DataFrame(jaccard_similarity_array, index=movie_cross_table.index, columns=movie_cross_table.index)

# Print the top 5 rows of the DataFrame
print(jaccard_similarity_df.head())

In [None]:
'''

____ Output :: 

name                                  21 Jump Street  Alvin and the Chipmunks: Chipwrecked  Another Earth  Beastly  Bridesmaids  ...    Cars 2  Green Lantern  Oldboy       Rio      Thor
name                                                                                                                             ...                                                     
21 Jump Street                              1.000000                                  0.25            0.0      0.0     0.333333  ...  0.142857            0.2     0.2  0.166667  0.142857
Alvin and the Chipmunks: Chipwrecked        0.250000                                  1.00            0.0      0.0     0.500000  ...  0.400000            0.0     0.0  0.500000  0.000000
Another Earth                               0.000000                                  0.00            1.0      0.5     0.000000  ...  0.000000            0.2     0.2  0.000000  0.142857
Beastly                                     0.000000                                  0.00            0.5      1.0     0.000000  ...  0.000000            0.0     0.2  0.000000  0.333333
Bridesmaids                                 0.333333                                  0.50            0.0      0.0     1.000000  ...  0.200000            0.0     0.0  0.250000  0.000000

'''

In [None]:
## Recommend a Movie similar to Thor 
# Wrap the preloaded array in a DataFrame
jaccard_similarity_df = pd.DataFrame(jaccard_similarity_array, index=movie_cross_table.index, columns=movie_cross_table.index)

# Find the values for the movie Thor
jaccard_similarity_series = jaccard_similarity_df.loc['Thor']

# Sort these values from highest to lowest
ordered_similarities = jaccard_similarity_series.sort_values(ascending=False)

# Print the results
print(ordered_similarities)

In [None]:
'''
name
Thor                                    1.000000
Green Lantern                           0.333333
Cars 2                                  0.250000
Captain America: The First Avenger      0.250000
Carnage                                 0.166667
Another Earth                           0.142857
21 Jump Street                          0.142857
Rio                                     0.125000
Bridesmaids                             0.000000
Alvin and the Chipmunks: Chipwrecked    0.000000

'''

## Content-Based using Text Features 

In [None]:
'''
df_plots


                             Title                                               Plot
0   Ace Ventura: When Nature Calls  In the Himalayas, after a failed rescue missio...
1      Dracula: Dead and Loving It  Solicitor Thomas Renfield travels all the way ...
2      Father of the Bride Part II  The film begins five years after the events of...
3                       Four Rooms  The film is set on New Year's Eve, and starts ...
4                 Grumpier Old Men  The feud between Max (Walter Matthau) and John...
5                          Jumanji  In 1869, near Brantford, New Hampshire, two br...
6                     Sudden Death  Darren McCord (Jean-Claude Van Damme) is a Fre...
7                     Tom and Huck  The movie opens with Injun Joe (Eric Schweig) ...
8                        Toy Story  In a world where toys are living things who pr...
9                Waiting to Exhale  "Friends are the People who let you be yoursel...
10                       GoldenEye  In 1986, at Arkhangelsk, MI6 agents James Bond...
11                         Skyfall  MI6 agents James Bond and Eve Moneypenny pursu...
'''

In [None]:
rom sklearn.feature_extraction.text import TfidfVectorizer

# Instantiate the vectorizer object and transform the plot column
vectorizer = TfidfVectorizer(max_df=0.7, min_df=2)
vectorized_data = vectorizer.fit_transform(df_plots['Plot']) 

# Create Dataframe from TF-IDFarray
tfidf_df = pd.DataFrame(vectorized_data.toarray(), columns=vectorizer.get_feature_names())

# Assign the movie titles to the index and inspect
tfidf_df.index = df_plots['Title']
print(tfidf_df.head())

In [None]:
'''
                               000       100  abandoned     above  accidentally  ...     wrong      year     years       you     young
Title                                                                                  ...                                                  
Ace Ventura: When Nature Calls  0.000000  0.000000        0.0  0.000000      0.000000  ...  0.000000  0.000000  0.044595  0.000000  0.053863
Dracula: Dead and Loving It     0.000000  0.000000        0.0  0.000000      0.000000  ...  0.000000  0.000000  0.000000  0.055645  0.000000
Father of the Bride Part II     0.045850  0.045850        0.0  0.000000      0.000000  ...  0.045850  0.000000  0.030099  0.000000  0.072708
Four Rooms                      0.039916  0.039916        0.0  0.079831      0.039916  ...  0.039916  0.079831  0.026203  0.000000  0.000000
Grumpier Old Men                0.000000  0.000000        0.0  0.000000      0.000000  ...  0.000000  0.000000  0.000000  0.000000  0.000000


'''

In [None]:
# Import cosine_similarity measure
from sklearn.metrics.pairwise import cosine_similarity

# Create the array of cosine similarity values
cosine_similarity_array = cosine_similarity(tfidf_summary_df)

# Wrap the array in a pandas DataFrame
cosine_similarity_df = pd.DataFrame(cosine_similarity_array, columns=tfidf_summary_df.index, index=tfidf_summary_df.index)

# Print the top 5 rows of the DataFrame
print(cosine_similarity_df.head()) ## 18 * 18 matrix


In [None]:
'''
                                                       Thor  21 Jump Street  The Avengers    Oldboy  
The Adventures of Tintin: The Secret of the Uni...  0.312927        0.282663      0.374425  0.248183  
Alvin and the Chipmunks: Chipwrecked                0.323938        0.311788      0.400024  0.267687  
Another Earth                                       0.304739        0.236896      0.229218  0.249804  
Beastly                                             0.229194        0.187408      0.186539  0.207715  
The Beaver                                          0.300383        0.238325      0.266592  0.253751  


'''

In [None]:
## How to make Recommendation 
# Wrap the preloaded array in a DataFrame
cosine_similarity_df = pd.DataFrame(cosine_similarity_array, index=tfidf_summary_df.index, columns=tfidf_summary_df.index)

# Find the values for the movie Thor
cosine_similarity_series = cosine_similarity_df.loc['Rio']

# Sort these values highest to lowest
ordered_similarities = cosine_similarity_series.sort_values(ascending=False)

# Print the results
print(ordered_similarities)

In [None]:
'''
Rio                                                    1.000000
Alvin and the Chipmunks: Chipwrecked                   0.361180
The Avengers                                           0.344869
The Hangover: Part II                                  0.344407
The Adventures of Tintin: The Secret of the Unicorn    0.327422
Thor                                                   0.318216
Green Lantern                                          0.314570
Carnage                                                0.312067
Cars 2                                                 0.306499
21 Jump Street                                         0.290252
Another Earth                                          0.281780
Captain America: The First Avenger                     0.266358
The Twilight Saga: Breaking Dawn - Part 1              0.262755
Oldboy                                                 0.252323
Bridesmaids                                            0.234514
Beastly                                                0.213502

'''

## Build a User-Profile for Recommendation : Users saw some Movies and we grabbed the genre for all the movies he/she watched and after that , we trying to undesrtand what sort tf-id features weight and after that we remove the list of movies he/she wacthed and and  find the other movies which he/she did not watched till now and recommend him/her. 

You are now able to generate suggestions for similar items based on their labeled features or based on their descriptions. But sometimes finding similar items might not be enough. In the next exercises, you will work through how one could create recommendations based on a user and all the items they liked as opposed to a singular item. You will first generate a profile for a user by aggregating all of the movies they have previously enjoyed.

The tfidf_summary_df you have been working on in the last few exercises has been loaded for you. This contains a row per movie with their titles as the index and a column for each feature containing their respective TF-IDF score.

In [None]:
tfidf_summary_df

In [None]:
'''
                               000       100  abandoned     above  accidentally  ...     wrong      year     years       you     young
Title                                                                                  ...                                                  
Ace Ventura: When Nature Calls  0.000000  0.000000        0.0  0.000000      0.000000  ...  0.000000  0.000000  0.044595  0.000000  0.053863
Dracula: Dead and Loving It     0.000000  0.000000        0.0  0.000000      0.000000  ...  0.000000  0.000000  0.000000  0.055645  0.000000
Father of the Bride Part II     0.045850  0.045850        0.0  0.000000      0.000000  ...  0.045850  0.000000  0.030099  0.000000  0.072708
Four Rooms                      0.039916  0.039916        0.0  0.079831      0.039916  ...  0.039916  0.079831  0.026203  0.000000  0.000000
Grumpier Old Men                0.000000  0.000000        0.0  0.000000      0.000000  ...  0.000000  0.000000  0.000000  0.000000  0.000000


'''

In [None]:
list_of_movies_enjoyed = ['Captain America: The First Avenger', 'Green Lantern', 'The Avengers'] 

# Create a subset of only the movies the user has enjoyed
movies_enjoyed_df = tfidf_summary_df.reindex(list_of_movies_enjoyed)

# Generate the user profile by finding the average scores of movies they enjoyed
user_prof = movies_enjoyed_df.mean()

# Inspect the results
print(user_prof)

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

# Find subset of tfidf_df that does not include movies in list_of_movies_enjoyed
tfidf_subset_df = tfidf_df.drop(list_of_movies_enjoyed, axis=0)

# Calculate the cosine_similarity and wrap it in a DataFrame
similarity_array = cosine_similarity(user_prof.values.reshape(1, -1), tfidf_subset_df)
similarity_df = pd.DataFrame(similarity_array.T, index=tfidf_subset_df.index, columns=["similarity_score"])

# Sort the values from high to low by the values in the similarity_score
sorted_similarity_df = similarity_df.sort_values(by="similarity_score", ascending=False)

# Inspect the most similar to the user preferences
print(sorted_similarity_df.head())

In [None]:
'''
                                similarity_score
Title                                           
21 Jump Street                          0.362488
Thor                                    0.266075
X-Men: First Class                      0.263540
Transformers: Dark of the Moon          0.224254
Beastly                                 0.179626

'''

# Collabrative Filtering  System 

#### "Note" : Never ever fill the missing values with zeros , see the example to understand 

In [None]:
'''
## Initial Dataframe 

Forrest Gump Pulp Fiction Toy Story The Matrix
User_A           10            9         7       None
User_B           10            9         7          0
User_C           10            9         7          8

 *** User_B and C are both similar to User_A 

'''

In [None]:
'''
### after you filled the value with zeros 

Forrest Gump Pulp Fiction Toy Story The Matrix
User_A           10            9         7          0
User_B           10            9         7          0
User_C           10            9         7          8

 *** User_B similar to User_A , we missed User_A in the analogy , in this way we are skewing the data 


'''

### Ideal way to fill the missing values 

In [None]:
'''
In [5]:
user_ratings_table.head()
Out[5]:

title     Forrest Gump (1994)  Matrix, The (1999)  Pulp Fiction (1994)  Shawshank Redemption, The (1994)  Silence of the Lambs, The (1991)
userId                                                                                                                                    
user_001                  4.0                 5.0                  3.0                               NaN                               4.0
user_002                  NaN                 NaN                  NaN                               3.0                               NaN
user_004                  NaN                 1.0                  1.0                               NaN                               5.0
user_005                  NaN                 NaN                  5.0                               3.0                               NaN
user_006                  5.0                 NaN                  2.0                               5.0                               4.0

'''


In [None]:
# Get the average rating for each user 
avg_ratings = user_ratings_table.mean(axis=1)


'''
user_001    4.000000
user_002    3.000000
user_004    2.333333
user_005    4.000000
user_006    4.000000
              ...   
user_606    4.400000
user_607    4.500000
user_608    4.300000
user_609    4.000000
user_610    4.100000
Length: 499, dtype: float64


'''


# Center each users ratings around 0
user_ratings_table_centered = user_ratings_table.sub(avg_ratings, axis=0)

'''

In [7]:
user_ratings_table_centered
Out[7]:

title     Forrest Gump (1994)  Matrix, The (1999)  Pulp Fiction (1994)  Shawshank Redemption, The (1994)  Silence of the Lambs, The (1991)
userId                                                                                                                                    
user_001                  0.0            1.000000            -1.000000                               NaN                          0.000000
user_002                  NaN                 NaN                  NaN                               0.0                               NaN
user_004                  NaN           -1.333333            -1.333333                               NaN                          2.666667
user_005                  NaN                 NaN             1.000000                              -1.0    

'''


# Fill in the missing data with 0s
user_ratings_table_normed = user_ratings_table_centered.fillna(0)


'''

title     Forrest Gump (1994)  Matrix, The (1999)  Pulp Fiction (1994)  Shawshank Redemption, The (1994)  Silence of the Lambs, The (1991)
userId                                                                                                                                    
user_001                  0.0            1.000000            -1.000000                               0.0                          0.000000
user_002                  0.0            0.000000             0.000000                               0.0                          0.000000
user_004                  0.0           -1.333333            -1.333333                               0.0                          2.666667
user_005                  0.0            0.000000             1.000000                              -1.0                          0.000000
user_006                  1.0            0.000000            -2.000000                               1.0   


'''




### Similar Movie & DIfferent Movie

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

# Assign the arrays to variables
sw_IV = movie_ratings_centered.loc['Star Wars: Episode IV - A New Hope (1977)', :].values.reshape(1, -1)
sw_V = movie_ratings_centered.loc['Star Wars: Episode V - The Empire Strikes Back (1980)', :].values.reshape(1, -1)

# Find the similarity between two Star Wars movies
similarity_A = cosine_similarity(sw_IV, sw_V) ## 0.53

# Assign the arrays to variables
jurassic_park = movie_ratings_centered.loc['Jurassic Park (1993)', :].values.reshape(1, -1)
pulp_fiction = movie_ratings_centered.loc['Pulp Fiction (1994)', :].values.reshape(1, -1)

# Find the similarity between Pulp Fiction and Jurassic Park
similarity_B = cosine_similarity(jurassic_park, pulp_fiction)
print(similarity_B)## -0.25 

## Scaling it to total list 

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

# Generate the similarity matrix
similarities = cosine_similarity(movie_ratings_centered)

# Wrap the similarities in a DataFrame
cosine_similarity_df = pd.DataFrame(similarities, index=movie_ratings_centered.index, columns=movie_ratings_centered.index)

# Find the similarity values for a specific movie
cosine_similarity_series = cosine_similarity_df.loc['Star Wars: Episode IV - A New Hope (1977)']

# Sort these values highest to lowest
ordered_similarities = cosine_similarity_series.sort_values(ascending=False)

print(ordered_similarities)

In [None]:
'''
In [6]: ### Movie Recomendation:: Star Wars: Episode IV - A New Hope (1977)
ordered_similarities.head()
Out[6]:

title
Star Wars: Episode IV - A New Hope (1977)                                         1.000000
Star Wars: Episode V - The Empire Strikes Back (1980)                             0.535705
Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)    0.078400
Lord of the Rings: The Fellowship of the Ring, The (2001)                         0.020582
Schindler's List (1993)                                                           0.013602
Name: Star Wars: Episode IV - A New Hope (1977), dtype: float64

'''

# K-Nearest Neighbours : how K-nearest neighbors can be used to infer how someone might rate an item based on the wisdom of a (similar) crowd. 

In [None]:
'''
user_similarities
Out[1]:

userId    user_001  user_002  user_003  user_004  user_005  ...  user_606  user_607  user_608  user_609  user_610
userId                                                      ...                                                  
user_001  1.000000       0.0       0.0  0.187234 -0.164525  ...  0.376280  0.029487  0.312335 -0.176184  0.133835
user_002  0.000000       0.0       0.0  0.000000  0.000000  ...  0.000000  0.000000  0.000000  0.000000  0.000000
user_003  0.000000       0.0       0.0  0.000000  0.000000  ...  0.000000  0.000000  0.000000  0.000000  0.000000
user_004  0.187234       0.0       0.0  1.000000 -0.223980  ... -0.040776 -0.249841 -0.271438 -0.200643 -0.217601
user_005 -0.164525       0.0       0.0 -0.223980  1.000000  ...  0.355365 -0.233540  0.369204  0.237733  0.050241
...            ...       ...       ...       ...       ...  ...       ...       ...       ...       ...       ...
user_606  0.376280       0.0       0.0 -0.040776  0.355365  ...  1.000000 -0.074804  0.596448  0.352719  0.018341
user_607  0.029487       0.0       0.0 -0.249841 -0.233540  ... -0.074804  1.000000 -0.123081 -0.154813 -0.238457
user_608  0.312335       0.0       0.0 -0.271438  0.369204  ...  0.596448 -0.123081  1.000000  0.443575  0.034959
user_609 -0.176184       0.0       0.0 -0.200643  0.237733  ...  0.352719 -0.154813  0.443575  1.000000 -0.475666
user_610  0.133835       0.0       0.0 -0.217601  0.050241  ...  0.018341 -0.238457  0.034959 -0.475666  1.000000

[569 rows x 569 columns]
'''

In [None]:
'''
user_ratings_table:

title     Star Wars: Episode V - The Empire Strikes Back (1980)  Terminator 2: Judgment Day (1991)  Toy Story (1995)  Usual Suspects, The (1995)  
userId                                                                                                                                            
user_001                                                5.0                                    NaN               4.0                         5.0  
user_002                                                NaN                                    NaN               NaN                         NaN  
user_003                                                NaN                                    NaN               NaN                         NaN  
user_004                                                5.0                                    NaN               NaN                         NaN  
user_005                                                NaN                                    3.0               4.0                         4.0  
...                                                     ...                                    ...               ...                         ...  
user_606                                                4.5                                    3.5               2.5                         4.5  
user_607                                                3.0                                    4.0               4.0                         NaN  
user_608                                                4.0                                    3.0               2.5                         4.5  
user_609                                                NaN                                    3.0               3.0                         NaN  
user_610                                                5.0                                    5.0               5.0                         4.0  

[569 rows x 20 columns]

'''

In [None]:
# Isolate the similarity scores for user_1 and sort them
user_similarity_series = user_similarities.loc['user_001']
ordered_similarities = user_similarity_series.sort_values(ascending=False)


In [None]:
'''
Out[3]:

userId
user_001    1.000000
user_282    0.643255
user_246    0.627894
user_552    0.627510
user_057    0.609214
              ...   
user_575   -0.501965
user_501   -0.501965
user_402   -0.506469
user_482   -0.579619
user_318   -0.594879
'''

In [None]:
# Find the top 10 most similar users
nearest_neighbors = ordered_similarities[1:11].index

In [None]:
'''
In [10]:
nearest_neighbors
Out[10]:
Index(['user_282', 'user_246', 'user_552', 'user_057', 'user_210', 'user_332', 'user_339', 'user_072', 'user_597', 'user_312'], dtype='object', name='userId')

'''

In [None]:
# Extract the ratings of the neighbors
neighbor_ratings = user_ratings_table.reindex(nearest_neighbors)

In [None]:
'''
neighbor_ratings
Out[12]:

title     American Beauty (1999)  Apollo 13 (1995)  Braveheart (1995)  Fight Club (1999)  Forrest Gump (1994)  ...  Star Wars: Episode IV - A New Hope (1977)  \
userId                                                                                                         ...                                              
user_282                     4.5               4.5                NaN                4.5                  4.5  ...                                        4.0   
user_246                     NaN               NaN                NaN                4.0                  3.5  ...                                        5.0   
user_552                     NaN               NaN                NaN                4.5                  4.0  ...                                        NaN   
user_057                     5.0               3.0                4.0                NaN                  4.0  ...                                        5.0   
user_210                     NaN               NaN                NaN                NaN                  NaN  ...                                        5.0   
user_332                     4.5               3.5                3.5                4.5                  4.5  ...                                        NaN   
user_339                     5.0               4.0                NaN                4.0                  4.0  ...                                        NaN   
user_072                     4.5               4.0                4.5                NaN                  4.0  ...                                        5.0   
user_597                     5.0               NaN                5.0                NaN                  5.0  ...                                        5.0   
user_312                     NaN               NaN                4.0                NaN                  4.0  ...                                        5.0   

title     Star Wars: Episode V - The Empire Strikes Back (1980)  Terminator 2: Judgment Day (1991)  Toy Story (1995)  Usual Suspects, The (1995)  
userId                                                                                                                                            
user_282                                                NaN                                    4.5               4.5                         4.5  
user_246                                                5.0                                    5.0               NaN                         4.5  
user_552                                                5.0                                    NaN               NaN                         NaN  
user_057                                                4.0                                    4.0               5.0                         5.0  
user_210                                                5.0                                    4.0               NaN                         NaN  
user_332                                                4.5                                    3.5               4.0                         4.0  
user_339                                                NaN                                    NaN               4.0                         NaN  
user_072                                                4.5                                    4.5               NaN                         4.5  
user_597                                                5.0                                    5.0               4.0                         5.0  
user_312                                                5.0                                    4.0               NaN                         NaN  

[10 rows x 20 columns]

'''

In [None]:
# Calculate the mean rating given by the users nearest neighbors
print(neighbor_ratings['Apollo 13 (1995)'].mean())

## 3.8 


# KNN predictions
With the data in the correct shape from the last exercise, you can now use it to infer how user_001 feels about Apollo 13 (1995)

As a reminder, the data you prepared in the last exercise (and have been loaded into this one) are:

 > target_user_x - Centered ratings that user_001 has given to the movies they have seen.
 > other_users_x - Centered ratings for all other users and the movies they have rated excluding the movie Apollo 13.
 > other_users_y - Raw ratings that all other users have given the movie Apollo 13.
You will use other_users_x and other_users_y to fit a KNeighborsRegressor from scikit-learn and use it to predict what user_001 might have rated Apollo 13 (1995).


In [None]:
# Instantiate the user KNN model
user_knn = KNeighborsRegressor()

# Fit the model and predict the target user
user_knn.fit(other_users_x, other_users_y)
user_user_pred = user_knn.predict(target_user_x)
print("The user-user model predicts {}".format(user_user_pred))

# Instantiate the user KNN model
movie_knn = KNeighborsRegressor()

# Fit the model on the movie data and predict
movie_knn.fit(other_movies_x, other_movies_y)
item_item_pred = movie_knn.predict(target_movie_x)
print("The item-item model predicts {}".format(item_item_pred))



In [None]:
'''
<script.py> output:
    The user-user model predicts [4.5]
    The item-item model predicts [4.1]
'''