## Recommender system

Whether is is watching Youtube, ordering food online, buy books online, listening on Spotify, using LinkedIn. You
get constant recommendations for new videoclips, what to eat and much more. What's behind all this is a recommender system.

### Warmup
Creating a easy recommender system for movies with KNN. 

100.000 dataset.

In [4]:
import pandas as pd
from scipy.sparse import csr_matrix
# k-nearest
from sklearn.neighbors import NearestNeighbors

## Efficent way to match two strings together
    # What it does is:
        # if we have spelling mistakes, capitalizing or forget to add spaces it can mash movies
        # it can select the index
from fuzzywuzzy import process

In [5]:
movies = ("/Users/joeloscarsson/Documents/www/Machine-Learning/Projects/data2/movies.csv")
ratings = ("/Users/joeloscarsson/Documents/www/Machine-Learning/Projects/data2/ratings.csv")

df_movies = pd.read_csv(movies, usecols=['movieId', 'title'], dtype={'movieId': 'int32', 'title': 'str'})
df_ratings = pd.read_csv(ratings, usecols=['userId', 'movieId', 'rating'], dtype={'userId': 'int32', 'movieId': 'int32', 'rating':'float32'})

In [6]:
# To be able to convert data to see K-Nearest
# Use Spare Matrix
# example 
#          Users
#         [4,4,5] A
# Movies  [3,3,4] B == Cos A,B) => 0.95 Similar
#         [3,2,1] C


# Why we use "rating" is the same reason we use "Sales" in Excercises. So we can get something for y-axis to compare
movies_users = df_ratings.pivot(index='movieId', columns='userId', values='rating').fillna(0)
# A lot of NaN values because people haven't voted on a movie. We can't process this data so we used .fillna(0)
mat_movies_users = csr_matrix(movies_users.values)

In [7]:
# Specifying distance between two vectors 
# Euclidean Distance
# Manhattan Distance
# Minkowski Distance
# Cosine Similarity


# Using brute because traverse thru all datapoints in the whole dataset 
model_knn = NearestNeighbors(metric='cosine', algorithm='brute', n_neighbors= 20)

In [8]:
model_knn.fit(mat_movies_users)

In [16]:
# Recommender (movie_name) => List of Movies recommended

def recommender(movie_name, data, model, n_recommendations):
    model.fit(data)
    # Extracting one movie fuzzy has selected for us
    # so i choose 'title' from df_movies and i want to match movie names in the title column
    # We specify index 2 because we have a tuple
    idx=process.extractOne(movie_name, df_movies['title'])[2] 
    # print(idx)
    print('Movie Selected: ', df_movies['title'][idx], 'Index: ', idx)
    print('Searching for recommendations......')
    # We specified the index from one movie to find similar movies
        # We got the movie index from what we extracted with help of fuzzywuzzy
    distances, indices=model.kneighbors(data[idx], n_neighbors=n_recommendations)

    # What we get out is the closest similarities but outcommented this and created a for loop
    # print(distances, indices)

    # We want all indices close to 100%
    # We dont want to compare toy story to toy story therefor i!=idx
    for i in indices:
        print(df_movies['title'][i].where(i!=idx))

recommender('toy story', mat_movies_users, model_knn,20)

# gives me the matched sequence it got (90)
# gives me the index of the particular movie in the dataset(0)
# recommender('toy story')
 # Output ('Toy Story (1995)', 90, 0)

 # Based on the user ratings we got movies similair to 'toy story'. Doesn't have to be similar movies though
 # It is possibly to sort based on genres aswell

Movie Selected:  Toy Story (1995) Index:  0
Searching for recommendations......
0                                                     NaN
2353                                 'night Mother (1986)
418                                  Jurassic Park (1993)
615                  Independence Day (a.k.a. ID4) (1996)
224             Star Wars: Episode IV - A New Hope (1977)
314                                   Forrest Gump (1994)
322                                 Lion King, The (1994)
910     Once Upon a Time in the West (C'era una volta ...
546                            Mission: Impossible (1996)
963                                           Diva (1981)
968                           Arsenic and Old Lace (1944)
3189            Rififi (Du rififi chez les hommes) (1955)
506                                        Aladdin (1992)
123                                      Apollo 13 (1995)
257                                   Pulp Fiction (1994)
897                 Cheech and Chong's Up in Smoke