# What are recommender systems?
* Amazon
* Netflix

* The people who bought this also bought
* The people who viewed this also viewed

## User based collaborative filtering
It's just a fancy name for saying recommending stuff based on the combination of what you did and what everybody else did.
* The idea here is we build up a matrix of everything that every user has ever bought, or viewed, or rated, or whatever signal of interest that you want to base the system on.
* We use that matrix to compute the similarity between different users. (behavior)
* Two users who liked mostly the same things would be very similar to each other and we can then sort this similarity scores. If we can find all users similar to you based on their past behavior and recommend stuff that they liked that they didn't look at yet.

## Limitations of user-based collabrative filtering
The following is the list of some limitations of  user.based collaborative filtering:
* One problem is that people are fickle; their tastes are always changing. The lady started gettting more dramas or romance films or romcoms. If the guy ended up with a high similarity to her just based of her earlier sci-fi period, and we ended up recommending romantic comedies to him as result?
* The other problem is that there's usually a lot more people than there are things in your system. So 7 billion  people in the world, there's probably not 7 billion movies in the world. The computational problem finding all the similiraties between all of the users in your system is probably much greater than the problem of finding similarities between the item in your system. 
* The final problem is that people do bad things. There's a very real economic incentive to make sure that your product or your movie it is gets recommended to people, and there are people who try to game the system to make that happen for their new movie, or their product.

## Collaboritve filtering

In [2]:
import pandas as pd

In [3]:
import warnings
warnings.filterwarnings('ignore')

In [4]:
cols = ["user_id", "movie_id", "rating"]
ratings = pd.read_csv("/home/nesmv/Documentos/10mo Semestre/Analisis y ML/Datasets/ml-100k/u.data",
                     sep = "\t", names = cols, usecols=range(3))
ratings

Unnamed: 0,user_id,movie_id,rating
0,196,242,3
1,186,302,3
2,22,377,1
3,244,51,2
4,166,346,1
...,...,...,...
99995,880,476,3
99996,716,204,5
99997,276,1090,1
99998,13,225,2


In [1]:
import numpy as np

In [5]:
movieStats = ratings.groupby('title').agg({'rating': [np.size, np.mean]})
movieStats

KeyError: 'title'

In [None]:
popularMovies = movieStats['rating']['size'] >= 100
movieStats[popularMovies].sort_values([('rating','mean')],
                                     ascending = False)[:15]

In [None]:
sdf = pd.DateFrame(similarMovies, columns=['similarity'])

In [None]:
df = movieStats[popularMovies]
df[sdf.columns] = sdf

In [None]:
df

In [None]:
df.sort_values(['similarity'], ascending = False)[:15]

## Making movie recommendations to people

In [6]:
cols = ["user_id", "movie_id", "rating"]
ratings = pd.read_csv("/home/nesmv/Documentos/10mo Semestre/Analisis y ML/Datasets/ml-100k/u.data",
                     sep = "\t", names = cols, usecols=range(3))
ratings

Unnamed: 0,user_id,movie_id,rating
0,196,242,3
1,186,302,3
2,22,377,1
3,244,51,2
4,166,346,1
...,...,...,...
99995,880,476,3
99996,716,204,5
99997,276,1090,1
99998,13,225,2


In [11]:
ratings = pd.merge(movies, ratings)
ratings.head()

Unnamed: 0,movie_id,title,user_id,rating
0,1,Toy Story (1995),308,4
1,1,Toy Story (1995),287,5
2,1,Toy Story (1995),148,4
3,1,Toy Story (1995),280,4
4,1,Toy Story (1995),66,3


In [12]:
mcols = ["movie_id", "title"]
movies = pd.read_csv("/home/nesmv/Documentos/10mo Semestre/Analisis y ML/Datasets/ml-100k/u.item",
                    sep = "|", encoding = 'ISO-8859-1',
                    names = mcols, usecols=range(2))
movies

Unnamed: 0,movie_id,title
0,1,Toy Story (1995)
1,2,GoldenEye (1995)
2,3,Four Rooms (1995)
3,4,Get Shorty (1995)
4,5,Copycat (1995)
...,...,...
1677,1678,Mat' i syn (1997)
1678,1679,B. Monkey (1998)
1679,1680,Sliding Doors (1998)
1680,1681,You So Crazy (1994)


In [13]:
userRatings = ratings.pivot_table(index=['user_id'],
                                  columns = ['title'], 
                                  values='rating')
userRatings.head()

title,'Til There Was You (1997),1-900 (1994),101 Dalmatians (1996),12 Angry Men (1957),187 (1997),2 Days in the Valley (1996),"20,000 Leagues Under the Sea (1954)",2001: A Space Odyssey (1968),3 Ninjas: High Noon At Mega Mountain (1998),"39 Steps, The (1935)",...,Yankee Zulu (1994),Year of the Horse (1997),You So Crazy (1994),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),"Young Poisoner's Handbook, The (1995)",Zeus and Roxanne (1997),unknown,Á köldum klaka (Cold Fever) (1994)
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,2.0,5.0,,,3.0,4.0,,,...,,,,5.0,3.0,,,,4.0,
2,,,,,,,,,1.0,,...,,,,,,,,,,
3,,,,,2.0,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,,,2.0,,,,,4.0,,,...,,,,4.0,,,,,4.0,


In [14]:
corrMatrix = userRatings.corr()
corrMatrix.head()

title,'Til There Was You (1997),1-900 (1994),101 Dalmatians (1996),12 Angry Men (1957),187 (1997),2 Days in the Valley (1996),"20,000 Leagues Under the Sea (1954)",2001: A Space Odyssey (1968),3 Ninjas: High Noon At Mega Mountain (1998),"39 Steps, The (1935)",...,Yankee Zulu (1994),Year of the Horse (1997),You So Crazy (1994),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),"Young Poisoner's Handbook, The (1995)",Zeus and Roxanne (1997),unknown,Á köldum klaka (Cold Fever) (1994)
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'Til There Was You (1997),1.0,,-1.0,-0.5,-0.5,0.522233,,-0.426401,,,...,,,,,,,,,,
1-900 (1994),,1.0,,,,,,-0.981981,,,...,,,,-0.944911,,,,,,
101 Dalmatians (1996),-1.0,,1.0,-0.04989,0.269191,0.048973,0.266928,-0.043407,,0.111111,...,,-1.0,,0.15884,0.119234,0.680414,-4.8756e-17,0.707107,,
12 Angry Men (1957),-0.5,,-0.04989,1.0,0.666667,0.256625,0.274772,0.178848,,0.457176,...,,,,0.096546,0.068944,-0.361961,0.1443376,1.0,1.0,
187 (1997),-0.5,,0.269191,0.666667,1.0,0.596644,,-0.5547,,1.0,...,,0.866025,,0.455233,-0.5,0.5,0.4753271,,,


In [15]:
corrMatrix = userRatings.corr(method='pearson', min_periods=100)
# at least 100 people rated both of those movies
corrMatrix.head()

title,'Til There Was You (1997),1-900 (1994),101 Dalmatians (1996),12 Angry Men (1957),187 (1997),2 Days in the Valley (1996),"20,000 Leagues Under the Sea (1954)",2001: A Space Odyssey (1968),3 Ninjas: High Noon At Mega Mountain (1998),"39 Steps, The (1935)",...,Yankee Zulu (1994),Year of the Horse (1997),You So Crazy (1994),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),"Young Poisoner's Handbook, The (1995)",Zeus and Roxanne (1997),unknown,Á köldum klaka (Cold Fever) (1994)
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'Til There Was You (1997),,,,,,,,,,,...,,,,,,,,,,
1-900 (1994),,,,,,,,,,,...,,,,,,,,,,
101 Dalmatians (1996),,,1.0,,,,,,,,...,,,,,,,,,,
12 Angry Men (1957),,,,1.0,,,,,,,...,,,,,,,,,,
187 (1997),,,,,,,,,,,...,,,,,,,,,,


## Understanding movie recommendations with an example

In [19]:
myRatings = userRatings.loc[100].dropna()
myRatings

title
Air Force One (1997)                   4.0
Amistad (1997)                         4.0
Anna Karenina (1997)                   3.0
Apostle, The (1997)                    4.0
Apt Pupil (1998)                       5.0
As Good As It Gets (1997)              5.0
Big Bang Theory, The (1994)            4.0
Boogie Nights (1997)                   3.0
Career Girls (1997)                    1.0
Chairman of the Board (1998)           1.0
Chasing Amy (1997)                     3.0
Conspiracy Theory (1997)               4.0
Contact (1997)                         4.0
Dante's Peak (1997)                    3.0
Dark City (1998)                       4.0
Desperate Measures (1998)              3.0
English Patient, The (1996)            3.0
Eve's Bayou (1997)                     2.0
Evita (1996)                           3.0
Flubber (1997)                         2.0
Full Monty, The (1997)                 4.0
Full Speed (1996)                      2.0
G.I. Jane (1997)                       3.0
Game,

In [21]:
## Create a series called simCandidates:
simCandidates = pd.Series()
for i in range(0, len(myRatings.index)):
    print("Adding sims for " + myRatings.index[i] + "...")
    #Retrieve similar movies to this one that I rated
    sims = corrMatrix[myRatings.index[i]].dropna()
    #Now scale its similarity by how well I rated this movie
    sims = sims.map(lambda x: x * myRatings[i]) #.dropna()
    #Add the score to the list of similarity candidates
    simCandidates = simCandidates.add(sims)
print("sorting...")
simCandidates.sort_values(inplace = True, ascending = False)
print(simCandidates.head(10))

Adding sims for Air Force One (1997)...
Adding sims for Amistad (1997)...
Adding sims for Anna Karenina (1997)...
Adding sims for Apostle, The (1997)...
Adding sims for Apt Pupil (1998)...
Adding sims for As Good As It Gets (1997)...
Adding sims for Big Bang Theory, The (1994)...
Adding sims for Boogie Nights (1997)...
Adding sims for Career Girls (1997)...
Adding sims for Chairman of the Board (1998)...
Adding sims for Chasing Amy (1997)...
Adding sims for Conspiracy Theory (1997)...
Adding sims for Contact (1997)...
Adding sims for Dante's Peak (1997)...
Adding sims for Dark City (1998)...
Adding sims for Desperate Measures (1998)...
Adding sims for English Patient, The (1996)...
Adding sims for Eve's Bayou (1997)...
Adding sims for Evita (1996)...
Adding sims for Flubber (1997)...
Adding sims for Full Monty, The (1997)...
Adding sims for Full Speed (1996)...
Adding sims for G.I. Jane (1997)...
Adding sims for Game, The (1997)...
Adding sims for Gattaca (1997)...
Adding sims for Good

# Using groupby command to combine rows

In [22]:
simCandidates = simCandidates.groupby(simCandidates.index).sum()
simCandidates.sort_values(inplace=True, ascending = False)
simCandidates.head(10)

title
2001: A Space Odyssey (1968)          0
Nutty Professor, The (1996)           0
Peacemaker, The (1997)                0
People vs. Larry Flynt, The (1996)    0
Phenomenon (1996)                     0
Postino, Il (1994)                    0
Primal Fear (1996)                    0
Princess Bride, The (1987)            0
Psycho (1960)                         0
Pulp Fiction (1994)                   0
dtype: object

# Removing entries 