# Joke Recommender System

## Surprise is a python scikit used to make recommender systems
http://surpriselib.com/

https://github.com/NicolasHug/Surprise

In [4]:
import os
#os.chdir("/home/sagar/insofe/acads/CF")

In [8]:
from surprise import Dataset, Reader, KNNWithMeans
from surprise.model_selection import cross_validate
import pandas as pd

## Setting seed for reproducible results

In [9]:
import random
import numpy as np

my_seed = 9873923
random.seed(my_seed)
np.random.seed(my_seed)
#first random is part of base package
# second random is part of numpy
# some libraries use base random package whereas some use numpy one

## Joke Dataset
http://eigentaste.berkeley.edu/dataset/

150 Jokes

63978 Users

### Reading jokes files
Note - Data is tab seperated

In [10]:
jokes = pd.read_csv("jester_items.tsv",sep="\t",names=["itemID","Joke"],usecols=["Joke"])
jokes.head()

Unnamed: 0,Joke
0,"A man visits the doctor. The doctor says, ""I h..."
1,This couple had an excellent relationship goin...
2,Q. What's 200 feet long and has 4 teeth? A. Th...
3,Q. What's the difference between a man and a t...
4,Q. What's O. J. Simpson's web address? A. Slas...


### Reading the ratings file

In [11]:
ratings = pd.read_csv("jester_ratings.csv")
ratings.head()

Unnamed: 0,userID,itemID,rating
0,1,5,0.219
1,1,7,-9.281
2,1,8,-9.281
3,1,13,-6.781
4,1,15,0.875


In [12]:
ratings.describe()

Unnamed: 0,userID,itemID,rating
count,1761439.0,1761439.0,1761439.0
mean,32723.22,70.71133,1.618602
std,18280.11,46.0079,5.302608
min,1.0,5.0,-10.0
25%,17202.0,21.0,-2.031
50%,34808.0,69.0,2.219
75%,47306.0,112.0,5.719
max,63978.0,150.0,10.0


### Defining the parser to read data into surprise dateframe
The parser requires the scale of ratings and the columns in a specific order ['userID', 'itemID', 'rating']

In [14]:
no_of_users = 1000
reader = Reader(rating_scale=(-10, 10))
data = Dataset.load_from_df(ratings[ratings.userID < no_of_users], reader)

## Simulation Parameters
- Algorithm Type
- User-Based vs Item-Based
- Similarity Metric

In [18]:
sim_parameters = {'name': 'cosine', 'user_based': False }
algo = KNNWithMeans(sim_options=sim_parameters)
#KNN with means will have average
#KNN without means will not have average, refer notes of 11th Feb,2018 for details

## Cross Validation Accuracies

In [19]:
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=2, verbose=True)
#cv =2 - 2 fold cross validation
#verbose - prints statistics
#test time is very high than train error as its lazy learner and everything happens at runtime.

Evaluating RMSE, MAE of algorithm KNNWithMeans on 2 split(s).

                  Fold 1  Fold 2  Mean    Std     
RMSE (testset)    4.7799  4.7656  4.7728  0.0072  
MAE (testset)     3.6928  3.6639  3.6784  0.0144  
Fit time          0.08    0.08    0.08    0.00    
Test time         2.10    2.04    2.07    0.03    


{'fit_time': (0.07505011558532715, 0.08255505561828613),
 'test_mae': array([ 3.69281202,  3.66392471]),
 'test_rmse': array([ 4.77991229,  4.76559591]),
 'test_time': (2.096890687942505, 2.0403530597686768)}

## Training the model on complete data

In [20]:
trainset = data.build_full_trainset()
algo.fit(trainset)

Computing the cosine similarity matrix...
Done computing similarity matrix.


  sim = construction_func[name](*args)


<surprise.prediction_algorithms.knns.KNNWithMeans at 0xb40e628518>

## Filtering instances which can be used for predictions

In [21]:
testset = trainset.build_anti_testset()
#findount test instances where we can make predictions , give values which have NA's

## Making Predictions

In [22]:
predictions = algo.test(testset)

### Function to calculate top 10 predictions for each user

In [23]:
# Fetching top 10 predictions for each user
from collections import defaultdict
def get_top_n(predictions, n=10):
    '''Return the top-N recommendation for each user from a set of predictions.
    Args:
        predictions(list of Prediction objects): The list of predictions, as
            returned by the test method of an algorithm.
        n(int): The number of recommendation to output for each user. Default
            is 10.
    Returns:
    A dict where keys are user (raw) ids and values are lists of tuples:
        [(raw item id, rating estimation), ...] of size n.
    '''
    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the k highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n

top_n = get_top_n(predictions, n=10)

### Function to get first n users and corresponding predictions

In [24]:
from itertools import islice

def take(n, iterable):
    "Return first n items of the iterable as a list"
    return list(islice(iterable, n))

## Top Predictions Matrix

In [26]:
for uid, user_ratings in take(10,top_n.items()):
    print(uid, [iid for (iid, _) in user_ratings])

1 [114, 117, 126, 143, 129, 138, 111, 47, 110, 63]
2 [127, 91, 87, 129, 88, 114, 138, 119, 105, 56]
3 [127, 114, 35, 36, 89, 129, 119, 76, 69, 91]
4 [72, 27, 89, 50, 127, 114, 32, 129, 119, 138]
5 [127, 114, 106, 105, 119, 138, 129, 134, 135, 126]
6 [49, 66, 129, 127, 119, 138, 114, 106, 117, 56]
7 [127, 114, 129, 138, 119, 105, 106, 117, 126, 145]
8 [129, 127, 114, 138, 119, 143, 145, 117, 134, 126]
9 [89, 36, 32, 129, 127, 119, 105, 35, 87, 121]
10 [114, 127, 138, 129, 148, 137, 139, 132, 150, 134]


## Top Jokes for each User

In [27]:
# Printing top predictions
for uid, user_ratings in take(10,top_n.items()):
    print("For User",uid)
    for  (iid, _) in user_ratings:
        print(iid)
        print(jokes.loc[int(iid),"Joke"])
    

For User 1
114
A lady bought a new Lexus. It cost a bundle. Two days later, she brought it back, complaining that the radio was not working. "Madam," said the sales manager, "the audio system in this car is completely automatic. All you need to do is tell it what you want to listen to, and you will hear exactly that!" She drove out, somewhat amazed and a little confused. She looked at the radio and said, "Nelson." The radio responded, "Ricky or Willie?" She was astounded. If she wanted Beethoven, that's what she got. If she wanted Nat King Cole, she got it. She was stopped at a traffic light enjoying "On the Road Again" when the light turned green and she pulled out. Suddenly an enormous sports utility vehicle coming from the street she was crossing sped toward her, obviously not paying attention to the light. She swerved and narrowly missed a collision. "Idiot!" she yelled and, from the radio, "Ladies and gentlemen, the President of the United States."
117
A man goes into a drug store

KeyError: 'the label [150] is not in the [index]'