# Recommendations based on Jaccard index
The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets. It measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets. 

$$ J(A, B) = \frac{|A \cap B|}{|A \cup B|} $$

Your objective is to create a Python function that utilizes the Jaccard index to find and recommend content from users who share similar interests. You will embed this function to the streamlit app


### 1. Loading the data
Load a subset from the ratings dataset.

In [1]:
import pandas as pd
df = pd.read_csv('./data/BX-Book-Ratings-Subset.csv', sep=';', encoding='latin-1')

In [2]:
df.head()

Unnamed: 0,User-ID,ISBN,Book-Rating
0,277427,002542730X,10
1,277427,0061009059,9
2,277427,0316776963,8
3,277427,0345413903,10
4,277427,0380702843,8


In [3]:
print(set(comparison_user))
print(set(users[277427]))

NameError: name 'comparison_user' is not defined

In [4]:
comparison_user = set(comparison_user)
user_x = set(users[277427])

z = user_x.difference(comparison_user) 

list_x = []

list_x.extend(z)

list_x

NameError: name 'comparison_user' is not defined

In [5]:
users = df.groupby('User-ID')['ISBN'].apply(list)
comparison_user = users[98783]

dist_list = []
for user, value in users.items():
    union = set(comparison_user + users[user])
    intersect = list(set(comparison_user) & set(users[user]))
    jac_sim = len(intersect) / float(len(union))
    distance = 1 - jac_sim
    dist_list.append(distance)

dist_list.sort()
print(dist_list)

[0.0, 0.6842105263157895, 0.7727272727272727, 0.8148148148148149, 0.8148148148148149, 0.8181818181818181, 0.8260869565217391, 0.8333333333333334, 0.8333333333333334, 0.8333333333333334, 0.84, 0.8571428571428572, 0.8695652173913043, 0.88, 0.88, 0.8846153846153846, 0.8888888888888888, 0.8888888888888888, 0.8947368421052632, 0.8947368421052632, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9047619047619048, 0.9047619047619048, 0.9047619047619048, 0.9047619047619048, 0.9090909090909091, 0.9090909090909091, 0.9117647058823529, 0.9117647058823529, 0.9142857142857143, 0.9142857142857143, 0.9148936170212766, 0.9166666666666666, 0.92, 0.92, 0.9210526315789473, 0.9230769230769231, 0.9230769230769231, 0.9230769230769231, 0.9285714285714286, 0.9285714285714286, 0.9302325581395349, 0.9322033898305084, 0.9333333333333333, 0.935483870967742, 0.9365079365079365, 0.9365079365079365, 0.9375, 0.9375, 0.9393939393939394, 0.9411764705882353, 0.9411764705882353, 0.9428571428571428, 0.9444444444444444, 0.9444444444444444, 0.94

### 2. Generating the recommendations

In [28]:
import itertools

# user we want to check
id = 98783

def get_jaccard_recommendations(id):
    
    # create lists per user
    users = df.groupby('User-ID')['ISBN'].apply(list)
    
    comparison_user = users[id]
    
    dist_list = []
    new_content = []
    similar_users = []
  
    for user, value in users.items():

        # calculate Jaccard distance
        union = set(comparison_user + users[user])
        intersect = list(set(comparison_user) & set(users[user]))
        jac_sim = len(intersect) / float(len(union))
        distance = 1 - jac_sim
        dist_list.append(distance)

        
    
        # tweak this parameter. Closer to 0.0 is more the same. 0.0 is the user.
        if distance < 0.80 and distance != 0.0:
            
            # get the differences in sets (ISBN) from the the selected user and user in the for-loop
            # add these differences to new_content 
            new_items = set(users[user]).difference(comparison_user)
            new_content.extend(new_items)
            # add the user to similiar_users
            similar_users.append(user)
        else:
            pass
      
    # flatten the list with the sets
    new_content = set(new_content)
    

    # select the books
    selected_books = df["ISBN"].isin(new_content)
    df_recommendations = df[selected_books]
    #df_recommendations = df_recommendations["ISBN"]
    #df_recommendations = df_recommendations.to_frame().reset_index()

    return df_recommendations

# display the recommendations
df_recommendations = get_jaccard_recommendations(id)
df_recommendations

Unnamed: 0,User-ID,ISBN,Book-Rating
95,278535,0446610038,10
266,1435,0345387651,5
456,3827,0345409671,8
458,3827,0345423402,7
552,5539,0345423402,8
...,...,...,...
40635,273113,0380820854,5
40773,274004,0345384466,9
40809,274061,0345423402,10
40830,274061,0451176464,10


In [19]:
df_books = df_books = pd.read_csv(r'C:\Users\Gebruiker\Documents\CODE\Master\Personalisation\INFOMPPM_local\Week 02\jaccard-distance\data\BX-Books.csv', sep=';', encoding='latin-1', low_memory=False)


In [23]:
df_test = df_recommendations.merge(df_books, on='ISBN')



In [24]:
df_test

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,278535,0446610038,10,1st to Die: A Novel,James Patterson,2002,Warner Vision,http://images.amazon.com/images/P/0446610038.0...,http://images.amazon.com/images/P/0446610038.0...,http://images.amazon.com/images/P/0446610038.0...
1,6543,0446610038,9,1st to Die: A Novel,James Patterson,2002,Warner Vision,http://images.amazon.com/images/P/0446610038.0...,http://images.amazon.com/images/P/0446610038.0...,http://images.amazon.com/images/P/0446610038.0...
2,9417,0446610038,7,1st to Die: A Novel,James Patterson,2002,Warner Vision,http://images.amazon.com/images/P/0446610038.0...,http://images.amazon.com/images/P/0446610038.0...,http://images.amazon.com/images/P/0446610038.0...
3,11676,0446610038,10,1st to Die: A Novel,James Patterson,2002,Warner Vision,http://images.amazon.com/images/P/0446610038.0...,http://images.amazon.com/images/P/0446610038.0...,http://images.amazon.com/images/P/0446610038.0...
4,16795,0446610038,9,1st to Die: A Novel,James Patterson,2002,Warner Vision,http://images.amazon.com/images/P/0446610038.0...,http://images.amazon.com/images/P/0446610038.0...,http://images.amazon.com/images/P/0446610038.0...
...,...,...,...,...,...,...,...,...,...,...
517,254206,0451176464,9,Gerald's Game,Stephen King,2001,Signet Book,http://images.amazon.com/images/P/0451176464.0...,http://images.amazon.com/images/P/0451176464.0...,http://images.amazon.com/images/P/0451176464.0...
518,259629,0451176464,9,Gerald's Game,Stephen King,2001,Signet Book,http://images.amazon.com/images/P/0451176464.0...,http://images.amazon.com/images/P/0451176464.0...,http://images.amazon.com/images/P/0451176464.0...
519,259901,0451176464,8,Gerald's Game,Stephen King,2001,Signet Book,http://images.amazon.com/images/P/0451176464.0...,http://images.amazon.com/images/P/0451176464.0...,http://images.amazon.com/images/P/0451176464.0...
520,270820,0451176464,7,Gerald's Game,Stephen King,2001,Signet Book,http://images.amazon.com/images/P/0451176464.0...,http://images.amazon.com/images/P/0451176464.0...,http://images.amazon.com/images/P/0451176464.0...


In [26]:
unique_df = df_test.drop_duplicates(subset=['ISBN'])
unique_df.head(10)

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,278535,446610038,10,1st to Die: A Novel,James Patterson,2002,Warner Vision,http://images.amazon.com/images/P/0446610038.0...,http://images.amazon.com/images/P/0446610038.0...,http://images.amazon.com/images/P/0446610038.0...
80,1435,345387651,5,The Cider House Rules,John Irving,1994,Ballantine Books,http://images.amazon.com/images/P/0345387651.0...,http://images.amazon.com/images/P/0345387651.0...,http://images.amazon.com/images/P/0345387651.0...
118,3827,345409671,8,"Memnoch the Devil (Vampire Chronicles, No 5)",Anne Rice,1997,Ballantine Books,http://images.amazon.com/images/P/0345409671.0...,http://images.amazon.com/images/P/0345409671.0...,http://images.amazon.com/images/P/0345409671.0...
137,3827,345423402,7,A Kiss of Shadows (Meredith Gentry Novels (Pap...,LAURELL K. HAMILTON,2002,Ballantine Books,http://images.amazon.com/images/P/0345423402.0...,http://images.amazon.com/images/P/0345423402.0...,http://images.amazon.com/images/P/0345423402.0...
159,6073,345427637,6,The Angel of Darkness,Caleb Carr,1998,Ballantine Books,http://images.amazon.com/images/P/0345427637.0...,http://images.amazon.com/images/P/0345427637.0...,http://images.amazon.com/images/P/0345427637.0...
180,10560,345384466,8,The Witching Hour (Lives of the Mayfair Witches),ANNE RICE,1993,Ballantine Books,http://images.amazon.com/images/P/0345384466.0...,http://images.amazon.com/images/P/0345384466.0...,http://images.amazon.com/images/P/0345384466.0...
225,11676,61097853,8,The First Eagle (Jim Chee Novels),Tony Hillerman,1999,HarperTorch,http://images.amazon.com/images/P/0061097853.0...,http://images.amazon.com/images/P/0061097853.0...,http://images.amazon.com/images/P/0061097853.0...
236,11676,345409973,8,The Cobra Event,Richard Preston,1998,Ballantine Books,http://images.amazon.com/images/P/0345409973.0...,http://images.amazon.com/images/P/0345409973.0...,http://images.amazon.com/images/P/0345409973.0...
245,11676,380820854,5,"To Sir Phillip, With Love",Julia Quinn,2003,Avon,http://images.amazon.com/images/P/0380820854.0...,http://images.amazon.com/images/P/0380820854.0...,http://images.amazon.com/images/P/0380820854.0...
256,11676,425155404,8,Invasion,Robin Cook,1997,Berkley Publishing Group,http://images.amazon.com/images/P/0425155404.0...,http://images.amazon.com/images/P/0425155404.0...,http://images.amazon.com/images/P/0425155404.0...


In [13]:
# create a dataframe where you get all the ratings from the selected user
df_user_ratings = df[df['User-ID'] == 98783]
df_user_ratings

Unnamed: 0,User-ID,ISBN,Book-Rating
15125,98783,0345313860,8
15126,98783,0345334531,3
15127,98783,0345337662,8
15128,98783,0345339703,9
15129,98783,0345351525,7
15130,98783,034538475X,8
15131,98783,0345397819,8
15132,98783,0553262505,8
15133,98783,0553583468,9
15134,98783,0553584375,9


In [14]:
df_test2 = df_user_ratings.merge(df_books, on='ISBN')
df_test2

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,98783,0345313860,8,"The Vampire Lestat (Vampire Chronicles, Book II)",ANNE RICE,1986,Ballantine Books,http://images.amazon.com/images/P/0345313860.0...,http://images.amazon.com/images/P/0345313860.0...,http://images.amazon.com/images/P/0345313860.0...
1,98783,0345334531,3,Feast of All Saints,Anne Rice,1986,Ballantine Books,http://images.amazon.com/images/P/0345334531.0...,http://images.amazon.com/images/P/0345334531.0...,http://images.amazon.com/images/P/0345334531.0...
2,98783,0345337662,8,Interview with the Vampire,Anne Rice,1993,Ballantine Books,http://images.amazon.com/images/P/0345337662.0...,http://images.amazon.com/images/P/0345337662.0...,http://images.amazon.com/images/P/0345337662.0...
3,98783,0345339703,9,The Fellowship of the Ring (The Lord of the Ri...,J.R.R. TOLKIEN,1986,Del Rey,http://images.amazon.com/images/P/0345339703.0...,http://images.amazon.com/images/P/0345339703.0...,http://images.amazon.com/images/P/0345339703.0...
4,98783,0345351525,7,The Queen of the Damned (Vampire Chronicles (P...,Anne Rice,1993,Ballantine Books,http://images.amazon.com/images/P/0345351525.0...,http://images.amazon.com/images/P/0345351525.0...,http://images.amazon.com/images/P/0345351525.0...
5,98783,034538475X,8,The Tale of the Body Thief (Vampire Chronicles...,Anne Rice,1993,Ballantine Books,http://images.amazon.com/images/P/034538475X.0...,http://images.amazon.com/images/P/034538475X.0...,http://images.amazon.com/images/P/034538475X.0...
6,98783,0345397819,8,Lasher: Lives of the Mayfair Witches (Lives of...,Anne Rice,1995,Ballantine Books,http://images.amazon.com/images/P/0345397819.0...,http://images.amazon.com/images/P/0345397819.0...,http://images.amazon.com/images/P/0345397819.0...
7,98783,0553262505,8,"A Wizard of Earthsea (Earthsea Trilogy, Book 1)",URSULA K. LE GUIN,1984,Bantam,http://images.amazon.com/images/P/0553262505.0...,http://images.amazon.com/images/P/0553262505.0...,http://images.amazon.com/images/P/0553262505.0...
8,98783,0553583468,9,"Whisper of Evil (Hooper, Kay. Evil Trilogy.)",Kay Hooper,2002,Bantam Books,http://images.amazon.com/images/P/0553583468.0...,http://images.amazon.com/images/P/0553583468.0...,http://images.amazon.com/images/P/0553583468.0...
9,98783,0553584375,9,No One to Trust,IRIS JOHANSEN,2003,Bantam,http://images.amazon.com/images/P/0553584375.0...,http://images.amazon.com/images/P/0553584375.0...,http://images.amazon.com/images/P/0553584375.0...
