# Recommendations based on Jaccard index
The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets. It measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets. 

$$ J(A, B) = \frac{|A \cap B|}{|A \cup B|} $$

Your objective is to create a Python function that utilizes the Jaccard index to find and recommend content from users who share similar interests. You will embed this function to the streamlit app


### 1. Loading the data
Load a subset from the ratings dataset.

In [1]:
import pandas as pd
df = pd.read_csv('./data/BX-Book-Ratings-Subset.csv', sep=';', encoding='latin-1')

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


### 2. Generating the recommendations

In [4]:
import itertools

# user we want to check
id = 98783

def get_jaccard_recommendations(id):
  # create lists per user
  users = df.groupby('User-ID')['ISBN'].apply(list)
  
  new_content = []
  similar_users = []
  
  for user, value in users.items():
    a = set(users[id])
    b = set(users[user])
    new = b.difference(a)

    j = float(len(a.intersection(b))) / len(a.union(b))
    
    # tweak this parameter. Closer to 0.0 is more the same. 0.0 is the user.
    if j < 0.8 and j != 0.0:
      new_content.append(new)
      similar_users.append(user)

  # flatten the list with the sets
  new_content = list(itertools.chain(*new_content))

  df_recommendations = df[df['User-ID'].isin(similar_users) & df['ISBN'].isin(new_content)]

  df_recommendations.sort_values('Book-Rating', ascending=False)

  return df_recommendations

get_jaccard_recommendations(id)

Unnamed: 0,User-ID,ISBN,Book-Rating
290,2033,0060248025,10
291,2033,0060256737,10
292,2033,0140386645,8
293,2033,0142000663,10
295,2033,0439064864,9
...,...,...,...
41054,276050,0553279912,7
41055,276050,0553377868,7
41056,276050,0671021001,9
41057,276050,067102423X,8
