# 📚 Recommendations based on Frequently Reviewed Together (association rules)
For the final part of this assignment, you can turn to 5.4 in the Practical Recommender Systems book (pp 113-127). Read this chapter and [download](https://www.manning.com/downloads/1927) the code accompanied by the book. Explore `association_rules_calculator.py` in the `builder` directory and translate it to this notebook. Falk uses a different infrastructure, but it is pretty simple to adapt this code. We will provide some guidelines below to speed up the process.

The steps found in the source code are:
1. Opening the data
2. Generating the transactions or, in our case reviews
3. Calculate the Support Confidence
4. Save the results

### 1. Opening the data
Since we are not using a database but `.csv` files, we can load them into a dataframe. Decide which data is necessary since we look for user A reviewed x and y.

In [1]:
import pandas as pd

In [2]:
#dowloading the data
ratings = pd.read_csv('/Users/ekaterinamazur/PycharmProjects/INFOMPPM1/Week 01/data/ratings.csv')

In [6]:
df_users = ratings.groupby('User-ID')['ISBN'].count().reset_index(name='counts')
users = list(df_users[(df_users['counts'] > 100) & (df_users['counts'] < 200)]['User-ID'])

ratings.loc[ratings['User-ID'].isin(users)]

# ratings[ratings['User-ID'] == 104636]

Unnamed: 0.1,Unnamed: 0,User-ID,ISBN,Book-Rating
1406,78074,16795,0060930535,4
1407,78080,16795,0060976845,7
1408,78087,16795,0061009059,9
1409,78130,16795,0062502182,8
1410,78192,16795,0140244824,6
...,...,...,...,...
1514,80844,16795,1558744150,8
1515,80846,16795,155874424X,8
1516,80848,16795,1558745017,7
1517,80901,16795,1573221112,8


### 2. Generating the reviews
What we want is a list containing lists of reviews belonging together. In the case of a shopping list, the output we used was
`[['eggs','milk','bread'], ['bacon', 'bread'], [...], [...]]`

In [8]:
user_reviews = ratings.groupby('User-ID')['ISBN'].apply(list)
reviewed = user_reviews.values.tolist()

[['0140230165',
  '0553564528',
  '0553575538',
  '055357695X',
  '0553583441',
  '0553583468'],
 ['0316666343',
  '0316779423',
  '0316789089',
  '0345339711',
  '0385420161',
  '0670892963',
  '0671042556',
  '0679746048'],
 ['0060256672',
  '0140077022',
  '0452282152',
  '0553268449',
  '0671003755',
  '0877733759'],
 ['0373218400',
  '0373484224',
  '0517556278',
  '0553250426',
  '067101420X',
  '0743457943'],
 ['0060934417',
  '0060976845',
  '0394758285',
  '0446672211',
  '0446673544',
  '0449149676',
  '0679772677'],
 ['0060930535',
  '0099771519',
  '0142001430',
  '0151002290',
  '0156001314',
  '0767902521'],
 ['0399133143',
  '0446672211',
  '0517149257',
  '0671024248',
  '0971880107',
  '155874262X'],
 ['0345353145',
  '0449221482',
  '0553279912',
  '0553348973',
  '0670892963',
  '0671023616'],
 ['0140386645',
  '0142000663',
  '0451456718',
  '051513628X',
  '0553573136',
  '0671524313',
  '0765342987',
  '0812575717',
  '0836204387',
  '0836218787',
  '0836220889',


In [9]:
# this code originated from the book Practical Recommender System.
# Some minor tweaks to make it work with the current dataset.

from collections import defaultdict
from itertools import combinations
from datetime import datetime

def calculate_itemsets_one(reviewed, min_sup=0.01):
    N = len(reviewed)
    print(N)
    temp = defaultdict(int)
    one_itemsets = dict()

    for items in reviewed:
        for item in items:
            inx = frozenset({item})
            temp[inx] += 1

    print("temp:")
    i = 0
    # remove all items that is not supported.
    for key, itemset in temp.items():
        #print(f"{key}, {itemset}, {min_sup}, {min_sup * N}")
        if itemset > min_sup * N:
            i = i + 1
            one_itemsets[key] = itemset
    print(i)
    return one_itemsets

def calculate_itemsets_two(reviewed, one_itemsets):
    two_itemsets = defaultdict(int)

    for items in reviewed:
        items = list(set(items))  # remove duplications

        if (len(items) > 2):
            for perm in combinations(items, 2):
                if has_support(perm, one_itemsets):
                    two_itemsets[frozenset(perm)] += 1
        elif len(items) == 2:
            if has_support(items, one_itemsets):
                two_itemsets[frozenset(items)] += 1
    return two_itemsets

def calculate_association_rules(one_itemsets, two_itemsets, N):
    timestamp = datetime.now()

    rules = []
    for source, source_freq in one_itemsets.items():
        for key, group_freq in two_itemsets.items():
            if source.issubset(key):
                target = key.difference(source)
                support = group_freq / N
                confidence = group_freq / source_freq
                rules.append((timestamp, next(iter(source)), next(iter(target)),
                              confidence, support))
    return rules

def has_support(perm, one_itemsets):
  return frozenset({perm[0]}) in one_itemsets and \
    frozenset({perm[1]}) in one_itemsets

min_sup = 0.01
N = len(reviewed)

### 3. Calculate the Support Confidence
This requires some puzzling, but looking at the source code will give you a clear idea. You can reuse the subroutines in the source code and pass along the list containing the reviews belonging together. Play around with the _minimum support_ parameter. Too strict will result in fewer associations.

In [11]:
one_itemsets = calculate_itemsets_one(reviewed, min_sup)
two_itemsets = calculate_itemsets_two(reviewed, one_itemsets)
rules = calculate_association_rules(one_itemsets, two_itemsets, N)

# check how many associations are made
len(rules)

1432
temp:
413


96940

### 4. Save the results
Create a dataframe for the results of step 3. In order to make it work with the current app please make sure the columns are `source;target;support;confidence`. Save the recommendations as `recommendations-seeded-associations.csv` and replace the file in the app directory.

In [12]:
# code goes here
associations = []

# iterate through results and create data structure containing the results
for rule in rules:
  association = {
    'source':str(rule[1]),
    'target':str(rule[2]),
    'confidence':rule[3],
    'support':rule[4]
  }
  # append to list
  associations.append(association)

# create dataframe
df_associations = pd.DataFrame(associations)

df_associations.to_csv('recommendations-seeded.csv', index=False, sep=';')