# ðŸ“š Recommendations based on Frequently Reviewed Together (association rules)
For the final segment of this assignment, refer to section 5.4 of the _Practical Recommender Systems_ book (pages 113-127). After reading, download the code provided by the book and focus on the `association_rules_calculator.py` in the `builder` directory. Your task is to adapt this code for use in this notebook, translating its steps into a format suitable for our environment. Here's a simplified outline based on the source code:

The steps found in the source code are:
1. Load the data
2. Generate transactions or, in our case reviews
3. Calculate the Support Confidence
4. Save the results

### 1. Load the data
Instead of using a database, load your `.csv` files into a dataframe. Select the data necessary for identifying which user reviewed which books.


In [1]:
import pandas as pd
df = pd.read_csv('data/ratings_subset.csv', sep=';', encoding='latin-1')

### 2. Generating the reviews
In this context, transactions are the reviews. You need to compile a list of lists, where each inner list contains reviews that are related, similar to how shopping lists are grouped in the example: `[['eggs','milk','bread'], ['bacon', 'bread'], [...], [...]]`

In [None]:
grouped_reviews = df.groupby(['User-ID'])
reviews_list = grouped_reviews['ISBN'].apply(list)

reviews = [rs for rs in reviews_list]
# or reviews.values.tolist()

### 3. Calculate the Support Confidence
This requires some puzzling, but looking at the source code will give you a clear idea. You can reuse the subroutines in the source code and pass along the list containing the reviews belonging together. Play around with the _minimum support_ parameter. Too strict will result in fewer associations.

In [31]:
# adapted code from book
from datetime import datetime
from collections import defaultdict
from itertools import combinations

def calculate_association_rules(one_itemsets, two_itemsets, N):
    timestamp = datetime.now()

    rules = []
    for source, source_freq in one_itemsets.items():
        for key, group_freq in two_itemsets.items():
            if source.issubset(key):
                target = key.difference(source)
                support = group_freq / N
                confidence = group_freq / source_freq
                rules.append((timestamp, next(iter(source)), next(iter(target)),
                              confidence, support))
    return rules

def calculate_itemsets_one(transactions, min_sup=0.01):

    N = len(transactions)

    temp = defaultdict(int)
    one_itemsets = dict()

    for items in transactions:
        for item in items:
            inx = frozenset({item})
            temp[inx] += 1

    # remove all items that is not supported.
    for key, itemset in temp.items():
        if itemset > min_sup * N:
            one_itemsets[key] = itemset

    return one_itemsets

def calculate_itemsets_two(transactions, one_itemsets):
    two_itemsets = defaultdict(int)

    for items in transactions:
        items = list(set(items))  # remove duplications

        if (len(items) > 2):
            for perm in combinations(items, 2):
                if has_support(perm, one_itemsets):
                    two_itemsets[frozenset(perm)] += 1
        elif len(items) == 2:
            if has_support(items, one_itemsets):
                two_itemsets[frozenset(items)] += 1
    return two_itemsets

def calculate_association_rules(one_itemsets, two_itemsets, N):
    timestamp = datetime.now()

    rules = []
    for source, source_freq in one_itemsets.items():
        for key, group_freq in two_itemsets.items():
            if source.issubset(key):
                target = key.difference(source)
                support = group_freq / N
                confidence = group_freq / source_freq
                rules.append((timestamp, next(iter(source)), next(iter(target)),
                              confidence, support))
    return rules

def has_support(perm, one_itemsets):
    return frozenset({perm[0]}) in one_itemsets and \
           frozenset({perm[1]}) in one_itemsets

In [49]:
def build_association_rules(reviews, min_support=0.01):
    n = len(reviews)
    one_itemsets = calculate_itemsets_one(reviews, min_support)
    two_itemsets = calculate_itemsets_two(reviews, one_itemsets)
    rules = calculate_association_rules(one_itemsets, two_itemsets, n)
    return rules

rules = build_association_rules(reviews, 0.01)

### 4. Save the results
Create a dataframe for the results of step 3. In order to make it work with the current app please make sure the columns are `source;target;support;confidence`. Save the recommendations as `recommendations-seeded-associations.csv` and replace the file in the app directory.

In [50]:
rules_df = pd.DataFrame(rules, columns=['timestamp','source', 'target', 'support', 'confidence'])

# drop timestamp
rules_df = rules_df.drop('timestamp', axis='columns')

# save the dataframe
rules_df.to_csv('app/recommendations/recommendations-seeded-associations.csv', sep=';', index=False)