## <center>Association Rules </center>
### <center>Jupyter Notebook</center>

## What are Association Rules?
<img width="500" src="images/marketbasketanalysis.png" />

## Problem to solve:
- "What do my customers buy?"
- "Which products are bought together?"

## Goal:
- Find <strong> associations </strong> and <strong>correlations</strong> between the different items that customers place in their shopping basket

## Key figures example:
<img width="500" src="images/example.png" />

```python
Support(B) = (Transactions containing (B))/(Total Transactions)
Confidence(A→B) = (Transactions containing both (A and B))/(Transactions containing A)
Lift(A→B) = (Confidence (A→B))/(Support (B))
```

## Lets write the code:
- Read the coresponding Blog Articel: https://stackabuse.com/association-rule-mining-via-apriori-algorithm-in-python/
### Install  mlxtend library from anaconda prompt 
- https://anaconda.org/conda-forge/mlxtend



- import pandas, numpy, pyplot as conventional names

In [1]:
import numpy as np
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

- import the Dataset
- store it as a pandas.dataframe in a variable

In [2]:
data = pd.read_csv('store_data.csv', header=None)

# the header=None removes the header.
# otherwise the headline would be our first shopping basket

# if the columns in your dataset are seperated with a ; you have to pass the param
# sep=";" to the .read_csv function

- use the .head() function to see the data looks like

In [3]:
data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


- alright, each row is a shopping basket
- if the shopping basket does not contain the item, the value is NaN

In [4]:
data.shape

(7501, 20)

- There are 7500 Shopping baskets and 20 different items

### Data Preprocessing
- Read more : http://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/
- The Apriori Alogrithm wants the Dataset to be a List with a lot of lists inside
- we have to transform our Dataframe to a List of lists
- records will be our outer list
- we iterate through every Shoppingbasket in the dataset and append it to the list

In [5]:
records = []

for i in range(0, data.shape[0]):
    records.append([str(data.values[i,j]) for j in range(0, data.shape[1])])
print("This is the first small list in the big List:    ",records[0]) 

This is the first small list in the big List:     ['shrimp', 'almonds', 'avocado', 'vegetables mix', 'green grapes', 'whole weat flour', 'yams', 'cottage cheese', 'energy drink', 'tomato juice', 'low fat yogurt', 'green tea', 'honey', 'salad', 'mineral water', 'salmon', 'antioxydant juice', 'frozen smoothie', 'spinach', 'olive oil']


- now we have a Big list containing a lot of smaller Lists
- we have to transform it a bit

In [6]:
te = TransactionEncoder() # -> we create a instance of the TransactionEncoder
# read more: http://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/

te_ary = te.fit(records).transform(records) # -> we fit the te on the big list and transform the list afterwards

te_ary # -> This is the Variable containing the new encoded List

array([[False,  True,  True, ...,  True, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False,  True, False]])

- lets change True to 1

In [7]:
te_ary.astype("int")

array([[0, 1, 1, ..., 1, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 1, 0]])

- the Transaction Encoder instance has an Attribute containing the unique item names

In [8]:
te.columns_

[' asparagus',
 'almonds',
 'antioxydant juice',
 'asparagus',
 'avocado',
 'babies food',
 'bacon',
 'barbecue sauce',
 'black tea',
 'blueberries',
 'body spray',
 'bramble',
 'brownies',
 'bug spray',
 'burger sauce',
 'burgers',
 'butter',
 'cake',
 'candy bars',
 'carrots',
 'cauliflower',
 'cereals',
 'champagne',
 'chicken',
 'chili',
 'chocolate',
 'chocolate bread',
 'chutney',
 'cider',
 'clothes accessories',
 'cookies',
 'cooking oil',
 'corn',
 'cottage cheese',
 'cream',
 'dessert wine',
 'eggplant',
 'eggs',
 'energy bar',
 'energy drink',
 'escalope',
 'extra dark chocolate',
 'flax seed',
 'french fries',
 'french wine',
 'fresh bread',
 'fresh tuna',
 'fromage blanc',
 'frozen smoothie',
 'frozen vegetables',
 'gluten free bar',
 'grated cheese',
 'green beans',
 'green grapes',
 'green tea',
 'ground beef',
 'gums',
 'ham',
 'hand protein bar',
 'herb & pepper',
 'honey',
 'hot dogs',
 'ketchup',
 'light cream',
 'light mayo',
 'low fat yogurt',
 'magazines',
 'mashe

- lets create a new Dataframe out of the List
- delete the nan 'item'

In [9]:
processedData = pd.DataFrame(te_ary, columns=te.columns_)
processedData = processedData.drop(['nan'], axis=1)
processedData.head(5)

Unnamed: 0,asparagus,almonds,antioxydant juice,asparagus.1,avocado,babies food,bacon,barbecue sauce,black tea,blueberries,...,turkey,vegetables mix,water spray,white wine,whole weat flour,whole wheat pasta,whole wheat rice,yams,yogurt cake,zucchini
0,False,True,True,False,True,False,False,False,False,False,...,False,True,False,False,True,False,False,True,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,True,False,False,False,False,False,...,True,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False


# Lets find frequent Itemsets

- Read more about the apriori algo http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/

In [10]:
frequent_itemsets = apriori(processedData, min_support=0.005, use_colnames=True)

frequent_itemsets.head(10).sort_values(by=['support'], ascending=False)

Unnamed: 0,support,itemsets
8,0.033729,(brownies)
2,0.033329,(avocado)
0,0.020397,(almonds)
5,0.014265,(black tea)
7,0.011465,(body spray)
4,0.010799,(barbecue sauce)
6,0.009199,(blueberries)
1,0.008932,(antioxydant juice)
3,0.008666,(bacon)
9,0.008666,(bug spray)


- this Dataframe contains now all Items which have at least 0,5% support
- You can change the treshold of support with the min_support attribute

# Lets find some rules

In [11]:
rules = association_rules(frequent_itemsets, min_threshold=0.01, metric="support") # Lets find frequent Itemsets
rules.head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(mineral water),(avocado),0.238368,0.033329,0.011598,0.048658,1.459926,0.003654,1.016113
1,(avocado),(mineral water),0.033329,0.238368,0.011598,0.348,1.459926,0.003654,1.168147
2,(cake),(burgers),0.081056,0.087188,0.011465,0.141447,1.622319,0.004398,1.063198
3,(burgers),(cake),0.087188,0.081056,0.011465,0.131498,1.622319,0.004398,1.05808
4,(chocolate),(burgers),0.163845,0.087188,0.017064,0.10415,1.194537,0.002779,1.018933
5,(burgers),(chocolate),0.087188,0.163845,0.017064,0.195719,1.194537,0.002779,1.03963
6,(burgers),(eggs),0.087188,0.179709,0.028796,0.330275,1.83783,0.013128,1.224818
7,(eggs),(burgers),0.179709,0.087188,0.028796,0.160237,1.83783,0.013128,1.086988
8,(french fries),(burgers),0.170911,0.087188,0.021997,0.128705,1.476173,0.007096,1.04765
9,(burgers),(french fries),0.087188,0.170911,0.021997,0.252294,1.476173,0.007096,1.108844


- the standard metric is "confidence"
- change the metric with: metric="support"