# Association rules

### Usage:
1. How likely is one to buy bread if (s)he bought milk & eggs?
2. Product placement optimization
3. Product recomendations

### Association rule:
If one buys eggs, the probability of buying bread is __ %.


### Advantages:
1. Fast
2. Works with relatively small amounts of data
3. Few if any feature engineering

## Feature engineering
is the process of engineering data into a predictive feature that fits the requirements (and / or improves the performance) of a machine learning model.

### Ways to measure association
1. Support - frequency of an item within a dataset.
2. Confidence - probability of seeing a "then" term, given the data contains the "if" term.
3. Lift > 1 - likely, 1 < unlikely, =1 no association between items.

Example:
5000 transactions occured during a month.
500 bread was bought
350 eggs were bought
150 both bread & eggs were bought

support(bread) = 500 / 5000 = 10%
confidence( bread -> eggs) = 150 / 500 = 30%
lift(bread -> eggs) = 0.3 / (350 / 5000) = 4.28

With the lift 4.28 the rule is: if a customer buys bread (s)he is likely to buy eggs.

### Apriory 
is the algorithm implementing association rule mining over structured data.


# association rule with Apiory

In [1]:
! pip install mlxtend

Collecting mlxtend
  Downloading mlxtend-0.17.3-py2.py3-none-any.whl (1.3 MB)
[K     |████████████████████████████████| 1.3 MB 467 kB/s eta 0:00:01
Installing collected packages: mlxtend
Successfully installed mlxtend-0.17.3


In [2]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

### Importing data

In [3]:
address = './Data/groceries.csv'
data = pd.read_csv(address)
data.head()

Unnamed: 0,1,2,3,4,5,6,7,8,9
0,citrus fruit,semi-finished bread,margarine,ready soups,,,,,
1,tropical fruit,yogurt,coffee,,,,,,
2,whole milk,,,,,,,,
3,pip fruit,yogurt,cream cheese,meat spreads,,,,,
4,other vegetables,whole milk,condensed milk,long life bakery product,,,,,


In [4]:
basket_sets = pd.get_dummies(data)
basket_sets.head()

Unnamed: 0,1_Instant food products,1_UHT-milk,1_artif. sweetener,1_baby cosmetics,1_bags,1_baking powder,1_bathroom cleaner,1_beef,1_berries,1_beverages,1_bottled beer,1_bottled water,1_brandy,1_brown bread,1_butter,1_butter milk,1_candy,1_canned beer,1_canned fish,1_canned fruit,1_canned vegetables,1_cat food,1_chewing gum,1_chicken,1_chocolate,1_chocolate marshmallow,1_citrus fruit,1_cleaner,1_cling film/bags,1_coffee,1_condensed milk,1_cookware,1_cream cheese,1_curd,1_curd cheese,1_decalcifier,1_dental care,1_dessert,1_detergent,1_dish cleaner,...,9_pastry,9_pet care,9_photo/film,9_pickled vegetables,9_popcorn,9_pot plants,9_potato products,9_processed cheese,9_red/blush wine,9_rice,9_rolls/buns,9_root vegetables,9_rubbing alcohol,9_rum,9_salt,9_salty snack,9_sauces,9_seasonal products,9_semi-finished bread,9_shopping bags,9_skin care,9_sliced cheese,9_soda,9_soft cheese,9_sparkling wine,9_specialty bar,9_specialty cheese,9_specialty vegetables,9_spread cheese,9_sugar,9_sweet spreads,9_tea,9_vinegar,9_waffles,9_whipped/sour cream,9_white bread,9_white wine,9_whole milk,9_yogurt,9_zwieback
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### Support calculation

In [6]:
apriori(basket_sets, min_support=0.02, use_colnames=True)

Unnamed: 0,support,itemsets
0,0.030421,(1_beef)
1,0.034951,(1_canned beer)
2,0.029126,(1_chicken)
3,0.049191,(1_citrus fruit)
4,0.064401,(1_frankfurter)
5,0.04466,(1_other vegetables)
6,0.024272,(1_pip fruit)
7,0.040453,(1_pork)
8,0.038835,(1_rolls/buns)
9,0.033981,(1_root vegetables)


In [None]:
df = basket_sets
frequent_item_sets = apriori(basket_sets, min_support=0.002, use_colnames=True)

