# Association Rule Learning
- Apriori algorithm is a popular machine learning algorithm in Association Rule Learning
- Association Rules helps to provide a set of rules which helps to get an idea of relations between different items. These rules can tell the pattern of buying an item with other items. This is mostly used in supermarkets to increase their sales. More profit can be generated if relations between items purchased are identified.
- These rules helps to give the probability of items that can be bought together by a customer, based on previous transactions. For example, if a person buys Bread, then he/she might also buy milk or jam. Similarly, a person buying a beer will be taking chips or any snacks. So we see that transactions involves some pattern.
- So if we say that items A and B are frequently bought together then we can use different techniques to increase the profit. For example:
    - Discount offers can be given to customers who are buying these items together.
    - Advertisement campaign can be done to target those customers who are buying only one item to buy other item as well
    - Items A and B can be placed together in the store so that customers can easily find frequent baught items at one place.

# Components of Apriori Algorithm
- Support {A}-->{B}: It tells how frequent an itemset is bought in all transactions. Support is given by:
        support({A}-->{B}) = Transactions containing B / Total number of transactions
- Confidence {A}-->{B}: Defines the likeliness that a customer buys an item B given that he/she has already bought an item A. Confidence is given by:
        confidence({A}-->{B}) = Transactions containing both A and B/ Transactions containing A
     - Confidence is similar to conditional probabilty, ie probabilty of B given A
- Lift {A} --> {B}: It indicates the increase in the ratio of sale of B when A is sold. It is the most important metric to determine the strength of rule. Lift is given by:
        lift({A}-->{B}) = confidence({A}-->{B})/support({A}-->{B})

# Importing the libraries

In [2]:
# !pip install apyori

In [3]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

# Data Preprocessing

In [4]:
data = pd.read_csv("Market_Basket_Optimisation.csv",header=None)
data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [5]:
total_items = len(data.columns)
# these 20 columns shows maximum items bought
total_items

20

In [6]:
total_trans = len(data)
# 7501 total transactions, ie. each row shows a transaction and columns shows the items bought by that user
total_trans

7501

In [7]:
data.values[1,2]

'eggs'

In [8]:
transaction = []
for i in range(0, total_trans):
    transaction.append([str(data.values[i,j]) for j in range(0,total_items) if (str(data.values[i,j])!='nan')])
# transactions
# A list of list is created. A sublist contains a single transaction with all the items for that transaction.

In [9]:
transaction

[['shrimp',
  'almonds',
  'avocado',
  'vegetables mix',
  'green grapes',
  'whole weat flour',
  'yams',
  'cottage cheese',
  'energy drink',
  'tomato juice',
  'low fat yogurt',
  'green tea',
  'honey',
  'salad',
  'mineral water',
  'salmon',
  'antioxydant juice',
  'frozen smoothie',
  'spinach',
  'olive oil'],
 ['burgers', 'meatballs', 'eggs'],
 ['chutney'],
 ['turkey', 'avocado'],
 ['mineral water', 'milk', 'energy bar', 'whole wheat rice', 'green tea'],
 ['low fat yogurt'],
 ['whole wheat pasta', 'french fries'],
 ['soup', 'light cream', 'shallot'],
 ['frozen vegetables', 'spaghetti', 'green tea'],
 ['french fries'],
 ['eggs', 'pet food'],
 ['cookies'],
 ['turkey', 'burgers', 'mineral water', 'eggs', 'cooking oil'],
 ['spaghetti', 'champagne', 'cookies'],
 ['mineral water', 'salmon'],
 ['mineral water'],
 ['shrimp',
  'chocolate',
  'chicken',
  'honey',
  'oil',
  'cooking oil',
  'low fat yogurt'],
 ['turkey', 'eggs'],
 ['turkey',
  'fresh tuna',
  'tomatoes',
  'spagh

## Training the apriori model
- The class apriori() takes first argument as transactions. The transaction variable must be an iterable object.
- The other arguments are:
    - min_support: The minimum support of relations to be taken as threshold
    - min_confidence: The minimum confidence threshold
    - min_lift: The minimum lift of relations
    - max_length: The maximum length of the relations, relation between items

In [10]:
from apyori import apriori
rules = apriori(transactions =transaction, min_support = 0.003, min_confidence=0.2, min_lift=3, min_length=3,max_length=3)
# min_support, min_confidence and min_lift values can be adjusted accordingly.
# Aprori function returns rules with the given arguments
result = list(rules)
len(result)
# 56 is the total number of relational records and each relational records contains 1 or more records

56

- 1 itemset is created, taking each item individually
- It will count how many times each item is present in all the transactions, and it'll count support for each item
- Wherever the support is greater than min_support, then it'll create 2  itemsets
- There are many itemsets which can be calculated after each step, but as  we have given max_length=3, so it will calculate upto 3 itemsets
- Different rules are generated internally and we will get that values which will give values above the threshold values provided, ie,values obtained are greater than min_confidence and min_lift

# Visualising the rules

# Output- Frequently bought items together

In [11]:
result
# rules are converted to list with support, confidence and lift
# First rule shows that if a customer buys light cream(items_base) there is 29%(confidence) chance that the same 
# customer may buy chicken(items_add)

[RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'escalope', 'pasta'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'honey', 'fromage blanc'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

## Putting the results into well organised dataframes

In [12]:
result[0]
# items=frozenset: items bought together, i.e chicken and light cream are bought together
# support > min_support
# items_base: antecedent- light_cream
# items_add: consequent- chicken
# Rule: antecedent to consequent- {light cream}-->{chicken} 
# confidence > min_confidence
# lift > min_lift
# here a-->b is true but b-->a is not true, we can say this because it is not mentioned here in the ouput and we got only
# one ordered_statistic

RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)])

In [13]:
result[0].ordered_statistics[0].items_base

frozenset({'light cream'})

In [14]:
# Getting the itemset
result[0].items

frozenset({'chicken', 'light cream'})

In [15]:
# Getting the support
result[0].support

0.004532728969470737

In [16]:
# Access to antecedent,consequent, confidence and lift through ordered_statistics for each rule. Here we have only one rule
# inside the first RelationRecord and in total we have 23 relational records in which each can have one or more rules
# For more than 2 rules ordered_statistics contains more than 1 set
result[0].ordered_statistics

[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]

In [17]:
# Going to the first rule by giving index[0]
# It can have more than one rule also, so give index [1],[2], ...
result[0].ordered_statistics[0]

OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)

In [18]:
# Getting antecedent for rule 1
result[0].ordered_statistics[0].items_base

frozenset({'light cream'})

In [19]:
# Getting consequent for rule 1
result[0].ordered_statistics[0].items_add

frozenset({'chicken'})

In [20]:
# Getting confidence for rule 1
result[0].ordered_statistics[0].confidence

0.29059829059829057

In [21]:
# Getting lift for rule 1
result[0].ordered_statistics[0].lift

4.84395061728395

### Printing the results in the dataframe to visualize it in a better way

In [22]:
df1= pd.DataFrame()
support=[]
item=[]
antecedent=[]
consequent=[]
confidence=[]
lift=[]
for RelationRecord in result:
    for ordered_stat in RelationRecord.ordered_statistics:
        support.append(RelationRecord.support)
        item.append(RelationRecord.items)
        antecedent.append(ordered_stat.items_base)
        consequent.append(ordered_stat.items_add)
        confidence.append(ordered_stat.confidence)
        lift.append(ordered_stat.lift)

In [23]:
df1['Items'] = list(map(set,item))
# this is converting each itemset in set format {} and creating a list for whole
df1['Antecedent'] = list(map(set,antecedent))
df1['Consequent'] = list(map(set,consequent))
df1['Support'] = support
df1['Confidence'] = confidence
df1['Lift'] = lift

In [25]:
# Sorting in descending order by Lift
df1 = df1.sort_values(by = 'Lift', ascending = False)
df1.head()

Unnamed: 0,Items,Antecedent,Consequent,Support,Confidence,Lift
69,"{mineral water, whole wheat pasta, olive oil}","{mineral water, whole wheat pasta}",{olive oil},0.003866,0.402778,6.115863
56,"{spaghetti, ground beef, tomato sauce}",{tomato sauce},"{spaghetti, ground beef}",0.003066,0.216981,5.535971
3,"{honey, fromage blanc}",{fromage blanc},{honey},0.003333,0.245098,5.164271
58,"{spaghetti, ground beef, tomato sauce}","{spaghetti, tomato sauce}",{ground beef},0.003066,0.489362,4.9806
0,"{chicken, light cream}",{light cream},{chicken},0.004533,0.290598,4.843951


- This shows that consequent are the frequently bought items with items in Antecedent.
- Business owners can give offers on these items to increase their profit.
- Supermarket can place the items in Antecedent and Consequent nearby so that customers can buy both of them at same place without much searching for them.

## There is a great package which can be used to post jupyter notebooks directly as a medium post, jupyter_to_medium. Here is the link of it's launch:
https://www.youtube.com/watch?v=lU9jogfXNqE&t=4s