## Association Rule Learning - Apriori

For this implementation of Apriori model, We are going to use a library from the Python Software Foundation called apyori.py which implements some useful classes.

In [1]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Increasing the size of all graphs 
plt.rcParams['figure.figsize'] = 16, 8

In [2]:
#Suppressing unnecessary warnings
import warnings
warnings.filterwarnings('ignore')

In [3]:
# Importing the dataset
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header=None)
rows, columns = dataset.shape
dataset.head(5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


This is a dataset of products which were bought by customers in a store. We are going to find some association between these products so the store can improve the placement of these items in the store.

In [4]:
# Creating a list of lists to use as our dataset
transactions = []
for i in range(0,rows):
    transactions.append([str(dataset.values[i,j]) for j in range(0,columns)])

In [5]:
# Viewing the dataset created
print "Single Transaction: ", transactions[0]

Single Transaction:  ['shrimp', 'almonds', 'avocado', 'vegetables mix', 'green grapes', 'whole weat flour', 'yams', 'cottage cheese', 'energy drink', 'tomato juice', 'low fat yogurt', 'green tea', 'honey', 'salad', 'mineral water', 'salmon', 'antioxydant juice', 'frozen smoothie', 'spinach', 'olive oil']


######  Training the Apriori on the dataset 

While using the apriori library we need to specify certain parameters which we will calculate by estimation and supply to the class. 

* min-support:
    * We can say if an item is purchased 3 times a day, then in 1 week it is purchanse 21 times. So the support for this item would be 3*7/7500 = 0.0028.
* min-confidence:
    * We dont want rules to be too obvious. We may not get a good result if we set a confidence too high so we can set the confidence to 20% = 0.2
* min-lift:
    * We need rules above lift 3 which would make it a good rule.
* min-length:
    * The mininum number of items in a rule should be set greater than 1, because just 1 product purchased doesnt give much insight into remendations or placement of items. So we set it to 2

In [6]:
from apyori import apriori
rules = apriori(transactions, min_support=0.003, min_confidence=0.2, min_lift=3, min_length=2)

###### Visualizing the results

In [7]:
results = list(rules)
results[0]

RelationRecord(items=frozenset(['chicken', 'light cream']), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset(['light cream']), items_add=frozenset(['chicken']), confidence=0.29059829059829057, lift=4.84395061728395)])

In [8]:
results[0][0]

frozenset({'chicken', 'light cream'})

From the above results we can see that "chicken" and "light cream" were the top assciated products. They have a great confidence and lift levels.

With data such as this, the store can better judge the placement of items in the store. Below is the result for a set of top 20 asscoiated items.

In [9]:
for i in range(0,20):
    for j in results[i][0]:
        print j,"->",
    print "\n"

chicken -> light cream -> 

escalope -> mushroom cream sauce -> 

pasta -> escalope -> 

honey -> fromage blanc -> 

herb & pepper -> ground beef -> 

tomato sauce -> ground beef -> 

olive oil -> light cream -> 

olive oil -> whole wheat pasta -> 

pasta -> shrimp -> 

spaghetti -> avocado -> milk -> 

cake -> burgers -> milk -> 

turkey -> burgers -> chocolate -> 

turkey -> burgers -> milk -> 

cake -> frozen vegetables -> tomatoes -> 

spaghetti -> cereals -> ground beef -> 

chicken -> milk -> ground beef -> 

chicken -> nan -> light cream -> 

olive oil -> chicken -> milk -> 

olive oil -> chicken -> spaghetti -> 

frozen vegetables -> shrimp -> chocolate -> 

