## Market Basket Analysis Study (Extra) ###
This is a sample Python implementation of Week 11 lesson. 

Demos in this notebook:
1. Apriori Algorithm Association Rule Mining

Uncomment and run the below code if you have not installed apyroi before.

In [4]:
#!pip3 install apyori

In [5]:
import numpy as np  
import pandas as pd  
from apyori import apriori  

In [6]:
store_data = pd.read_csv('store_data.csv', header=None)  
store_data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


The Apriori library we are going to use requires our dataset to be in the form of a list of lists, where the whole dataset is a big list and each transaction in the dataset is an inner list within the outer big list. Currently we have data in the form of a pandas dataframe. To convert our pandas dataframe into a list of lists, execute the following script:

In [7]:
records = []  
for i in range(0, 7501):  
    records.append([str(store_data.values[i,j]) for j in range(0, 20)])

### Apriori Algorithm

Refer to the following links on for detail explanation on the implementation:
- [Association Rule Mining from StackAbuse](https://stackabuse.com/association-rule-mining-via-apriori-algorithm-in-python/)

The apriori class requires some parameter values to work. The first parameter is the list of list that you want to extract rules from. The second parameter is the min_support parameter. This parameter is used to select the items with support values greater than the value specified by the parameter. Next, the min_confidence parameter filters those rules that have confidence greater than the confidence threshold specified by the parameter. Similarly, the min_lift parameter specifies the minimum lift value for the short listed rules. Finally, the min_length parameter specifies the minimum number of items that you want in your rules.

In [8]:
#You are encourage to play with the different parameter settings for min_support, min_confidence, etc.
association_rules = apriori(records, min_support=0.0045, min_confidence=0.2, min_lift=3, min_length=2)  
association_results = list(association_rules)

In [9]:
#Check the number of rules mined
print(len(association_results)) 

48


In [10]:
#Check the first rules we have mined
print(association_results[0])

RelationRecord(items=frozenset(['chicken', 'light cream']), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset(['light cream']), items_add=frozenset(['chicken']), confidence=0.29059829059829057, lift=4.84395061728395)])


For instance from the first item, we can see that light cream and chicken are commonly bought together. This makes sense since people who purchase light cream are careful about what they eat hence they are more likely to buy chicken i.e. white meat instead of red meat i.e. beef. Or this could mean that light cream is commonly used in recipes for chicken.

The support value for the first rule is 0.0045. This number is calculated by dividing the number of transactions containing light cream divided by total number of transactions. The confidence level for the rule is 0.2905 which shows that out of all the transactions that contain light cream, 29.05% of the transactions also contain chicken. Finally, the lift of 4.84 tells us that chicken is 4.84 times more likely to be bought by the customers who buy light cream compared to the default likelihood of the sale of chicken.

In [11]:
#Clearer representation:

for item in association_results:

    # first index of the inner list
    # Contains base item and add item
    pair = item[0] 
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])

    #second index of the inner list
    print("Support: " + str(item[1]))

    #third index of the list located at 0th
    #of the third index of the inner list

    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("=====================================")

Rule: chicken -> light cream
Support: 0.00453272896947
Confidence: 0.290598290598
Lift: 4.84395061728
Rule: escalope -> mushroom cream sauce
Support: 0.0057325689908
Confidence: 0.300699300699
Lift: 3.79083269672
Rule: pasta -> escalope
Support: 0.00586588454873
Confidence: 0.372881355932
Lift: 4.70081185016
Rule: herb & pepper -> ground beef
Support: 0.0159978669511
Confidence: 0.323450134771
Lift: 3.29199384113
Rule: tomato sauce -> ground beef
Support: 0.00533262231702
Confidence: 0.377358490566
Lift: 3.84065948132
Rule: olive oil -> whole wheat pasta
Support: 0.00799893347554
Confidence: 0.27149321267
Lift: 4.12241009764
Rule: pasta -> shrimp
Support: 0.00506599120117
Confidence: 0.322033898305
Lift: 4.50667214774
Rule: chicken -> nan
Support: 0.00453272896947
Confidence: 0.290598290598
Lift: 4.84395061728
Rule: frozen vegetables -> shrimp
Support: 0.00533262231702
Confidence: 0.232558139535
Lift: 3.25451232211
Rule: spaghetti -> ground beef
Support: 0.00479936008532
Confidence: 0.