## Import the Library

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#from apyori import apriori

In [None]:
#pip install apyori

In [None]:
from apyori import apriori

## Read data and Display

In [2]:
store_data = pd.read_csv("Market_Basket_Optimisation.csv", header=None)
store_data

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7496,butter,light mayo,fresh bread,,,,,,,,,,,,,,,,,
7497,burgers,frozen vegetables,eggs,french fries,magazines,green tea,,,,,,,,,,,,,,
7498,chicken,,,,,,,,,,,,,,,,,,,
7499,escalope,green tea,,,,,,,,,,,,,,,,,,


## Preprocessing on Data
*  Here we need a data in form of list for Apriori Algorithm.

In [3]:
records = []
for i in range(0, 7501):
    records.append([str(store_data.values[i, j]) for j in range(0, 20)])

## 📘 Step-by-Step Explanation of the comprehensie list

**`records = []`**  
Creates an empty list to store the final data.

**`for i in range(0, len(store_data)):`**  
Loops through each row in the DataFrame.  
`len(store_data)` gives the total number of rows.

**`store_data.values[i, j]`**  
Accesses the raw value at row `i`, column `j`.  
Note: `store_data.values` returns the entire DataFrame as a NumPy array.

**`[str(store_data.values[i, j]) for j in range(0, 20)]`**  
This inner list comprehension:  
- Loops through the **first 20 columns** (`j` from 0 to 19)  
- Converts each value to a string  
- Collects all 20 string values into a list representing one row

**`records.append(...)`**  

In [4]:
len(records)

7501

In [5]:
records[0]

['shrimp',
 'almonds',
 'avocado',
 'vegetables mix',
 'green grapes',
 'whole weat flour',
 'yams',
 'cottage cheese',
 'energy drink',
 'tomato juice',
 'low fat yogurt',
 'green tea',
 'honey',
 'salad',
 'mineral water',
 'salmon',
 'antioxydant juice',
 'frozen smoothie',
 'spinach',
 'olive oil']

In [6]:
records[3]

['turkey',
 'avocado',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan',
 'nan']

In [7]:
print (records[1])
print (records[100])
print (records[200])


['burgers', 'meatballs', 'eggs', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan']
['mineral water', 'barbecue sauce', 'chocolate', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan']
['green tea', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan']


In [None]:
records[50]

## Apriori Algorithm

* Now time to apply algorithm on data.
* We have provide `min_support`, `min_confidence`, `min_lift`, and `min length` of sample-set for find rule.

#### Measure 1: Support.
This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears. In Table 1 below, the support of {apple} is 4 out of 8, or 50%. Itemsets can also contain multiple items. For instance, the support of {apple, beer, rice} is 2 out of 8, or 25%.

![](https://annalyzin.files.wordpress.com/2016/04/association-rule-support-table.png?w=503&h=447)

If you discover that sales of items beyond a certain proportion tend to have a significant impact on your profits, you might consider using that proportion as your support threshold. You may then identify itemsets with support values above this threshold as significant itemsets.

#### Measure 2: Confidence. 
This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears. In Table 1, the confidence of {apple -> beer} is 3 out of 4, or 75%.

![](https://annalyzin.files.wordpress.com/2016/03/association-rule-confidence-eqn.png?w=527&h=77)

One drawback of the confidence measure is that it might misrepresent the importance of an association. This is because it only accounts for how popular apples are, but not beers. If beers are also very popular in general, there will be a higher chance that a transaction containing apples will also contain beers, thus inflating the confidence measure. To account for the base popularity of both constituent items, we use a third measure called lift.

#### Measure 3: Lift. 
This says how likely item Y is purchased when item X is purchased, while controlling for how popular item Y is. In Table 1, the lift of {apple -> beer} is 1,which implies no association between items. A lift value greater than 1 means that item Y is likely to be bought if item X is bought, while a value less than 1 means that item Y is unlikely to be bought if item X is bought.
![](https://annalyzin.files.wordpress.com/2016/03/association-rule-lift-eqn.png?w=566&h=80)

In [8]:
from apyori import apriori

In [30]:
association_rules = apriori(records, min_support=0.045)
association_results = list(association_rules)

In [31]:
association_results

[RelationRecord(items=frozenset({'burgers'}), support=0.0871883748833489, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'burgers'}), confidence=0.0871883748833489, lift=1.0)]),
 RelationRecord(items=frozenset({'cake'}), support=0.08105585921877083, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'cake'}), confidence=0.08105585921877083, lift=1.0)]),
 RelationRecord(items=frozenset({'champagne'}), support=0.04679376083188908, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'champagne'}), confidence=0.04679376083188908, lift=1.0)]),
 RelationRecord(items=frozenset({'chicken'}), support=0.05999200106652446, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'chicken'}), confidence=0.05999200106652446, lift=1.0)]),
 RelationRecord(items=frozenset({'chocolate'}), support=0.1638448206905746, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_a

In [32]:
len(association_results)

65

In [18]:
association_results[0]

RelationRecord(items=frozenset({'burgers'}), support=0.0871883748833489, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'burgers'}), confidence=0.0871883748833489, lift=1.0)])

In [19]:
association_rules = apriori(records, min_support=0.0045, min_confidence=0.2, min_lift=3, min_length=2)
association_results = list(association_rules)

## How many relation derived

In [20]:
len(association_results)

48

### Association Rules Derived

In [21]:
association_results[0]

RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)])

In [22]:
for rule in association_results:
    print(f'{rule[0]} ---------> {rule[2][0][2]}')

frozenset({'chicken', 'light cream'}) ---------> 0.29059829059829057
frozenset({'mushroom cream sauce', 'escalope'}) ---------> 0.3006993006993007
frozenset({'escalope', 'pasta'}) ---------> 0.3728813559322034
frozenset({'ground beef', 'herb & pepper'}) ---------> 0.3234501347708895
frozenset({'ground beef', 'tomato sauce'}) ---------> 0.3773584905660377
frozenset({'olive oil', 'whole wheat pasta'}) ---------> 0.2714932126696833
frozenset({'shrimp', 'pasta'}) ---------> 0.3220338983050847
frozenset({'nan', 'chicken', 'light cream'}) ---------> 0.29059829059829057
frozenset({'chocolate', 'frozen vegetables', 'shrimp'}) ---------> 0.23255813953488375
frozenset({'ground beef', 'spaghetti', 'cooking oil'}) ---------> 0.5714285714285714
frozenset({'mushroom cream sauce', 'escalope', 'nan'}) ---------> 0.3006993006993007
frozenset({'pasta', 'escalope', 'nan'}) ---------> 0.3728813559322034
frozenset({'ground beef', 'spaghetti', 'frozen vegetables'}) ---------> 0.31100478468899523
frozenset({

References : **Theory** https://www.kdnuggets.com/2016/04/association-rules-apriori-algorithm-tutorial.html

**Medium Blog** : https://medium.com/@khaledgama4/customer-behavior-analysis-identifies-hidden-patterns-association-rule-algorithm-6630e1abaafd