# Apriori

## Importing the libraries

Association Rule Mining

Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.

Association Rules are widely used to analyze retail basket or transaction data and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules.

Details of the dataset

The dataset has 9835 rows of the purchase orders of people from the grocery stores. These orders can be analysed and association rules can be generated using Market Basket Analysis by algorithms like Apriori Algorithm.

Some important terms:

Support: This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.

Confidence: This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears.

Lift: This says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is.

In [1]:
!pip install apyori

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25l[?25hdone
  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5976 sha256=7d6f2193557aa868f97bd68dafe297629321766acdfcba6bc0de5e49e820e8f6
  Stored in directory: /root/.cache/pip/wheels/32/2a/54/10c595515f385f3726642b10c60bf788029e8f3a1323e3913a
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


In [2]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Data Preprocessing

In [7]:
df=pd.read_csv('groceries - groceries.csv',header=None)

In [8]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,23,24,25,26,27,28,29,30,31,32
0,Item(s),Item 1,Item 2,Item 3,Item 4,Item 5,Item 6,Item 7,Item 8,Item 9,...,Item 23,Item 24,Item 25,Item 26,Item 27,Item 28,Item 29,Item 30,Item 31,Item 32
1,4,citrus fruit,semi-finished bread,margarine,ready soups,,,,,,...,,,,,,,,,,
2,3,tropical fruit,yogurt,coffee,,,,,,,...,,,,,,,,,,
3,1,whole milk,,,,,,,,,...,,,,,,,,,,
4,4,pip fruit,yogurt,cream cheese,meat spreads,,,,,,...,,,,,,,,,,


In [18]:
df = df.drop(0, axis=1)

In [19]:
df.head()

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,23,24,25,26,27,28,29,30,31,32
1,citrus fruit,semi-finished bread,margarine,ready soups,,,,,,,...,,,,,,,,,,
2,tropical fruit,yogurt,coffee,,,,,,,,...,,,,,,,,,,
3,whole milk,,,,,,,,,,...,,,,,,,,,,
4,pip fruit,yogurt,cream cheese,meat spreads,,,,,,,...,,,,,,,,,,
5,other vegetables,whole milk,condensed milk,long life bakery product,,,,,,,...,,,,,,,,,,


In [30]:
df.shape

(9835, 32)

In [21]:
transactions = []
for i in range(0, 9835):
  transactions.append([str(df.values[i,j]) for j in range(0, 31)])

## Training the Apriori model on the dataset

In [22]:
from apyori import apriori

In [31]:
rules=apriori(transactions=transactions,min_support=0.002, min_confidence=0.1,min_lift=3, min_length=2, max_length=2)

In [32]:
rules

<generator object apriori at 0x7f6907422f20>

## Visualising the results

### Displaying the first results coming directly from the output of the apriori function

In [33]:
results=list(rules)

In [34]:
results

[RelationRecord(items=frozenset({'Instant food products', 'hamburger meat'}), support=0.003050330452465684, ordered_statistics=[OrderedStatistic(items_base=frozenset({'Instant food products'}), items_add=frozenset({'hamburger meat'}), confidence=0.379746835443038, lift=11.42143769597027)]),
 RelationRecord(items=frozenset({'baking powder', 'sugar'}), support=0.003253685815963396, ordered_statistics=[OrderedStatistic(items_base=frozenset({'baking powder'}), items_add=frozenset({'sugar'}), confidence=0.18390804597701146, lift=5.431638535086811)]),
 RelationRecord(items=frozenset({'baking powder', 'whipped/sour cream'}), support=0.004575495678698526, ordered_statistics=[OrderedStatistic(items_base=frozenset({'baking powder'}), items_add=frozenset({'whipped/sour cream'}), confidence=0.25862068965517243, lift=3.607850330154072)]),
 RelationRecord(items=frozenset({'beef', 'flour'}), support=0.0028469750889679717, ordered_statistics=[OrderedStatistic(items_base=frozenset({'flour'}), items_add

### Putting the results well organised into a Pandas DataFrame

In [35]:
def inspect(results):
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    supports    = [result[1] for result in results]
    confidences = [result[2][0][2] for result in results]
    lifts       = [result[2][0][3] for result in results]
    return list(zip(lhs, rhs, supports, confidences, lifts))
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])

### Displaying the results non sorted

In [36]:
resultsinDataFrame

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
0,Instant food products,hamburger meat,0.00305,0.379747,11.421438
1,baking powder,sugar,0.003254,0.183908,5.431639
2,baking powder,whipped/sour cream,0.004575,0.258621,3.60785
3,flour,beef,0.002847,0.163743,3.120948
4,herbs,beef,0.002847,0.175,3.335514
5,beef,root vegetables,0.017387,0.331395,3.040367
6,grapes,berries,0.002644,0.118182,3.55449
7,berries,whipped/sour cream,0.009049,0.272171,3.796886
8,liquor,bottled beer,0.004677,0.422018,5.240594
9,red/blush wine,bottled beer,0.004881,0.253968,3.15376


### Displaying the results sorted by descending lifts

In [37]:
resultsinDataFrame.nlargest(n = 10, columns = 'Lift')

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
0,Instant food products,hamburger meat,0.00305,0.379747,11.421438
35,liquor,red/blush wine,0.002135,0.192661,10.025484
21,flour,sugar,0.004982,0.28655,8.463112
36,popcorn,salty snack,0.002237,0.309859,8.19211
29,ham,processed cheese,0.00305,0.117188,7.070792
37,processed cheese,white bread,0.004169,0.251534,5.975445
32,pasta,hamburger meat,0.002745,0.182432,5.48692
1,baking powder,sugar,0.003254,0.183908,5.431639
8,liquor,bottled beer,0.004677,0.422018,5.240594
31,ham,white bread,0.005084,0.195312,4.639851


##Conclusion
If a customer buys one of left hand side products, how many percent the customer will also buy right hand side product. For example, if a person buys instant food products, he has chance 37% to buy hamburger meat. And this rule appears 0.3% of all transactions and it has a lift of 11.42 which is indeed very good. And so on