<a href="https://colab.research.google.com/github/aleksanderprofic/Machine-Learning/blob/master/AssociationRuleLearning/Apriori/market_basket_optimisation_apriori.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Apriori

## Importing the libraries

In [2]:
!pip install apyori

Collecting apyori
  Downloading https://files.pythonhosted.org/packages/5e/62/5ffde5c473ea4b033490617ec5caa80d59804875ad3c3c57c0976533a21a/apyori-1.1.2.tar.gz
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25l[?25hdone
  Created wheel for apyori: filename=apyori-1.1.2-cp36-none-any.whl size=5975 sha256=c2e27155bb308794ade9a5c0d0575d33969933c93e9844cd5e6c9ec2fdcf4bdc
  Stored in directory: /root/.cache/pip/wheels/5d/92/bb/474bbadbc8c0062b9eb168f69982a0443263f8ab1711a8cad0
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


In [3]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Data Preprocessing

In [5]:
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header=None)
dataset.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


### Creating transactions list where every transaction is a list of producs that particular person bought

In [6]:
transactions = [row[row.notnull()].to_list() for _, row in dataset.iterrows()]

## Training the Apriori model on the dataset

In [49]:
from apyori import apriori
# min_support is at least how many times per week items appeared in one transaction - 
#   in this case 0.003 = 3 x 7 / 7501 = times per day x times per week / number of transactions in a week
# min_confidence is how many times if we have product A then we also have product B;
#   as a rule of thumb it is set by default to 0.8 (80% of the time we have product B together with product A), 
#   but if it doesn't give many results we can go lower by half - in this case 0.2 is nice
# min_lift - good idea is to set lift to at least 3
rules = apriori(transactions=transactions, min_support=0.003, min_confidence=0.2, min_lift=3, min_length=2, max_length=2)

## Visualising the results

### Displaying the first results coming directly from the output of the apriori function

In [50]:
results = list(rules)
results

[RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'mushroom cream sauce', 'escalope'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'escalope', 'pasta'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'honey', 'fromage blanc'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

### Putting the results well organised into a Pandas DataFrame

In [51]:
def extract_data_from_results(results):
    extracted_data = []
    for result in results:
        result_statistics = result.ordered_statistics[0]

        lhs               = tuple(result_statistics.items_base)[0]
        rhs               = tuple(result_statistics.items_add)[0]
        support           = result.support
        confidence        = result_statistics.confidence
        lift              = result_statistics.lift

        extracted_data.append([lhs, rhs, support, confidence, lift])
    return extracted_data

df = pd.DataFrame(extract_data_from_results(results), columns=['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])

### Displaying the results non sorted

In [52]:
df

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
0,light cream,chicken,0.004533,0.290598,4.843951
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
2,pasta,escalope,0.005866,0.372881,4.700812
3,fromage blanc,honey,0.003333,0.245098,5.164271
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
6,light cream,olive oil,0.0032,0.205128,3.11471
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
8,pasta,shrimp,0.005066,0.322034,4.506672


### Displaying the results sorted by descending lifts

In [53]:
df.sort_values(by='Lift', axis=0, ascending=False, inplace=True, ignore_index=True)
df

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
0,fromage blanc,honey,0.003333,0.245098,5.164271
1,light cream,chicken,0.004533,0.290598,4.843951
2,pasta,escalope,0.005866,0.372881,4.700812
3,pasta,shrimp,0.005066,0.322034,4.506672
4,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
6,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
7,herb & pepper,ground beef,0.015998,0.32345,3.291994
8,light cream,olive oil,0.0032,0.205128,3.11471


These are 8 rules that our algorithm found. If someone buys a product from the Left Hand Side column then there is a high chance that he will also buy the product from the Right Hand Side column.