# Apriori

## Importing the libraries

In [85]:
import pandas as pd
import matplotlib.pyplot as plt

## Data Preprocessing

In [86]:
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header=None)

# Optimized the process of extracting non-missing data from each row 
# by using dataset.apply() with a lambda function, leveraging pandas' 
# vectorized operations for better performance. 
transactions = dataset.apply(
    lambda row: [str(item) for item in row if pd.notna(item)], axis=1).tolist()

## Training the Apriori model on the dataset

In [87]:
from apyori import apriori

# Take products that appear at least 3 times per day.
# All transactions were recorded over the course of a week (7 days).
min_support = 3 * 7 / len(dataset)

# Took this value according to the course (I have no idea why)
min_confidence = 0.2

# The same as min_confidence
min_lif = 3

# Trying to find the best combinations of two products
max_length = 2

rules = apriori(transactions=transactions,
                min_support=min_support,
                min_confidence=min_confidence,
                min_lif=min_lif,
                max_length=max_length)

# Remove rules with one item
rules = [rule for rule in rules if len(rule.items) >= 2]

## Visualising the results

### Displaying the results coming directly from the output of the apriori function

In [88]:
results = list(rules)
results[:5] # display 5 elements for demonstration data structure

[RelationRecord(items=frozenset({'almonds', 'burgers'}), support=0.005199306759098787, ordered_statistics=[OrderedStatistic(items_base=frozenset({'almonds'}), items_add=frozenset({'burgers'}), confidence=0.25490196078431376, lift=2.923577382023146)]),
 RelationRecord(items=frozenset({'almonds', 'chocolate'}), support=0.005999200106652446, ordered_statistics=[OrderedStatistic(items_base=frozenset({'almonds'}), items_add=frozenset({'chocolate'}), confidence=0.29411764705882354, lift=1.7950988369310295)]),
 RelationRecord(items=frozenset({'almonds', 'eggs'}), support=0.006532462338354886, ordered_statistics=[OrderedStatistic(items_base=frozenset({'almonds'}), items_add=frozenset({'eggs'}), confidence=0.3202614379084967, lift=1.7821076007059597)]),
 RelationRecord(items=frozenset({'almonds', 'french fries'}), support=0.004399413411545127, ordered_statistics=[OrderedStatistic(items_base=frozenset({'almonds'}), items_add=frozenset({'french fries'}), confidence=0.21568627450980393, lift=1.261

### Putting the results well organised into a Pandas DataFrame

In [89]:
# Specific implementation for the current case
def inspect(results):
    return [
        (statistic.items_base, statistic.items_add, result.support, statistic.confidence, statistic.lift)
        for result in results
        for statistic in result.ordered_statistics
    ]

resultDataFrame = pd.DataFrame(
    inspect(results), 
    columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])

### Displaying the results sorted by descending lifts

In [94]:
# Sort output by Lift column
resultDataFrame.nlargest(n=10, columns='Lift') 

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
171,(fromage blanc),(honey),0.003333,0.245098,5.164271
61,(light cream),(chicken),0.004533,0.290598,4.843951
144,(pasta),(escalope),0.005866,0.372881,4.700812
279,(pasta),(shrimp),0.005066,0.322034,4.506672
276,(whole wheat pasta),(olive oil),0.007999,0.271493,4.12241
60,(extra dark chocolate),(chicken),0.0028,0.233333,3.889407
204,(tomato sauce),(ground beef),0.005333,0.377358,3.840659
143,(mushroom cream sauce),(escalope),0.005733,0.300699,3.790833
194,(herb & pepper),(ground beef),0.015998,0.32345,3.291994
217,(light cream),(olive oil),0.0032,0.205128,3.11471


## Explanation
For example lets take combination of the products with the max lift

In [92]:
resultDataFrame.nlargest(n=1, columns='Lift') 

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
171,(fromage blanc),(honey),0.003333,0.245098,5.164271


### Support

<img src="apriori_files/support_formula.png" alt="Support" width="600"/>

Support is the proportion of transactions that contain a specific combination. The support for the `fromage blanc - honey` is 0.003333 or 3%. So, 3% of all transactions include the `fromage blanc - honey` combination.

### Confidence

<img src="apriori_files/confidence_formula.png" alt="Confidence" width="600"/>

Confidence is the probability that the second item will appear in a transaction if the first one is already present. In this example, if a transaction contains `fromage blanc`, there is approximately a 24.5% chance (0.245) that it will also contain `honey`.

### Lift
<img src="apriori_files/lift_formula.png" alt="Confidence" width="600"/>

Lift measures how much more likely the second item `honey` is to appear in a transaction with the first item `fromage blanc` compared to if they were independent. 

In our case the Lift of 5.16 means that the probability of purchasing `honey` along with `fromage blanc` is  5.16 times higher than purchasing honey independently of `fromage blanc`.

Summary
- Lift > 1: Positive association (items are often bought together).
- Lift = 1: No association (items are bought independently).
- Lift < 1: Negative association (items are rarely bought together).