# Market Basket Analysis using Apriori Algorithm

**What We Did in this notebook**:
1. Loaded the transactional data (data.csv file).

2. Applied the Apriori algorithm:
    -  Minimum support was set (e.g., 0.02 meaning at least 2% of transactions contain the item).

3. Generated frequent itemsets (pairs, triplets of products bought together).

4. Created association rules:
    - Calculated confidence (how often items are bought together).
    - Calculated lift (how much more likely items are bought together vs random chance).



In [1]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

In [2]:
# Load dataset
df = pd.read_csv('data.csv', encoding='ISO-8859-1')
df.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,12/1/2010 8:26,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,12/1/2010 8:26,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,12/1/2010 8:26,3.39,17850.0,United Kingdom


In [3]:
# Drop rows with missing values and keep positive quantities
df.dropna(inplace=True)
df = df[df['Quantity'] > 0]

In [4]:
# Create a basket for each transaction by country (e.g., UK)
basket = (df[df['Country'] == 'United Kingdom']
          .groupby(['InvoiceNo', 'Description'])['Quantity']
          .sum().unstack().fillna(0))

# Convert quantities to 1 and 0
basket_sets = basket.applymap(lambda x: 1 if x > 0 else 0)

In [5]:
# Apply Apriori Algorithm
frequent_itemsets = apriori(basket_sets, min_support=0.02, use_colnames=True)
frequent_itemsets



Unnamed: 0,support,itemsets
0,0.022404,(3 STRIPEY MICE FELTCRAFT)
1,0.037720,(6 RIBBONS RUSTIC CHARM)
2,0.025767,(60 CAKE CASES VINTAGE CHRISTMAS)
3,0.035257,(60 TEATIME FAIRY CAKE CASES)
4,0.026668,(72 SWEETHEART FAIRY CAKE CASES)
...,...,...
230,0.023004,"(ROSES REGENCY TEACUP AND SAUCER , PINK REGENC..."
231,0.025707,"(WHITE HANGING HEART T-LIGHT HOLDER, RED HANGI..."
232,0.021142,"(ROSES REGENCY TEACUP AND SAUCER , REGENCY CAK..."
233,0.027509,"(WOODEN PICTURE FRAME WHITE FINISH, WOODEN FRA..."


In [6]:
# Generate association rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
rules.sort_values('lift', ascending=False).head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
71,"(ROSES REGENCY TEACUP AND SAUCER , PINK REGENC...",(GREEN REGENCY TEACUP AND SAUCER),0.023004,0.036759,0.020482,0.890339,24.221015,1.0,0.019636,8.783841,0.981287,0.521407,0.886155,0.723764
74,(GREEN REGENCY TEACUP AND SAUCER),"(ROSES REGENCY TEACUP AND SAUCER , PINK REGENC...",0.036759,0.023004,0.020482,0.55719,24.221015,1.0,0.019636,2.206352,0.9953,0.521407,0.546763,0.723764
75,(PINK REGENCY TEACUP AND SAUCER),"(ROSES REGENCY TEACUP AND SAUCER , GREEN REGEN...",0.029611,0.02859,0.020482,0.691684,24.192941,1.0,0.019635,3.150691,0.987919,0.542994,0.682609,0.704035
70,"(ROSES REGENCY TEACUP AND SAUCER , GREEN REGEN...",(PINK REGENCY TEACUP AND SAUCER),0.02859,0.029611,0.020482,0.716387,24.192941,1.0,0.019635,3.421518,0.986881,0.542994,0.707732,0.704035
4,(GREEN REGENCY TEACUP AND SAUCER),(PINK REGENCY TEACUP AND SAUCER),0.036759,0.029611,0.024266,0.660131,22.293137,1.0,0.023177,2.855182,0.991593,0.57632,0.64976,0.739802
5,(PINK REGENCY TEACUP AND SAUCER),(GREEN REGENCY TEACUP AND SAUCER),0.029611,0.036759,0.024266,0.819473,22.293137,1.0,0.023177,5.335706,0.984289,0.57632,0.812583,0.739802
73,(ROSES REGENCY TEACUP AND SAUCER ),"(GREEN REGENCY TEACUP AND SAUCER, PINK REGENCY...",0.040723,0.024266,0.020482,0.50295,20.726763,1.0,0.019494,1.96305,0.992157,0.460189,0.490589,0.673505
72,"(GREEN REGENCY TEACUP AND SAUCER, PINK REGENCY...",(ROSES REGENCY TEACUP AND SAUCER ),0.024266,0.040723,0.020482,0.844059,20.726763,1.0,0.019494,6.151553,0.975423,0.460189,0.837439,0.673505
6,(ROSES REGENCY TEACUP AND SAUCER ),(GREEN REGENCY TEACUP AND SAUCER),0.040723,0.036759,0.02859,0.702065,19.099148,1.0,0.027093,3.233057,0.987871,0.584767,0.690695,0.739921
7,(GREEN REGENCY TEACUP AND SAUCER),(ROSES REGENCY TEACUP AND SAUCER ),0.036759,0.040723,0.02859,0.777778,19.099148,1.0,0.027093,4.316746,0.983805,0.584767,0.768344,0.739921


# Conclusions Drawn from Apriori Results:

| Finding | Meaning | Business Action |
|:--------|:--------|:----------------|
| Products like X and Y were often bought together | Customers commonly buy them in the same cart. | ➔ Bundle these products, give combo offers. |
| High confidence rules (e.g., 80%) | If someone buys X, there's 80% chance they’ll buy Y too. | ➔ Recommend Y whenever X is added to cart. |
| High lift rules (lift > 1) | Strong association (way more than random). | ➔ Cross-sell or suggest together in marketing. |
| Some products form triplet bundles (X, Y, Z) | Three products often bought in one go. | ➔ Create "family packs" or bigger bundles for discounts. |
| Some low-support itemsets | Very rare combinations, not very actionable. | ➔ Can ignore or treat as outliers for now. |


# Example (Hypothetical based on typical Apriori output)
| Antecedent (If Buy) | Consequent (Then Buy) | Support | Confidence | Lift | Business Action |
|:--------------------|:----------------------|:--------|:-----------|:-----|:----------------|
| Milk | Bread | 0.15 | 0.8 | 2.5 | Cross-sell: "Buy Milk? Get Bread at 10% off!" |
| Pen | Notebook | 0.12 | 0.7 | 1.9 | Bundle: Pen + Notebook combo pack |
| Diapers | Baby Wipes | 0.10 | 0.85 | 3.1 | Recommend: "Customers who bought Diapers also bought Baby Wipes" |


# Final Business Conclusions:
- Suggest related items on website (improve cart value).

- Create combo offers based on frequent itemsets.

- Design promotions around products with high lift values.

- Avoid wasting marketing on random unrelated items.

- Better organize shelves in physical stores (keep associated items close).

# Important Note
- Apriori finds patterns, but it does not predict.

- It's descriptive analytics — tells you what HAS happened, not what WILL happen.

- It's good for marketing, product placement, bundling decisions.