<a href="https://colab.research.google.com/github/RafaelAnga/Artificial-Intelligence/blob/main/Market_Basket_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Market Basket Analysis using Apriori Algorithm

This analysis explores customer purchasing patterns in a retail dataset using the Apriori algorithm.
We analyze 7,501 transactions to discover associations between products and identify which items are
frequently bought together.

We will:
- Process transaction data from 'Market_Basket_Optimisation.csv'
- Apply the Apriori algorithm with parameters:
  * min_support = 0.003 (item frequency)
  * min_confidence = 0.2 (rule strength)
  * min_lift = 3 (relationship strength)
- Generate and visualize association rules
- Identify top product combinations by confidence

Expected outcomes:
- Discover strong product associations
- Identify frequently co-purchased items
- Generate actionable insights for product placement and marketing strategies

## Importing the libraries

In [None]:
!pip install apyori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25l[?25hdone
  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5954 sha256=19e9e232cfa3e3aab055de161b9dd837a4f322a9cf665979fd826dca12473416
  Stored in directory: /root/.cache/pip/wheels/c4/1a/79/20f55c470a50bb3702a8cb7c94d8ada15573538c7f4baebe2d
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [None]:
# Used to connect to google drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
 # Library necesary to access the folder route
import os
os.chdir('/content/drive/MyDrive/Machine Python and R/Association Rule Learning/Dataset')

#Lists the available directories
os.listdir()

['Market_Basket_Optimisation.csv']

In [None]:
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header = None)
transactions = []
for i in range(0,7501):
  transactions.append([str(dataset.values[i,j]) for j in range(0,20)])

## Data Preprocessing

## Training the Apriori model on the dataset

In [None]:
from apyori import apriori
rules = apriori(transactions = transactions, min_support= 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2,max_length = 2)

## Visualising the results

### Displaying the first results coming directly from the output of the apriori function

In [None]:
results = list(rules)
results

[RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'escalope', 'pasta'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'honey', 'fromage blanc'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

### Putting the results well organised into a Pandas DataFrame

In [None]:
# Function to extract and organize Apriori results into a structured format
def inspect(results):
    # Extract Left Hand Side items (first item in each rule)
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    # Extract Right Hand Side items (second item in each rule)
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    # Extract support values for each rule
    supports    = [result[1] for result in results]
    # Extract confidence values for each rule
    confidences = [result[2][0][2] for result in results]
    # Extract lift values for each rule
    lifts       = [result[2][0][3] for result in results]
    # Combine all extracted values into a list of tuples
    return list(zip(lhs, rhs, supports, confidences, lifts))

# Create DataFrame with organized results and labeled columns
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])

### Displaying the results non sorted

In [None]:
# Confidence: What percent sure are we that they will buy the next product.
# Support: What percaentage does this happen in the data
# Lift: How strong the rule is
resultsinDataFrame

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
0,light cream,chicken,0.004533,0.290598,4.843951
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
2,pasta,escalope,0.005866,0.372881,4.700812
3,fromage blanc,honey,0.003333,0.245098,5.164271
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
6,light cream,olive oil,0.0032,0.205128,3.11471
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
8,pasta,shrimp,0.005066,0.322034,4.506672


### Displaying the results sorted by descending lifts

In [None]:
# Display top 10 rules with highest confidence values (strongest product associations)
resultsinDataFrame.nlargest(n = 10, columns= 'Confidence')

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
2,pasta,escalope,0.005866,0.372881,4.700812
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
8,pasta,shrimp,0.005066,0.322034,4.506672
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
0,light cream,chicken,0.004533,0.290598,4.843951
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
3,fromage blanc,honey,0.003333,0.245098,5.164271
6,light cream,olive oil,0.0032,0.205128,3.11471


Explanation of results
This analysis uses the Apriori algorithm to discover associations between products in shopping baskets. Here's a simple breakdown:

Rules Found:

Each rule shows which products are commonly bought together
The analysis focuses on pairs of items (min_length = 2, max_length = 2)
Rules are evaluated using three key metrics:
Support: How frequently the items appear together
Confidence: Probability of buying one item when buying the other
Lift: Strength of the association between items
Parameters Used:

Minimum support: 0.003 (item combinations must appear in at least 0.3% of transactions)
Minimum confidence: 0.2 (20% certainty of rule accuracy)
Minimum lift: 3 (rules must show at least 3x stronger association than random chance)

Conclusion for the Code
At the end of the analysis, we can summarize the findings as:

The algorithm identified significant product associations in the market basket data, which are:

Displayed in a DataFrame showing Left Hand Side (first product) and Right Hand Side (associated product)
Sorted by confidence to show the strongest associations first
Filtered to show only meaningful relationships (lift > 3)
Business Applications:

Product placement optimization
Promotional strategy development
Cross-selling recommendations
Inventory management
Store layout planning
The results can be used to:

Design targeted promotions
Optimize store layouts
Improve stock management
Create effective product bundles
Enhance customer shopping experience