# Market Basket Analysis with Apriori Algorithm

### Support, Confidence, Lift
- Support:  
    - proportion of transactions in the dataset that contain both items in the rule.
    - frequency of occurrence of a rule in the dataset. 
    - high support value indicates that the rule occurs frequently in the dataset.

- Confidence: 
    - conditional probability that a transaction containing the antecedent (items on the left-hand side of the rule) also contains the consequent (items on the right-hand side of the rule). 
    - measures the strength of a rule. 
    - high confidence value indicates that the rule is more likely to be true.

- Lift: 
    - ratio of observed support to expected support if the items in the rule were statistically independent. 
    - lift > 1 means rule occurs more frequently than would be expected,
    - lift < 1 means rule occurs less frequently than expected. 
    - lift = 1 means antecedent and the consequent are independent of each other.


In [52]:
!pip3 install apyori

Defaulting to user installation because normal site-packages is not writeable


In [None]:
from apyori import apriori

In [40]:

# A toy dataset with transactions 
dataset = [    
    ['egg','bread', 'milk', 'cheese'],
    ['bread', 'milk' , 'egg'],
    ['beer','milk', 'cheese'],
    ['beer','bread', 'butter'],
    ['egg', 'butter', 'cheese'],
    ['egg', 'bread', 'butter'],
    ['beer', 'bread', 'milk', 'cheese'],
    ['beer', 'bread', 'milk'],
    ['egg', 'milk', 'cheese'],
    ['egg', 'bread', 'butter'],
    ['egg', 'butter', 'cheese'],
    ['egg', 'bread', 'butter'],
    ['beer', 'butter', 'cheese']
]

## Lets check some rules
- min_support: minimum frequency of an itemset to be considered frequent
- min_confidence: minimum level of confidence for a rule to be considered significant. 
- min_lift : minimum level of lift for a rule to be considered interesting, 
- min_length: minimum length of the itemset to be considered in the analysis.

#### In the example below, min_support= 0.5, min_confidence= 0.6, min_lift = 1, and min_length = 2. 
- it will show itemset that occur in at least 50% of the transactions, rules with at least 60% confidence, and rules with a lift of at least 1.

In [49]:
rules = apriori(dataset, min_support=0.5, min_confidence=0.6, min_lift=1, min_length=2)

for rule in rules:
    print(rule)

RelationRecord(items=frozenset({'bread'}), support=0.6153846153846154, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'bread'}), confidence=0.6153846153846154, lift=1.0)])
RelationRecord(items=frozenset({'egg'}), support=0.6153846153846154, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'egg'}), confidence=0.6153846153846154, lift=1.0)])


In the first rule:
-  rule is of item 'bread'.
-  THe rule indicates: customers who buy 'bread' are also likely to buy 'bread' again. ( seems obvious!!)
    -  support = 0.6154 i.e. 61.54% of the transactions  contain  'bread'
    -  confidence = 0.6154 i.e. if a customer buys any items in the dataset, they are 61.54% likely to also buy 'bread'. 
    -  lift value = 1.0 indicates that the occurrence of 'bread' is not statistically dependent on any other item in the dataset.

## Filter rules related to 'beer' with  min_support=0.2, min_confidence=0.5

In [30]:
# Run Apriori algorithm to find association rules
rules = apriori(dataset, min_support=0.2, min_confidence=0.5)

# Filter rules related to 'bread'
beer_rules = [rule for rule in rules if 'beer' in rule.items]
# Print the bread-related rules
if not beer_rules:
    print("No rules found for 'beer'")
else:
    for rule in beer_rules:
        print(rule)

RelationRecord(items=frozenset({'bread', 'beer'}), support=0.23076923076923078, ordered_statistics=[OrderedStatistic(items_base=frozenset({'beer'}), items_add=frozenset({'bread'}), confidence=0.6, lift=0.9749999999999999)])
RelationRecord(items=frozenset({'cheese', 'beer'}), support=0.23076923076923078, ordered_statistics=[OrderedStatistic(items_base=frozenset({'beer'}), items_add=frozenset({'cheese'}), confidence=0.6, lift=1.1142857142857143)])
RelationRecord(items=frozenset({'milk', 'beer'}), support=0.23076923076923078, ordered_statistics=[OrderedStatistic(items_base=frozenset({'beer'}), items_add=frozenset({'milk'}), confidence=0.6, lift=1.2999999999999998), OrderedStatistic(items_base=frozenset({'milk'}), items_add=frozenset({'beer'}), confidence=0.5, lift=1.2999999999999998)])


In the first rule:
-  rule is of item 'bread' and 'beer'.
-  The rule indicates: customers who buy 'beer' are 60% likely to also buy 'bread'. 
-  This rule has a lift value of 0.975, which is less than 1.0, indicating that the occurrence of 'bread' is negatively dependent on the occurrence of 'beer'.  
- frozenset({'bread', 'beer'}) indicates the rule that involves both 'bread' and 'beer':
    - support = 0.2308 i.e. 23.08% of the transactions in the dataset contain both items. 
    - The ordered_statistics field contains information about the rule itself, including the confidence and lift metrics.

## Filter rules related to 'bread' with  min_support=0.2, min_confidence=0.5, min_lift=1.2

In [33]:
# Run Apriori algorithm to find association rules
rules = apriori(dataset, min_support=0.2, min_confidence=0.5, min_lift=1.1)

# Filter rules related to 'bread'
bread_rules = [rule for rule in rules if 'bread' in rule.items]
# Print the bread-related rules
if not bread_rules:
    print("No rules found for 'bread'")
else:
    for rule in bread_rules:
        print(rule)

RelationRecord(items=frozenset({'butter', 'egg', 'bread'}), support=0.23076923076923078, ordered_statistics=[OrderedStatistic(items_base=frozenset({'butter', 'bread'}), items_add=frozenset({'egg'}), confidence=0.75, lift=1.21875), OrderedStatistic(items_base=frozenset({'egg', 'bread'}), items_add=frozenset({'butter'}), confidence=0.6, lift=1.1142857142857143)])


In [34]:
### Change the lift to  min_lift=1 and see

In [35]:
# Run Apriori algorithm to find association rules
rules = apriori(dataset, min_support=0.2, min_confidence=0.5, min_lift=1)

# Filter rules related to 'bread'
bread_rules = [rule for rule in rules if 'bread' in rule.items]
# Print the bread-related rules
if not bread_rules:
    print("No rules found for 'bread'")
else:
    for rule in bread_rules:
        print(rule)

RelationRecord(items=frozenset({'bread'}), support=0.6153846153846154, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'bread'}), confidence=0.6153846153846154, lift=1.0)])
RelationRecord(items=frozenset({'egg', 'bread'}), support=0.38461538461538464, ordered_statistics=[OrderedStatistic(items_base=frozenset({'bread'}), items_add=frozenset({'egg'}), confidence=0.625, lift=1.015625), OrderedStatistic(items_base=frozenset({'egg'}), items_add=frozenset({'bread'}), confidence=0.625, lift=1.015625)])
RelationRecord(items=frozenset({'bread', 'milk'}), support=0.3076923076923077, ordered_statistics=[OrderedStatistic(items_base=frozenset({'bread'}), items_add=frozenset({'milk'}), confidence=0.5, lift=1.0833333333333333), OrderedStatistic(items_base=frozenset({'milk'}), items_add=frozenset({'bread'}), confidence=0.6666666666666666, lift=1.0833333333333333)])
RelationRecord(items=frozenset({'butter', 'egg', 'bread'}), support=0.23076923076923078, ordered_statist

## Filter rules related to 'cheese' and 'milk' with  min_support=0.2, min_confidence=0.2, min_lift=1.2

In [27]:
# Run Apriori algorithm to find association rules
rules = apriori(dataset, min_support=0.2, min_confidence=0.2, min_lift=1.2)


# Filter rules related to 'cheese' and 'milk'
cheese_milk_rules = [rule for rule in rules if ('cheese' in rule.items) and ('milk' in rule.items)]

# Print the cheese-milk rules
for rule in cheese_milk_rules:
    print(rule)

RelationRecord(items=frozenset({'cheese', 'milk'}), support=0.3076923076923077, ordered_statistics=[OrderedStatistic(items_base=frozenset({'cheese'}), items_add=frozenset({'milk'}), confidence=0.5714285714285715, lift=1.2380952380952381), OrderedStatistic(items_base=frozenset({'milk'}), items_add=frozenset({'cheese'}), confidence=0.6666666666666666, lift=1.2380952380952381)])


# TO DO
1. Find the related rules to 'beer' and 'egg' with min_support=0.2, min_confidence=0.5, min_lift=1.2 ( also, explain about the result you get)
2. Check rules related to 'bread', apply min_support=0.5, min_confidence=0.7, min_lift=1.2 and see
    If no rules are obtained then add some data to get some rules. THen explain about it.

References:

Liu, B., Hsu, W., & Ma, Y. (1999). Pruning and dynamic ordering of acquired association rules. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (pp. 454-458).
Tan, P. N., Steinbach, M., & Kumar, V. (2006). Introduction to data mining. Pearson Education.
https://www.kaggle.com/code/yugagrawal95/market-basket-analysis-apriori-in-python
https://roshnirathore11-12.medium.com/market-basket-analysis-using-apriori-algorithm-in-python-874332318cd9
https://github.com/ymoch/apyori