# Data Mining / Prospecção de Dados

## Sara C. Madeira, 2024/2025

# Project 1 - Pattern Mining

## Logistics 
**_Read Carefully_**

**Students should work in teams of 3 people**. 

Groups with less than 3 people might be allowed (with valid justification), but will not have better grades for this reason. 

The quality of the project will dictate its grade, not the number of people working.

**The project's solution should be uploaded in Moodle before the end of `May, 4th (23:59)`.** 

Students should **upload a `.zip` file** containing a folder with all the files necessary for project evaluation. 
Groups should be registered in [Moodle](https://moodle.ciencias.ulisboa.pt/mod/groupselect/view.php?id=139096) and the `zip` file should be identified as `PDnn.zip` where `nn` is the number of your group.

**It is mandatory to produce a Jupyter notebook containing code and text/images/tables/etc describing the solution and the results. Projects not delivered in this format will not be graded. You can use `PD_202425_P1.ipynb` as template. In your `.zip` folder you should also include an HTML version of your notebook with all the outputs.**

**Decisions should be justified and results should be critically discussed.** 

Remember that **your notebook should be as clear and organized as possible**, that is, **only the relevant code and experiments should be presented, not everything you tried and did not work, or is not relevant** (that can be discussed in the text, if relevant)! Tables and figures can be used together with text to summarize results and conclusions, improving understanding, readability and concision. **More does not mean better! The target is quality not quantity!**

_**Project solutions containing only code and outputs without discussions will achieve a maximum grade of 10 out of 20.**_

## Dataset and Tools

The dataset to be analysed is **`Foodmart_2025_DM.csv`**, which is a modified and integrated version of the **Foodmart database**, used in several [Kaggle](https://www.kaggle.com) Pattern Mining competitions, with the goal of finding **actionable patterns** by analysing data from the `FOODmart Ltd` company, a leading supermarket chain. 

`FOODmart Ltd` has different types of stores: Deluxe Supermarkets, Gourmet Supermarkets, Mid-Size Grocerys, Small Grocerys and 
Supermarkets. Y

Your **goals** are to find: 
1. **global patterns** (common to all stores) and
2. **local/specific patterns** (related to the type of store).

**`Foodmart_2025_DM.csv`** stores **69549 transactions** from **24 stores**, where **103 different products** can be bought. 

Each transaction (row) has a `STORE_ID` (integer from 1 to 24), and a list of produts (items), together with the quantities bought. 

In the transation highlighted below, a given customer bought 1 unit of soup, 2 of cheese and 1 of wine at store 2.

<img src="Foodmart_2025_DM_Example.png" alt="Foodmart_2025_DM_Example" style="width: 1000px;"/>

In this context, the project has **2 main tasks**:
1. Mining Frequent Itemsets and Association Rules: Ignoring Product Quantities and Stores **(global patterns)**
2. Mining Frequent Itemsets and Association Rules: Looking for Differences between Stores **(local/specific patterns)**

**While doing PATTERN and ASSOCIATION MINING keep in mind the following basic/key questions and BE CREATIVE!**

1. What are the most popular products?
2. Which products are bought together?
3. What are the frequent patterns?
4. Can we find associations highlighting that when people buy a product/set of products also buy other product(s)?
5. Are these associations strong? Can we trust them? Are they misleading?
6. Can we analyse these patterns and evaluate these associations to find, not only frequent and strong associations, but also interest patterns and associations?

**In this project you should use [Python 3](https://www.python.org), [Jupyter Notebook](http://jupyter.org) and [`MLxtend`](http://rasbt.github.io/mlxtend/).**

When using `MLxtend`, frequent patterns can either be discovered using `Apriori` and `FP-Growth`. **Choose the pattern mining algorithm to be used.** 

## Team Identification

**GROUP 18**

Students:

* Student 1 - Lloyd D'Silva 64858
* Student 2 - Matei Lupașcu 64471
* Student 3 - Vram Davtyan 64691

## 1. Mining Frequent Itemsets and Association Rules: Ignoring Product Quantities and Stores

In this first task you should load and preprocessed the dataset **`Foodmart_2025_DM.csv`** in order to compute frequent itemsets and generate association rules considering all the transactions, regardeless of the store, and ignoring product quantities.

### 1.1. Load and Preprocess Dataset

 **Product quantities and stores should not be considered.**

In [1]:
# Preprocessing function
def preprocess_transaction(line):
    items = line.strip().split(',')
    products = []
    for item in items:
        key_value = item.split('=')
        if key_value[0] != 'STORE_ID':  # Skip STORE_ID
            products.append(key_value[0])  # Keep only product name
    return products

# Read and preprocess all transactions
transactions = []
with open('Foodmart_2025_DM.csv', 'r', encoding='utf-8') as file:
    for line in file:
        transaction = preprocess_transaction(line)
        transactions.append(transaction)

# Display example transactions
transactions[:10]

[['Pasta', 'Soup'],
 ['Soup', 'Fresh Vegetables', 'Milk', 'Plastic Utensils'],
 ['Cheese', 'Deodorizers', 'Hard Candy', 'Jam'],
 ['Fresh Vegetables'],
 ['Cleaners', 'Cookies', 'Eggs', 'Preserves'],
 ['Soup', 'Cheese', 'Nasal Sprays'],
 ['Dips', 'Jelly', 'Tofu'],
 ['Cookies', 'Preserves', 'Dips'],
 ['Fresh Vegetables', 'Cleaners', 'Cereal', 'Deli Meats', 'Rice'],
 ['Soup', 'Jelly', 'Flavored Drinks', 'French Fries', 'Spices']]

## 1.2. Compute Frequent Itemsets 
## APRIORI 

* Compute frequent itemsets considering a minimum support S_min. 
* Present frequent itemsets organized by length (number of items). 
* List frequent 1-itemsets, 2-itemsets, 3-itemsets, etc with support of at least S < S_min.
* Change the minimum support values and discuss the results.

In [2]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori



# Encode the transactions into a dataframe
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_ary, columns=te.columns_)

S_min=0.03

# Compute frequent itemsets with a minimum support
frequent_itemsets = apriori(df, min_support=S_min, use_colnames=True)
# Add a column for the length (number of items)
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
# Display itemsets organized by size
print(f'Here are frequent itemsets using the min_support {S_min}')


for i in range(1, frequent_itemsets['length'].max() + 1):
    print(f"\nFrequent {i}-itemsets:")
    display(frequent_itemsets[(frequent_itemsets['length'] == i)])

Here are frequent itemsets using the min_support 0.03

Frequent 1-itemsets:


Unnamed: 0,support,itemsets,length
0,0.053962,(Batteries),1
1,0.040633,(Bologna),1
2,0.078549,(Canned Vegetables),1
3,0.054293,(Cereal),1
4,0.117802,(Cheese),1
5,0.064717,(Chips),1
6,0.066716,(Chocolate Candy),1
7,0.039771,(Cleaners),1
8,0.052912,(Coffee),1
9,0.105408,(Cookies),1



Frequent 2-itemsets:


Unnamed: 0,support,itemsets,length
50,0.031144,"(Fresh Vegetables, Cheese)",2
51,0.035227,"(Dried Fruit, Fresh Vegetables)",2
52,0.050914,"(Fresh Fruit, Fresh Vegetables)",2
53,0.035443,"(Soup, Fresh Vegetables)",2


### Frequent 1-itemsets, 2-itemsets, 3-itemsets, etc with support of 0.004 < S_min

In [3]:
def freq_itemsets_by_support(S_min):
    # Compute frequent itemsets with a minimum support
    frequent_itemsets = apriori(df, min_support=S_min, use_colnames=True)

    # Add a column for the length (number of items)
    frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))

    # Display itemsets organized by size
    for i in range(1, frequent_itemsets['length'].max() + 1):
        print(f"\nFrequent {i}-itemsets:")
        display(frequent_itemsets[(frequent_itemsets['length'] == i)])


supp_val=0.004
print(f'Here are frequent itemsets using the min_support {supp_val}')
freq_itemsets_by_support(supp_val)

Here are frequent itemsets using the min_support 0.004

Frequent 1-itemsets:


Unnamed: 0,support,itemsets,length
0,0.014407,(Acetominifen),1
1,0.014321,(Anchovies),1
2,0.026672,(Aspirin),1
3,0.013357,(Auto Magazines),1
4,0.013444,(Bagels),1
...,...,...,...
97,0.013933,(Toothbrushes),1
98,0.027808,(Tuna),1
99,0.054595,(Waffles),1
100,0.080677,(Wine),1



Frequent 2-itemsets:


Unnamed: 0,support,itemsets,length
102,0.004558,"(Fresh Vegetables, Acetominifen)",2
103,0.008613,"(Aspirin, Fresh Vegetables)",2
104,0.005593,"(Batteries, Cheese)",2
105,0.005234,"(Batteries, Cookies)",2
106,0.006068,"(Batteries, Dried Fruit)",2
...,...,...,...
501,0.006068,"(Soda, Soup)",2
502,0.004198,"(Soup, Spices)",2
503,0.005723,"(Soup, TV Dinner)",2
504,0.006988,"(Waffles, Soup)",2



Frequent 3-itemsets:


Unnamed: 0,support,itemsets,length
506,0.004026,"(Fresh Fruit, Canned Vegetables, Fresh Vegetab...",3
507,0.0045,"(Dried Fruit, Fresh Vegetables, Cheese)",3
508,0.005334,"(Fresh Fruit, Fresh Vegetables, Cheese)",3
509,0.004227,"(Soup, Fresh Vegetables, Cheese)",3
510,0.00555,"(Fresh Fruit, Cookies, Fresh Vegetables)",3
511,0.004946,"(Fresh Fruit, Dried Fruit, Fresh Vegetables)",3
512,0.004673,"(Fresh Fruit, Paper Wipes, Fresh Vegetables)",3
513,0.007045,"(Fresh Fruit, Soup, Fresh Vegetables)",3
514,0.004299,"(Fresh Fruit, Fresh Vegetables, Wine)",3


## Comparison of different support values

In [4]:
supp_val=0.01
print(f'Here are frequent itemsets using the min_support {supp_val}')
freq_itemsets_by_support(supp_val)
####################
supp_val=0.05
print(f'Here are frequent itemsets using the min_support {supp_val}')
freq_itemsets_by_support(supp_val)

Here are frequent itemsets using the min_support 0.01

Frequent 1-itemsets:


Unnamed: 0,support,itemsets,length
0,0.014407,(Acetominifen),1
1,0.014321,(Anchovies),1
2,0.026672,(Aspirin),1
3,0.013357,(Auto Magazines),1
4,0.013444,(Bagels),1
...,...,...,...
97,0.013933,(Toothbrushes),1
98,0.027808,(Tuna),1
99,0.054595,(Waffles),1
100,0.080677,(Wine),1



Frequent 2-itemsets:


Unnamed: 0,support,itemsets,length
102,0.010798,"(Fresh Fruit, Batteries)",2
103,0.015054,"(Batteries, Fresh Vegetables)",2
104,0.011948,"(Bologna, Fresh Vegetables)",2
105,0.012193,"(Fresh Fruit, Canned Vegetables)",2
106,0.022042,"(Canned Vegetables, Fresh Vegetables)",2
...,...,...,...
173,0.011258,"(Fresh Vegetables, Spices)",2
174,0.012222,"(Fresh Vegetables, TV Dinner)",2
175,0.013990,"(Waffles, Fresh Vegetables)",2
176,0.020475,"(Fresh Vegetables, Wine)",2


Here are frequent itemsets using the min_support 0.05

Frequent 1-itemsets:


Unnamed: 0,support,itemsets,length
0,0.053962,(Batteries),1
1,0.078549,(Canned Vegetables),1
2,0.054293,(Cereal),1
3,0.117802,(Cheese),1
4,0.064717,(Chips),1
5,0.066716,(Chocolate Candy),1
6,0.052912,(Coffee),1
7,0.105408,(Cookies),1
8,0.05425,(Cooking Oil),1
9,0.053602,(Deli Meats),1



Frequent 2-itemsets:


Unnamed: 0,support,itemsets,length
31,0.050914,"(Fresh Fruit, Fresh Vegetables)",2


### Frequent Itemset Analysis 1.2

For this analysis, we initially set the minimum support to 3%. With this setting, we found 50 1-itemsets and 4 2-itemsets. Interestingly, we didn’t observe any 3-itemsets at this level of support.

Next, we lowered the minimum support to 0.004, which is much smaller than our initial 3%. This resulted in a much higher number of frequent itemsets. Specifically, we saw 102 1-itemsets, 404 2-itemsets, and 9 3-itemsets. This shows how the number of itemsets increases as we lower the support threshold.

We also tested two other support values — 0.01 and 0.05 — to compare how sensitive the algorithm is to changes in the support value. What we noticed is that even small changes in the support threshold can lead to significant differences in the number of frequent itemsets. For instance, when we changed the minimum support from 0.01 to 0.05, the number of 1-itemsets dropped dramatically from 102 to just 31.

In summary, the algorithm is quite sensitive to changes in the support value, and small adjustments can lead to big differences in the number of frequent itemsets found.


## FPGROWTH

In [5]:
from mlxtend.frequent_patterns import fpgrowth


S_min=0.03
# Compute frequent itemsets with FPGrowth using same support threshold
frequent_itemsets_fp = fpgrowth(df, min_support=S_min, use_colnames=True)

# Add a column for the length (number of items)
frequent_itemsets_fp['length'] = frequent_itemsets_fp['itemsets'].apply(lambda x: len(x))

# Display itemsets organized by size
for i in range(1, frequent_itemsets_fp['length'].max() + 1):
    print(f"\nFPGrowth Frequent {i}-itemsets:")
    display(frequent_itemsets_fp[frequent_itemsets_fp['length'] == i])




FPGrowth Frequent 1-itemsets:


Unnamed: 0,support,itemsets,length
0,0.120059,(Soup),1
1,0.049217,(Pasta),1
2,0.284174,(Fresh Vegetables),1
3,0.066313,(Milk),1
4,0.039814,(Plastic Utensils),1
5,0.117802,(Cheese),1
6,0.041352,(Jam),1
7,0.105408,(Cookies),1
8,0.06604,(Preserves),1
9,0.064041,(Eggs),1



FPGrowth Frequent 2-itemsets:


Unnamed: 0,support,itemsets,length
50,0.035443,"(Soup, Fresh Vegetables)",2
51,0.031144,"(Fresh Vegetables, Cheese)",2
52,0.050914,"(Fresh Fruit, Fresh Vegetables)",2
53,0.035227,"(Dried Fruit, Fresh Vegetables)",2


## Comparison Between Apriori and FPGrowth (For This Dataset)
In our analysis, we applied both the Apriori and FPGrowth algorithms to find frequent itemsets in the transaction data. Here's how they compare in this context:

Apriori: The Apriori algorithm successfully found frequent itemsets by iterating through all possible combinations of products. It produced results like the frequent 1-itemsets (e.g., 'Fresh Vegetables', 'Cheese', and 'Soup') and the 2-itemsets (e.g., combinations like 'Soup' and 'Fresh Vegetables'). However, due to its approach of generating candidate itemsets, it can become slower as the dataset grows, especially when we start considering larger itemsets.

FPGrowth: FPGrowth, in contrast, was able to find similar frequent itemsets (e.g., 'Fresh Vegetables', 'Soup', and 'Cheese' as individual items, and pairs like 'Soup' and 'Fresh Vegetables'). However, it did so in a much more efficient manner by building an FP-tree, which allowed it to avoid the overhead of generating many candidate itemsets.

In conclusion, for this dataset, both algorithms yielded identical frequent itemsets, but FPGrowth is "usually" more efficient and faster, especially when working with larger itemsets, making it the preferable choice for larger-scale data mining tasks.

## 1.3. Generate Association Rules from Frequent Itemsets
## APRIORI

Using a minimum support S_min fundamented by the previous results. 
* Generate association rules with a choosed value (C) for minimum confidence. 
* Generate association rules with a choosed value (L) for minimum lift. 
* Generate association rules with both confidence >= C and lift >= L.
* Change C and L when it makes sense and discuss the results.
* Use other metrics besides confidence and lift.
* Evaluate how good the rules are given the metrics and how interesting they are from your point of view.

In [6]:
from mlxtend.frequent_patterns import association_rules

# Generate all association rules based on frequent itemsets
rules_confidence = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.1)
rules_lift = association_rules(frequent_itemsets, metric="lift", min_threshold=0.9)
rules_both = rules_confidence[rules_confidence['lift'] >= 0.9]

# Try different metrics
rules_by_leverage = association_rules(frequent_itemsets, metric="leverage", min_threshold=0.001)
rules_by_conviction = association_rules(frequent_itemsets, metric="conviction", min_threshold=1.0)

# Display
print("\nRules with confidence >= C:")
display(rules_confidence)

print("\nRules with lift >= L:")
display(rules_lift)

print("\nRules with confidence >= C and lift >= L:")
display(rules_both)

print("\nRules based on leverage:")
display(rules_by_leverage)

print("\nRules based on conviction:")
display(rules_by_conviction)


Rules with confidence >= C:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Fresh Vegetables),(Cheese),0.284174,0.117802,0.031144,0.109593,0.930318,1.0,-0.002333,0.990781,-0.094724,0.083983,-0.009305,0.186983
1,(Cheese),(Fresh Vegetables),0.117802,0.284174,0.031144,0.264372,0.930318,1.0,-0.002333,0.973082,-0.078258,0.083983,-0.027663,0.186983
2,(Dried Fruit),(Fresh Vegetables),0.117212,0.284174,0.035227,0.30054,1.057592,1.0,0.001918,1.023398,0.061686,0.096207,0.022863,0.212251
3,(Fresh Vegetables),(Dried Fruit),0.284174,0.117212,0.035227,0.123963,1.057592,1.0,0.001918,1.007706,0.076073,0.096207,0.007647,0.212251
4,(Fresh Fruit),(Fresh Vegetables),0.175229,0.284174,0.050914,0.290556,1.022457,1.0,0.001118,1.008995,0.02663,0.124639,0.008915,0.23486
5,(Fresh Vegetables),(Fresh Fruit),0.284174,0.175229,0.050914,0.179164,1.022457,1.0,0.001118,1.004794,0.030683,0.124639,0.004771,0.23486
6,(Soup),(Fresh Vegetables),0.120059,0.284174,0.035443,0.29521,1.038835,1.0,0.001325,1.015658,0.042484,0.096105,0.015417,0.209966
7,(Fresh Vegetables),(Soup),0.284174,0.120059,0.035443,0.124722,1.038835,1.0,0.001325,1.005327,0.052224,0.096105,0.005299,0.209966



Rules with lift >= L:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Fresh Vegetables),(Cheese),0.284174,0.117802,0.031144,0.109593,0.930318,1.0,-0.002333,0.990781,-0.094724,0.083983,-0.009305,0.186983
1,(Cheese),(Fresh Vegetables),0.117802,0.284174,0.031144,0.264372,0.930318,1.0,-0.002333,0.973082,-0.078258,0.083983,-0.027663,0.186983
2,(Dried Fruit),(Fresh Vegetables),0.117212,0.284174,0.035227,0.30054,1.057592,1.0,0.001918,1.023398,0.061686,0.096207,0.022863,0.212251
3,(Fresh Vegetables),(Dried Fruit),0.284174,0.117212,0.035227,0.123963,1.057592,1.0,0.001918,1.007706,0.076073,0.096207,0.007647,0.212251
4,(Fresh Fruit),(Fresh Vegetables),0.175229,0.284174,0.050914,0.290556,1.022457,1.0,0.001118,1.008995,0.02663,0.124639,0.008915,0.23486
5,(Fresh Vegetables),(Fresh Fruit),0.284174,0.175229,0.050914,0.179164,1.022457,1.0,0.001118,1.004794,0.030683,0.124639,0.004771,0.23486
6,(Soup),(Fresh Vegetables),0.120059,0.284174,0.035443,0.29521,1.038835,1.0,0.001325,1.015658,0.042484,0.096105,0.015417,0.209966
7,(Fresh Vegetables),(Soup),0.284174,0.120059,0.035443,0.124722,1.038835,1.0,0.001325,1.005327,0.052224,0.096105,0.005299,0.209966



Rules with confidence >= C and lift >= L:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Fresh Vegetables),(Cheese),0.284174,0.117802,0.031144,0.109593,0.930318,1.0,-0.002333,0.990781,-0.094724,0.083983,-0.009305,0.186983
1,(Cheese),(Fresh Vegetables),0.117802,0.284174,0.031144,0.264372,0.930318,1.0,-0.002333,0.973082,-0.078258,0.083983,-0.027663,0.186983
2,(Dried Fruit),(Fresh Vegetables),0.117212,0.284174,0.035227,0.30054,1.057592,1.0,0.001918,1.023398,0.061686,0.096207,0.022863,0.212251
3,(Fresh Vegetables),(Dried Fruit),0.284174,0.117212,0.035227,0.123963,1.057592,1.0,0.001918,1.007706,0.076073,0.096207,0.007647,0.212251
4,(Fresh Fruit),(Fresh Vegetables),0.175229,0.284174,0.050914,0.290556,1.022457,1.0,0.001118,1.008995,0.02663,0.124639,0.008915,0.23486
5,(Fresh Vegetables),(Fresh Fruit),0.284174,0.175229,0.050914,0.179164,1.022457,1.0,0.001118,1.004794,0.030683,0.124639,0.004771,0.23486
6,(Soup),(Fresh Vegetables),0.120059,0.284174,0.035443,0.29521,1.038835,1.0,0.001325,1.015658,0.042484,0.096105,0.015417,0.209966
7,(Fresh Vegetables),(Soup),0.284174,0.120059,0.035443,0.124722,1.038835,1.0,0.001325,1.005327,0.052224,0.096105,0.005299,0.209966



Rules based on leverage:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Dried Fruit),(Fresh Vegetables),0.117212,0.284174,0.035227,0.30054,1.057592,1.0,0.001918,1.023398,0.061686,0.096207,0.022863,0.212251
1,(Fresh Vegetables),(Dried Fruit),0.284174,0.117212,0.035227,0.123963,1.057592,1.0,0.001918,1.007706,0.076073,0.096207,0.007647,0.212251
2,(Fresh Fruit),(Fresh Vegetables),0.175229,0.284174,0.050914,0.290556,1.022457,1.0,0.001118,1.008995,0.02663,0.124639,0.008915,0.23486
3,(Fresh Vegetables),(Fresh Fruit),0.284174,0.175229,0.050914,0.179164,1.022457,1.0,0.001118,1.004794,0.030683,0.124639,0.004771,0.23486
4,(Soup),(Fresh Vegetables),0.120059,0.284174,0.035443,0.29521,1.038835,1.0,0.001325,1.015658,0.042484,0.096105,0.015417,0.209966
5,(Fresh Vegetables),(Soup),0.284174,0.120059,0.035443,0.124722,1.038835,1.0,0.001325,1.005327,0.052224,0.096105,0.005299,0.209966



Rules based on conviction:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Dried Fruit),(Fresh Vegetables),0.117212,0.284174,0.035227,0.30054,1.057592,1.0,0.001918,1.023398,0.061686,0.096207,0.022863,0.212251
1,(Fresh Vegetables),(Dried Fruit),0.284174,0.117212,0.035227,0.123963,1.057592,1.0,0.001918,1.007706,0.076073,0.096207,0.007647,0.212251
2,(Fresh Fruit),(Fresh Vegetables),0.175229,0.284174,0.050914,0.290556,1.022457,1.0,0.001118,1.008995,0.02663,0.124639,0.008915,0.23486
3,(Fresh Vegetables),(Fresh Fruit),0.284174,0.175229,0.050914,0.179164,1.022457,1.0,0.001118,1.004794,0.030683,0.124639,0.004771,0.23486
4,(Soup),(Fresh Vegetables),0.120059,0.284174,0.035443,0.29521,1.038835,1.0,0.001325,1.015658,0.042484,0.096105,0.015417,0.209966
5,(Fresh Vegetables),(Soup),0.284174,0.120059,0.035443,0.124722,1.038835,1.0,0.001325,1.005327,0.052224,0.096105,0.005299,0.209966


## FPGROWTH

In [7]:
# Generate association rules from FPGrowth itemsets
rules_fp_conf = association_rules(frequent_itemsets_fp, metric="confidence", min_threshold=0.1)
rules_fp_lift = association_rules(frequent_itemsets_fp, metric="lift", min_threshold=0.9)
rules_fp_both = rules_fp_conf[rules_fp_conf['lift'] >= 0.9]

# Try different metrics
rules_fp_by_leverage = association_rules(frequent_itemsets_fp, metric="leverage", min_threshold=0.001)
rules_fp_by_conviction = association_rules(frequent_itemsets_fp, metric="conviction", min_threshold=1.0)

print("\nFPGrowth Rules with confidence >= C:")
display(rules_fp_conf)

print("\nFPGrowth Rules with lift >= L:")
display(rules_fp_lift)

print("\nFPGrowth Rules with confidence >= C and lift >= L:")
display(rules_fp_both)

print("\nRules based on leverage:")
display(rules_fp_by_leverage)

print("\nRules based on conviction:")
display(rules_fp_by_conviction)


FPGrowth Rules with confidence >= C:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Soup),(Fresh Vegetables),0.120059,0.284174,0.035443,0.29521,1.038835,1.0,0.001325,1.015658,0.042484,0.096105,0.015417,0.209966
1,(Fresh Vegetables),(Soup),0.284174,0.120059,0.035443,0.124722,1.038835,1.0,0.001325,1.005327,0.052224,0.096105,0.005299,0.209966
2,(Fresh Vegetables),(Cheese),0.284174,0.117802,0.031144,0.109593,0.930318,1.0,-0.002333,0.990781,-0.094724,0.083983,-0.009305,0.186983
3,(Cheese),(Fresh Vegetables),0.117802,0.284174,0.031144,0.264372,0.930318,1.0,-0.002333,0.973082,-0.078258,0.083983,-0.027663,0.186983
4,(Fresh Fruit),(Fresh Vegetables),0.175229,0.284174,0.050914,0.290556,1.022457,1.0,0.001118,1.008995,0.02663,0.124639,0.008915,0.23486
5,(Fresh Vegetables),(Fresh Fruit),0.284174,0.175229,0.050914,0.179164,1.022457,1.0,0.001118,1.004794,0.030683,0.124639,0.004771,0.23486
6,(Dried Fruit),(Fresh Vegetables),0.117212,0.284174,0.035227,0.30054,1.057592,1.0,0.001918,1.023398,0.061686,0.096207,0.022863,0.212251
7,(Fresh Vegetables),(Dried Fruit),0.284174,0.117212,0.035227,0.123963,1.057592,1.0,0.001918,1.007706,0.076073,0.096207,0.007647,0.212251



FPGrowth Rules with lift >= L:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Soup),(Fresh Vegetables),0.120059,0.284174,0.035443,0.29521,1.038835,1.0,0.001325,1.015658,0.042484,0.096105,0.015417,0.209966
1,(Fresh Vegetables),(Soup),0.284174,0.120059,0.035443,0.124722,1.038835,1.0,0.001325,1.005327,0.052224,0.096105,0.005299,0.209966
2,(Fresh Vegetables),(Cheese),0.284174,0.117802,0.031144,0.109593,0.930318,1.0,-0.002333,0.990781,-0.094724,0.083983,-0.009305,0.186983
3,(Cheese),(Fresh Vegetables),0.117802,0.284174,0.031144,0.264372,0.930318,1.0,-0.002333,0.973082,-0.078258,0.083983,-0.027663,0.186983
4,(Fresh Fruit),(Fresh Vegetables),0.175229,0.284174,0.050914,0.290556,1.022457,1.0,0.001118,1.008995,0.02663,0.124639,0.008915,0.23486
5,(Fresh Vegetables),(Fresh Fruit),0.284174,0.175229,0.050914,0.179164,1.022457,1.0,0.001118,1.004794,0.030683,0.124639,0.004771,0.23486
6,(Dried Fruit),(Fresh Vegetables),0.117212,0.284174,0.035227,0.30054,1.057592,1.0,0.001918,1.023398,0.061686,0.096207,0.022863,0.212251
7,(Fresh Vegetables),(Dried Fruit),0.284174,0.117212,0.035227,0.123963,1.057592,1.0,0.001918,1.007706,0.076073,0.096207,0.007647,0.212251



FPGrowth Rules with confidence >= C and lift >= L:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Soup),(Fresh Vegetables),0.120059,0.284174,0.035443,0.29521,1.038835,1.0,0.001325,1.015658,0.042484,0.096105,0.015417,0.209966
1,(Fresh Vegetables),(Soup),0.284174,0.120059,0.035443,0.124722,1.038835,1.0,0.001325,1.005327,0.052224,0.096105,0.005299,0.209966
2,(Fresh Vegetables),(Cheese),0.284174,0.117802,0.031144,0.109593,0.930318,1.0,-0.002333,0.990781,-0.094724,0.083983,-0.009305,0.186983
3,(Cheese),(Fresh Vegetables),0.117802,0.284174,0.031144,0.264372,0.930318,1.0,-0.002333,0.973082,-0.078258,0.083983,-0.027663,0.186983
4,(Fresh Fruit),(Fresh Vegetables),0.175229,0.284174,0.050914,0.290556,1.022457,1.0,0.001118,1.008995,0.02663,0.124639,0.008915,0.23486
5,(Fresh Vegetables),(Fresh Fruit),0.284174,0.175229,0.050914,0.179164,1.022457,1.0,0.001118,1.004794,0.030683,0.124639,0.004771,0.23486
6,(Dried Fruit),(Fresh Vegetables),0.117212,0.284174,0.035227,0.30054,1.057592,1.0,0.001918,1.023398,0.061686,0.096207,0.022863,0.212251
7,(Fresh Vegetables),(Dried Fruit),0.284174,0.117212,0.035227,0.123963,1.057592,1.0,0.001918,1.007706,0.076073,0.096207,0.007647,0.212251



Rules based on leverage:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Soup),(Fresh Vegetables),0.120059,0.284174,0.035443,0.29521,1.038835,1.0,0.001325,1.015658,0.042484,0.096105,0.015417,0.209966
1,(Fresh Vegetables),(Soup),0.284174,0.120059,0.035443,0.124722,1.038835,1.0,0.001325,1.005327,0.052224,0.096105,0.005299,0.209966
2,(Fresh Fruit),(Fresh Vegetables),0.175229,0.284174,0.050914,0.290556,1.022457,1.0,0.001118,1.008995,0.02663,0.124639,0.008915,0.23486
3,(Fresh Vegetables),(Fresh Fruit),0.284174,0.175229,0.050914,0.179164,1.022457,1.0,0.001118,1.004794,0.030683,0.124639,0.004771,0.23486
4,(Dried Fruit),(Fresh Vegetables),0.117212,0.284174,0.035227,0.30054,1.057592,1.0,0.001918,1.023398,0.061686,0.096207,0.022863,0.212251
5,(Fresh Vegetables),(Dried Fruit),0.284174,0.117212,0.035227,0.123963,1.057592,1.0,0.001918,1.007706,0.076073,0.096207,0.007647,0.212251



Rules based on conviction:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Soup),(Fresh Vegetables),0.120059,0.284174,0.035443,0.29521,1.038835,1.0,0.001325,1.015658,0.042484,0.096105,0.015417,0.209966
1,(Fresh Vegetables),(Soup),0.284174,0.120059,0.035443,0.124722,1.038835,1.0,0.001325,1.005327,0.052224,0.096105,0.005299,0.209966
2,(Fresh Fruit),(Fresh Vegetables),0.175229,0.284174,0.050914,0.290556,1.022457,1.0,0.001118,1.008995,0.02663,0.124639,0.008915,0.23486
3,(Fresh Vegetables),(Fresh Fruit),0.284174,0.175229,0.050914,0.179164,1.022457,1.0,0.001118,1.004794,0.030683,0.124639,0.004771,0.23486
4,(Dried Fruit),(Fresh Vegetables),0.117212,0.284174,0.035227,0.30054,1.057592,1.0,0.001918,1.023398,0.061686,0.096207,0.022863,0.212251
5,(Fresh Vegetables),(Dried Fruit),0.284174,0.117212,0.035227,0.123963,1.057592,1.0,0.001918,1.007706,0.076073,0.096207,0.007647,0.212251


## 1. Rules with Confidence >= 0.1:
These rules are selected based on the confidence metric with a threshold of 0.1.

Confidence is the likelihood that the consequent will appear given the antecedent is present. The rules here show strong associations but might not capture the strongest relationships.

## 2. Rules with Lift >= 0.9:
Lift compares the likelihood of the consequent occurring given the antecedent, relative to the likelihood of the consequent occurring independently. It helps identify rules that show a meaningful association beyond random chance.

The minimum threshold is set to 0.9, and rules with higher lift are likely to represent more meaningful relationships.

## 3. Rules with Confidence >= 0.1 and Lift >= 0.9:
These are filtered rules from the previous two sets, selecting only those rules where both confidence and lift meet the specified thresholds.

This set likely contains the most interesting and high-confidence rules with stronger associations, as they are filtered on two important metrics.

## 4. Rules Based on Leverage:
Leverage helps assess how much more likely the items in the rule are to co-occur than by random chance.

With a minimum threshold of 0.001, these rules indicate relationships where the antecedents and consequents appear together more often than expected.

## 5. Rules Based on Conviction:
Conviction is a measure of how strongly the antecedent influences the consequent, considering the probability of the consequent occurring without the antecedent.

The conviction values here indicate rules that suggest a stronger influence of one item on another.


## Key Insights: 1.3
As we apply higher thresholds for confidence, lift, and conviction, the rules tend to focus on stronger, more reliable associations.

The combination of confidence and lift generally yields rules that are more interpretable and relevant in a business context (e.g., cross-selling products or identifying patterns of item purchases).

Leverage and conviction add valuable insights but are often used for more specialized analyses to measure statistical significance and causal influence between items.

## 1.4. Take a Look at Maximal Patterns: Compute Maximal Frequent Itemsets
- discuss their utility compared to frequent patterns
- analyse the association rules they can unravel
## APRIORI

In [8]:
# Maximal frequent itemsets
maximal_itemsets = frequent_itemsets.copy()
maximal_itemsets['is_maximal'] = True

# Check if an itemset is subset of any larger itemset
for idx, row in maximal_itemsets.iterrows():
    for idx2, row2 in maximal_itemsets.iterrows():
        if row['length'] < row2['length'] and row['itemsets'].issubset(row2['itemsets']):
            maximal_itemsets.at[idx, 'is_maximal'] = False
            break

# Only keep maximal itemsets
maximal_itemsets = maximal_itemsets[maximal_itemsets['is_maximal']]

print("\nMaximal frequent itemsets:")
display(maximal_itemsets)


Maximal frequent itemsets:


Unnamed: 0,support,itemsets,length,is_maximal
0,0.053962,(Batteries),1,True
1,0.040633,(Bologna),1,True
2,0.078549,(Canned Vegetables),1,True
3,0.054293,(Cereal),1,True
5,0.064717,(Chips),1,True
6,0.066716,(Chocolate Candy),1,True
7,0.039771,(Cleaners),1,True
8,0.052912,(Coffee),1,True
9,0.105408,(Cookies),1,True
10,0.05425,(Cooking Oil),1,True


## FPGROWTH

In [9]:
# Identify maximal itemsets from FPGrowth results

maximal_fp = frequent_itemsets_fp.copy()
maximal_fp['is_maximal'] = True

for idx, row in maximal_fp.iterrows():
    for idx2, row2 in maximal_fp.iterrows():
        if row['length'] < row2['length'] and row['itemsets'].issubset(row2['itemsets']):
            maximal_fp.at[idx, 'is_maximal'] = False
            break

maximal_fp = maximal_fp[maximal_fp['is_maximal']]

print("\nFPGrowth Maximal frequent itemsets:")
display(maximal_fp)


FPGrowth Maximal frequent itemsets:


Unnamed: 0,support,itemsets,length,is_maximal
1,0.049217,(Pasta),1,True
3,0.066313,(Milk),1,True
4,0.039814,(Plastic Utensils),1,True
6,0.041352,(Jam),1,True
7,0.105408,(Cookies),1,True
8,0.06604,(Preserves),1,True
9,0.064041,(Eggs),1,True
10,0.039771,(Cleaners),1,True
11,0.054868,(Dips),1,True
12,0.039181,(Jelly),1,True


## Maximal Frequent Itemsets

#### The maximal itemsets are typically 1-itemsets, such as (Batteries), (Bologna), (Canned Vegetables), few of them being 2-itemsets.
#### All maximum itemsets are in fact the frequent itemsets using S_min = 0.03.


## 1.5 Conclusions from Mining Frequent Patterns in All Stores (Global Patterns and Rules)

# **Final Conclusion**

In this analysis, we explored frequent itemset mining using both the **Apriori** and **FPGrowth** algorithms, and examined how different support thresholds impact the results. The initial findings showed that with a **minimum support of 3%**, we identified 50 **1-itemsets** and 4 **2-itemsets**, with no **3-itemsets**. When the minimum support was lowered to 0.004, we observed a significant increase in the number of frequent itemsets, including 102 **1-itemsets**, 404 **2-itemsets**, and 9 **3-itemsets**. This demonstrates how lowering the support threshold captures a wider range of itemsets, though it also introduces more noise.

Testing with other support values, such as 0.01 and 0.05, further emphasized the algorithm’s sensitivity to small changes in the support threshold. For instance, the number of **1-itemsets** dropped from 102 to 31 as we increased the minimum support from 0.01 to 0.05, highlighting how different thresholds can drastically change the number of frequent itemsets found.

In addition to frequent itemsets, we explored association rules based on different metrics like **confidence**, **lift**, **leverage**, and **conviction**. We found that higher thresholds for **confidence** and **lift** produced more meaningful and interpretable rules, which are particularly useful in business applications like cross-selling or identifying customer purchasing patterns. **Leverage** and **conviction** offered deeper insights into statistical significance and causal influence, providing value in more specialized analyses.

Lastly, the **maximal itemsets**—mostly **1-itemsets** such as 'Batteries' and 'Bologna'—aligned with the frequent itemsets found using a minimum support of 0.03. These itemsets represent those that cannot be extended with other frequent itemsets, and as such, are the most significant patterns in the dataset.

### **Key Takeaways:**

* The choice of **support** significantly impacts the number of frequent itemsets discovered, with lower thresholds capturing more itemsets but also potentially adding noise.
* **Association rules** based on **confidence** and **lift** provide useful insights for business decision-making, while **leverage** and **conviction** offer more specialized metrics for understanding causal relationships.
* The analysis of **maximal itemsets** helped identify the most significant patterns in the data, which can be valuable for further decision-making or model building.

Overall, the combination of different metrics and algorithms gave a comprehensive view of the frequent itemsets and their relationships, providing actionable insights that could guide future business or research decisions.




## 2. Mining Frequent Itemsets and Association Rules: Looking for Differences between Stores

The 24 stores, whose transactions were analysed in Task 1, are in fact from purchases carried out in **different types of stores**:
* Deluxe Supermarkets: STORE_ID = 8, 12, 13, 17, 19, 21
* Gourmet Supermarkets: STORE_ID = 4, 6
* Mid-Size Grocerys: STORE_ID = 9, 18, 20, 23
* Small Grocerys: STORE_ID = 2, 5, 14, 22
* Supermarkets: STORE_ID = 1, 3, 7, 10, 11, 15, 16

In this context, in this second task you should compute frequent itemsets and association rules for specific groups of stores (specific/local patterns), and then compare the store specific results with those obtained when all transactions were analysed independently of the type of store (global patterns). 

**The goal is to find similarities and differences in buying patterns according to the types of store. Do popular products change? Are there buying patterns specific to the type of store?**

### 2.1. Analyse Deluxe Supermarkets and Gourmet Supermarkets

Here you should analyse **both** the transactions from **Deluxe Supermarkets (STORE_ID = 8, 12, 13, 17, 19, 21)** and **Gourmet Supermarkets (STORE_ID = 4, 6)**.

#### 2.1.1. Load/Preprocess the Dataset

In [10]:
# Re-parse the transactions but keep STORE_ID
def preprocess_transaction_with_store(line):
    items = line.strip().split(',')
    store_id = None
    products = []
    for item in items:
        key_value = item.split('=')
        if key_value[0] == 'STORE_ID':
            store_id = int(key_value[1])
        else:
            products.append(key_value[0])
    return store_id, products

transactions_with_store = []
with open('Foodmart_2025_DM.csv', 'r', encoding='utf-8') as file:
    for line in file:
        store_id, products = preprocess_transaction_with_store(line)
        transactions_with_store.append((store_id, products))

# Show examples
transactions_with_store[:10]


[(2, ['Pasta', 'Soup']),
 (2, ['Soup', 'Fresh Vegetables', 'Milk', 'Plastic Utensils']),
 (2, ['Cheese', 'Deodorizers', 'Hard Candy', 'Jam']),
 (2, ['Fresh Vegetables']),
 (2, ['Cleaners', 'Cookies', 'Eggs', 'Preserves']),
 (2, ['Soup', 'Cheese', 'Nasal Sprays']),
 (2, ['Dips', 'Jelly', 'Tofu']),
 (2, ['Cookies', 'Preserves', 'Dips']),
 (2, ['Fresh Vegetables', 'Cleaners', 'Cereal', 'Deli Meats', 'Rice']),
 (2, ['Soup', 'Jelly', 'Flavored Drinks', 'French Fries', 'Spices'])]

In [11]:
deluxe_ids = {8, 12, 13, 17, 19, 21}
gourmet_ids = {4, 6}

# Filter transactions
deluxe_transactions = [products for store_id, products in transactions_with_store if store_id in deluxe_ids]
gourmet_transactions = [products for store_id, products in transactions_with_store if store_id in gourmet_ids]


#### 2.1.2. Compute Frequent Itemsets

#### DELUXE SUPERMARKETS

##### APRIORI

In [12]:
# Encode
te = TransactionEncoder()
te_ary_deluxe = te.fit(deluxe_transactions).transform(deluxe_transactions)
df_deluxe = pd.DataFrame(te_ary_deluxe, columns=te.columns_)

# Frequent itemsets
S_min = 0.01  
frequent_itemsets_deluxe = apriori(df_deluxe, min_support=S_min, use_colnames=True)
frequent_itemsets_deluxe['length'] = frequent_itemsets_deluxe['itemsets'].apply(lambda x: len(x))

# Organize by length
for i in range(1, frequent_itemsets_deluxe['length'].max() + 1):
    print(f"\nFrequent {i}-itemsets:")
    display(frequent_itemsets_deluxe[frequent_itemsets_deluxe['length'] == i])



Frequent 1-itemsets:


Unnamed: 0,support,itemsets,length
0,0.014466,(Acetominifen),1
1,0.013964,(Anchovies),1
2,0.027042,(Aspirin),1
3,0.014003,(Auto Magazines),1
4,0.014697,(Bagels),1
...,...,...,...
97,0.013926,(Toothbrushes),1
98,0.028816,(Tuna),1
99,0.053235,(Waffles),1
100,0.076689,(Wine),1



Frequent 2-itemsets:


Unnamed: 0,support,itemsets,length
102,0.011496,"(Fresh Fruit, Batteries)",2
103,0.015006,"(Batteries, Fresh Vegetables)",2
104,0.012074,"(Bologna, Fresh Vegetables)",2
105,0.012421,"(Fresh Fruit, Canned Vegetables)",2
106,0.022258,"(Canned Vegetables, Fresh Vegetables)",2
...,...,...,...
169,0.011843,"(Fresh Vegetables, Spices)",2
170,0.013116,"(Fresh Vegetables, TV Dinner)",2
171,0.014042,"(Waffles, Fresh Vegetables)",2
172,0.019519,"(Fresh Vegetables, Wine)",2


##### FPGROWTH

In [13]:
# Frequent itemsets

S_min = 0.01

frequent_itemsets_deluxe_fp = fpgrowth(df_deluxe, min_support=S_min, use_colnames=True)
frequent_itemsets_deluxe_fp['length'] = frequent_itemsets_deluxe_fp['itemsets'].apply(lambda x: len(x))

# Organize by length

for i in range(1, frequent_itemsets_deluxe_fp['length'].max() + 1):
    print(f"\nFrequent {i}-itemsets:")
    display(frequent_itemsets_deluxe_fp[frequent_itemsets_deluxe_fp['length'] == i])



Frequent 1-itemsets:


Unnamed: 0,support,itemsets,length
0,0.176291,(Fresh Fruit),1
1,0.121321,(Soup),1
2,0.013656,(Screwdrivers),1
3,0.055433,(Cooking Oil),1
4,0.057439,(Lightbulbs),1
...,...,...,...
97,0.014234,(Gum),1
98,0.013926,(Toothbrushes),1
99,0.014003,(Auto Magazines),1
100,0.012537,(Pots and Pans),1



Frequent 2-itemsets:


Unnamed: 0,support,itemsets,length
102,0.051499,"(Fresh Fruit, Fresh Vegetables)",2
103,0.022567,"(Fresh Fruit, Soup)",2
104,0.036300,"(Soup, Fresh Vegetables)",2
105,0.016163,"(Cooking Oil, Fresh Vegetables)",2
106,0.014736,"(Lightbulbs, Fresh Vegetables)",2
...,...,...,...
169,0.010840,"(Fresh Fruit, Juice)",2
170,0.013424,"(Pasta, Fresh Vegetables)",2
171,0.013116,"(Fresh Vegetables, TV Dinner)",2
172,0.013309,"(Shampoo, Fresh Vegetables)",2


## GOURMET SUPERMARKETS
## APRIORI

In [14]:
# Encode
te = TransactionEncoder()
te_ary_gourmet = te.fit(gourmet_transactions).transform(gourmet_transactions)
df_gourmet = pd.DataFrame(te_ary_gourmet, columns=te.columns_)

# Frequent itemsets
S_min = 0.01
frequent_itemsets_gourmet = apriori(df_gourmet, min_support=S_min, use_colnames=True)
frequent_itemsets_gourmet['length'] = frequent_itemsets_gourmet['itemsets'].apply(lambda x: len(x))

# Organize by length
for i in range(1, frequent_itemsets_gourmet['length'].max() + 1):
    print(f"\nFrequent {i}-itemsets:")
    display(frequent_itemsets_gourmet[frequent_itemsets_gourmet['length'] == i])


Frequent 1-itemsets:


Unnamed: 0,support,itemsets,length
0,0.013514,(Acetominifen),1
1,0.014640,(Anchovies),1
2,0.030593,(Aspirin),1
3,0.011261,(Auto Magazines),1
4,0.013701,(Bagels),1
...,...,...,...
97,0.013138,(Toothbrushes),1
98,0.026839,(Tuna),1
99,0.053679,(Waffles),1
100,0.083896,(Wine),1



Frequent 2-itemsets:


Unnamed: 0,support,itemsets,length
102,0.015015,"(Batteries, Fresh Vegetables)",2
103,0.010886,"(Bologna, Fresh Vegetables)",2
104,0.010698,"(Canned Vegetables, Cookies)",2
105,0.012575,"(Fresh Fruit, Canned Vegetables)",2
106,0.020270,"(Canned Vegetables, Fresh Vegetables)",2
...,...,...,...
176,0.010135,"(Sugar, Fresh Vegetables)",2
177,0.012763,"(Fresh Vegetables, TV Dinner)",2
178,0.013514,"(Waffles, Fresh Vegetables)",2
179,0.022523,"(Fresh Vegetables, Wine)",2


##### FPGROWTH

In [15]:
# Frequent itemsets
frequent_itemsets_gourmet_fp = fpgrowth(df_gourmet, min_support=S_min, use_colnames=True)
frequent_itemsets_gourmet_fp['length'] = frequent_itemsets_gourmet_fp['itemsets'].apply(lambda x: len(x))

# Organize by length
for i in range(1, frequent_itemsets_gourmet_fp['length'].max() + 1):
    print(f"\nFrequent {i}-itemsets:")
    display(frequent_itemsets_gourmet_fp[frequent_itemsets_gourmet_fp['length'] == i])


Frequent 1-itemsets:


Unnamed: 0,support,itemsets,length
0,0.176239,(Fresh Fruit),1
1,0.054429,(Sliced Bread),1
2,0.041667,(Ice Cream),1
3,0.035473,(Jelly),1
4,0.013514,(Sardines),1
...,...,...,...
97,0.014640,(Anchovies),1
98,0.014452,(Shellfish),1
99,0.011637,(Gum),1
100,0.013138,(Shrimp),1



Frequent 2-itemsets:


Unnamed: 0,support,itemsets,length
102,0.053303,"(Fresh Fruit, Fresh Vegetables)",2
103,0.010135,"(Fresh Fruit, Sliced Bread)",2
104,0.014827,"(Sliced Bread, Fresh Vegetables)",2
105,0.010698,"(Ice Cream, Fresh Vegetables)",2
106,0.012387,"(Jelly, Fresh Vegetables)",2
...,...,...,...
176,0.012763,"(Fresh Vegetables, TV Dinner)",2
177,0.013701,"(Fresh Fruit, Pizza)",2
178,0.015390,"(Pizza, Fresh Vegetables)",2
179,0.013514,"(Lightbulbs, Fresh Vegetables)",2


## 2.1.3. Generate Association Rules from Frequent Itemsets
## APRIORI

#### DELUXE SUPERMARKETS

In [16]:
# Association rules
rules_deluxe_conf = association_rules(frequent_itemsets_deluxe, metric="confidence", min_threshold=0.1)
rules_deluxe_lift = association_rules(frequent_itemsets_deluxe, metric="lift", min_threshold=0.9)
rules_deluxe_both = rules_deluxe_conf[rules_deluxe_conf['lift'] >= 0.9]

# Other metrics
rules_deluxe_leverage = association_rules(frequent_itemsets_deluxe, metric="leverage", min_threshold=0.001)
rules_deluxe_conviction = association_rules(frequent_itemsets_deluxe, metric="conviction", min_threshold=1.0)


# Display
print("\nRules with confidence >= C:")
display(rules_deluxe_conf)

print("\nRules with lift >= L:")
display(rules_deluxe_lift)

print("\nRules with confidence >= C and lift >= L:")
display(rules_deluxe_both)

print("\nRules based on leverage:")
display(rules_deluxe_leverage)

print("\nRules based on conviction:")
display(rules_deluxe_conviction)


Rules with confidence >= C:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Batteries),(Fresh Fruit),0.054469,0.176291,0.011496,0.211048,1.197156,1.0,0.001893,1.044054,0.174174,0.052428,0.042196,0.138128
1,(Batteries),(Fresh Vegetables),0.054469,0.290360,0.015006,0.275496,0.948808,1.0,-0.000810,0.979484,-0.053982,0.045497,-0.020946,0.163588
2,(Bologna),(Fresh Vegetables),0.041623,0.290360,0.012074,0.290083,0.999048,1.0,-0.000012,0.999611,-0.000994,0.037743,-0.000390,0.165834
3,(Canned Vegetables),(Fresh Fruit),0.076727,0.176291,0.012421,0.161890,0.918312,1.0,-0.001105,0.982817,-0.087880,0.051627,-0.017483,0.116175
4,(Canned Vegetables),(Fresh Vegetables),0.076727,0.290360,0.022258,0.290096,0.999089,1.0,-0.000020,0.999628,-0.000986,0.064549,-0.000373,0.183376
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
81,(Spices),(Fresh Vegetables),0.041045,0.290360,0.011843,0.288534,0.993711,1.0,-0.000075,0.997433,-0.006556,0.037059,-0.002573,0.164660
82,(TV Dinner),(Fresh Vegetables),0.041276,0.290360,0.013116,0.317757,1.094356,1.0,0.001131,1.040157,0.089932,0.041177,0.038607,0.181464
83,(Waffles),(Fresh Vegetables),0.053235,0.290360,0.014042,0.263768,0.908418,1.0,-0.001416,0.963881,-0.096236,0.042608,-0.037472,0.156064
84,(Wine),(Fresh Vegetables),0.076689,0.290360,0.019519,0.254527,0.876592,1.0,-0.002748,0.951933,-0.132302,0.056166,-0.050494,0.160876



Rules with lift >= L:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Fresh Fruit),(Batteries),0.176291,0.054469,0.011496,0.065208,1.197156,1.0,0.001893,1.011488,0.199933,0.052428,0.011358,0.138128
1,(Batteries),(Fresh Fruit),0.054469,0.176291,0.011496,0.211048,1.197156,1.0,0.001893,1.044054,0.174174,0.052428,0.042196,0.138128
2,(Batteries),(Fresh Vegetables),0.054469,0.290360,0.015006,0.275496,0.948808,1.0,-0.000810,0.979484,-0.053982,0.045497,-0.020946,0.163588
3,(Fresh Vegetables),(Batteries),0.290360,0.054469,0.015006,0.051681,0.948808,1.0,-0.000810,0.997060,-0.070658,0.045497,-0.002949,0.163588
4,(Bologna),(Fresh Vegetables),0.041623,0.290360,0.012074,0.290083,0.999048,1.0,-0.000012,0.999611,-0.000994,0.037743,-0.000390,0.165834
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
127,(TV Dinner),(Fresh Vegetables),0.041276,0.290360,0.013116,0.317757,1.094356,1.0,0.001131,1.040157,0.089932,0.041177,0.038607,0.181464
128,(Waffles),(Fresh Vegetables),0.053235,0.290360,0.014042,0.263768,0.908418,1.0,-0.001416,0.963881,-0.096236,0.042608,-0.037472,0.156064
129,(Fresh Vegetables),(Waffles),0.290360,0.053235,0.014042,0.048359,0.908418,1.0,-0.001416,0.994877,-0.124393,0.042608,-0.005149,0.156064
130,(Soup),(Wine),0.121321,0.076689,0.011033,0.090938,1.185808,1.0,0.001729,1.015675,0.178328,0.059006,0.015433,0.117401



Rules with confidence >= C and lift >= L:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Batteries),(Fresh Fruit),0.054469,0.176291,0.011496,0.211048,1.197156,1.0,0.001893,1.044054,0.174174,0.052428,0.042196,0.138128
1,(Batteries),(Fresh Vegetables),0.054469,0.290360,0.015006,0.275496,0.948808,1.0,-0.000810,0.979484,-0.053982,0.045497,-0.020946,0.163588
2,(Bologna),(Fresh Vegetables),0.041623,0.290360,0.012074,0.290083,0.999048,1.0,-0.000012,0.999611,-0.000994,0.037743,-0.000390,0.165834
3,(Canned Vegetables),(Fresh Fruit),0.076727,0.176291,0.012421,0.161890,0.918312,1.0,-0.001105,0.982817,-0.087880,0.051627,-0.017483,0.116175
4,(Canned Vegetables),(Fresh Vegetables),0.076727,0.290360,0.022258,0.290096,0.999089,1.0,-0.000020,0.999628,-0.000986,0.064549,-0.000373,0.183376
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
80,(Fresh Vegetables),(Soup),0.290360,0.121321,0.036300,0.125017,1.030463,1.0,0.001073,1.004224,0.041658,0.096701,0.004206,0.212111
81,(Spices),(Fresh Vegetables),0.041045,0.290360,0.011843,0.288534,0.993711,1.0,-0.000075,0.997433,-0.006556,0.037059,-0.002573,0.164660
82,(TV Dinner),(Fresh Vegetables),0.041276,0.290360,0.013116,0.317757,1.094356,1.0,0.001131,1.040157,0.089932,0.041177,0.038607,0.181464
83,(Waffles),(Fresh Vegetables),0.053235,0.290360,0.014042,0.263768,0.908418,1.0,-0.001416,0.963881,-0.096236,0.042608,-0.037472,0.156064



Rules based on leverage:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Fresh Fruit),(Batteries),0.176291,0.054469,0.011496,0.065208,1.197156,1.0,0.001893,1.011488,0.199933,0.052428,0.011358,0.138128
1,(Batteries),(Fresh Fruit),0.054469,0.176291,0.011496,0.211048,1.197156,1.0,0.001893,1.044054,0.174174,0.052428,0.042196,0.138128
2,(Dried Fruit),(Cheese),0.119546,0.11754,0.015276,0.127783,1.087142,1.0,0.001224,1.011743,0.091041,0.06887,0.011607,0.128874
3,(Cheese),(Dried Fruit),0.11754,0.119546,0.015276,0.129964,1.087142,1.0,0.001224,1.011974,0.090834,0.06887,0.011832,0.128874
4,(Soup),(Cheese),0.121321,0.11754,0.01543,0.127186,1.082062,1.0,0.00117,1.011051,0.08631,0.069061,0.01093,0.129231
5,(Cheese),(Soup),0.11754,0.121321,0.01543,0.131277,1.082062,1.0,0.00117,1.01146,0.08594,0.069061,0.01133,0.129231
6,(Deli Meats),(Fresh Vegetables),0.053466,0.29036,0.016665,0.311688,1.073455,1.0,0.00114,1.030987,0.072294,0.050937,0.030055,0.184541
7,(Fresh Vegetables),(Deli Meats),0.29036,0.053466,0.016665,0.057393,1.073455,1.0,0.00114,1.004166,0.096427,0.050937,0.004149,0.184541
8,(Donuts),(Fresh Vegetables),0.042433,0.29036,0.013463,0.317273,1.092688,1.0,0.001142,1.03942,0.088584,0.04216,0.037925,0.18182
9,(Fresh Vegetables),(Donuts),0.29036,0.042433,0.013463,0.046366,1.092688,1.0,0.001142,1.004124,0.119533,0.04216,0.004107,0.18182



Rules based on conviction:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Fresh Fruit),(Batteries),0.176291,0.054469,0.011496,0.065208,1.197156,1.0,0.001893,1.011488,0.199933,0.052428,0.011358,0.138128
1,(Batteries),(Fresh Fruit),0.054469,0.176291,0.011496,0.211048,1.197156,1.0,0.001893,1.044054,0.174174,0.052428,0.042196,0.138128
2,(Dried Fruit),(Cheese),0.119546,0.117540,0.015276,0.127783,1.087142,1.0,0.001224,1.011743,0.091041,0.068870,0.011607,0.128874
3,(Cheese),(Dried Fruit),0.117540,0.119546,0.015276,0.129964,1.087142,1.0,0.001224,1.011974,0.090834,0.068870,0.011832,0.128874
4,(Fresh Fruit),(Cheese),0.176291,0.117540,0.021024,0.119256,1.014596,1.0,0.000302,1.001948,0.017465,0.077064,0.001944,0.149060
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
71,(Fresh Vegetables),(Soup),0.290360,0.121321,0.036300,0.125017,1.030463,1.0,0.001073,1.004224,0.041658,0.096701,0.004206,0.212111
72,(Fresh Vegetables),(TV Dinner),0.290360,0.041276,0.013116,0.045171,1.094356,1.0,0.001131,1.004079,0.121499,0.041177,0.004062,0.181464
73,(TV Dinner),(Fresh Vegetables),0.041276,0.290360,0.013116,0.317757,1.094356,1.0,0.001131,1.040157,0.089932,0.041177,0.038607,0.181464
74,(Soup),(Wine),0.121321,0.076689,0.011033,0.090938,1.185808,1.0,0.001729,1.015675,0.178328,0.059006,0.015433,0.117401


## FPGROWTH

In [17]:
# Association rules
rules_deluxe_fp_conf = association_rules(frequent_itemsets_deluxe_fp, metric="confidence", min_threshold=0.1)
rules_deluxe_fp_lift = association_rules(frequent_itemsets_deluxe_fp, metric="lift", min_threshold=0.9)
rules_deluxe_fp_both = rules_deluxe_fp_conf[rules_deluxe_fp_conf['lift'] >= 0.9]

# Other metrics
rules_deluxe_fp_leverage = association_rules(frequent_itemsets_deluxe_fp, metric="leverage", min_threshold=0.001)
rules_deluxe_fp_conviction = association_rules(frequent_itemsets_deluxe_fp, metric="conviction", min_threshold=1.0)


# Display
print("\nRules with confidence >= C:")
display(rules_deluxe_fp_conf)

print("\nRules with lift >= L:")
display(rules_deluxe_fp_lift)

print("\nRules with confidence >= C and lift >= L:")
display(rules_deluxe_fp_both)

print("\nRules based on leverage:")
display(rules_deluxe_fp_leverage)

print("\nRules based on conviction:")
display(rules_deluxe_fp_conviction)


Rules with confidence >= C:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Fresh Fruit),(Fresh Vegetables),0.176291,0.290360,0.051499,0.292123,1.006070,1.0,0.000311,1.002490,0.007325,0.124048,0.002484,0.234742
1,(Fresh Vegetables),(Fresh Fruit),0.290360,0.176291,0.051499,0.177361,1.006070,1.0,0.000311,1.001301,0.008503,0.124048,0.001299,0.234742
2,(Fresh Fruit),(Soup),0.176291,0.121321,0.022567,0.128009,1.055126,1.0,0.001179,1.007670,0.063428,0.082048,0.007611,0.157009
3,(Soup),(Fresh Fruit),0.121321,0.176291,0.022567,0.186010,1.055126,1.0,0.001179,1.011939,0.059459,0.082048,0.011798,0.157009
4,(Soup),(Fresh Vegetables),0.121321,0.290360,0.036300,0.299205,1.030463,1.0,0.001073,1.012622,0.033644,0.096701,0.012464,0.212111
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
81,(Juice),(Fresh Fruit),0.052424,0.176291,0.010840,0.206770,1.172886,1.0,0.001598,1.038423,0.155557,0.049752,0.037001,0.134129
82,(Pasta),(Fresh Vegetables),0.048644,0.290360,0.013424,0.275971,0.950446,1.0,-0.000700,0.980127,-0.051956,0.041232,-0.020276,0.161103
83,(TV Dinner),(Fresh Vegetables),0.041276,0.290360,0.013116,0.317757,1.094356,1.0,0.001131,1.040157,0.089932,0.041177,0.038607,0.181464
84,(Shampoo),(Fresh Vegetables),0.042742,0.290360,0.013309,0.311372,1.072365,1.0,0.000898,1.030513,0.070495,0.041616,0.029609,0.178603



Rules with lift >= L:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Fresh Fruit),(Fresh Vegetables),0.176291,0.290360,0.051499,0.292123,1.006070,1.0,0.000311,1.002490,0.007325,0.124048,0.002484,0.234742
1,(Fresh Vegetables),(Fresh Fruit),0.290360,0.176291,0.051499,0.177361,1.006070,1.0,0.000311,1.001301,0.008503,0.124048,0.001299,0.234742
2,(Fresh Fruit),(Soup),0.176291,0.121321,0.022567,0.128009,1.055126,1.0,0.001179,1.007670,0.063428,0.082048,0.007611,0.157009
3,(Soup),(Fresh Fruit),0.121321,0.176291,0.022567,0.186010,1.055126,1.0,0.001179,1.011939,0.059459,0.082048,0.011798,0.157009
4,(Soup),(Fresh Vegetables),0.121321,0.290360,0.036300,0.299205,1.030463,1.0,0.001073,1.012622,0.033644,0.096701,0.012464,0.212111
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
127,(TV Dinner),(Fresh Vegetables),0.041276,0.290360,0.013116,0.317757,1.094356,1.0,0.001131,1.040157,0.089932,0.041177,0.038607,0.181464
128,(Shampoo),(Fresh Vegetables),0.042742,0.290360,0.013309,0.311372,1.072365,1.0,0.000898,1.030513,0.070495,0.041616,0.029609,0.178603
129,(Fresh Vegetables),(Shampoo),0.290360,0.042742,0.013309,0.045835,1.072365,1.0,0.000898,1.003242,0.095093,0.041616,0.003231,0.178603
130,(Fresh Vegetables),(Spices),0.290360,0.041045,0.011843,0.040787,0.993711,1.0,-0.000075,0.999731,-0.008840,0.037059,-0.000269,0.164660



Rules with confidence >= C and lift >= L:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Fresh Fruit),(Fresh Vegetables),0.176291,0.290360,0.051499,0.292123,1.006070,1.0,0.000311,1.002490,0.007325,0.124048,0.002484,0.234742
1,(Fresh Vegetables),(Fresh Fruit),0.290360,0.176291,0.051499,0.177361,1.006070,1.0,0.000311,1.001301,0.008503,0.124048,0.001299,0.234742
2,(Fresh Fruit),(Soup),0.176291,0.121321,0.022567,0.128009,1.055126,1.0,0.001179,1.007670,0.063428,0.082048,0.007611,0.157009
3,(Soup),(Fresh Fruit),0.121321,0.176291,0.022567,0.186010,1.055126,1.0,0.001179,1.011939,0.059459,0.082048,0.011798,0.157009
4,(Soup),(Fresh Vegetables),0.121321,0.290360,0.036300,0.299205,1.030463,1.0,0.001073,1.012622,0.033644,0.096701,0.012464,0.212111
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
81,(Juice),(Fresh Fruit),0.052424,0.176291,0.010840,0.206770,1.172886,1.0,0.001598,1.038423,0.155557,0.049752,0.037001,0.134129
82,(Pasta),(Fresh Vegetables),0.048644,0.290360,0.013424,0.275971,0.950446,1.0,-0.000700,0.980127,-0.051956,0.041232,-0.020276,0.161103
83,(TV Dinner),(Fresh Vegetables),0.041276,0.290360,0.013116,0.317757,1.094356,1.0,0.001131,1.040157,0.089932,0.041177,0.038607,0.181464
84,(Shampoo),(Fresh Vegetables),0.042742,0.290360,0.013309,0.311372,1.072365,1.0,0.000898,1.030513,0.070495,0.041616,0.029609,0.178603



Rules based on leverage:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Fresh Fruit),(Soup),0.176291,0.121321,0.022567,0.128009,1.055126,1.0,0.001179,1.00767,0.063428,0.082048,0.007611,0.157009
1,(Soup),(Fresh Fruit),0.121321,0.176291,0.022567,0.18601,1.055126,1.0,0.001179,1.011939,0.059459,0.082048,0.011798,0.157009
2,(Soup),(Fresh Vegetables),0.121321,0.29036,0.0363,0.299205,1.030463,1.0,0.001073,1.012622,0.033644,0.096701,0.012464,0.212111
3,(Fresh Vegetables),(Soup),0.29036,0.121321,0.0363,0.125017,1.030463,1.0,0.001073,1.004224,0.041658,0.096701,0.004206,0.212111
4,(Fresh Fruit),(Lightbulbs),0.176291,0.057439,0.011341,0.064333,1.120009,1.0,0.001215,1.007367,0.130083,0.050997,0.007313,0.13089
5,(Lightbulbs),(Fresh Fruit),0.057439,0.176291,0.011341,0.197448,1.120009,1.0,0.001215,1.026362,0.11368,0.050997,0.025685,0.13089
6,(Fresh Fruit),(Paper Wipes),0.176291,0.07908,0.015932,0.090372,1.142787,1.0,0.001991,1.012413,0.151687,0.066538,0.012261,0.145918
7,(Paper Wipes),(Fresh Fruit),0.07908,0.176291,0.015932,0.201463,1.142787,1.0,0.001991,1.031523,0.135675,0.066538,0.030559,0.145918
8,(Deli Meats),(Fresh Vegetables),0.053466,0.29036,0.016665,0.311688,1.073455,1.0,0.00114,1.030987,0.072294,0.050937,0.030055,0.184541
9,(Fresh Vegetables),(Deli Meats),0.29036,0.053466,0.016665,0.057393,1.073455,1.0,0.00114,1.004166,0.096427,0.050937,0.004149,0.184541



Rules based on conviction:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Fresh Fruit),(Fresh Vegetables),0.176291,0.290360,0.051499,0.292123,1.006070,1.0,0.000311,1.002490,0.007325,0.124048,0.002484,0.234742
1,(Fresh Vegetables),(Fresh Fruit),0.290360,0.176291,0.051499,0.177361,1.006070,1.0,0.000311,1.001301,0.008503,0.124048,0.001299,0.234742
2,(Fresh Fruit),(Soup),0.176291,0.121321,0.022567,0.128009,1.055126,1.0,0.001179,1.007670,0.063428,0.082048,0.007611,0.157009
3,(Soup),(Fresh Fruit),0.121321,0.176291,0.022567,0.186010,1.055126,1.0,0.001179,1.011939,0.059459,0.082048,0.011798,0.157009
4,(Soup),(Fresh Vegetables),0.121321,0.290360,0.036300,0.299205,1.030463,1.0,0.001073,1.012622,0.033644,0.096701,0.012464,0.212111
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
71,(Juice),(Fresh Fruit),0.052424,0.176291,0.010840,0.206770,1.172886,1.0,0.001598,1.038423,0.155557,0.049752,0.037001,0.134129
72,(Fresh Vegetables),(TV Dinner),0.290360,0.041276,0.013116,0.045171,1.094356,1.0,0.001131,1.004079,0.121499,0.041177,0.004062,0.181464
73,(TV Dinner),(Fresh Vegetables),0.041276,0.290360,0.013116,0.317757,1.094356,1.0,0.001131,1.040157,0.089932,0.041177,0.038607,0.181464
74,(Shampoo),(Fresh Vegetables),0.042742,0.290360,0.013309,0.311372,1.072365,1.0,0.000898,1.030513,0.070495,0.041616,0.029609,0.178603


# GOURMET SUPERMARKETS
## APRIORI

In [18]:
# Association rules
rules_gourmet_conf = association_rules(frequent_itemsets_gourmet, metric="confidence", min_threshold=0.1)
rules_gourmet_lift = association_rules(frequent_itemsets_gourmet, metric="lift", min_threshold=0.9)
rules_gourmet_both = rules_gourmet_conf[rules_gourmet_conf['lift'] >= 0.9]

# Other metrics
rules_gourmet_leverage = association_rules(frequent_itemsets_gourmet, metric="leverage", min_threshold=0.001)
rules_gourmet_conviction = association_rules(frequent_itemsets_gourmet, metric="conviction", min_threshold=1.0)

# Display
print("\nRules with confidence >= C:")
display(rules_gourmet_conf)

print("\nRules with lift >= L:")
display(rules_gourmet_lift)

print("\nRules with confidence >= C and lift >= L:")
display(rules_gourmet_both)

print("\nRules based on leverage:")
display(rules_gourmet_leverage)

print("\nRules based on conviction:")
display(rules_gourmet_conviction)


Rules with confidence >= C:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Batteries),(Fresh Vegetables),0.050113,0.288664,0.015015,0.299625,1.037974,1.0,0.000549,1.015651,0.038515,0.046377,0.015410,0.175821
1,(Bologna),(Fresh Vegetables),0.037162,0.288664,0.010886,0.292929,1.014777,1.0,0.000159,1.006033,0.015124,0.034565,0.005997,0.165320
2,(Canned Vegetables),(Cookies),0.073198,0.111111,0.010698,0.146154,1.315385,1.0,0.002565,1.041041,0.258703,0.061622,0.039423,0.121219
3,(Canned Vegetables),(Fresh Fruit),0.073198,0.176239,0.012575,0.171795,0.974785,1.0,-0.000325,0.994634,-0.027152,0.053090,-0.005395,0.121574
4,(Canned Vegetables),(Fresh Vegetables),0.073198,0.288664,0.020270,0.276923,0.959328,1.0,-0.000859,0.983763,-0.043744,0.059341,-0.016505,0.173572
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88,(TV Dinner),(Fresh Vegetables),0.040916,0.288664,0.012763,0.311927,1.080588,1.0,0.000952,1.033809,0.077760,0.040284,0.032703,0.178070
89,(Waffles),(Fresh Vegetables),0.053679,0.288664,0.013514,0.251748,0.872116,1.0,-0.001982,0.950664,-0.134165,0.041096,-0.051896,0.149281
90,(Wine),(Fresh Vegetables),0.083896,0.288664,0.022523,0.268456,0.929997,1.0,-0.001695,0.972377,-0.075927,0.064343,-0.028408,0.173240
91,(Soup),(Wine),0.121434,0.083896,0.012387,0.102009,1.215896,1.0,0.002200,1.020170,0.202103,0.064202,0.019772,0.124830



Rules with lift >= L:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Batteries),(Fresh Vegetables),0.050113,0.288664,0.015015,0.299625,1.037974,1.0,0.000549,1.015651,0.038515,0.046377,0.015410,0.175821
1,(Fresh Vegetables),(Batteries),0.288664,0.050113,0.015015,0.052016,1.037974,1.0,0.000549,1.002007,0.051431,0.046377,0.002003,0.175821
2,(Bologna),(Fresh Vegetables),0.037162,0.288664,0.010886,0.292929,1.014777,1.0,0.000159,1.006033,0.015124,0.034565,0.005997,0.165320
3,(Fresh Vegetables),(Bologna),0.288664,0.037162,0.010886,0.037711,1.014777,1.0,0.000159,1.000571,0.020471,0.034565,0.000570,0.165320
4,(Canned Vegetables),(Cookies),0.073198,0.111111,0.010698,0.146154,1.315385,1.0,0.002565,1.041041,0.258703,0.061622,0.039423,0.121219
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
127,(TV Dinner),(Fresh Vegetables),0.040916,0.288664,0.012763,0.311927,1.080588,1.0,0.000952,1.033809,0.077760,0.040284,0.032703,0.178070
128,(Fresh Vegetables),(Wine),0.288664,0.083896,0.022523,0.078023,0.929997,1.0,-0.001695,0.993630,-0.095692,0.064343,-0.006411,0.173240
129,(Wine),(Fresh Vegetables),0.083896,0.288664,0.022523,0.268456,0.929997,1.0,-0.001695,0.972377,-0.075927,0.064343,-0.028408,0.173240
130,(Soup),(Wine),0.121434,0.083896,0.012387,0.102009,1.215896,1.0,0.002200,1.020170,0.202103,0.064202,0.019772,0.124830



Rules with confidence >= C and lift >= L:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Batteries),(Fresh Vegetables),0.050113,0.288664,0.015015,0.299625,1.037974,1.0,0.000549,1.015651,0.038515,0.046377,0.015410,0.175821
1,(Bologna),(Fresh Vegetables),0.037162,0.288664,0.010886,0.292929,1.014777,1.0,0.000159,1.006033,0.015124,0.034565,0.005997,0.165320
2,(Canned Vegetables),(Cookies),0.073198,0.111111,0.010698,0.146154,1.315385,1.0,0.002565,1.041041,0.258703,0.061622,0.039423,0.121219
3,(Canned Vegetables),(Fresh Fruit),0.073198,0.176239,0.012575,0.171795,0.974785,1.0,-0.000325,0.994634,-0.027152,0.053090,-0.005395,0.121574
4,(Canned Vegetables),(Fresh Vegetables),0.073198,0.288664,0.020270,0.276923,0.959328,1.0,-0.000859,0.983763,-0.043744,0.059341,-0.016505,0.173572
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
87,(Sugar),(Fresh Vegetables),0.030218,0.288664,0.010135,0.335404,1.161919,1.0,0.001412,1.070329,0.143697,0.032827,0.065707,0.185257
88,(TV Dinner),(Fresh Vegetables),0.040916,0.288664,0.012763,0.311927,1.080588,1.0,0.000952,1.033809,0.077760,0.040284,0.032703,0.178070
90,(Wine),(Fresh Vegetables),0.083896,0.288664,0.022523,0.268456,0.929997,1.0,-0.001695,0.972377,-0.075927,0.064343,-0.028408,0.173240
91,(Soup),(Wine),0.121434,0.083896,0.012387,0.102009,1.215896,1.0,0.002200,1.020170,0.202103,0.064202,0.019772,0.124830



Rules based on leverage:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Canned Vegetables),(Cookies),0.073198,0.111111,0.010698,0.146154,1.315385,1.0,0.002565,1.041041,0.258703,0.061622,0.039423,0.121219
1,(Cookies),(Canned Vegetables),0.111111,0.073198,0.010698,0.096284,1.315385,1.0,0.002565,1.025545,0.269737,0.061622,0.024909,0.121219
2,(Dried Fruit),(Cheese),0.118619,0.12012,0.015953,0.134494,1.11966,1.0,0.001705,1.016607,0.121255,0.071609,0.016336,0.133653
3,(Cheese),(Dried Fruit),0.12012,0.118619,0.015953,0.132812,1.11966,1.0,0.001705,1.016368,0.121462,0.071609,0.016104,0.133653
4,(Soup),(Cheese),0.121434,0.12012,0.01783,0.146832,1.222372,1.0,0.003244,1.031308,0.207063,0.079698,0.030358,0.147635
5,(Cheese),(Soup),0.12012,0.121434,0.01783,0.148438,1.222372,1.0,0.003244,1.031711,0.206754,0.079698,0.030736,0.147635
6,(Coffee),(Fresh Vegetables),0.052928,0.288664,0.016517,0.312057,1.081039,1.0,0.001238,1.034004,0.079154,0.050808,0.032886,0.184637
7,(Fresh Vegetables),(Coffee),0.288664,0.052928,0.016517,0.057217,1.081039,1.0,0.001238,1.00455,0.105385,0.050808,0.004529,0.184637
8,(Paper Wipes),(Cookies),0.07958,0.111111,0.010886,0.136792,1.231132,1.0,0.002044,1.029751,0.203971,0.060543,0.028892,0.117383
9,(Cookies),(Paper Wipes),0.111111,0.07958,0.010886,0.097973,1.231132,1.0,0.002044,1.020391,0.211207,0.060543,0.019984,0.117383



Rules based on conviction:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Batteries),(Fresh Vegetables),0.050113,0.288664,0.015015,0.299625,1.037974,1.0,0.000549,1.015651,0.038515,0.046377,0.015410,0.175821
1,(Fresh Vegetables),(Batteries),0.288664,0.050113,0.015015,0.052016,1.037974,1.0,0.000549,1.002007,0.051431,0.046377,0.002003,0.175821
2,(Bologna),(Fresh Vegetables),0.037162,0.288664,0.010886,0.292929,1.014777,1.0,0.000159,1.006033,0.015124,0.034565,0.005997,0.165320
3,(Fresh Vegetables),(Bologna),0.288664,0.037162,0.010886,0.037711,1.014777,1.0,0.000159,1.000571,0.020471,0.034565,0.000570,0.165320
4,(Canned Vegetables),(Cookies),0.073198,0.111111,0.010698,0.146154,1.315385,1.0,0.002565,1.041041,0.258703,0.061622,0.039423,0.121219
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
85,(Fresh Vegetables),(Sugar),0.288664,0.030218,0.010135,0.035111,1.161919,1.0,0.001412,1.005071,0.195905,0.032827,0.005045,0.185257
86,(Fresh Vegetables),(TV Dinner),0.288664,0.040916,0.012763,0.044213,1.080588,1.0,0.000952,1.003450,0.104842,0.040284,0.003438,0.178070
87,(TV Dinner),(Fresh Vegetables),0.040916,0.288664,0.012763,0.311927,1.080588,1.0,0.000952,1.033809,0.077760,0.040284,0.032703,0.178070
88,(Soup),(Wine),0.121434,0.083896,0.012387,0.102009,1.215896,1.0,0.002200,1.020170,0.202103,0.064202,0.019772,0.124830


## FPGROWTH

In [19]:
# Association rules
rules_gourmet_fp_conf = association_rules(frequent_itemsets_gourmet_fp, metric="confidence", min_threshold=0.1)
rules_gourmet_fp_lift = association_rules(frequent_itemsets_gourmet_fp, metric="lift", min_threshold=0.9)
rules_gourmet_fp_both = rules_gourmet_fp_conf[rules_gourmet_fp_conf['lift'] >= 0.9]

# Other metrics
rules_gourmet_fp_leverage = association_rules(frequent_itemsets_gourmet_fp, metric="leverage", min_threshold=0.001)
rules_gourmet_fp_conviction = association_rules(frequent_itemsets_gourmet_fp, metric="conviction", min_threshold=1.0)

# Display
print("\nRules with confidence >= C:")
display(rules_gourmet_fp_conf)

print("\nRules with lift >= L:")
display(rules_gourmet_fp_lift)

print("\nRules with confidence >= C and lift >= L:")
display(rules_gourmet_fp_both)

print("\nRules based on leverage:")
display(rules_gourmet_fp_leverage)

print("\nRules based on conviction:")
display(rules_gourmet_fp_conviction)


Rules with confidence >= C:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Fresh Fruit),(Fresh Vegetables),0.176239,0.288664,0.053303,0.302449,1.047757,1.0,0.002430,1.019763,0.055332,0.129503,0.019380,0.243552
1,(Fresh Vegetables),(Fresh Fruit),0.288664,0.176239,0.053303,0.184655,1.047757,1.0,0.002430,1.010323,0.064077,0.129503,0.010217,0.243552
2,(Sliced Bread),(Fresh Fruit),0.054429,0.176239,0.010135,0.186207,1.056561,1.0,0.000543,1.012249,0.056614,0.045957,0.012101,0.121857
3,(Sliced Bread),(Fresh Vegetables),0.054429,0.288664,0.014827,0.272414,0.943707,1.0,-0.000884,0.977666,-0.059342,0.045169,-0.022844,0.161890
4,(Ice Cream),(Fresh Vegetables),0.041667,0.288664,0.010698,0.256757,0.889467,1.0,-0.001329,0.957071,-0.114787,0.033470,-0.044855,0.146909
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88,(TV Dinner),(Fresh Vegetables),0.040916,0.288664,0.012763,0.311927,1.080588,1.0,0.000952,1.033809,0.077760,0.040284,0.032703,0.178070
89,(Pizza),(Fresh Fruit),0.054805,0.176239,0.013701,0.250000,1.418530,1.0,0.004042,1.098348,0.312153,0.063040,0.089542,0.163871
90,(Pizza),(Fresh Vegetables),0.054805,0.288664,0.015390,0.280822,0.972834,1.0,-0.000430,0.989096,-0.028696,0.046911,-0.011024,0.167069
91,(Lightbulbs),(Fresh Vegetables),0.052365,0.288664,0.013514,0.258065,0.893997,1.0,-0.001602,0.958758,-0.111209,0.041261,-0.043016,0.152439



Rules with lift >= L:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Fresh Fruit),(Fresh Vegetables),0.176239,0.288664,0.053303,0.302449,1.047757,1.0,0.002430,1.019763,0.055332,0.129503,0.019380,0.243552
1,(Fresh Vegetables),(Fresh Fruit),0.288664,0.176239,0.053303,0.184655,1.047757,1.0,0.002430,1.010323,0.064077,0.129503,0.010217,0.243552
2,(Fresh Fruit),(Sliced Bread),0.176239,0.054429,0.010135,0.057508,1.056561,1.0,0.000543,1.003266,0.064986,0.045957,0.003256,0.121857
3,(Sliced Bread),(Fresh Fruit),0.054429,0.176239,0.010135,0.186207,1.056561,1.0,0.000543,1.012249,0.056614,0.045957,0.012101,0.121857
4,(Sliced Bread),(Fresh Vegetables),0.054429,0.288664,0.014827,0.272414,0.943707,1.0,-0.000884,0.977666,-0.059342,0.045169,-0.022844,0.161890
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
127,(Pizza),(Fresh Fruit),0.054805,0.176239,0.013701,0.250000,1.418530,1.0,0.004042,1.098348,0.312153,0.063040,0.089542,0.163871
128,(Pizza),(Fresh Vegetables),0.054805,0.288664,0.015390,0.280822,0.972834,1.0,-0.000430,0.989096,-0.028696,0.046911,-0.011024,0.167069
129,(Fresh Vegetables),(Pizza),0.288664,0.054805,0.015390,0.053316,0.972834,1.0,-0.000430,0.998427,-0.037773,0.046911,-0.001575,0.167069
130,(Peanut Butter),(Fresh Vegetables),0.041104,0.288664,0.012200,0.296804,1.028199,1.0,0.000335,1.011576,0.028601,0.038416,0.011443,0.169533



Rules with confidence >= C and lift >= L:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Fresh Fruit),(Fresh Vegetables),0.176239,0.288664,0.053303,0.302449,1.047757,1.0,0.002430,1.019763,0.055332,0.129503,0.019380,0.243552
1,(Fresh Vegetables),(Fresh Fruit),0.288664,0.176239,0.053303,0.184655,1.047757,1.0,0.002430,1.010323,0.064077,0.129503,0.010217,0.243552
2,(Sliced Bread),(Fresh Fruit),0.054429,0.176239,0.010135,0.186207,1.056561,1.0,0.000543,1.012249,0.056614,0.045957,0.012101,0.121857
3,(Sliced Bread),(Fresh Vegetables),0.054429,0.288664,0.014827,0.272414,0.943707,1.0,-0.000884,0.977666,-0.059342,0.045169,-0.022844,0.161890
5,(Jelly),(Fresh Vegetables),0.035473,0.288664,0.012387,0.349206,1.209734,1.0,0.002148,1.093029,0.179748,0.039735,0.085111,0.196060
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
87,(Eggs),(Fresh Fruit),0.065315,0.176239,0.011074,0.169540,0.961992,1.0,-0.000438,0.991934,-0.040556,0.048046,-0.008132,0.116187
88,(TV Dinner),(Fresh Vegetables),0.040916,0.288664,0.012763,0.311927,1.080588,1.0,0.000952,1.033809,0.077760,0.040284,0.032703,0.178070
89,(Pizza),(Fresh Fruit),0.054805,0.176239,0.013701,0.250000,1.418530,1.0,0.004042,1.098348,0.312153,0.063040,0.089542,0.163871
90,(Pizza),(Fresh Vegetables),0.054805,0.288664,0.015390,0.280822,0.972834,1.0,-0.000430,0.989096,-0.028696,0.046911,-0.011024,0.167069



Rules based on leverage:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Fresh Fruit),(Fresh Vegetables),0.176239,0.288664,0.053303,0.302449,1.047757,1.0,0.00243,1.019763,0.055332,0.129503,0.01938,0.243552
1,(Fresh Vegetables),(Fresh Fruit),0.288664,0.176239,0.053303,0.184655,1.047757,1.0,0.00243,1.010323,0.064077,0.129503,0.010217,0.243552
2,(Jelly),(Fresh Vegetables),0.035473,0.288664,0.012387,0.349206,1.209734,1.0,0.002148,1.093029,0.179748,0.039735,0.085111,0.19606
3,(Fresh Vegetables),(Jelly),0.288664,0.035473,0.012387,0.042913,1.209734,1.0,0.002148,1.007773,0.243728,0.039735,0.007714,0.19606
4,(Sugar),(Fresh Vegetables),0.030218,0.288664,0.010135,0.335404,1.161919,1.0,0.001412,1.070329,0.143697,0.032827,0.065707,0.185257
5,(Fresh Vegetables),(Sugar),0.288664,0.030218,0.010135,0.035111,1.161919,1.0,0.001412,1.005071,0.195905,0.032827,0.005045,0.185257
6,(Dried Fruit),(Wine),0.118619,0.083896,0.011824,0.099684,1.188174,1.0,0.001873,1.017535,0.179687,0.062008,0.017233,0.120312
7,(Wine),(Dried Fruit),0.083896,0.118619,0.011824,0.14094,1.188174,1.0,0.001873,1.025983,0.172876,0.062008,0.025325,0.120312
8,(Fresh Fruit),(Wine),0.176239,0.083896,0.018393,0.104366,1.243991,1.0,0.003608,1.022855,0.238098,0.076087,0.022345,0.161803
9,(Wine),(Fresh Fruit),0.083896,0.176239,0.018393,0.219239,1.243991,1.0,0.003608,1.055075,0.214098,0.076087,0.0522,0.161803



Rules based on conviction:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Fresh Fruit),(Fresh Vegetables),0.176239,0.288664,0.053303,0.302449,1.047757,1.0,0.002430,1.019763,0.055332,0.129503,0.019380,0.243552
1,(Fresh Vegetables),(Fresh Fruit),0.288664,0.176239,0.053303,0.184655,1.047757,1.0,0.002430,1.010323,0.064077,0.129503,0.010217,0.243552
2,(Fresh Fruit),(Sliced Bread),0.176239,0.054429,0.010135,0.057508,1.056561,1.0,0.000543,1.003266,0.064986,0.045957,0.003256,0.121857
3,(Sliced Bread),(Fresh Fruit),0.054429,0.176239,0.010135,0.186207,1.056561,1.0,0.000543,1.012249,0.056614,0.045957,0.012101,0.121857
4,(Jelly),(Fresh Vegetables),0.035473,0.288664,0.012387,0.349206,1.209734,1.0,0.002148,1.093029,0.179748,0.039735,0.085111,0.196060
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
85,(TV Dinner),(Fresh Vegetables),0.040916,0.288664,0.012763,0.311927,1.080588,1.0,0.000952,1.033809,0.077760,0.040284,0.032703,0.178070
86,(Fresh Fruit),(Pizza),0.176239,0.054805,0.013701,0.077742,1.418530,1.0,0.004042,1.024871,0.358168,0.063040,0.024267,0.163871
87,(Pizza),(Fresh Fruit),0.054805,0.176239,0.013701,0.250000,1.418530,1.0,0.004042,1.098348,0.312153,0.063040,0.089542,0.163871
88,(Peanut Butter),(Fresh Vegetables),0.041104,0.288664,0.012200,0.296804,1.028199,1.0,0.000335,1.011576,0.028601,0.038416,0.011443,0.169533


#### 2.1.4.  Take a look at Maximal Patterns

#### DELUXE SUPERMARKETS

##### APRIORI

In [20]:
# Maximal itemsets
maximal_itemsets_deluxe = frequent_itemsets_deluxe.copy()
maximal_itemsets_deluxe['is_maximal'] = True

for idx, row in maximal_itemsets_deluxe.iterrows():
    for idx2, row2 in maximal_itemsets_deluxe.iterrows():
        if row['length'] < row2['length'] and row['itemsets'].issubset(row2['itemsets']):
            maximal_itemsets_deluxe.at[idx, 'is_maximal'] = False
            break

maximal_itemsets_deluxe = maximal_itemsets_deluxe[maximal_itemsets_deluxe['is_maximal']]
display(maximal_itemsets_deluxe)

Unnamed: 0,support,itemsets,length,is_maximal
0,0.014466,(Acetominifen),1,True
1,0.013964,(Anchovies),1,True
2,0.027042,(Aspirin),1,True
3,0.014003,(Auto Magazines),1,True
4,0.014697,(Bagels),1,True
...,...,...,...,...
169,0.011843,"(Fresh Vegetables, Spices)",2,True
170,0.013116,"(Fresh Vegetables, TV Dinner)",2,True
171,0.014042,"(Waffles, Fresh Vegetables)",2,True
172,0.019519,"(Fresh Vegetables, Wine)",2,True


##### FPGROWTH

In [21]:
# Maximal itemsets
maximal_itemsets_deluxe_fp = frequent_itemsets_deluxe_fp.copy()
maximal_itemsets_deluxe_fp['is_maximal'] = True

for idx, row in maximal_itemsets_deluxe_fp.iterrows():
    for idx2, row2 in maximal_itemsets_deluxe_fp.iterrows():
        if row['length'] < row2['length'] and row['itemsets'].issubset(row2['itemsets']):
            maximal_itemsets_deluxe_fp.at[idx, 'is_maximal'] = False
            break

maximal_itemsets_deluxe_fp = maximal_itemsets_deluxe_fp[maximal_itemsets_deluxe_fp['is_maximal']]
display(maximal_itemsets_deluxe_fp)

Unnamed: 0,support,itemsets,length,is_maximal
2,0.013656,(Screwdrivers),1,True
11,0.028623,(Canned Fruit),1,True
12,0.028276,(Hamburger),1,True
16,0.029048,(Hard Candy),1,True
20,0.030938,(Rice),1,True
...,...,...,...,...
169,0.010840,"(Fresh Fruit, Juice)",2,True
170,0.013424,"(Pasta, Fresh Vegetables)",2,True
171,0.013116,"(Fresh Vegetables, TV Dinner)",2,True
172,0.013309,"(Shampoo, Fresh Vegetables)",2,True


# GOURMET SUPERMARKETS

## APRIORI

In [22]:
# Maximal itemsets
maximal_itemsets_gourmet = frequent_itemsets_gourmet.copy()
maximal_itemsets_gourmet['is_maximal'] = True

for idx, row in maximal_itemsets_gourmet.iterrows():
    for idx2, row2 in maximal_itemsets_gourmet.iterrows():
        if row['length'] < row2['length'] and row['itemsets'].issubset(row2['itemsets']):
            maximal_itemsets_gourmet.at[idx, 'is_maximal'] = False
            break

maximal_itemsets_gourmet = maximal_itemsets_gourmet[maximal_itemsets_gourmet['is_maximal']]
display(maximal_itemsets_gourmet)

Unnamed: 0,support,itemsets,length,is_maximal
0,0.013514,(Acetominifen),1,True
1,0.014640,(Anchovies),1,True
2,0.030593,(Aspirin),1,True
3,0.011261,(Auto Magazines),1,True
4,0.013701,(Bagels),1,True
...,...,...,...,...
176,0.010135,"(Sugar, Fresh Vegetables)",2,True
177,0.012763,"(Fresh Vegetables, TV Dinner)",2,True
178,0.013514,"(Waffles, Fresh Vegetables)",2,True
179,0.022523,"(Fresh Vegetables, Wine)",2,True


## FPGROWTH

In [23]:
# Maximal itemsets
maximal_itemsets_gourmet_fp = frequent_itemsets_gourmet_fp.copy()
maximal_itemsets_gourmet_fp['is_maximal'] = True

for idx, row in maximal_itemsets_gourmet_fp.iterrows():
    for idx2, row2 in maximal_itemsets_gourmet_fp.iterrows():
        if row['length'] < row2['length'] and row['itemsets'].issubset(row2['itemsets']):
            maximal_itemsets_gourmet_fp.at[idx, 'is_maximal'] = False
            break

maximal_itemsets_gourmet_fp = maximal_itemsets_gourmet_fp[maximal_itemsets_gourmet_fp['is_maximal']]
display(maximal_itemsets_gourmet_fp)

Unnamed: 0,support,itemsets,length,is_maximal
4,0.013514,(Sardines),1,True
6,0.033596,(Rice),1,True
8,0.024775,(Cottage Cheese),1,True
13,0.038476,(Plastic Utensils),1,True
17,0.013138,(Clams),1,True
...,...,...,...,...
176,0.012763,"(Fresh Vegetables, TV Dinner)",2,True
177,0.013701,"(Fresh Fruit, Pizza)",2,True
178,0.015390,"(Pizza, Fresh Vegetables)",2,True
179,0.013514,"(Lightbulbs, Fresh Vegetables)",2,True


### **Summary of Frequent Itemset & Association Rule Analysis for Deluxe and Gourmet Supermarkets**

This analysis compares purchasing patterns between **Deluxe** and **Gourmet** supermarkets using frequent itemsets and association rule metrics.

---

### **1. Frequent Itemsets: Core Product Overlap**

* Deluxe stores had:

  * **102 unique 1-itemsets**
  * **72 unique 2-itemsets**

* Gourmet stores had:

  * **102 unique 1-itemsets**
  * **79 unique 2-itemsets**


**Conclusion:** The frequent 1-itemsets are identical across both store types, suggesting a stable customer preference regardless of brand or store format. We observed slightly more frequent 2-itemsets in the Gourmet stores.

---

### **2. Association Rules: Buying Behavior Comparison**

Rules were evaluated using different metrics. The focus is on how many rules from one store matched those in the other.

| **Metric**              | **Deluxe Matches** | **Gourmet Total** | **Matched Rows** | **Key Insight**                                                      |
| ----------------------- | ------------------ | ----------------- | ---------------- | -------------------------------------------------------------------- |
| **Confidence (≥ 0.10)** | 86                 | 93                | 86               | Gourmet has 7 additional patterns → slightly more diverse behavior   |
| **Lift (≥ 0.90)**       | 132                | 132               | 132              | Complete overlap → identical product relationship strengths          |
| **Confidence + Lift**   | 78                 | 80                | 78               | Gourmet has 2 extra rules → minor increase in complexity             |
| **Leverage (≥ 0.001)**  | 34                 | 48                | 34               | Gourmet has 14 unique rules → more distinct, non-random associations |
| **Conviction (= 1)**    | 76                 | 90                | 76               | Gourmet has 14 more rules → broader co-purchasing behavior           |

---

### **3. Key Takeaways**

* **Similarities:**

  * Both stores share the same frequent 1-itemsets and most rule-based relationships.
  * Lift-based associations are **identical**, showing strong commonalities in product pairing strength.

* **Differences:**

  * **Gourmet** consistently has **more rules** under several metrics, indicating:

    * Greater variety in purchase combinations.
    * Possibly a broader customer base or wider product range.
    * Store-specific influences like layout, promotions, or product assortment.

---

### **Final Conclusion**

#### Both Deluxe and Gourmet supermarkets feature a strong core of shared product preferences, but Gourmet stands out with more diverse and complex customer behavior, potentially due to a wider product range, promotions, or different customer segments. Deluxe, on the other hand, appears more focused and streamlined in its purchasing patterns.

# 2.1.5.  Deluxe/Gourmet Supermarkets versus All Stores (Global versus Deluxe/Gourmet Supermarkets Specific Patterns and Rules)

Discuss the similarities and diferences between the results obtained in task 1. (frequent itemsets and association rules found in transactions from all stores) and those obtained above (frequent itemsets and association rules found in transactions only from Deluxe/Gourmet Supermarkets).


### **Overall Observation: Similarities and Differences Between Task 1 and the Deluxe/Gourmet Analysis**

In **Task 1**, we conducted a frequent itemset and association rule analysis on transactions from all stores, focusing on various metrics such as **confidence**, **lift**, **leverage**, and **conviction**, alongside adjusting the support threshold to observe its impact on the results. This provided a broad view of customer purchasing behavior across the entire dataset, capturing a wide range of itemsets and associations.

In contrast, the analysis of transactions from **Deluxe** and **Gourmet** supermarkets focused on a more targeted comparison of buying patterns between the two specific stores. Here’s how the findings compare:

---

### **1. Frequent Itemsets: Core Product Overlap and Variations**

* **Task 1 (All Stores)**: We observed the **sensitivity of frequent itemsets to different support thresholds**. As the support threshold was lowered, the number of **1-itemsets** and **2-itemsets** significantly increased, capturing a broader set of product combinations. This helped identify many popular and co-purchased items across all stores.

* **Deluxe/Gourmet Analysis**: Both stores shared **102 unique 1-itemsets**, confirming a strong overlap in popular products. However, **Gourmet** featured **79 unique 2-itemsets**, compared to **72 in Deluxe**, suggesting that Gourmet has a slightly more diverse set of frequently purchased item pairs. This indicates that while the core items are the same across stores, **Gourmet customers show a broader range of item pairings** than Deluxe.

**Comparison**: Both analyses show the importance of adjusting the support threshold to capture a diverse set of itemsets. While **Task 1** provides a broad, all-encompassing view of frequent itemsets, the **Deluxe/Gourmet analysis** narrows in on the variations in item pairings between these two stores, emphasizing more nuanced differences in customer behavior.

---

### **2. Association Rules: Strength of Relationships**

* **Task 1 (All Stores)**: Using various thresholds for **confidence** and **lift**, we identified **meaningful associations** that were most likely to represent real customer behavior. For instance, higher thresholds for **confidence** and **lift** identified more reliable and interpretable rules, often useful for cross-selling or identifying product bundles.

* **Deluxe/Gourmet Analysis**: We observed that the two stores showed strong alignment in **lift-based relationships**, but **Gourmet** displayed more **diverse relationships**, as indicated by additional rules in metrics like **confidence**, **leverage**, and **conviction**. **Gourmet's** purchasing behavior appeared more complex and varied, suggesting the presence of more varied customer segments or store-specific factors influencing purchases.

**Comparison**: In **Task 1**, the rules were focused on capturing general trends across all stores, leading to valuable insights for high-level decision-making. In contrast, the **Deluxe/Gourmet analysis** allowed us to see how the metrics reveal **subtle differences** between stores, highlighting that **Gourmet** exhibits slightly more intricate relationships in product pairings and co-purchase patterns.

---

### **3. Impact of Support Thresholds**

* **Task 1 (All Stores)**: The sensitivity of the algorithm to the **support threshold** was a significant factor in the number of itemsets discovered. A lower threshold led to more itemsets being identified, but it also increased the risk of noise, requiring careful interpretation.

* **Deluxe/Gourmet Analysis**: This focused on **transaction data from specific stores** and did not directly test the effect of varying support thresholds. However, we did observe that **Gourmet** stores showed more frequent 2-itemsets and additional association rules, indicating a higher level of **diversity** in purchasing behavior compared to Deluxe.

**Comparison**: **Task 1** emphasized how sensitive frequent itemset mining is to the choice of support threshold, which affects the quantity and quality of the results. The **Deluxe/Gourmet analysis**, while not exploring support thresholds explicitly, highlighted that **Gourmet** may capture a broader range of behaviors even within a relatively fixed set of frequent itemsets.

---

### **4. Association Rule Metrics: Different Focus on Relationships**

* **Task 1 (All Stores)**: In **Task 1**, the metrics such as **confidence**, **lift**, **leverage**, and **conviction** were used to assess general patterns and the statistical significance of item relationships. These metrics helped identify trends across all stores and showed which relationships were strong and reliable.

* **Deluxe/Gourmet Analysis**: This focused on how these metrics reflected the **differences between stores**, with **Gourmet** having **more variety** in its association rules. For example, **Gourmet's** additional rules in **leverage** and **conviction** indicated a more **distinct and non-random** set of associations compared to Deluxe, pointing to **more varied or complex shopping patterns**.

**Comparison**: In **Task 1**, the metrics helped identify broad trends across all stores, while in the **Deluxe/Gourmet analysis**, they highlighted subtle distinctions between the two store types, showcasing **Gourmet's** more diverse buying patterns.

---

### **Overall Conclusion:**

The **frequent itemset and association rule analysis** for **Deluxe** and **Gourmet** supermarkets, when compared to the broader **Task 1** results, reveals both **similarities** and **differences**. Both analyses demonstrate the importance of **support thresholds** in determining the number of itemsets found, and both reveal strong core product preferences across all stores and between the two supermarket types. However, the **Deluxe/Gourmet analysis** uncovers that **Gourmet customers** exhibit more **complex purchasing behavior**, with additional and more varied association rules, suggesting a **wider range of customer types or store-specific factors** influencing their behavior. This insight is valuable for tailoring marketing strategies and understanding **local customer preferences** more deeply.


### 2.2. Analyse Small Groceries

Here you should analyse **Small Groceries (STORE_ID = 2, 5, 14, 22)**.

#### 2.2.1.  Load/Preprocess the Dataset

**This should be trivial now.**

In [24]:
small_grocery_ids = {2, 5, 14, 22}

small_grocery_transactions = [products for store_id, products in transactions_with_store if store_id in small_grocery_ids]


## 2.2.2. Compute Frequent Itemsets

## APRIORI

In [25]:
# Encode
te = TransactionEncoder()
te_ary_small = te.fit(small_grocery_transactions).transform(small_grocery_transactions)
df_small = pd.DataFrame(te_ary_small, columns=te.columns_)

# Frequent itemsets
S_min = 0.01 
frequent_itemsets_sg = apriori(df_small, min_support=S_min, use_colnames=True)
frequent_itemsets_sg['length'] = frequent_itemsets_sg['itemsets'].apply(lambda x: len(x))

# Organize by length
for i in range(1, frequent_itemsets_sg['length'].max() + 1):
    print(f"\nFrequent {i}-itemsets:")
    display(frequent_itemsets_sg[frequent_itemsets_sg['length'] == i])


Frequent 1-itemsets:


Unnamed: 0,support,itemsets,length
0,0.015364,(Acetominifen),1
1,0.011414,(Anchovies),1
2,0.024583,(Aspirin),1
3,0.010975,(Auto Magazines),1
4,0.012291,(Bagels),1
...,...,...,...
93,0.011853,(Toothbrushes),1
94,0.023705,(Tuna),1
95,0.053117,(Waffles),1
96,0.088674,(Wine),1



Frequent 2-itemsets:


Unnamed: 0,support,itemsets,length
98,0.016242,"(Batteries, Fresh Vegetables)",2
99,0.012291,"(Bologna, Fresh Vegetables)",2
100,0.011414,"(Fresh Fruit, Canned Vegetables)",2
101,0.023705,"(Canned Vegetables, Fresh Vegetables)",2
102,0.010097,"(Canned Vegetables, Wine)",2
...,...,...,...
170,0.011853,"(Fresh Vegetables, Spices)",2
171,0.015364,"(Fresh Vegetables, TV Dinner)",2
172,0.016681,"(Waffles, Fresh Vegetables)",2
173,0.021510,"(Fresh Vegetables, Wine)",2


## FPGROWTH

In [26]:
# Compute frequent itemsets with FPGrowth using same support threshold
frequent_itemsets_sg_fp = fpgrowth(df_small, min_support=S_min, use_colnames=True)

# Add a column for the length (number of items)
frequent_itemsets_sg_fp['length'] = frequent_itemsets_sg_fp['itemsets'].apply(lambda x: len(x))

# Display itemsets organized by size
for i in range(1, frequent_itemsets_sg_fp['length'].max() + 1):
    print(f"\nFPGrowth Frequent {i}-itemsets:")
    display(frequent_itemsets_sg_fp[frequent_itemsets_sg_fp['length'] == i])



FPGrowth Frequent 1-itemsets:


Unnamed: 0,support,itemsets,length
0,0.119842,(Soup),1
1,0.047410,(Pasta),1
2,0.277875,(Fresh Vegetables),1
3,0.061457,(Milk),1
4,0.035996,(Plastic Utensils),1
...,...,...,...
93,0.014925,(Pancakes),1
94,0.011853,(Clams),1
95,0.011414,(Fresh Fish),1
96,0.013169,(Sardines),1



FPGrowth Frequent 2-itemsets:


Unnamed: 0,support,itemsets,length
98,0.030729,"(Soup, Fresh Vegetables)",2
99,0.015803,"(Fresh Fruit, Soup)",2
100,0.012730,"(Pasta, Fresh Vegetables)",2
101,0.017559,"(Fresh Vegetables, Milk)",2
102,0.010536,"(Fresh Fruit, Milk)",2
...,...,...,...
170,0.021510,"(Fresh Vegetables, Wine)",2
171,0.015803,"(Fresh Fruit, Wine)",2
172,0.010975,"(Dried Fruit, Wine)",2
173,0.016681,"(Pizza, Fresh Vegetables)",2


## 2.2.3. Generate Association Rules from Frequent Itemsets

## APRIORI

In [27]:
# Association rules
rules_sg_conf = association_rules(frequent_itemsets_sg, metric="confidence", min_threshold=0.1)
rules_sg_lift = association_rules(frequent_itemsets_sg, metric="lift", min_threshold=0.9)
rules_sg_both = rules_sg_conf[rules_sg_conf['lift'] >= 0.9]

# Other metrics
rules_sg_leverage = association_rules(frequent_itemsets_sg, metric="leverage", min_threshold=0.001)
rules_sg_conviction = association_rules(frequent_itemsets_sg, metric="conviction", min_threshold=1.0)

# Display
print("\nRules with confidence >= C:")
display(rules_sg_conf)

print("\nRules with lift >= L:")
display(rules_sg_lift)

print("\nRules with confidence >= C and lift >= L:")
display(rules_sg_both)

print("\nRules based on leverage:")
display(rules_sg_leverage)

print("\nRules based on conviction:")
display(rules_sg_conviction)


Rules with confidence >= C:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Batteries),(Fresh Vegetables),0.049605,0.277875,0.016242,0.327434,1.178347,1.0,0.002458,1.073685,0.159253,0.052186,0.068628,0.192943
1,(Bologna),(Fresh Vegetables),0.043459,0.277875,0.012291,0.282828,1.017824,1.0,0.000215,1.006906,0.018308,0.039773,0.006859,0.163531
2,(Canned Vegetables),(Fresh Fruit),0.082090,0.176910,0.011414,0.139037,0.785924,1.0,-0.003109,0.956012,-0.228840,0.046099,-0.046012,0.101777
3,(Canned Vegetables),(Fresh Vegetables),0.082090,0.277875,0.023705,0.288770,1.039207,1.0,0.000894,1.015318,0.041102,0.070496,0.015087,0.187039
4,(Canned Vegetables),(Wine),0.082090,0.088674,0.010097,0.122995,1.387039,1.0,0.002817,1.039134,0.303994,0.062842,0.037660,0.118428
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
87,(Spices),(Fresh Vegetables),0.035996,0.277875,0.011853,0.329268,1.184950,1.0,0.001850,1.076622,0.161911,0.039244,0.071169,0.185961
88,(TV Dinner),(Fresh Vegetables),0.044337,0.277875,0.015364,0.346535,1.247087,1.0,0.003044,1.105070,0.207323,0.050072,0.095080,0.200913
89,(Waffles),(Fresh Vegetables),0.053117,0.277875,0.016681,0.314050,1.130182,1.0,0.001921,1.052736,0.121648,0.053073,0.050094,0.187041
90,(Wine),(Fresh Vegetables),0.088674,0.277875,0.021510,0.242574,0.872961,1.0,-0.003130,0.953393,-0.137698,0.062341,-0.048885,0.159992



Rules with lift >= L:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Batteries),(Fresh Vegetables),0.049605,0.277875,0.016242,0.327434,1.178347,1.0,0.002458,1.073685,0.159253,0.052186,0.068628,0.192943
1,(Fresh Vegetables),(Batteries),0.277875,0.049605,0.016242,0.058452,1.178347,1.0,0.002458,1.009396,0.209595,0.052186,0.009309,0.192943
2,(Bologna),(Fresh Vegetables),0.043459,0.277875,0.012291,0.282828,1.017824,1.0,0.000215,1.006906,0.018308,0.039773,0.006859,0.163531
3,(Fresh Vegetables),(Bologna),0.277875,0.043459,0.012291,0.044234,1.017824,1.0,0.000215,1.000810,0.024251,0.039773,0.000810,0.163531
4,(Canned Vegetables),(Fresh Vegetables),0.082090,0.277875,0.023705,0.288770,1.039207,1.0,0.000894,1.015318,0.041102,0.070496,0.015087,0.187039
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129,(TV Dinner),(Fresh Vegetables),0.044337,0.277875,0.015364,0.346535,1.247087,1.0,0.003044,1.105070,0.207323,0.050072,0.095080,0.200913
130,(Waffles),(Fresh Vegetables),0.053117,0.277875,0.016681,0.314050,1.130182,1.0,0.001921,1.052736,0.121648,0.053073,0.050094,0.187041
131,(Fresh Vegetables),(Waffles),0.277875,0.053117,0.016681,0.060032,1.130182,1.0,0.001921,1.007356,0.159510,0.053073,0.007303,0.187041
132,(Soup),(Wine),0.119842,0.088674,0.010536,0.087912,0.991405,1.0,-0.000091,0.999164,-0.009754,0.053215,-0.000836,0.103362



Rules with confidence >= C and lift >= L:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Batteries),(Fresh Vegetables),0.049605,0.277875,0.016242,0.327434,1.178347,1.0,0.002458,1.073685,0.159253,0.052186,0.068628,0.192943
1,(Bologna),(Fresh Vegetables),0.043459,0.277875,0.012291,0.282828,1.017824,1.0,0.000215,1.006906,0.018308,0.039773,0.006859,0.163531
3,(Canned Vegetables),(Fresh Vegetables),0.082090,0.277875,0.023705,0.288770,1.039207,1.0,0.000894,1.015318,0.041102,0.070496,0.015087,0.187039
4,(Canned Vegetables),(Wine),0.082090,0.088674,0.010097,0.122995,1.387039,1.0,0.002817,1.039134,0.303994,0.062842,0.037660,0.118428
5,(Wine),(Canned Vegetables),0.088674,0.082090,0.010097,0.113861,1.387039,1.0,0.002817,1.035854,0.306191,0.062842,0.034613,0.118428
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
86,(Fresh Vegetables),(Soup),0.277875,0.119842,0.030729,0.110585,0.922753,1.0,-0.002572,0.989592,-0.103884,0.083732,-0.010518,0.183497
87,(Spices),(Fresh Vegetables),0.035996,0.277875,0.011853,0.329268,1.184950,1.0,0.001850,1.076622,0.161911,0.039244,0.071169,0.185961
88,(TV Dinner),(Fresh Vegetables),0.044337,0.277875,0.015364,0.346535,1.247087,1.0,0.003044,1.105070,0.207323,0.050072,0.095080,0.200913
89,(Waffles),(Fresh Vegetables),0.053117,0.277875,0.016681,0.314050,1.130182,1.0,0.001921,1.052736,0.121648,0.053073,0.050094,0.187041



Rules based on leverage:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Batteries),(Fresh Vegetables),0.049605,0.277875,0.016242,0.327434,1.178347,1.0,0.002458,1.073685,0.159253,0.052186,0.068628,0.192943
1,(Fresh Vegetables),(Batteries),0.277875,0.049605,0.016242,0.058452,1.178347,1.0,0.002458,1.009396,0.209595,0.052186,0.009309,0.192943
2,(Canned Vegetables),(Wine),0.082090,0.088674,0.010097,0.122995,1.387039,1.0,0.002817,1.039134,0.303994,0.062842,0.037660,0.118428
3,(Wine),(Canned Vegetables),0.088674,0.082090,0.010097,0.113861,1.387039,1.0,0.002817,1.035854,0.306191,0.062842,0.034613,0.118428
4,(Dried Fruit),(Cheese),0.107550,0.118525,0.019315,0.179592,1.515223,1.0,0.006568,1.074435,0.381009,0.093418,0.069278,0.171277
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57,(Spices),(Fresh Vegetables),0.035996,0.277875,0.011853,0.329268,1.184950,1.0,0.001850,1.076622,0.161911,0.039244,0.071169,0.185961
58,(Fresh Vegetables),(TV Dinner),0.277875,0.044337,0.015364,0.055292,1.247087,1.0,0.003044,1.011596,0.274373,0.050072,0.011463,0.200913
59,(TV Dinner),(Fresh Vegetables),0.044337,0.277875,0.015364,0.346535,1.247087,1.0,0.003044,1.105070,0.207323,0.050072,0.095080,0.200913
60,(Waffles),(Fresh Vegetables),0.053117,0.277875,0.016681,0.314050,1.130182,1.0,0.001921,1.052736,0.121648,0.053073,0.050094,0.187041



Rules based on conviction:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Batteries),(Fresh Vegetables),0.049605,0.277875,0.016242,0.327434,1.178347,1.0,0.002458,1.073685,0.159253,0.052186,0.068628,0.192943
1,(Fresh Vegetables),(Batteries),0.277875,0.049605,0.016242,0.058452,1.178347,1.0,0.002458,1.009396,0.209595,0.052186,0.009309,0.192943
2,(Bologna),(Fresh Vegetables),0.043459,0.277875,0.012291,0.282828,1.017824,1.0,0.000215,1.006906,0.018308,0.039773,0.006859,0.163531
3,(Fresh Vegetables),(Bologna),0.277875,0.043459,0.012291,0.044234,1.017824,1.0,0.000215,1.000810,0.024251,0.039773,0.000810,0.163531
4,(Canned Vegetables),(Fresh Vegetables),0.082090,0.277875,0.023705,0.288770,1.039207,1.0,0.000894,1.015318,0.041102,0.070496,0.015087,0.187039
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99,(Spices),(Fresh Vegetables),0.035996,0.277875,0.011853,0.329268,1.184950,1.0,0.001850,1.076622,0.161911,0.039244,0.071169,0.185961
100,(Fresh Vegetables),(TV Dinner),0.277875,0.044337,0.015364,0.055292,1.247087,1.0,0.003044,1.011596,0.274373,0.050072,0.011463,0.200913
101,(TV Dinner),(Fresh Vegetables),0.044337,0.277875,0.015364,0.346535,1.247087,1.0,0.003044,1.105070,0.207323,0.050072,0.095080,0.200913
102,(Waffles),(Fresh Vegetables),0.053117,0.277875,0.016681,0.314050,1.130182,1.0,0.001921,1.052736,0.121648,0.053073,0.050094,0.187041


## FPGROWTH

In [28]:
# Association rules
rules_sg_fp_conf = association_rules(frequent_itemsets_sg_fp, metric="confidence", min_threshold=0.1)
rules_sg_fp_lift = association_rules(frequent_itemsets_sg_fp, metric="lift", min_threshold=0.9)
rules_sg_fp_both = rules_sg_fp_conf[rules_sg_fp_conf['lift'] >= 0.9]

# Other metrics
rules_sg_fp_leverage = association_rules(frequent_itemsets_sg_fp, metric="leverage", min_threshold=0.001)
rules_sg_fp_conviction = association_rules(frequent_itemsets_sg_fp, metric="conviction", min_threshold=1.0)

# Display
print("\nRules with confidence >= C:")
display(rules_sg_fp_conf)

print("\nRules with lift >= L:")
display(rules_sg_fp_lift)

print("\nRules with confidence >= C and lift >= L:")
display(rules_sg_fp_both)

print("\nRules based on leverage:")
display(rules_sg_fp_leverage)

print("\nRules based on conviction:")
display(rules_sg_fp_conviction)


Rules with confidence >= C:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Soup),(Fresh Vegetables),0.119842,0.277875,0.030729,0.256410,0.922753,1.0,-0.002572,0.971133,-0.086852,0.083732,-0.029725,0.183497
1,(Fresh Vegetables),(Soup),0.277875,0.119842,0.030729,0.110585,0.922753,1.0,-0.002572,0.989592,-0.103884,0.083732,-0.010518,0.183497
2,(Soup),(Fresh Fruit),0.119842,0.176910,0.015803,0.131868,0.745399,1.0,-0.005398,0.948117,-0.279576,0.056250,-0.054722,0.110599
3,(Pasta),(Fresh Vegetables),0.047410,0.277875,0.012730,0.268519,0.966327,1.0,-0.000444,0.987208,-0.035289,0.040730,-0.012957,0.157166
4,(Milk),(Fresh Vegetables),0.061457,0.277875,0.017559,0.285714,1.028210,1.0,0.000482,1.010975,0.029233,0.054570,0.010855,0.174453
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
87,(Wine),(Fresh Fruit),0.088674,0.176910,0.015803,0.178218,1.007395,1.0,0.000116,1.001592,0.008055,0.063269,0.001589,0.133774
88,(Dried Fruit),(Wine),0.107550,0.088674,0.010975,0.102041,1.150738,1.0,0.001438,1.014885,0.146778,0.059242,0.014667,0.112902
89,(Wine),(Dried Fruit),0.088674,0.107550,0.010975,0.123762,1.150738,1.0,0.001438,1.018502,0.143738,0.059242,0.018166,0.112902
90,(Pizza),(Fresh Vegetables),0.052239,0.277875,0.016681,0.319328,1.149176,1.0,0.002165,1.060899,0.136966,0.053221,0.057403,0.189680



Rules with lift >= L:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Soup),(Fresh Vegetables),0.119842,0.277875,0.030729,0.256410,0.922753,1.0,-0.002572,0.971133,-0.086852,0.083732,-0.029725,0.183497
1,(Fresh Vegetables),(Soup),0.277875,0.119842,0.030729,0.110585,0.922753,1.0,-0.002572,0.989592,-0.103884,0.083732,-0.010518,0.183497
2,(Pasta),(Fresh Vegetables),0.047410,0.277875,0.012730,0.268519,0.966327,1.0,-0.000444,0.987208,-0.035289,0.040730,-0.012957,0.157166
3,(Fresh Vegetables),(Pasta),0.277875,0.047410,0.012730,0.045814,0.966327,1.0,-0.000444,0.998327,-0.046034,0.040730,-0.001676,0.157166
4,(Fresh Vegetables),(Milk),0.277875,0.061457,0.017559,0.063191,1.028210,1.0,0.000482,1.001851,0.037994,0.054570,0.001847,0.174453
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129,(Wine),(Dried Fruit),0.088674,0.107550,0.010975,0.123762,1.150738,1.0,0.001438,1.018502,0.143738,0.059242,0.018166,0.112902
130,(Pizza),(Fresh Vegetables),0.052239,0.277875,0.016681,0.319328,1.149176,1.0,0.002165,1.060899,0.136966,0.053221,0.057403,0.189680
131,(Fresh Vegetables),(Pizza),0.277875,0.052239,0.016681,0.060032,1.149176,1.0,0.002165,1.008290,0.179763,0.053221,0.008222,0.189680
132,(Fresh Fruit),(Pizza),0.176910,0.052239,0.012291,0.069479,1.330025,1.0,0.003050,1.018527,0.301467,0.056680,0.018190,0.152387



Rules with confidence >= C and lift >= L:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Soup),(Fresh Vegetables),0.119842,0.277875,0.030729,0.256410,0.922753,1.0,-0.002572,0.971133,-0.086852,0.083732,-0.029725,0.183497
1,(Fresh Vegetables),(Soup),0.277875,0.119842,0.030729,0.110585,0.922753,1.0,-0.002572,0.989592,-0.103884,0.083732,-0.010518,0.183497
3,(Pasta),(Fresh Vegetables),0.047410,0.277875,0.012730,0.268519,0.966327,1.0,-0.000444,0.987208,-0.035289,0.040730,-0.012957,0.157166
4,(Milk),(Fresh Vegetables),0.061457,0.277875,0.017559,0.285714,1.028210,1.0,0.000482,1.010975,0.029233,0.054570,0.010855,0.174453
5,(Milk),(Fresh Fruit),0.061457,0.176910,0.010536,0.171429,0.969018,1.0,-0.000337,0.993385,-0.032944,0.046243,-0.006659,0.115491
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
87,(Wine),(Fresh Fruit),0.088674,0.176910,0.015803,0.178218,1.007395,1.0,0.000116,1.001592,0.008055,0.063269,0.001589,0.133774
88,(Dried Fruit),(Wine),0.107550,0.088674,0.010975,0.102041,1.150738,1.0,0.001438,1.014885,0.146778,0.059242,0.014667,0.112902
89,(Wine),(Dried Fruit),0.088674,0.107550,0.010975,0.123762,1.150738,1.0,0.001438,1.018502,0.143738,0.059242,0.018166,0.112902
90,(Pizza),(Fresh Vegetables),0.052239,0.277875,0.016681,0.319328,1.149176,1.0,0.002165,1.060899,0.136966,0.053221,0.057403,0.189680



Rules based on leverage:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Soup),(Cheese),0.119842,0.118525,0.016242,0.135531,1.143481,1.0,0.002038,1.019672,0.142563,0.073123,0.019293,0.136284
1,(Cheese),(Soup),0.118525,0.119842,0.016242,0.137037,1.143481,1.0,0.002038,1.019926,0.142350,0.073123,0.019536,0.136284
2,(Fresh Fruit),(Cookies),0.176910,0.107112,0.021510,0.121588,1.135154,1.0,0.002561,1.016480,0.144653,0.081940,0.016213,0.161204
3,(Cookies),(Fresh Fruit),0.107112,0.176910,0.021510,0.200820,1.135154,1.0,0.002561,1.029918,0.133345,0.081940,0.029049,0.161204
4,(Dried Fruit),(Cookies),0.107550,0.107112,0.014486,0.134694,1.257511,1.0,0.002966,1.031876,0.229456,0.072368,0.030891,0.134970
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57,(Wine),(Dried Fruit),0.088674,0.107550,0.010975,0.123762,1.150738,1.0,0.001438,1.018502,0.143738,0.059242,0.018166,0.112902
58,(Pizza),(Fresh Vegetables),0.052239,0.277875,0.016681,0.319328,1.149176,1.0,0.002165,1.060899,0.136966,0.053221,0.057403,0.189680
59,(Fresh Vegetables),(Pizza),0.277875,0.052239,0.016681,0.060032,1.149176,1.0,0.002165,1.008290,0.179763,0.053221,0.008222,0.189680
60,(Fresh Fruit),(Pizza),0.176910,0.052239,0.012291,0.069479,1.330025,1.0,0.003050,1.018527,0.301467,0.056680,0.018190,0.152387



Rules based on conviction:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Fresh Vegetables),(Milk),0.277875,0.061457,0.017559,0.063191,1.028210,1.0,0.000482,1.001851,0.037994,0.054570,0.001847,0.174453
1,(Milk),(Fresh Vegetables),0.061457,0.277875,0.017559,0.285714,1.028210,1.0,0.000482,1.010975,0.029233,0.054570,0.010855,0.174453
2,(Plastic Utensils),(Fresh Vegetables),0.035996,0.277875,0.010975,0.304878,1.097176,1.0,0.000972,1.038846,0.091876,0.036232,0.037393,0.172186
3,(Fresh Vegetables),(Plastic Utensils),0.277875,0.035996,0.010975,0.039494,1.097176,1.0,0.000972,1.003642,0.122650,0.036232,0.003629,0.172186
4,(Soup),(Cheese),0.119842,0.118525,0.016242,0.135531,1.143481,1.0,0.002038,1.019672,0.142563,0.073123,0.019293,0.136284
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99,(Wine),(Dried Fruit),0.088674,0.107550,0.010975,0.123762,1.150738,1.0,0.001438,1.018502,0.143738,0.059242,0.018166,0.112902
100,(Pizza),(Fresh Vegetables),0.052239,0.277875,0.016681,0.319328,1.149176,1.0,0.002165,1.060899,0.136966,0.053221,0.057403,0.189680
101,(Fresh Vegetables),(Pizza),0.277875,0.052239,0.016681,0.060032,1.149176,1.0,0.002165,1.008290,0.179763,0.053221,0.008222,0.189680
102,(Fresh Fruit),(Pizza),0.176910,0.052239,0.012291,0.069479,1.330025,1.0,0.003050,1.018527,0.301467,0.056680,0.018190,0.152387


Write text in cells like this


## 2.2.4. Take a Look at Maximal Patterns

## APRIORI

In [29]:
# Maximal itemsets
maximal_itemsets_sg = frequent_itemsets_sg.copy()
maximal_itemsets_sg['is_maximal'] = True

for idx, row in maximal_itemsets_sg.iterrows():
    for idx2, row2 in maximal_itemsets_sg.iterrows():
        if row['length'] < row2['length'] and row['itemsets'].issubset(row2['itemsets']):
            maximal_itemsets_sg.at[idx, 'is_maximal'] = False
            break

maximal_itemsets_sg = maximal_itemsets_sg[maximal_itemsets_sg['is_maximal']]
display(maximal_itemsets_sg)

Unnamed: 0,support,itemsets,length,is_maximal
0,0.015364,(Acetominifen),1,True
1,0.011414,(Anchovies),1,True
2,0.024583,(Aspirin),1,True
3,0.010975,(Auto Magazines),1,True
4,0.012291,(Bagels),1,True
...,...,...,...,...
170,0.011853,"(Fresh Vegetables, Spices)",2,True
171,0.015364,"(Fresh Vegetables, TV Dinner)",2,True
172,0.016681,"(Waffles, Fresh Vegetables)",2,True
173,0.021510,"(Fresh Vegetables, Wine)",2,True


## FPGROWTH

In [30]:
# Maximal itemsets
maximal_itemsets_sg_fp = frequent_itemsets_sg_fp.copy()
maximal_itemsets_sg_fp['is_maximal'] = True

for idx, row in maximal_itemsets_sg_fp.iterrows():
    for idx2, row2 in maximal_itemsets_sg_fp.iterrows():
        if row['length'] < row2['length'] and row['itemsets'].issubset(row2['itemsets']):
            maximal_itemsets_sg_fp.at[idx, 'is_maximal'] = False
            break

maximal_itemsets_sg_fp = maximal_itemsets_sg_fp[maximal_itemsets_sg_fp['is_maximal']]
display(maximal_itemsets_sg_fp)

Unnamed: 0,support,itemsets,length,is_maximal
7,0.027217,(Hard Candy),1,True
8,0.010975,(Deodorizers),1,True
12,0.037313,(Cleaners),1,True
13,0.030290,(Nasal Sprays),1,True
16,0.016242,(Tofu),1,True
...,...,...,...,...
170,0.021510,"(Fresh Vegetables, Wine)",2,True
171,0.015803,"(Fresh Fruit, Wine)",2,True
172,0.010975,"(Dried Fruit, Wine)",2,True
173,0.016681,"(Pizza, Fresh Vegetables)",2,True


### Final Conclusion 2.2

### **Overall Conclusion for Task 2.2: Small Grocery Store Analysis**

In **Task 2.2**, we analyzed transactions from a selected group of small grocery stores (identified by **small\_grocery\_ids = {2, 5, 14, 22}**) to identify frequent itemsets and association rules. This focused on understanding buying behaviors in a specific subset of stores, and the findings provide key insights into how customer preferences differ from those observed across all stores in earlier tasks.

---

### **1. Frequent Itemsets: Popularity of Items and Pairs**

* **Frequent 1-itemsets**: We identified **98 unique 1-itemsets**, which represent the most commonly purchased individual products across the small grocery stores.
* **Frequent 2-itemsets**: We also found **77 unique 2-itemsets**, indicating the most frequent pairs of products purchased together.

**Conclusion**: The frequent itemsets in these small grocery stores demonstrate a solid core set of products and pairings, which is consistent with general customer preferences observed across other store types. However, the **smaller number of 2-itemsets** compared to the earlier analysis suggests that customers in these stores may show **simpler or more focused buying patterns**, possibly due to the smaller size or more limited product offerings in these stores.

---

### **2. Association Rules: Strength of Relationships**

We explored association rules using several key metrics, yielding the following insights:

| **Metric**             | **Count** | **Key Insight**                                                                                     |
| ---------------------- | --------- | --------------------------------------------------------------------------------------------------- |
| **Confidence (≥ 0.1)** | 92        | A solid set of strong associations were identified with **92 rules**.                               |
| **Lift (≥ 0.9)**       | 134       | A high number of meaningful associations, showing **strong relationships** between products.        |
| **Leverage (≥ 0.001)** | 62        | **62 rules** indicated significant co-occurrence beyond random chance.                              |
| **Conviction (= 1)**   | 104       | A wide array of rules with **strong influences** between items.                                     |
| **Maximal Patterns**   | 128       | **128 maximal patterns** further highlight the most significant, unextendable product associations. |

**Conclusion**: The association rules show that, even with a focused set of stores, there are **strong and reliable relationships** between products. The **higher number of rules in lift and conviction** indicates meaningful relationships that go beyond random co-occurrence, reflecting **reliable customer behavior**. Additionally, the **maximal patterns** underscore key products and combinations that are most significant in driving sales within these stores.

---

### **3. Key Observations**

* The overall number of **frequent 1-itemsets** and **2-itemsets** aligns with a more focused shopping behavior, with **simpler product pairings** observed compared to larger supermarket chains.

* The **association rules** identified, particularly in **lift**, **confidence**, and **conviction**, suggest that there are **stronger and more meaningful product relationships** in these small grocery stores, making the data valuable for targeted sales strategies and promotions.

---

### **Final Conclusion**

The analysis of **small grocery stores** reveals distinct customer behavior compared to larger supermarket chains. The data shows that while these stores feature a **core set of popular products**, the **simpler shopping patterns** and **more straightforward product pairings** suggest that customers are purchasing more essential or fundamental items, possibly due to limited options or convenience-driven shopping.

The high number of **meaningful association rules** (particularly in **lift, confidence, and conviction**) indicates that there are **strong, interpretable relationships** between products, which can be leveraged for strategic decision-making, such as **cross-selling or promotional bundling**. The identification of **maximal itemsets** further highlights the most significant product combinations that are crucial for understanding customer preferences in these specific stores.

In conclusion, this focused analysis of small grocery store transactions provides valuable insights into **more focused and efficient customer behavior**, allowing for tailored marketing and inventory management strategies that can enhance customer satisfaction and drive sales.



#### 2.2.5. Small Groceries versus All Stores (Global versus Small Groceries Specific Patterns and Rules)

Discuss the similarities and diferences between the results obtained in task 1. (frequent itemsets and association rules found in transactions from all stores) and those obtained above (frequent itemsets and association rules found in transactions only Small Groceries).

### **2.2.5. Small Groceries vs. All Stores: A Comparison of Global and Small Groceries Specific Patterns and Rules**

In this section, we compare the results obtained from **Task 1** (frequent itemsets and association rules from all stores) with those from the **small grocery stores** (Task 2.2), focusing on the similarities and differences between the two datasets. This comparison highlights how purchasing behaviors in small grocery stores may differ from those observed across a broader set of stores.

---

### **1. Frequent Itemsets: Core Product Overlap and Differences**

* **Task 1 (All Stores)**: In Task 1, we identified **102 unique 1-itemsets** and **404 2-itemsets** when analyzing all stores, suggesting a broad range of popular individual items and item pairings across various store types. A lower support threshold led to capturing more itemsets, representing a diverse set of frequently bought products and combinations.

* **Small Grocery Stores (Task 2.2)**: For small grocery stores, we found **98 unique 1-itemsets** and **77 unique 2-itemsets**. This indicates a similar core of frequently purchased individual products but with **fewer item pairs**, suggesting that small grocery stores feature a **narrower range of co-purchased items** compared to the more diverse selection found across all stores.

**Comparison**: The frequent itemsets from both datasets show a similar set of **core popular products** (e.g., 1-itemsets), but small grocery stores have fewer **2-itemsets** compared to the broader dataset, indicating that **customers in small grocery stores tend to buy simpler, less varied combinations of products**. This suggests that shoppers in small groceries may prioritize essential or everyday items, potentially due to a more limited product selection or a focus on convenience.

---

### **2. Association Rules: Strength and Variety of Relationships**

The association rules were evaluated using several key metrics (confidence, lift, leverage, conviction), and we saw both similarities and differences between the global store dataset and the small grocery store dataset:

* **Confidence (≥ 0.1)**:

  * **Task 1 (All Stores)**: A relatively high number of rules with **confidence ≥ 0.1** were found across the dataset, indicating strong associations across a wide range of products.
  * **Small Grocery Stores (Task 2.2)**: We identified **92 rules** with confidence ≥ 0.1, which suggests a similarly **strong set of relationships** between products in small grocery stores.

* **Lift (≥ 0.9)**:

  * **Task 1 (All Stores)**: We observed **132 lift-based associations**, indicating strong relationships between product pairs.
  * **Small Grocery Stores (Task 2.2)**: In contrast, **134 lift-based associations** were identified for small grocery stores, suggesting that **small grocery stores exhibit just as many, if not more, meaningful associations** between products as seen in the broader set of stores.

* **Leverage (≥ 0.001)**:

  * **Task 1 (All Stores)**: The leverage metric captured **a large number of relationships** indicative of non-random product pairings.
  * **Small Grocery Stores (Task 2.2)**: A more **focused set of 62 leverage-based rules** was found, suggesting that while small grocery stores also have meaningful associations, they tend to show **fewer complex or less frequent co-purchase relationships** compared to the broader set of stores.

* **Conviction (= 1)**:

  * **Task 1 (All Stores)**: Conviction-based rules revealed **diverse co-purchasing behavior** across stores, with strong, independent relationships between products.
  * **Small Grocery Stores (Task 2.2)**: **104 conviction-based rules** were found, indicating that small grocery stores exhibit similar **independent product associations**, but with potentially **less variety** compared to the global dataset.

**Comparison**: Overall, the number of rules in **confidence** and **lift** were fairly comparable across both datasets, with **small grocery stores showing just as many, or slightly more, strong relationships** between products. However, small grocery stores exhibited **fewer rules based on leverage** and **more straightforward associations** (fewer complex relationships), possibly reflecting a more **focused and simpler shopping behavior**.

---

### **3. Maximal Itemsets: Significance of Core Products**

* **Task 1 (All Stores)**: The **maximal itemsets** identified in the global dataset highlighted the most significant product patterns, many of which were **broadly applicable across various store types**.

* **Small Grocery Stores (Task 2.2)**: A **similar approach** to identifying **maximal patterns** revealed **128 maximal itemsets** for small grocery stores, reflecting the most important combinations that cannot be extended further. This suggests that small grocery stores, despite their **limited inventory**, feature **core products** that drive much of the purchasing behavior.

**Comparison**: The maximal itemsets for both sets revealed the **key patterns** in customer behavior. However, small grocery stores show **more focused** and **practical combinations** of items, which are likely centered around **basic, everyday products** that customers return for regularly.

---

### **4. Conclusion: Differences in Customer Behavior and Product Relationships**

The comparison between the results from **Task 1 (All Stores)** and **Task 2.2 (Small Grocery Stores)** reveals both **shared trends** and **distinct differences**:

* **Similarities**:

  * Both datasets show strong overlaps in **core popular products** and **meaningful relationships between products**. The confidence and lift metrics indicate that **small grocery stores** and **all stores** exhibit **reliable associations** in product purchases.
* **Differences**:

  * Small grocery stores show a **more streamlined and focused** set of **item pairs and relationships**, likely due to a **simpler product offering** and **convenience-driven shopping**. They have **fewer 2-itemsets** and **less complex co-purchase relationships**, suggesting customers may primarily buy **staple items** in small quantities.
  * The global dataset (all stores) exhibits a wider variety of **product combinations** and more complex relationships, reflecting the diversity in customer behavior across larger stores with broader product assortments.

In summary, **small grocery stores** cater to a more focused customer base with simpler purchasing behaviors, while **all stores** capture a broader, more varied set of product pairings and complex customer behaviors. This insight can inform **marketing strategies**, suggesting that small grocery stores could benefit from **targeted promotions** for essential item combinations, while larger stores might emphasize a **broader range of product bundling** and cross-selling strategies.


### 2.3.  Deluxe/Gourmet Supermarkets versus Small Groceries

Discuss the similarities and diferences between the results obtained in task 2.1. (frequent itemsets and association rules found in transactions only from Deluxe/Gourmet Supermarkets) and those obtained in task 2.2. (frequent itemsets and association rules found in transactions only Small Groceries).

### **2.3. Deluxe/Gourmet Supermarkets vs. Small Groceries: A Comparison of Frequent Itemsets and Association Rules**

In this section, we compare the results obtained from **Task 2.1** (frequent itemsets and association rules from **Deluxe and Gourmet supermarkets**) with the results from **Task 2.2** (frequent itemsets and association rules from **small grocery stores**). This comparison allows us to highlight the similarities and differences in **customer purchasing behaviors** between the **premium supermarkets** (Deluxe and Gourmet) and **smaller grocery stores**, which may have distinct product assortments and customer preferences.

---

### **1. Frequent Itemsets: Core Product Overlap and Differences**

* **Task 2.1 (Deluxe/Gourmet Supermarkets)**:

  * **Frequent 1-itemsets**: We identified **102 unique 1-itemsets** for both Deluxe and Gourmet supermarkets, which suggests a consistent set of popular individual products across these higher-end stores.
  * **Frequent 2-itemsets**: **72 unique 2-itemsets** were found, showing a narrower range of frequent product pairings.

* **Task 2.2 (Small Grocery Stores)**:

  * **Frequent 1-itemsets**: **98 unique 1-itemsets** were identified, which is slightly fewer than in Deluxe/Gourmet supermarkets but still reflective of a strong core of popular products.
  * **Frequent 2-itemsets**: **77 unique 2-itemsets** were found, indicating a slightly higher number of product pairings compared to the premium supermarkets.

**Comparison**:
Both the **Deluxe/Gourmet supermarkets** and **small grocery stores** show a strong core of popular **1-itemsets**, suggesting that customers across these different store types generally purchase the same set of essential products. However, small grocery stores have **slightly more frequent 2-itemsets** than the larger supermarkets, indicating that **small grocery store customers may exhibit a greater tendency for simpler, more frequent product pairings**. This suggests that, while core products remain consistent, **small grocery stores** may have **simpler and more immediate purchasing patterns** compared to the more diverse assortments at **Deluxe/Gourmet stores**.

---

### **2. Association Rules: Strength of Relationships**

The association rules evaluated using confidence, lift, leverage, and conviction metrics provide further insights into **customer behavior** at both store types:

* **Confidence (≥ 0.1)**:

  * **Task 2.1 (Deluxe/Gourmet Supermarkets)**: **86 rules** were found with confidence ≥ 0.1, indicating strong associations between products.
  * **Task 2.2 (Small Grocery Stores)**: **92 rules** with confidence ≥ 0.1 were identified, suggesting that **small grocery stores** exhibit slightly more diverse or stronger associations between products.

* **Lift (≥ 0.9)**:

  * **Task 2.1 (Deluxe/Gourmet Supermarkets)**: **132 lift-based rules** were found, suggesting strong relationships between product pairings in Deluxe and Gourmet supermarkets.
  * **Task 2.2 (Small Grocery Stores)**: **134 lift-based rules** were identified, indicating that **small grocery stores** share a similarly high level of meaningful product associations as Deluxe/Gourmet supermarkets.

* **Leverage (≥ 0.001)**:

  * **Task 2.1 (Deluxe/Gourmet Supermarkets)**: **34 leverage-based rules** were identified, suggesting **less variety** in non-random product pairings in Deluxe/Gourmet stores.
  * **Task 2.2 (Small Grocery Stores)**: **62 leverage-based rules** were found, showing **more distinct, non-random associations** in small grocery stores compared to the larger supermarkets.

* **Conviction (= 1)**:

  * **Task 2.1 (Deluxe/Gourmet Supermarkets)**: **76 conviction-based rules** were identified, indicating strong but less frequent **independent product associations**.
  * **Task 2.2 (Small Grocery Stores)**: **104 conviction-based rules** were found, showing a **wider range of independent product relationships** in small grocery stores.

**Comparison**:

* **Confidence and Lift**: Both the **Deluxe/Gourmet supermarkets** and **small grocery stores** exhibit similar levels of **strong product associations**, with very close numbers in both confidence and lift-based rules. This suggests that in both contexts, there are **reliable and meaningful relationships** between products, indicating customer purchasing habits are consistent.
* **Leverage and Conviction**: The **small grocery stores** exhibit **more leverage and conviction-based rules** than the **Deluxe/Gourmet supermarkets**, suggesting that **small grocery stores may feature a broader variety of non-random and independent product pairings**. This could imply that, while **Deluxe and Gourmet supermarkets** focus on more specific and premium product pairings, **small grocery stores** may have a **wider range of co-purchases**, possibly driven by **everyday, essential items** and **simpler purchasing behaviors**.

---

### **3. Maximal Itemsets: Significance of Core Products**

* **Task 2.1 (Deluxe/Gourmet Supermarkets)**:

  * The **maximal itemsets** identified in the Deluxe/Gourmet supermarkets revealed a set of key products and combinations that were significant in driving purchases.
* **Task 2.2 (Small Grocery Stores)**:

  * **128 maximal patterns** were identified in small grocery stores, showing the most significant product combinations that cannot be extended further.

**Comparison**:
While both store types exhibit a set of **core products** driving purchasing behavior, small grocery stores appear to have **a slightly broader variety of maximal patterns** compared to the more **narrowly focused maximal itemsets** in the larger supermarkets. This suggests that **small grocery stores** may have more **fundamental and everyday product combinations** driving sales, whereas **Deluxe/Gourmet supermarkets** likely have a smaller set of **premium and specific product combinations**.

---

### **4. Conclusion: Differences in Customer Behavior and Product Relationships**

The comparison between **Deluxe/Gourmet Supermarkets** and **small grocery stores** highlights several key differences and similarities:

* **Similarities**:

  * Both store types exhibit a **strong core of popular 1-itemsets** and **meaningful product associations** (evidenced by the similar numbers of **confidence** and **lift** rules).
  * Both datasets reveal significant **maximal itemsets**, suggesting that the core products drive much of the purchasing behavior in both store types.

* **Differences**:

  * **Small grocery stores** tend to have **more complex product pairings** (as reflected in the higher number of **2-itemsets** and **leverage-based rules**), possibly due to the more diverse mix of everyday products and more frequent product combinations driven by customer needs.
  * In contrast, **Deluxe/Gourmet supermarkets** are focused on more **specific, premium product pairings**, with **fewer non-random associations** in their data. This reflects the more **specialized shopping behavior** of customers who may be buying higher-end or luxury items that are less commonly paired.

In summary, while both **Deluxe/Gourmet supermarkets** and **small grocery stores** share a core set of **popular products** and meaningful associations, small grocery stores display **more varied and complex product relationships** compared to the more **streamlined, premium-focused shopping patterns** in Deluxe/Gourmet stores. This insight could guide **differentiated marketing strategies**, with **small grocery stores** focusing on promoting **everyday items** and **bundled essentials**, while **Deluxe/Gourmet supermarkets** may emphasize **exclusive product pairings** and **premium offerings**.
