# Market Basket Analysis

To start this project, I will gather at the beginning all the libraries used to manipulate data, create visualizations, and develop machine learning models.

In [3]:
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules, fpgrowth
import time

### Market Basket Analysis

Also known as association-rule it is a method employed to unveil customer purchase patterns by analyzing transactional data from stores. This insightful approach can yield a competitive edge for retail companies. By discerning the typical items a customer purchases, it facilitates strategic enhancements in store layouts, website design, and marketing strategies, such as promoting bundled offerings (Chen et al., 2005, p.339).

#### EDA and preprocessing

Below, I will start the EDA for the 'ecommerce' dataset, which contains transaction records from an online electronics store. This dataset has 92,250 records with 5 features, with zero missing values and no duplicates.

In [6]:
df = pd.read_csv('ecommerce.csv')
df.head()

Unnamed: 0,Product,Product Category (Enhanced Ecommerce),Transaction ID,Unique Purchases,Product Revenue
0,3.7V 3400mah LIION 12.6WH,Battery/Consumer Rechargeable,EC0043605902,47,"$1,597.53"
1,3V PHOTO LITHIUM,Battery/Primary Other,EC0043507670,47,"$1,246.44"
2,12V 11.2AH 225CCA AGM 12/0,Battery/Powersports,EC0043504182,41,"$4,714.59"
3,12V 12AH 165CCA FLOODED 6/0,Battery/Powersports,EC0043503186,39,"$2,456.61"
4,12V 12AH 210CCA AGM 12/0,Battery/Powersports,EC0043406547,34,"$3,570.00"


In [7]:
df.shape

(92250, 5)

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 92250 entries, 0 to 92249
Data columns (total 5 columns):
 #   Column                                 Non-Null Count  Dtype 
---  ------                                 --------------  ----- 
 0   Product                                92250 non-null  object
 1   Product Category (Enhanced Ecommerce)  92250 non-null  object
 2   Transaction ID                         92250 non-null  object
 3   Unique Purchases                       92250 non-null  int64 
 4   Product Revenue                        92250 non-null  object
dtypes: int64(1), object(4)
memory usage: 3.5+ MB


In [9]:
df.isnull().sum()

Product                                  0
Product Category (Enhanced Ecommerce)    0
Transaction ID                           0
Unique Purchases                         0
Product Revenue                          0
dtype: int64

In [10]:
df.duplicated().sum()

0

The code below will remove whitespace at the start and end of the 'Product' string.

In [12]:
df['Product'] = df['Product'].str.strip()

In the following code, I will use the 'astype' function to cast the 'Transaction ID' to string datatype to ensure consistency during the analysis.

In [14]:
df['Transaction ID'] = df['Transaction ID'].astype('str')

I will rename the columns to facilitate understanding and data manipulation.

In [16]:
df.rename(columns=({'Product': 'product', 'Product Category (Enhanced Ecommerce)': 'prod_categ', 'Transaction ID': 'transaction_id', 'Unique Purchases': 'unique_purchase', 'Product Revenue': 'prod_revenue'}), inplace=True)

In [17]:
df.head()

Unnamed: 0,product,prod_categ,transaction_id,unique_purchase,prod_revenue
0,3.7V 3400mah LIION 12.6WH,Battery/Consumer Rechargeable,EC0043605902,47,"$1,597.53"
1,3V PHOTO LITHIUM,Battery/Primary Other,EC0043507670,47,"$1,246.44"
2,12V 11.2AH 225CCA AGM 12/0,Battery/Powersports,EC0043504182,41,"$4,714.59"
3,12V 12AH 165CCA FLOODED 6/0,Battery/Powersports,EC0043503186,39,"$2,456.61"
4,12V 12AH 210CCA AGM 12/0,Battery/Powersports,EC0043406547,34,"$3,570.00"


**One hot encoding**

Now, it is necessary to transform the data into a format suitable for analysis. To achieve this, I will utilize the code below. In this code, the 'groupby' function aggregates the purchase data for each product within each transaction. Subsequently, the 'unstack' function pivots the data, converting the 'product' index into columns, with the 'transaction_id' becoming part of the index. The values in the resulting dataframe represent the summed 'unique_purchase' for each product in each transaction. Following this, missing values are filled with zero, and the 'set_index' function is used to set the 'transaction_id' feature as the index of the dataframe.

In [19]:
basket = (df.groupby(['transaction_id', 'product'])['unique_purchase']
          .sum().unstack().reset_index().fillna(0)
          .set_index('transaction_id'))
print(basket)

product           (4)F32T8 CENTIUM IS UNV  (not set)  \
transaction_id                                         
1234                                  0.0        1.0   
123456                                0.0        1.0   
12345678                              0.0        1.0   
<transaction id>                      0.0        0.0   
EC0032704676                          0.0        0.0   
...                                   ...        ...   
EC0044007291                          0.0        0.0   
EC0044007292                          0.0        0.0   
EC0044007293                          0.0        0.0   
EC0044007294                          0.0        0.0   
EC0044007295                          0.0        0.0   

product           1 BANK 10A ONBOARD BATTERY CHARGER  \
transaction_id                                         
1234                                             0.0   
123456                                           0.0   
12345678                                       

To finalize the one-hot encoding process, the function below is used to convert positive values to 1 and anything less than 0 to 0

In [21]:
def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1
basket_ohe = basket.apply(lambda x: x.map(encode_units))
basket_ohe

product,(4)F32T8 CENTIUM IS UNV,(not set),1 BANK 10A ONBOARD BATTERY CHARGER,1 BANK 5A ONBOARD BATTERY CHARGER,1.2V 1100MAH NICAD,1.2V 12000MAH NIMH,1.2V 1200MAH NICAD,1.2V 1200MAH NICAD 4/5A,1.2V 1200MAH NIMH,1.2V 1400MAH NICAD 4/5A,...,Y50-N18L-A W/METAL JACKET,YB16L-B W/METAL JACKET,YETI 1500X PORTABLE POWER STATION,YETI 200X PORTABLE POWER,YETI 3000X PORTABLE POWER STATION,YETI 400 PROTECTION CASE,YETI 500X PORTABLE POWER STATION,ZBUG LANTERN + LIGHT,ZUS SMART VEHICLE HEALTH MONITOR MINI,ZUS UNIVERSAL CAR AUDIO ADAPTER
transaction_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1234,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
123456,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
12345678,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
<transaction id>,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
EC0032704676,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
EC0044007291,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
EC0044007292,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
EC0044007293,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
EC0044007294,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


The steps above of EDA and data preprocessing will be used in both algorithms, Apriori and FP-growth.

### Apriori

#### Generating frequent item

This procedure is important for finding relevant patterns and efficiently obtaining the association rules to extract useful insights from large datasets. The minimum support is the percentage that determines the frequency at which an itemset needs to appear to be considered frequent. For example, a minimum support of 0.1 means that only those itemsets that appear in at least 10% of all transactions will be considered frequent.  
Below, I experimented with various minimum support values and found that using 0.1 resulted in an empty dataframe, indicating that no itemset appeared in 10% of the transactions. However, when I used 0.01 (1%) and 0.001 (0.1%), the results were 15 and 233 itemsets, respectively. I chose to use the minimum support value of 0.001 (0.1%) to analyze a wider range, which might contain more interesting and less obvious patterns. This should provide satisfactory results in both the Apriori and FP-Growth algorithms with low computational costs.y.

In [25]:
basket_ohe = basket_ohe.astype(bool)
frequent_itemsets = apriori(basket_ohe, min_support = 0.001, use_colnames = True)

In [26]:
print(frequent_itemsets.shape)

(233, 2)


#### Generating rules

Here, will be generated the association rules based on frequent itemsets computed previously. These rules express association between different items in the dataset in order to find the strongest associations. 

To facilitate understanding, I will create two business questions:
- Which items purchased together are more likely to occur than if bought independently?
- 
Which itemsets have a high probability of occurring?  

In [28]:
rules_apriori = association_rules(frequent_itemsets, metric = "lift", min_threshold = 60)
rules_apriori.head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(24PK 1.5V AAA ALKALINE),(24PK 1.5V AA ALKALINE),0.002831,0.004457,0.001662,0.587234,131.753099,0.00165,2.411882,0.995227
1,(24PK 1.5V AA ALKALINE),(24PK 1.5V AAA ALKALINE),0.004457,0.002831,0.001662,0.372973,131.753099,0.00165,1.590313,0.996853
2,(C ALKALINE BULK),(D ALKALINE BULK),0.003325,0.006878,0.00159,0.478261,69.531257,0.001567,1.903483,0.988906
3,(D ALKALINE BULK),(C ALKALINE BULK),0.006878,0.003325,0.00159,0.231173,69.531257,0.001567,1.296359,0.992444
4,"(1.5V IND AAA ALK BULK, 9V IND ALK BULK)",(1.5V IND AA ALK BULK),0.001927,0.011793,0.001614,0.8375,71.015552,0.001591,6.081273,0.987822
5,"(9V IND ALK BULK, 1.5V IND AA ALK BULK)",(1.5V IND AAA ALK BULK),0.002192,0.009504,0.001614,0.736264,77.465396,0.001593,3.755629,0.98926
6,(1.5V IND AAA ALK BULK),"(9V IND ALK BULK, 1.5V IND AA ALK BULK)",0.009504,0.002192,0.001614,0.169835,77.465396,0.001593,1.201939,0.996563
7,(1.5V IND AA ALK BULK),"(1.5V IND AAA ALK BULK, 9V IND ALK BULK)",0.011793,0.001927,0.001614,0.136874,71.015552,0.001591,1.156347,0.997684
8,"(1.5V IND AAA ALK BULK, C ALKALINE BULK)",(1.5V IND AA ALK BULK),0.001614,0.011793,0.001325,0.820896,69.60758,0.001306,5.517488,0.987227
9,"(1.5V IND AAA ALK BULK, 1.5V IND AA ALK BULK)",(C ALKALINE BULK),0.006553,0.003325,0.001325,0.202206,60.818548,0.001303,1.249289,0.990046


Before printing the data above, I experimented with several values for 'min_threshold' and found that 60 filtered the highest lift value with some low values also. This suggests that the values in rows 0 and 1, both with a lift of 131.75, have the highest probability of being bought together rather than separately. 

Therfore, this means that the first question can be answered: we now know that if customers buy an AA battery, there is a high chance they will also buy an AAA battery, and vice versa, in comparison if they buy these items separately.
Thinking inng a business strategy, it would be advantageous to offer a discount on only one of these two items since the other one has a high likelihood of being bought as well.

Below is an association rule considering two metrics simultaneously: 'lift' and 'confidence'. I used this to select a very strong correlation (lift) and a high prediction accurracy (confidence).

In [31]:
rules_apriori[ (rules_apriori['lift'] >= 60) & (rules_apriori['confidence'] >= 0.7)]  
rules_apriori.head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(24PK 1.5V AAA ALKALINE),(24PK 1.5V AA ALKALINE),0.002831,0.004457,0.001662,0.587234,131.753099,0.00165,2.411882,0.995227
1,(24PK 1.5V AA ALKALINE),(24PK 1.5V AAA ALKALINE),0.004457,0.002831,0.001662,0.372973,131.753099,0.00165,1.590313,0.996853
2,(C ALKALINE BULK),(D ALKALINE BULK),0.003325,0.006878,0.00159,0.478261,69.531257,0.001567,1.903483,0.988906
3,(D ALKALINE BULK),(C ALKALINE BULK),0.006878,0.003325,0.00159,0.231173,69.531257,0.001567,1.296359,0.992444
4,"(1.5V IND AAA ALK BULK, 9V IND ALK BULK)",(1.5V IND AA ALK BULK),0.001927,0.011793,0.001614,0.8375,71.015552,0.001591,6.081273,0.987822
5,"(9V IND ALK BULK, 1.5V IND AA ALK BULK)",(1.5V IND AAA ALK BULK),0.002192,0.009504,0.001614,0.736264,77.465396,0.001593,3.755629,0.98926
6,(1.5V IND AAA ALK BULK),"(9V IND ALK BULK, 1.5V IND AA ALK BULK)",0.009504,0.002192,0.001614,0.169835,77.465396,0.001593,1.201939,0.996563
7,(1.5V IND AA ALK BULK),"(1.5V IND AAA ALK BULK, 9V IND ALK BULK)",0.011793,0.001927,0.001614,0.136874,71.015552,0.001591,1.156347,0.997684
8,"(1.5V IND AAA ALK BULK, C ALKALINE BULK)",(1.5V IND AA ALK BULK),0.001614,0.011793,0.001325,0.820896,69.60758,0.001306,5.517488,0.987227
9,"(1.5V IND AAA ALK BULK, 1.5V IND AA ALK BULK)",(C ALKALINE BULK),0.006553,0.003325,0.001325,0.202206,60.818548,0.001303,1.249289,0.990046


In [32]:
print(f"Antecedents:\n9V IND ALK BULK: {basket_ohe['9V IND ALK BULK'].sum()}\n1.5V IND AAA ALK BULK: {basket_ohe['1.5V IND AAA ALK BULK'].sum()}\nConsequents:\n1.5V IND AA ALK BULK: {basket_ohe['1.5V IND AA ALK BULK'].sum()}")

Antecedents:
9V IND ALK BULK: 384
1.5V IND AAA ALK BULK: 789
Consequents:
1.5V IND AA ALK BULK: 979


Focusing on the most accurate prediction, shown in the confidence column, we can see that in row 5 there is an 83.7% probability that a customer who bought the antecedents items will also buy the consequent. Although the lift value is lower than the previous selection, the confidence value is higher, suggesting that this itemset has a high chance of occurring.

This leads to answering the second question, which is that buying the items 1.5V IND AAA ALK BULK and 9V IND ALK BULK indicates a high chance of also buying 1.5V IND AA ALK BULK.With this in mind, it is possible to suggest a business strategy similar to the previous selection, offering a discount when buying the antecedents, which have a high chance of the customer buying the consequent item. I also recommend displaying these products near each other to facilitate the customer's experience and increase the chances of purchasin. 

### Frequent Pattern (FP growth)

The parameter values used in the FP-Growth algorithm were the same as those used in the Apriori algorithm to promote a reliable comparison between them. The results were similar, as we will see below.

#### Generating frequent item

In [37]:
frequent_itemsets_fpgrowth = fpgrowth(basket_ohe, min_support = 0.001, use_colnames = True)

In [38]:
print(frequent_itemsets_fpgrowth.shape)

(233, 2)


#### Generating rules

In [40]:
rules_fpgrowth = association_rules(frequent_itemsets_fpgrowth, metric = "lift", min_threshold = 60)
rules_fpgrowth.head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,"(1.5V IND AAA ALK BULK, D ALKALINE BULK)",(1.5V IND AA ALK BULK),0.002289,0.011793,0.001771,0.773684,65.604312,0.001744,4.366495,0.987016
1,"(1.5V IND AA ALK BULK, D ALKALINE BULK)",(1.5V IND AAA ALK BULK),0.002409,0.009504,0.001771,0.735,77.332433,0.001748,3.737719,0.989453
2,(1.5V IND AAA ALK BULK),"(1.5V IND AA ALK BULK, D ALKALINE BULK)",0.009504,0.002409,0.001771,0.186312,77.332433,0.001748,1.226011,0.99654
3,(1.5V IND AA ALK BULK),"(1.5V IND AAA ALK BULK, D ALKALINE BULK)",0.011793,0.002289,0.001771,0.150153,65.604312,0.001744,1.17399,0.996509
4,"(1.5V IND AAA ALK BULK, 9V IND ALK BULK)",(1.5V IND AA ALK BULK),0.001927,0.011793,0.001614,0.8375,71.015552,0.001591,6.081273,0.987822
5,"(9V IND ALK BULK, 1.5V IND AA ALK BULK)",(1.5V IND AAA ALK BULK),0.002192,0.009504,0.001614,0.736264,77.465396,0.001593,3.755629,0.98926
6,(1.5V IND AAA ALK BULK),"(9V IND ALK BULK, 1.5V IND AA ALK BULK)",0.009504,0.002192,0.001614,0.169835,77.465396,0.001593,1.201939,0.996563
7,(1.5V IND AA ALK BULK),"(1.5V IND AAA ALK BULK, 9V IND ALK BULK)",0.011793,0.001927,0.001614,0.136874,71.015552,0.001591,1.156347,0.997684
8,(C ALKALINE BULK),(D ALKALINE BULK),0.003325,0.006878,0.00159,0.478261,69.531257,0.001567,1.903483,0.988906
9,(D ALKALINE BULK),(C ALKALINE BULK),0.006878,0.003325,0.00159,0.231173,69.531257,0.001567,1.296359,0.992444


In [41]:
rules_fpgrowth[ (rules_fpgrowth['lift'] >= 60) & (rules_fpgrowth['confidence'] >= 0.7)] 
rules_fpgrowth.head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,"(1.5V IND AAA ALK BULK, D ALKALINE BULK)",(1.5V IND AA ALK BULK),0.002289,0.011793,0.001771,0.773684,65.604312,0.001744,4.366495,0.987016
1,"(1.5V IND AA ALK BULK, D ALKALINE BULK)",(1.5V IND AAA ALK BULK),0.002409,0.009504,0.001771,0.735,77.332433,0.001748,3.737719,0.989453
2,(1.5V IND AAA ALK BULK),"(1.5V IND AA ALK BULK, D ALKALINE BULK)",0.009504,0.002409,0.001771,0.186312,77.332433,0.001748,1.226011,0.99654
3,(1.5V IND AA ALK BULK),"(1.5V IND AAA ALK BULK, D ALKALINE BULK)",0.011793,0.002289,0.001771,0.150153,65.604312,0.001744,1.17399,0.996509
4,"(1.5V IND AAA ALK BULK, 9V IND ALK BULK)",(1.5V IND AA ALK BULK),0.001927,0.011793,0.001614,0.8375,71.015552,0.001591,6.081273,0.987822
5,"(9V IND ALK BULK, 1.5V IND AA ALK BULK)",(1.5V IND AAA ALK BULK),0.002192,0.009504,0.001614,0.736264,77.465396,0.001593,3.755629,0.98926
6,(1.5V IND AAA ALK BULK),"(9V IND ALK BULK, 1.5V IND AA ALK BULK)",0.009504,0.002192,0.001614,0.169835,77.465396,0.001593,1.201939,0.996563
7,(1.5V IND AA ALK BULK),"(1.5V IND AAA ALK BULK, 9V IND ALK BULK)",0.011793,0.001927,0.001614,0.136874,71.015552,0.001591,1.156347,0.997684
8,(C ALKALINE BULK),(D ALKALINE BULK),0.003325,0.006878,0.00159,0.478261,69.531257,0.001567,1.903483,0.988906
9,(D ALKALINE BULK),(C ALKALINE BULK),0.006878,0.003325,0.00159,0.231173,69.531257,0.001567,1.296359,0.992444


In [42]:
print(f"Antecedents:\n9V IND ALK BULK: {basket_ohe['9V IND ALK BULK'].sum()}\n1.5V IND AAA ALK BULK: {basket_ohe['1.5V IND AAA ALK BULK'].sum()}\nConsequents:\n1.5V IND AA ALK BULK: {basket_ohe['1.5V IND AA ALK BULK'].sum()}")

Antecedents:
9V IND ALK BULK: 384
1.5V IND AAA ALK BULK: 789
Consequents:
1.5V IND AA ALK BULK: 979


### Comparison - Apriori and FP-Growth

In this project, I used identical parameter values for both algorithms to ensure a reliable comparison between them. As we can see, the results are identical, thanks to maintaining the same parameters in both algorithms, despite each model having different approaches. The clear consistency found here suggests that they can identify the antecedents and consequents without ambiguity.

Therefore, we cannot express major similarity/divergence between these models when considering the output as both algorithms are deterministic, that means that they will result in the same output if the input and parameters are the same (Han, Pei and Yin, 2000). However, the difference between them lies in their efficiency and scalability, as we will see in the following section about the speed test between them.

### Speed - Apriori and FP-Growth

The speed test will be performed below to evaluate the efficiency of these algorithms because Market Basket Analysis is usually conducted on large datasets, which can result in significant computational and time costs.

#### Apriori

In [46]:
start_time = time.time()
# Calculate the frequent itemsets by calling the apriori method
frequent_itemsets_ap = apriori(basket_ohe, min_support=0.001, use_colnames=True)
# Calculate association rules
rules__apriori = association_rules(frequent_itemsets_ap, metric="confidence", min_threshold=0.7)
end_time = time.time()
calculation_time = end_time - start_time
print("Association rules calculated in {:.2f} seconds for Apriori.".format(calculation_time))

Association rules calculated in 3.12 seconds for Apriori.


#### FP-Growth

In [48]:
start_time = time.time()
# Calculate the frequent itemsets by calling the apriori method
frequent_itemsets_fpgrowth = fpgrowth(basket_ohe, min_support=0.001, use_colnames=True)
# Calculate association rules
rules_fpgrowth = association_rules(frequent_itemsets_fpgrowth, metric="confidence", min_threshold=0.7)
end_time = time.time()
calculation_time = end_time - start_time
print("Association rules calculated in {:.2f} seconds for FP-Growth.".format(calculation_time))

Association rules calculated in 1.78 seconds for FP-Growth.


**Comparison**

After conducting a speed test, it is evident that the FP-Growth algorithm (1.78 seconds) is significantly faster than the Apriori algorithm (3.12 seconds), being 1.75 times quicker. This observation aligns with the findings of Hossain, Sattar, and Paul (2019). In addition, according to Heaton (2016), Apriori has serious scalability and memory issues compared to FP-Growth, making Apriori unsuitable for large datasets. 
	The superior performance of FP-Growth can be attributed to its use of the FP-tree structure, which minimizes database scans and reduces computational complexity. In contrast, Apriori relies on candidate generation and multiple database scans, which are computationally intensive (Shankar, 2024).


# References

Chen, Y.-L., Tang, K., Shen, R.-J. and Hu, Y.-H. (2005). Market basket analysis in a multiple store environment. *Decision Support Systems*, 40(2), pp.339–354. doi:https://doi.org/10.1016/j.dss.2004.04.009.

Han, J., Pei, J. and Yin, Y. (2000). Mining frequent patterns without candidate generation. *ACM SIGMOD Record*, 29(2), pp.1–12. doi:https://doi.org/10.1145/335191.335372.

Heaton, J. (2016). Comparing dataset characteristics that favor the Apriori, Eclat or FP-Growth frequent itemset mining algorithms. [online] *IEEE Xplore*. doi:https://doi.org/10.1109/SECON.2016.7506659.

Hossain, M., Sattar, A.H.M.S. and Paul, M.K. (2019). Market Basket Analysis Using Apriori and FP Growth Algorithm. [online] *IEEE Xplore*. doi:https://doi.org/10.1109/ICCIT48885.2019.9038197.

Numpy (2009). NumPy. [online] Numpy.org. Available at: https://numpy.org/. [Accessed 16 May 2024].

OpenAI. (2024). ChatGPT (GPT-3.5 version) [Large language model]. https://chat.openai.com/chat (https://chat.openai.com/chat. [Accessed 20 May 2024])

pandas.pydata.org. (n.d.). User Guide — pandas 1.0.4 documentation. [online] Available at: https://pandas.pydata.org/docs/user_guide/index.html#user-guide. [Accessed 16 May 2024].

Shankar (2024). Apriori vs. FP-Growth: Comparative Study of Data Mining Algorithms - Algorithmic Mind. [online] Algorithmic Mind. Available at: https://algorithmicmind.org/difference-between-apriori-and-fp-growth/ [Accessed 22 May 2024].

**Explanation of AI uses in this project:** 

I used the ChatGPT to refine ideas, find information about the topic, check for grammar usage, and get information about code usage. 