In [None]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder


file_path = 'Online Retail_UTF8.csv'

df = pd.read_csv(file_path)

df.head()


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 8:26,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 8:26,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 8:26,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 8:26,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 8:26,3.39,17850.0,United Kingdom


In [None]:
# Preprocess the data
# Remove missing values
df.dropna(subset=['InvoiceNo', 'Description'], inplace=True)

# Filter out non-sale transactions(returns)
df = df[df['Quantity']>0]
# Filter for transactions in UK (significant portion of the transactions - analyze one country to keep the analysis manageable)
df = df[df['Country']== 'United Kingdom']

# Prepare data for market basket analysis
# Convert the dataset into a list of lists, where each sublist represents a transaction
transactions = df.groupby(['InvoiceNo', 'Description'])['Quantity'].sum().unstack().reset_index().fillna(0).set_index('InvoiceNo')
# Convert quantities to 1s and 0s
basket_sets = transactions.applymap(lambda x: 1 if x > 0 else 0)

# Apply the Apriori algorithm to find frequent itemsets
frequent_itemsets = apriori(basket_sets, min_support=0.02, use_colnames=True)

# Generate association rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)

# Display the rules
print(rules.head())

  and should_run_async(code)


                         antecedents                        consequents  \
0      (60 TEATIME FAIRY CAKE CASES)  (PACK OF 72 RETROSPOT CAKE CASES)   
1  (PACK OF 72 RETROSPOT CAKE CASES)      (60 TEATIME FAIRY CAKE CASES)   
2        (ALARM CLOCK BAKELIKE RED )       (ALARM CLOCK BAKELIKE GREEN)   
3       (ALARM CLOCK BAKELIKE GREEN)        (ALARM CLOCK BAKELIKE RED )   
4        (ALARM CLOCK BAKELIKE RED )        (ALARM CLOCK BAKELIKE PINK)   

   antecedent support  consequent support   support  confidence       lift  \
0            0.041387            0.062053  0.022480    0.543161   8.753114   
1            0.062053            0.041387  0.022480    0.362267   8.753114   
2            0.051116            0.048148  0.030944    0.605376  12.573307   
3            0.048148            0.051116  0.030944    0.642694  12.573307   
4            0.051116            0.036056  0.021601    0.422581  11.720171   

   leverage  conviction  zhangs_metric  
0  0.019912    2.053121       0.923997 

Observations / Interpretations:

* Antecedents: This is the "if" part of the rule. It represents the item we have in a transaction. For example, (PACK OF 72 RETROSPOT CAKE CASES).

* Consequents: This is the "then" part of the rule. It represents the item that are likely to be in the same transaction as the antecedent. For instance, (60 TEATIME FAIRY CAKE CASES).

* Antecedent Support: The proportion of transactions that contain the antecedent. For example, 0.063284 means that the PACK OF 72 RETROSPOT CAKE CASES occurs in about 6.33% of all transactions.

* Consequent Support: The proportion of transactions that contain the consequent. 0.042267 indicates that 60 TEATIME FAIRY CAKE CASES are found in roughly 4.23% of transactions.

* Support: The proportion of transactions that contain both the antecedent and the consequent. A support of 0.022827 signifies that both items appear together in about 2.28% of all transactions.

* Confidence: The probability that a transaction containing the antecedent also contains the consequent. For instance, a confidence of 0.360701 for the first rule suggests that there is a 36.07% chance that transactions with PACK OF 72 RETROSPOT CAKE CASES also have 60 TEATIME FAIRY CAKE CASES.

* Lift: Measures how much more often the antecedent and consequent occur together than expected if they were statistically independent. A lift greater than 1 indicates that the items are likely to be bought together. For example, the first rule has a lift of 8.533770, meaning the likelihood of buying both items together is 8.53 times higher than the likelihood of buying them independently.

* Leverage: A measure of how much the antecedent and consequent appear together more than if they were independent. Higher values indicate stronger association.

* Conviction: A measure of the reliability of the rule. A higher conviction means that the consequent is highly dependent on the antecedent. For example, a conviction value of 1.498098 in the first rule means if PACK OF 72 RETROSPOT CAKE CASES were not in the transaction, the chances of also not having 60 TEATIME FAIRY CAKE CASES would increase by 49.8%.

* Zhang's Metric: A measure of the rule's certainty and direction. Values close to 1 or -1 indicate a strong positive or negative association, respectively.

Insights from the results:

* Cross-Sell Opportunities

Cake Cases:

The first two rules suggest a strong relationship between PACK OF 72 RETROSPOT CAKE CASES and 60 TEATIME FAIRY CAKE CASES. Given the high lift values (8.533770 for both directions of the rule), customers who buy one of these are very likely to be interested in the other. This indicates a clear opportunity for cross-selling these items together. For instance, if a customer adds one type of cake case to their basket, recommending the other type could likely result in an additional sale.

Alarm Clocks:

The next sets of rules indicate a very strong relationship between different colors of ALARM CLOCK BAKELIKE (Green and Red, Pink and Red). The lift values here are exceptionally high (12.564011 for Green and Red, 11.986253 for Pink and Red), which suggests that customers who buy an alarm clock in one color are significantly more likely to buy it in another color as well. This insight can be used to cross-sell these items by suggesting other colors to customers who have already selected an alarm clock.

* Leverage and Conviction

These metrics reinforce the insights from lift and confidence. Higher leverage values (e.g., 0.027619 for the alarm clocks) indicate that these items indeed appear together in transactions more often than expected by chance, reinforcing the suggestion for cross-selling.
The conviction metric suggests a strong dependence between these item pairs. For example, a conviction value of 2.587549 for the Green and Red alarm clocks implies that the likelihood of selling a Red alarm clock increases significantly when a Green one is already in the basket, and vice versa.

* Zhang's Metric

The high values of Zhang's metric, particularly for the alarm clocks (around 0.96 and 0.95), further confirm the strong association and the directionality of these rules. It indicates not only that these items are likely to be bought together but also that the association is very reliable.

* Strategic Implications

Promotional Bundling:

Given the strong associations between certain items, bundling them together at a slight discount could encourage customers to purchase both, enhancing the average order value.

Targeted Marketing:

These insights can inform targeted marketing campaigns. For example, email marketing campaigns can specifically target customers who have bought ALARM CLOCK BAKELIKE in one color but not the others, highlighting the availability and appeal of the other colors.

Online Recommendations:

Implementing a recommendation engine that suggests items based on the basket analysis findings could automatically enhance cross-selling opportunities on the website.


In summary, these insights point to clear opportunities for cross-selling these related items. By leveraging these associations, the retail store can enhance its sales strategy to increase both customer satisfaction (by making relevant recommendations) and sales performance.