In [1]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

In [2]:
# Load the dataset
df= pd.read_excel('Online retail.xlsx')
df.head()

Unnamed: 0,"shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil"
0,"burgers,meatballs,eggs"
1,chutney
2,"turkey,avocado"
3,"mineral water,milk,energy bar,whole wheat rice..."
4,low fat yogurt


In [4]:
# Convert the data to a list of lists (transactions)
transactions = [row.split(',') for row in df.iloc[:, 0].astype(str).tolist()]

In [5]:
# Remove leading/trailing spaces from items
transactions = [[item.strip() for item in transaction] for transaction in transactions]

In [6]:
# Use TransactionEncoder to one-hot encode the data
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df_encoded = pd.DataFrame(te_ary, columns=te.columns_)

In [7]:
# Apply the Apriori algorithm
frequent_itemsets = apriori(df_encoded, min_support=0.01, use_colnames=True)

In [8]:
# Generate association rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)

In [9]:
# Display the rules
print(rules.head())

       antecedents      consequents  antecedent support  consequent support  \
0  (mineral water)        (avocado)            0.238267            0.033200   
1        (avocado)  (mineral water)            0.033200            0.238267   
2           (cake)        (burgers)            0.081067            0.087200   
3        (burgers)           (cake)            0.087200            0.081067   
4      (chocolate)        (burgers)            0.163867            0.087200   

    support  confidence      lift  representativity  leverage  conviction  \
0  0.011467    0.048125  1.449559               1.0  0.003556    1.015680   
1  0.011467    0.345382  1.449559               1.0  0.003556    1.163629   
2  0.011467    0.141447  1.622103               1.0  0.004398    1.063185   
3  0.011467    0.131498  1.622103               1.0  0.004398    1.058068   
4  0.017067    0.104150  1.194377               1.0  0.002777    1.018920   

   zhangs_metric   jaccard  certainty  kulczynski  
0       0.

In [10]:
# Sort rules by lift to see the most interesting rules
sorted_rules = rules.sort_values(by='lift', ascending=False)
print(sorted_rules.head(10)) # Display the top 10 rules

                    antecedents                 consequents  \
217               (ground beef)             (herb & pepper)   
216             (herb & pepper)               (ground beef)   
387               (ground beef)  (mineral water, spaghetti)   
386  (mineral water, spaghetti)               (ground beef)   
398  (mineral water, spaghetti)                 (olive oil)   
399                 (olive oil)  (mineral water, spaghetti)   
195         (frozen vegetables)                  (tomatoes)   
194                  (tomatoes)         (frozen vegetables)   
190                    (shrimp)         (frozen vegetables)   
191         (frozen vegetables)                    (shrimp)   

     antecedent support  consequent support   support  confidence      lift  \
217            0.098267            0.049467  0.016000    0.162822  3.291555   
216            0.049467            0.098267  0.016000    0.323450  3.291555   
387            0.098267            0.059733  0.017067    0.173677  2.

Analysis of the Generated Rules

Key Observations:

- High Lift Rules: Several rules exhibit a high lift (greater than 2.0), indicating strong positive associations. These are particularly interesting as they suggest that the purchase of the antecedent significantly increases the likelihood of the consequent being purchased.

- Specific Item Combinations: The rules highlight specific combinations of items that are frequently bought together. For example:

(ground beef, herb & pepper) -> (ground beef): Customers who buy ground beef and herb & pepper are very likely to buy ground beef.

(mineral water, spaghetti) -> (ground beef): Customers who buy mineral water and spaghetti often buy ground beef.

(olive oil) -> (mineral water, spaghetti): Customers who buy olive oil are likely to buy mineral water and spaghetti.

(frozen vegetables) -> (tomatoes): Customers who buy frozen vegetables often buy tomatoes.

(frozen vegetables) -> (shrimp): Customers who buy frozen vegetables often buy shrimp.

- Support and Confidence Variations: While some rules have high lift, their support and confidence values vary. This indicates that some associations, while strong, might occur less frequently than others.


Interpretation and Insights into Customer Purchasing Behavior



1) Complementary Items: The rules suggest that certain items are often purchased together as complementary products. For example:

- Ground Beef, Herb & Pepper: This suggests that customers who purchase ground beef are likely to also buy herb & pepper, possibly for seasoning or enhancing the flavor of the meat.
- Mineral Water, Spaghetti, Ground Beef: This combination indicates a potential meal pairing. Customers might be buying these items to prepare a pasta dish with ground beef.
- Olive Oil, Mineral Water, Spaghetti: This combination also suggests a meal preparation context. Olive oil is often used in pasta dishes.
- Frozen Vegetables, Tomatoes, Shrimp: This combination suggests that customers might be preparing meals with these ingredients, such as stir-fries or seafood dishes.
  
2) Meal Planning: The presence of meal-related combinations (e.g., pasta ingredients, vegetable and shrimp) indicates that customers often purchase items with a specific meal or recipe in mind.

3) Cross-Selling Opportunities: Retailers can leverage these insights to improve cross-selling strategies. For example:

- Placing herb & pepper near ground beef.
- Bundling mineral water, spaghetti, and ground beef together.
- Displaying olive oil near pasta products.
- Suggesting tomato or shrimp recipes when customers buy frozen vegetables.

4) Product Placement: The rules can guide product placement within the store to increase sales. Placing associated items near each other can encourage customers to purchase them together.

5) Targeted Promotions: Retailers can create targeted promotions or discounts for product combinations that are frequently purchased together.

Interview Questions:

1. What is lift and why is it important in Association rules?

Lift: Lift measures how much more likely the consequent is purchased when the antecedent is purchased, while controlling for how popular the consequent is. It's calculated as:

Lift(A -> C) = Confidence(A -> C) / Support(C)

Importance: Lift helps identify rules that are truly interesting and not just due to the popularity of individual items. A lift greater than 1 indicates a positive association, a lift less than 1 indicates a negative association, and a lift of 1 indicates no association.

2. . What is support and confidence? How do you calculate them?

Support: Support measures the frequency of an itemset in the dataset. It's calculated as:

Support(A) = Number of transactions containing A / Total number of transactions

Confidence: Confidence measures how often the consequent appears in transactions that contain the antecedent. It's calculated as:

Confidence(A -> C) = Support(A U C) / Support(A)

3. What are some limitations or challenges of Association rules mining?

- Computational Complexity: The Apriori algorithm can be computationally expensive for large datasets with many items.
- Spurious Rules: With low support thresholds, many uninteresting or spurious rules may be generated.
- Data Sparsity: If the dataset is sparse (many items are rarely purchased), it can be difficult to find meaningful associations 
- Interpretation: Interpreting a large number of rules can be challenging.
- Temporal Aspects: Association rules don't consider the temporal order of purchases.
- Handling Categorical Data: Association rules typically work with binary data (items present or absent). Handling categorical data with many categories can be challenging. 

### Interpretation of the First Rule (Mineral Water -> Avocado)

- Antecedents: (mineral water)
- Consequents: (avocado)
- Antecedent Support: 0.238267
- Consequent Support: 0.033200
- Support: 0.011467
- Confidence: 0.048125
- Lift: 1.449559
- Interpretation:

1) Antecedent Support: Mineral water appears in 23.8% of the transactions.
2) Consequent Support: Avocado appears in 3.3% of the transactions.
3) Support: The combined itemset (mineral water and avocado) appears in only 1.1% of the transactions.
4) Confidence: Given that mineral water is purchased, avocado is also purchased only 4.8% of the time.
5) Lift: Avocado is 1.45 times more likely to be purchased when mineral water is present, compared to its general popularity.


Impact of Corrections:

- Weak Association: values indicate a much weaker association between mineral water and avocado
- Limited Practicality: The low support and confidence suggest that this association might not be practically significant for most business decisions.
- Lift Still Relevant: While the lift is still above 1, indicating a positive association, the low support and confidence make this lift less impactful.


### Interpretation of the Top 10 Lift Rules

1) High Lift Values: Notice that all the lift values are above 2. This indicates strong positive associations between the antecedents and consequents, suggesting that these items are significantly more likely to be purchased together than expected by chance.

2) Rule 217 (Herb & Pepper -> Ground Beef):

- Lift: 3.291555
This is the strongest association in top 10. It means that customers are 3.29 times more likely to purchase ground beef when they also buy herb & pepper.
- Business Implication: This is a very strong cross-selling opportunity. Consider placing herb & pepper near ground beef, offering a discount when both are purchased, or promoting recipes that use both items.

3) Rule 386 (Ground Beef, Mineral Water, Spaghetti -> Olive Oil):

- Lift: 2.907540   
This multi-item rule suggests that customers who purchase ground beef, mineral water, and spaghetti are significantly more likely to also buy olive oil.
- Business Implication: This could indicate a pattern of customers buying ingredients for a specific type of meal (e.g., pasta with meat sauce). Consider creating meal bundles or recipe promotions that include these items.

4) Rule 399   (Olive Oil -> Ground Beef, Mineral Water, Spaghetti):

- Lift: 2.614731   
This is the reverse of the previous rule. It shows that customers who buy olive oil are also more likely to purchase ground beef, mineral water, and spaghetti.
- Business Implication: This reinforces the idea of a meal-related pattern. Place olive oil near these other ingredients to encourage cross-selling.

- Other Rules: Each of the other top 10 rules also has a lift above 2, signifying strong associations. These rules highlight other interesting product combinations that could be leveraged for marketing and sales strategies.