Problem Statement: - 

The Departmental Store, has gathered the data of the products it sells on a Daily basis.
Using Association Rules concepts, provide the insights on the rules and the plots.

The grocery store dataset, which tracks items purchased together in transactions, can address several business challenges:

Basket Analysis: Discover patterns in customer purchases, such as commonly bought item combinations (e.g., milk and bread). This insight can guide product placement and improve cross-selling strategies.

Customer Profiling: Group customers based on their buying behavior. This allows for targeted marketing efforts, such as offering personalized discounts on products they frequently purchase.

Demand Prediction: Use past transaction data to forecast the demand for different products throughout the year. This supports efficient supply chain management and helps maintain optimal inventory levels.

Store Layout Optimization: Understanding which products are often purchased together allows the store to arrange items strategically, making it easier for customers to find related products and boosting overall sales.

In [9]:
#Import required libraries
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder

In [13]:
df=pd.read_csv('groceries.csv', on_bad_lines='skip')
df

Unnamed: 0,citrus fruit,semi-finished bread,margarine,ready soups
0,tropical fruit,yogurt,coffee,
1,whole milk,,,
2,pip fruit,yogurt,cream cheese,meat spreads
3,other vegetables,whole milk,condensed milk,long life bakery product
4,rolls/buns,,,
...,...,...,...,...
6100,yogurt,long life bakery product,,
6101,pork,frozen vegetables,pastry,
6102,ice cream,long life bakery product,specialty chocolate,specialty bar
6103,cooking chocolate,,,


In [14]:
transactions=[]
transactions.append(df.keys().tolist())

In [17]:
transactions.extend(df.values.tolist())

In [18]:
transactions

[['tropical fruit', 'yogurt', 'coffee', nan],
 ['whole milk', nan, nan, nan],
 ['pip fruit', 'yogurt', 'cream cheese ', 'meat spreads'],
 ['other vegetables',
  'whole milk',
  'condensed milk',
  'long life bakery product'],
 ['rolls/buns', nan, nan, nan],
 ['pot plants', nan, nan, nan],
 ['whole milk', 'cereals', nan, nan],
 ['beef', nan, nan, nan],
 ['frankfurter', 'rolls/buns', 'soda', nan],
 ['chicken', 'tropical fruit', nan, nan],
 ['butter', 'sugar', 'fruit/vegetable juice', 'newspapers'],
 ['fruit/vegetable juice', nan, nan, nan],
 ['packaged fruit/vegetables', nan, nan, nan],
 ['chocolate', nan, nan, nan],
 ['specialty bar', nan, nan, nan],
 ['other vegetables', nan, nan, nan],
 ['butter milk', 'pastry', nan, nan],
 ['whole milk', nan, nan, nan],
 ['bottled water', 'canned beer', nan, nan],
 ['yogurt', nan, nan, nan],
 ['sausage', 'rolls/buns', 'soda', 'chocolate'],
 ['other vegetables', nan, nan, nan],
 ['yogurt', 'beverages', 'bottled water', 'specialty bar'],
 ['beef', 'gra

In [20]:
#removing nan values from the list
transactions=[[item for item in sublist if not pd.isna(item)] for sublist in transactions]
#This is a list comprehension that processes each sublist of transactions
#It iterates through each item in the sublist
#if not pd.isna(item): Checks if the item is not NaN (missing value)
#pd.isna() function from the pandas library is used to identify missing values.
transactions

[['tropical fruit', 'yogurt', 'coffee'],
 ['whole milk'],
 ['pip fruit', 'yogurt', 'cream cheese ', 'meat spreads'],
 ['other vegetables',
  'whole milk',
  'condensed milk',
  'long life bakery product'],
 ['rolls/buns'],
 ['pot plants'],
 ['whole milk', 'cereals'],
 ['beef'],
 ['frankfurter', 'rolls/buns', 'soda'],
 ['chicken', 'tropical fruit'],
 ['butter', 'sugar', 'fruit/vegetable juice', 'newspapers'],
 ['fruit/vegetable juice'],
 ['packaged fruit/vegetables'],
 ['chocolate'],
 ['specialty bar'],
 ['other vegetables'],
 ['butter milk', 'pastry'],
 ['whole milk'],
 ['bottled water', 'canned beer'],
 ['yogurt'],
 ['sausage', 'rolls/buns', 'soda', 'chocolate'],
 ['other vegetables'],
 ['yogurt', 'beverages', 'bottled water', 'specialty bar'],
 ['beef', 'grapes', 'detergent'],
 ['pastry', 'soda'],
 ['fruit/vegetable juice'],
 ['canned beer'],
 ['root vegetables', 'other vegetables', 'whole milk', 'dessert'],
 ['citrus fruit', 'zwieback', 'newspapers'],
 ['berries', 'yogurt'],
 ['cann

In [21]:
#step1:convert the dataset into a format suitable for Apriori
te=TransactionEncoder()
te_ary=te.fit(transactions).transform(transactions)
df=pd.DataFrame(te_ary,columns=te.columns_)
df

Unnamed: 0,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,bags,baking powder,bathroom cleaner,beef,berries,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6100,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
6101,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
6102,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
6103,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [22]:
#step2:Apply the apriori algorithm to find frequent itemsets
frequent_itemsets=apriori(df,min_support=0.001,use_colnames=True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.003112,(Instant food products)
1,0.015561,(UHT-milk)
2,0.001310,(artif. sweetener)
3,0.004914,(baking powder)
4,0.001147,(bathroom cleaner)
...,...,...
511,0.001147,"(fruit/vegetable juice, sausage, rolls/buns)"
512,0.001147,"(whole milk, other vegetables, rolls/buns)"
513,0.001310,"(root vegetables, whole milk, other vegetables)"
514,0.001147,"(soda, other vegetables, whole milk)"


In [23]:
#step3:Generate association rules from the frequent itemsets
rules=association_rules(frequent_itemsets,metric="lift",min_threshold=1)

In [24]:
#step4:Output the results
print("Frequent Itemsets:")
print(frequent_itemsets)

Frequent Itemsets:
      support                                         itemsets
0    0.003112                          (Instant food products)
1    0.015561                                       (UHT-milk)
2    0.001310                               (artif. sweetener)
3    0.004914                                  (baking powder)
4    0.001147                               (bathroom cleaner)
..        ...                                              ...
511  0.001147     (fruit/vegetable juice, sausage, rolls/buns)
512  0.001147       (whole milk, other vegetables, rolls/buns)
513  0.001310  (root vegetables, whole milk, other vegetables)
514  0.001147             (soda, other vegetables, whole milk)
515  0.003440                      (soda, sausage, rolls/buns)

[516 rows x 2 columns]


In [25]:
print("\nAssociation Rules:")
print(rules[['antecedents','consequents','support','confidence','lift']])


Association Rules:
               antecedents            consequents   support  confidence  \
0               (UHT-milk)        (bottled water)  0.001966    0.126316   
1          (bottled water)             (UHT-milk)  0.001966    0.029557   
2               (UHT-milk)               (coffee)  0.001310    0.084211   
3                 (coffee)             (UHT-milk)  0.001310    0.038835   
4               (UHT-milk)                 (soda)  0.002129    0.136842   
..                     ...                    ...       ...         ...   
447     (soda, rolls/buns)              (sausage)  0.003440    0.192661   
448  (sausage, rolls/buns)                 (soda)  0.003440    0.256098   
449                 (soda)  (sausage, rolls/buns)  0.003440    0.027167   
450              (sausage)     (soda, rolls/buns)  0.003440    0.071429   
451           (rolls/buns)        (soda, sausage)  0.003440    0.027273   

         lift  
0    1.899404  
1    1.899404  
2    2.495657  
3    2.495657  