![alt text](image-3.png)

# **Market Basket Analysis Approach to Machine Learning**

# **1. Introduction**

##### *In these recent years, transaction data have been commonly used as research and analysis objects for researchers. In this study, also, transaction data are to be re-processed/re-explored to generate more valuable information. For instance, information of an item whose sales is the highest. Besides, information can be utilized in regard with the stock addition of that item. Moreover, from transaction data there can be utilized as to the relation of each purchased item inside the customer’ basket. By that information, we can make use of it for effective product display/assortment to attract customers’ interest. The commonly-used application to analyze transaction data customers’ shopping basket is market basket analysis.*

![alt text](image-4.png)

## *1.1 Market Basket Analysis*

##### *Market Basket Analysis is an **Unsupervised machine learning** algorithm performed on customer behavior whilst shopping at a supermarket through the means identifying association and connections among various items placed by the customers in their shopping baskets. In specific, Market Basket Analysis aims at simultaneously identifying the most frequently-purchased items by customers. Here, item is depicted as several kinds of products in supermarket. Using market basket analysis mode, a knowledge of what are the items oftenly purchased by the customers simultaneously and having an opportunity to be promoted can be obtained. With regards to the objective of market basket analysis mode to decide which products that customers purchase at the same time, whereby the name of this mode is taken from the behavior of the customers in placing shopping products into their shopping baskets or shopping list. Over identifying shopping basket pattern of a customer will significantly be able to help a company in using that information in respect of business strategy needs, one of them is placing the most frequently-purchased products simultaneously into one specific area.*

## *1.2 Advantages*


##### * Market basket analysis is a data mining technique that analyzes patterns of co-occurrence and determines the strength of the link between products purchased together. We also refer to it as frequent itemset mining or association analysis.

##### * Market basket analysis is one of the modes from data mining technique prevalently employed to analyze items/goods in one or more shopping baskets that a customer has in one particular moment.

##### * Market basket analysis application ought to be designed and implemented at a supermarket not only owing to being able to help the sales promotion design but also able to be made as a reference to re-manage item stock’ incoming and outcoming in the warehouse. 

##### * In this study, market analysis application will be implemented at Quick-Pick Departmental Store (Near Pondicherry University Kalapet), in regard with its inability to use transaction data. This application is expected to work well and is able to generate the desired result.

## *1.3 Customer Behaviour*

##### *Definition of customer behavior is a dynamic interaction between cognition, affection, behavior, and its environment whereby someone performs exchange activities in their regular basis [3]. In view of this statement, there are three significant matters to grasp, namely:*
1. Customer behavior bears dynamic characteristic, thus, hard to predict.
2. Involving interaction, like cognition, affection, behavior, and the occurrences around 
customers,
3. Involving exchange, like the exchange of item and money from merchant to customer.

Four factors that could give a rise to customer purchase in shopping, some of them were:
1. Cultural Factor
2. Social Factor
3. Personal Factor
4. Psychological Factor

*There were three variables that must be regarded in understanding customer behavior, namely 
stimulus variable, response variable, and intervening variable.*

# 2. **Objective**

##### The objective of this study is to achieve the following:

1. To take the information from the super market and apply machine learning to predict what we need to do in the future.
2. To apply data analysis in our thesis. Where we will have our research process of inspecting, cleansing, transforming and modeling data to discover useful information, informing conclusions, and supporting decision-making.
3. To apply data mining, We will be applying MBA approach to machine learning. There we will try to understand the customer's purchase behaviour.

# **3. Data Pre-Processing**

##### Importing the monthwise sales data

In [None]:
# Importing libraries to import and clean the data

import numpy as np
import pandas as pd

#Loading April data
url1='https://raw.githubusercontent.com/dharanidaran-t/MarketBasketAnalysis/main/quick%20pick%20data%20month%20wise/april_df.csv'
df1=pd.read_csv(url1)
df1

In [None]:
#Loading May dataset
url2='https://raw.githubusercontent.com/dharanidaran-t/MarketBasketAnalysis/data-preprocessing/quick%20pick%20data%20month%20wise/may_df.csv'
df2=pd.read_csv(url2)
df2

In [None]:
#Loading June data
url3='https://raw.githubusercontent.com/dharanidaran-t/MarketBasketAnalysis/data-preprocessing/quick%20pick%20data%20month%20wise/june_df.csv'
df3=pd.read_csv(url3)
df3

In [None]:
#Loading July data
url4='https://raw.githubusercontent.com/dharanidaran-t/MarketBasketAnalysis/data-preprocessing/quick%20pick%20data%20month%20wise/july_df.csv'
df4=pd.read_csv(url4)
df4

In [None]:
#Loading August dataset
url5='https://raw.githubusercontent.com/dharanidaran-t/MarketBasketAnalysis/data-preprocessing/quick%20pick%20data%20month%20wise/august_df.csv'
df5=pd.read_csv(url5)
df5

In [None]:
#Loading September dataset
url6='https://raw.githubusercontent.com/dharanidaran-t/MarketBasketAnalysis/data-preprocessing/quick%20pick%20data%20month%20wise/september_df.csv'
df6=pd.read_csv(url6)
df6

In [None]:
#Loading October dataset
url7='https://raw.githubusercontent.com/dharanidaran-t/MarketBasketAnalysis/data-preprocessing/quick%20pick%20data%20month%20wise/october_df.csv'
df7=pd.read_csv(url7)
df7

In [None]:
# Concatenating dataframes row-wise (stack)

combined_df = pd.concat([df1, df2, df3, df4, df5, df6, df7], ignore_index=True)
# Setting ignore_index=True ensures that the resulting dataframe has a continuous index.
combined_df

In [None]:
combined_df.info()

In [None]:
# Data Type formatting

combined_df['Date']=pd.to_datetime(combined_df['Date'])
combined_df['Item Name']=combined_df['Item Name'].str.strip().astype('str')
combined_df.info()

In [None]:
# Creating date,month,day columns from the data using pandas datetime format

combined_df['Year'] = combined_df['Date'].dt.year
combined_df['Month'] = combined_df['Date'].dt.month_name()
combined_df['Day'] = combined_df['Date'].dt.day
combined_df['Day of week'] = combined_df['Date'].dt.day_name()
combined_df

In [None]:
# Dropping the additional details

combined_df.drop(columns=['Barcode','Net Amount'],axis=1)

# Exporting the csv file
combined_df.to_csv(r'C:/Users/Win 10/Desktop/Market-Basket-Analysis/groceries_df.csv',index=False)

* Combined sales df is now saved in a  new csv file (groceries_df)
* Now, let us create a new df to analyzes patterns of co-occurrence. In this dataframe, we will display the items purchased together by a Bill No in a Single row (i.e., Displaying the items purchased by a person in a single purchase column-wise).

In [None]:
# Group by 'Bill Number' and create new columns for each item
df_grouped = combined_df.groupby('Bill No')['Item Name'].apply(lambda x: pd.Series(x.values)).unstack().reset_index()
# Rename columns
df_grouped.columns = ['Bill Number'] + [f'{i+1}' for i in range(df_grouped.shape[1]-1)]
df_grouped=df_grouped.drop(columns=['Bill Number'])
df_grouped.head(20)

In [None]:
# Drop the rows which has NaN value at the 2nd column (column index 1)
df_grouped = df_grouped.dropna(subset=[df_grouped.columns[1]], axis=0)
# Drop columns from 11 to 77
df_grouped.drop(columns=df_grouped.columns[10:77], inplace=True)
df_grouped.head(20)
df_grouped

In [None]:
# Export to new csv file
df_grouped.to_csv(r'C:/Users/Win 10/Desktop/Market-Basket-Analysis/basket_df.csv',index=False)

# **4. Exploratory data Analysis**

![alt text](image-8.png)

#### 4.1 Importing the data

In [None]:
# Importing libraries
# For importing, cleaning and analysing data
import numpy as np
import pandas as pd

In [None]:
groceries_df=pd.read_csv(r'C:/Users/Win 10/Desktop/Market-Basket-Analysis/groceries_df.csv')
groceries_df

#### 4.2 Data Cleaning

In [None]:
# Function to strip leading characters- Batch names O,Q and Y and empty spaces

import re

def rename_10(item_name):
    return re.sub(r'^[RS10]\s*', 'PEN', item_name)

groceries_df['Item Name'] = groceries_df['Item Name'].apply(rename_10)

def clean_item_name(item_name):
    return re.sub(r'^[OQXY]\s*', '', item_name)

# Apply function to 'Item Name' column
groceries_df['Item Name'] = groceries_df['Item Name'].apply(clean_item_name)

def removenum(item_name):
    return re.sub(r'^[\d.-]+\s*', '', item_name)

# Apply function to 'Item Name' column
groceries_df['Item Name'] = groceries_df['Item Name'].apply(removenum)

groceries_df

In [None]:
groceries_df.sample(20)

In [None]:
# Checking for NaN values

groceries_df.isna().sum()

* There is no missing or NA values

In [None]:
# Dropping data with negative or zero quantity

negativeorNull = groceries_df.loc[groceries_df['Qty']<=0]
print('Count of Null or negative quantity: ', len(negativeorNull))

In [None]:
#Dropping data with zero or negative price

negativeorNull = groceries_df.loc[groceries_df['Rate']<=0]
print('Count of Null or negative Price: ', len(negativeorNull))

There is no entry with zero Quantities purchased, data is ready to explore!

In [None]:
# Check 'Year' column for unique values

groceries_df['Year'].unique()

In [None]:
# Check 'Month' column for unique values

groceries_df['Month'].unique()

In [None]:
# Check 'Day' column for unusual values

groceries_df['Day'].unique()

In [None]:
# Check 'Day of week' column for unusual values

groceries_df['Day of week'].unique()

In [None]:
# Lets find out the time period of the dataset

print(f"The dataset is from dates {groceries_df['Date'].min()} to {groceries_df['Date'].max()}")

#### 4.3 Summary Statistics

In [None]:
groceries=groceries_df.drop(columns=['Year','Day'],axis=1)
groceries.describe()

In [None]:
groceries_df.groupby(['Year', 'Month']).describe()

In [None]:
# Lets find how many unique purchases have been done
groceries_df['Bill No'].unique()
purchases=groceries_df['Bill No'].nunique()
print(f"There are {purchases} unique Purchases.")

In [None]:
Items=groceries_df['Item Name'].unique()
Items

In [None]:
# Lets find how many unique items are there in the dataset

n_items=groceries_df['Item Name'].nunique()
print(f"There are {n_items} unique Items")

In [None]:
CountOfItem = groceries_df['Item Name'].value_counts()

sortedItems = CountOfItem.sort_values(ascending=False)

df=pd.DataFrame(list(sortedItems.items()),columns=['Item name','Counts'])
df
print('Top 50 items:\n',df.head(50))

In [None]:
# Lets find the least 15 purchased item type
df.tail(15)

In [None]:
groceries_df.groupby(['Month'])['Amount'].sum()
print(groceries_df['Month'].value_counts())

In [None]:
groceries_df.groupby(['Day of week'])['Amount'].sum()
print(groceries_df['Day of week'].value_counts())

In [None]:
# Lets find out the top 20 Bill's where most number of items purchased

groceries_df['Bill No'].value_counts().head(10)

In [None]:
# Rate is the total price column
groceries_df.groupby(['Year', 'Month'])['Rate'].sum()

#### 4.4 Data Visualisation

In [None]:
#Importing Libraries
import pandas as pd

# For visualisation
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

from wordcloud import WordCloud

#Importing Dataset
groceries_df

# RENAME COLUMN
groceries_df = groceries_df.rename(columns={'Item Name': 'title'})
#Checking the Data

groceries_df.head()

#Creating the text variable

text2 = " ".join(title for title in groceries_df.title)

# Creating word_cloud with text as argument in .generate() method

word_cloud2 = WordCloud(collocations = False, background_color = 'white').generate(text2)

# Display the generated Word Cloud

plt.imshow(word_cloud2, interpolation='bilinear')

plt.axis("off")

plt.show()

In [None]:
# RENAME COLUMN
groceries_df = groceries_df.rename(columns={'title': 'Item Name'})

In [None]:
item_freq = groceries_df['Item Name'].value_counts().sort_values(ascending = False).head(25)

plt.figure(figsize=(10, 6))
item_freq.plot(kind='bar', color='wheat')
plt.xlabel('Item Name')
plt.ylabel('Frequency (absolute)')
plt.title('Top-25 Absolute Item Frequency Plot')
plt.show()

# top 10 item frequecies
item_freq = groceries_df['Item Name'].value_counts().sort_values(ascending = False).head(10)

plt.figure(figsize=(8, 6))
item_freq.plot(kind='bar', color='blue')
plt.xlabel('Item Name')
plt.ylabel('Frequency (absolute)')
plt.title('Top-10 Absolute Item Frequency Plot')
plt.show()

In [None]:
# Lets find the least 15 purchased item type
item_freq = groceries_df['Item Name'].value_counts().sort_values(ascending = False).tail(10).plot(kind='bar', color='seagreen', edgecolor='black')

plt.xlabel('Item types')
plt.ylabel('Purchase count')
plt.title('Least 15 purchased item types')

In [None]:
# Lets see if there is any different in the number of transactions month-wise

groceries_df['Month'].value_counts()
groceries_df['Month'].value_counts().sort_values(ascending = False).plot(kind='bar',edgecolor='black',color='green')
plt.title('Number of transactions month-wise')

* Maximum Sale is recorded in August
* Least sale is recorded in  July

In [None]:
# transaction per weekday
plt.figure(figsize = (8,6))
groceries_df.groupby(groceries_df['Day of week'])['Bill No'].nunique().sort_values(ascending = False).plot(kind='bar',color='lightblue')
plt.xlabel('Week day')
plt.ylabel('Transactions')
plt.title('Transactions by Weekday')
plt.xticks(rotation=0)
plt.show()

* Approximately every day records approximately equal number of transactions except on Sunday

In [None]:
# Highest sales amount items (Top 50)

import jinja2

cm = sns.light_palette('pink', as_cmap = True)
item_sales = groceries_df.groupby('Item Name')['Rate'].sum().sort_values(ascending= False)
item_sales.to_csv('ItemSales.csv')
item_sales = pd.read_csv('ItemSales.csv')
item_sales.head(50).style.background_gradient(cmap=cm)

In [None]:
# Least sales amount items (Least 20)
import jinja2

cm = sns.light_palette('pink', as_cmap = True)
item_sales = groceries_df.groupby('Item Name')['Rate'].sum().sort_values(ascending= False)
item_sales.to_csv('ItemSales.csv')
item_sales = pd.read_csv('ItemSales.csv')
item_sales.tail(20).style.background_gradient(cmap=cm)

# **5. Research Methodology** 

## **5.1 Associate Rule Mining with Apriori Algorithm**

##### **What is Apriori?**

#### Apriori algorithm uses frequent itemsets to get association rules, but on the assumptions that:

*1. All subsets of frequent itemsets must be frequent*

*2. Similarly incase of infrequent subset their parent set is infrequent too The algorithm works in such a way that a minimum support value is set and iterations happen with frequent itemsets. Itemsets and subsets are ignored if their support is below the threshold till there can’t be any removal.*

![alt text](image-6.png)

*Association rule is related to the statement of “what’ with what”. This matter can be in a form of statement on transaction activity carried out by the customers at a supermarket. From that statement, there has a strong relation to the study of customer transaction data database to determine the habit of a purchased product with what product, thus, association rule is frequently referred as market basket analysis. The significance of an associative rule can be figured in the presence of two parameters, namely **support** and **confidence**. Support (supporting value) is the percentage of combinations of product items in the database. While confidence (certainty value) is a value to determine the strength of inter-item relationships in association rules.*

## **5.2 We can utilize three core measures that are used in Association Rule Learning, which are: Support, Confidence, and Lift.**

### **i. Support:**

 *It signifies the popularity of the item, if an item is less frequently bought then it will be ignored in the association.*


### **ii. Confidence:**

 It tells the likelihood of purchasing Y when X is bought.Sounds more like a conditional probability. Infact it is ! But it fails to check the popularity(frequency) of Y to overcome that we got lift.

### **iii. Lift:**

 It combines both confidence and support.A lift greater than 1 suggests that the presence of the antecedent increases the chances that the consequent will occur in a given transaction. Lift below 1 indicates that purchasing the antecedent reduces the chances of purchasing the consequent in the same transaction.

![alt text](image-2.png)

In [None]:

# For importing, cleaning and transforming data
import numpy as np
import pandas as pd
# For data analysis
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

In [None]:
groceries_df

In [None]:
basket=(groceries_df.groupby(['Bill No','Item Name'])['Qty']
        .sum().unstack().reset_index().fillna(0)
        .set_index('Bill No'))

def encode_unit(x):
    if x<= 0:
        return 0
    if x>= 0:
        return 1
basket_sets=basket1.map(encode_unit)

freq_itemsets=apriori(basket_sets,min_support=0.01,use_colnames=True)
print(freq_itemsets)

rules=association_rules(freq_itemsets, metric="lift", min_threshold=0.05)

# confidence tells us the how likely the consequent will be bought when the antecedents is bought

# lift tells us the strength of the rule
print(rules.sort_values(by='lift', ascending=False))

##################################################################
# This code will rise Memory Error - It requires too much memory #
##################################################################

### Since it takes too much memory to process the whole basket, Lets analyse the baskets month-wise

In [None]:
basket1=(groceries_df[groceries_df['Month']=="April"]
        .groupby(['Bill No','Item Name'])['Qty']
        .sum().unstack().reset_index().fillna(0)
        .set_index('Bill No'))

def encode_unit(x):
    if x<= 0:
        return 0
    if x>0:
        return 1
basket1_sets=basket1.map(encode_unit)

freq_itemsets_apr=apriori(basket1_sets,min_support=0.01,use_colnames=True)

rules_apr=association_rules(freq_itemsets_apr, metric="lift", min_threshold=0.05)

In [None]:
#Freq Items
freq_itemsets_apr.sort_values(by='support',ascending=False)

In [None]:
# confidence tells us the how likely the consequent will be bought when the antecedents is bought
# lift tells us the strength of the rule
rule1=rules_apr.sort_values(by='lift', ascending=False).drop(columns=['antecedent support','consequent support','leverage','conviction','zhangs_metric'])
rule1

#### *  *From the April month basket, we infer that Tomato and Onion has been frequently bought together.*

### Lets analyse the July month basket - which has registered the lowest sale

In [None]:
basket3=(groceries_df[groceries_df['Month']=="July"]
        .groupby(['Bill No','Item Name'])['Qty']
        .sum().unstack().reset_index().fillna(0)
        .set_index('Bill No'))

def encode_unit(x):
    if x<= 0:
        return 0
    if x>0:
        return 1
basket3_sets=basket3.map(encode_unit)

freq_itemsets_july=apriori(basket3_sets,min_support=0.01,use_colnames=True)

rules_july=association_rules(freq_itemsets_july, metric="lift", min_threshold=0.05)

In [None]:
freq_itemsets_july.sort_values(by='support',ascending=False)

In [None]:
# confidence tells us the how likely the consequent will be bought when the antecedents is bought
# lift tells us the strength of the rule
rules_july.sort_values(by='lift', ascending=False).drop(columns=['antecedent support','consequent support','leverage','conviction','zhangs_metric'])

#### *  *From the July month basket, we infer that Tomato-Onion, Green chilli-Tomato has been frequently bought together.*

### Lets analyse the August month basket - which has registered the highest sale

In [None]:
basket4=(groceries_df[groceries_df['Month']=="August"]
        .groupby(['Bill No','Item Name'])['Qty']
        .sum().unstack().reset_index().fillna(0)
        .set_index('Bill No'))

def encode_unit(x):
    if x<= 0:
        return 0
    if x>0:
        return 1
basket4_sets=basket4.map(encode_unit)

freq_itemsets_aug=apriori(basket4_sets,min_support=0.01,use_colnames=True)
freq_itemsets_aug.sort_values(by='support',ascending=False)

In [None]:
rules_aug.sort_values(by='lift', ascending=False).drop(columns=['antecedent support','consequent support','leverage','conviction','zhangs_metric'])

### *   *From the August month basket, we can see that those who bought Pillow have bought the Bed and vice-versa. In this month alone we can see a non grocery item mostly bought in the basket, it maybe the admission time in the university so that newly admitted studednts who are opting for hostel's bought the furniture items.*

![alt text](image-7.png)

 **To create a basket for the overall sales data, we have cleaned the groceries dataset based on some criteria(Removed the single items & limited the columns to 10 in a single purchase) and reduced the size of the basket so that we can run the overall analysis.**

In [None]:
# For importing, cleaning and transforming data
import numpy as np
import pandas as pd

# For data analysis
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# Visualise the results
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
%matplotlib inline

# importing the basket dataframe
basket_df=pd.read_csv(r'C:/Users/Win 10/Desktop/Market-Basket-Analysis/basket_df.csv')

# Filling the NaN values with the word 'NA'
basket_df.fillna('NA', inplace=True)

# Formatting dataframe into list of lists
basket_df_list = basket_df.values.tolist()

# Removing 'NA' from each list
for i in range(len(basket_df_list)):
    basket_df_list[i] = [x for x in basket_df_list[i] if not x=='NA']

# Transactional encoding
te = TransactionEncoder()
te_ary = te.fit(basket_df_list).transform(basket_df_list)

df_encoded = pd.DataFrame(te_ary, columns=te.columns_)

# Apriori Algorithm

frequent_itemsets = apriori(df_encoded, min_support=0.003, use_colnames=True)

rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.05)

# confidence tells us the how likely the consequent will be bought when the antecedents is bought
rules.sort_values(by='confidence', ascending=False).head()

# lift tells us the strength of the rule
rules.sort_values(by='lift', ascending=False).head()

In [None]:
frequent_itemsets.sort_values(by='support',ascending=False)

In [None]:
rules.sort_values(by='lift', ascending=False).drop(columns=['antecedent support','consequent support','leverage','conviction','zhangs_metric']).head()

In [None]:
rules.sort_values(by='lift', ascending=False).drop(columns=['antecedent support','consequent support','leverage','conviction','zhangs_metric']).tail()

# **5.3 Visualisation of the Association rules**

## 5.3.1Creating a Bar Plot

In [None]:
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
fig = px.bar(rules, x=rules.index, y='support', text='support', labels={'index': 'Association Rule'})
fig.update_traces(texttemplate='%{text:.2f}', textposition='outside')
fig.update_layout(title='Association Rules by Support', xaxis_title='Association Rule', yaxis_title='Support')
fig.show()

In [None]:
sns.set_style("whitegrid")
fig = plt.figure(figsize=(12, 12))
ax = fig.add_subplot(projection = '3d')


x = rules['support']
y = rules['confidence']
z = rules['lift']

ax.set_xlabel("Support")
ax.set_ylabel("Confidence")
ax.set_zlabel("Lift")

ax.scatter(x, y, z)
ax.set_title("3D Distribution of Association Rules")

plt.show()

## 5.3.2 Creating Scatter Plot

In [None]:
#Setting up the style
plt.figure(figsize = (15, 15))
sns.set_style('darkgrid')
#Plotting the relationship between the metrics
plt.subplot(2,2,1)
sns.scatterplot(x="support", y="confidence",data=rules)
plt.subplot(2,2,2)
sns.scatterplot(x="support", y="lift",data=rules)
plt.subplot(2,2,3)
sns.scatterplot(x="confidence", y="lift",data=rules)
plt.subplot(2,2,4)
sns.scatterplot(x="antecedent support", y="consequent support",data=rules)
plt.title('Scatter-Plots')
plt.show()

In [None]:
fig = px.scatter(rules, x='confidence', y='lift', title='Confidence vs Lift')
fig.update_traces(marker=dict(size=12, color='skyblue', line=dict(width=2, color='DarkSlateGrey')), selector=dict(mode='markers'))
fig.update_layout(xaxis_title='Confidence', yaxis_title='Lift', showlegend=False)
fig.show()

In [None]:
fig = px.scatter(rules, x='confidence', y='support', title='Confidence vs Support')
fig.update_traces(marker=dict(size=12, color='skyblue', line=dict(width=2, color='DarkSlateGrey')), selector=dict(mode='markers'))
fig.update_layout(xaxis_title='Confidence', yaxis_title='Support', showlegend=False)
fig.show()

## 5.3.3 Network Graph

In [None]:
import networkx as nx
import re

In [None]:
def draw_network(rules, rules_to_show):
  # Directional Graph from NetworkX
  network = nx.DiGraph()
  
  # Loop through number of rules to show
  for i in range(rules_to_show):
    
    # Add a Rule Node
    network.add_nodes_from(["R"+str(i)])
    for antecedents in rules.iloc[i]['antecedents']: 
        # Add antecedent node and link to rule
        network.add_nodes_from([antecedents])
        network.add_edge(antecedents, "R"+str(i),  weight = 2)
      
    for consequents in rules.iloc[i]['consequents']:
        # Add consequent node and link to rule
        network.add_nodes_from([consequents])
        network.add_edge("R"+str(i), consequents,  weight = 2)

  color_map=[]  
  
  # For every node, if it's a rule, colour as Black, otherwise Orange
  for node in network:
       if re.compile("^[R]\d+$").fullmatch(node) != None:
            color_map.append('black')
       else:
            color_map.append('orange')
  
  # Position nodes using spring layout
  pos = nx.spring_layout(network, k=16, scale=1)
  # Draw the network graph
  nx.draw(network, pos, node_color = color_map, font_size=8)            
  
  # Shift the text position upwards
  for p in pos:  
      pos[p][1] += 0.12

  nx.draw_networkx_labels(network, pos)
  plt.title("Network Graph for Association Rules")
  plt.show()

draw_network(rules, 10)

# **5.4 Business Application**

##### *Let’s say the grocery has bought up too much Egg and is now worrying that the stocks will expire if they cannot be sold out in time. To make matters worse, the profit margin of Whole Milk is so low that they cannot afford to have a promotional discount without killing too much of their profits. One approach that can be proposed is to find out which products drive the sales of Whole Milk and offer discounts on those products instead.*

In [None]:
egg_rules = rules[rules['consequents'].astype(str).str.contains('EGG')]
egg_rules = egg_rules.sort_values(by=['lift'],ascending = [False]).reset_index(drop = True)

display(egg_rules.head())

##### *For instance, we can apply a promotional discount on Vegetables and Bread. Some of the associations may seem counter-intuitive, but the rules state that these products do drive the sales of Egg.*

# **6. Inferences & Conclusions**

#### Exploratory Data Analysis (EDA)

- **Duration of the Dataset:** The dataset covers a 6-month period.
- **Number of Unique Itemsets:** The dataset contains a variety of unique itemsets, including several frequently purchased combinations.
- **Number of Purchases:** Over the 6-month period, the dataset recorded a significant number of purchases, showcasing the buying patterns of the customers.
- **Weekly Sales Analysis:** Weekly sales data indicates a consistent pattern of purchasing, with certain peaks during weekends and special sale events.
- **Monthly Sales Analysis:** Monthly sales data reveals trends such as increased purchases at the beginning and end of each month, possibly due to salary cycles and end-of-month sales.

#### Market Basket Analysis using Apriori Algorithm

The Apriori algorithm was applied to the dataset to identify frequent itemsets and association rules. The key findings include:

- **Onion-Tomato:** This combination was found to be one of the most frequently purchased together, indicating that customers commonly buy these two vegetables simultaneously.
- **Green Chilli-Tomato:** Similar to the Onion-Tomato combination, Green Chilli-Tomato pairs are also frequently bought together, suggesting a pattern in the customers' cooking habits.
- **Bed-Pillow:** This combination indicates a strong association between these bedding items, reflecting a common shopping behavior for home essentials.

The Apriori classification method efficiently identified these frequent itemsets by analyzing the transactions and determining the support and confidence levels of the item pairs.

#### Suggestions for the Departmental Store

1. **Promotional Bundling:** Create bundled promotions for frequently bought together items like Onion-Tomato and Green Chilli-Tomato. This can encourage customers to buy more and increase sales volume.
  
2. **Cross-Merchandising:** Place related items like bed and pillows together in the store to make it easier for customers to find and purchase these items together, enhancing their shopping experience.

3. **Inventory Management:** Ensure that frequently paired items are always in stock to avoid missed sales opportunities. Regularly monitor inventory levels for these items and adjust stock accordingly.

4. **Targeted Marketing:** Utilize the insights from the market basket analysis to design targeted marketing campaigns. For example, send personalized offers to customers who frequently buy onions and tomatoes together.

5. **Layout Optimization:** Organize the store layout to reflect the common buying patterns identified. For instance, placing vegetables that are often bought together in close proximity can streamline the shopping process for customers.

By implementing these strategies, the departmental store can enhance customer satisfaction, boost sales, and improve overall operational efficiency.

# **References**

1. Raich, B. Ganguly, and M. Tota, "Machine Learning for Market Basket Analysis through," IOSR Journal of Engineering (IOSRJEN), pp. 22-23, 2019. 
2. S. Mainali, "MARKET BASKET ANALYSIS," GitHub, Kirtipur, 2016.
3. https://www.researchgate.net/publication/355894565_Market_Basket_Analysis_Approach_to_Machine_Learning
4. https://www.researchgate.net/publication/365489098_MARKET_BASKET_ANALYSIS_FOR_A_SUPERMARKET
5. https://www.kaggle.com/code/mukandkrishna/mba-apriori