# Association Rules - Market Basket Analysis

Some of us go to the grocery with a standard list; while some of us have a hard time sticking to our grocery shopping list, no matter how determined we are. No matter which type of person you are, retailers will always be experts at making various temptations to inflate your budget.

Remember the time when you had the “_Ohh, I might need this as well._” moment? Retailers boost their sales by relying on this one simple intuition.

> **_People that buy this will most likely want to buy that as well._**

People who buy bread will have a higher chance of buying butter together, therefore an experienced assortment manager will definitely know that having a discount on bread pushes the sales on butter as well.

## Data-driven strategies

Huge retailers pivot on a detailed market basket analysis to uncover associations between items. Using this valuable information, they are able to carry out various strategies to improve their revenue:

- Associated products are placed close to each other, so that buyers of one item would be prompted to buy the other.
- Discounts can be applied to only one of the associated products.

![](https://miro.medium.com/v2/resize:fit:770/0*EKYQHLhOlfpW3FeW)

## Association Rule Mining

But how exactly is a Market Basket Analysis carried out?

Data scientists are able to carry out Market Basket Analysis by implementing Association Rule Mining. Association Rule Mining is a **rule-based machine learning method that helps to uncover meaningful correlations between different products** according to their co-occurrence in a data set.

However, one of the major pitfalls is that it consists of various formulas and parameters that may make it difficult for people without expertise in data mining. Therefore, before sharing your results with stakeholders, make sure that the underlying definitions are well-understood.

## Core concepts illustration

I will be illustrating three of the core concepts that are used in Association Rule Mining with some simple examples below. This will assist you in grasping the data mining process.

Let’s say you have now opened up your own cafeteria. How will you utilize your data science skills to __understand which of the items on your menu are associated__?

There are six transactions in total with various different purchases that happened in your cafeteria.

![](https://miro.medium.com/v2/resize:fit:201/1*ak7rukRbBei1LAD-jHNuig.png)

We can utilize three core measures that are used in Association Rule Learning, which are: **Support**, **Confidence,** and **Lift**.

### 1. Support

**Support is just the plain basic probability of an event to occur.** It is measured by the __proportion of transactions in which an item set appears__. 

To put it in another way, $Support(A)$ is the number of transactions which includes A divided by the total number of transactions.

If we analyze the transaction table above, the support for ``cookie`` is $3$ out of $6$. That is, out of a total of __$6$ transactions__, purchases containing ``cookies`` have occurred __3 times__ (or $50\%$).

![](https://miro.medium.com/v2/resize:fit:184/1*JvwE_0DeslBzi9HJTYLVrw.png)

Support can be implemented onto multiple items at the same time as well. The support for ``cookie`` and ``cake`` is 2 out of 6.

![](https://miro.medium.com/v2/resize:fit:267/1*zfzXNNkUTPVFOgLkSvnNwQ.png)

### 2. Confidence

The confidence of a **consequent** **event** given an **antecedent event** can be described by using __conditional probability__ (_Bayes' Theorem_). 

Simply put, it is **the probability of event $A$ happening given that event $B$ has already happened.**

This can be used to describe the probability of an item being purchased when another item is already in the basket. It is measured by dividing the proportion of transactions with item $X$ and $Y$, over the proportion of transactions with $Y$.

From the transactions table above, the confidence of ``{cookie -> cake}`` can be formulated below:

![](https://miro.medium.com/v2/resize:fit:560/1*faL2TJQAWLdKToSqp9LJkQ.png)

The __conditional probability__ can also be written as:

![](https://miro.medium.com/v2/resize:fit:477/1*-ecvkPBR_2sVPsBJFYx2lw.png)

Finally, we arrive at a solution of $2$ out of $3$. We can understand the intuition of confidence if we were to look only at ``Transaction 1`` to ``Transaction 3``. Out of $3$ purchases with ``cookies``, $2$ of them are actually bought together with a ``cake``!

### 3. Lift

Lift is the **_observed to expected ratio_** (abbreviation o/e). Lift measures **how likely an item is purchased when another item is purchased, while controlling for how popular both items are**. It can be calculated by dividing the probability of both of the items occurring together by the product of the probabilities of the both individuals items occurring as if there was no association between them.

![](https://miro.medium.com/v2/resize:fit:660/1*qDMrzlJob5o9K4rnAb1YQg.png)

A lift of $1$ will then mean that __both of the items are actually independent and without any association__. 

For any value __higher than $1$__, lift shows that there is __actually an association__. The __higher the value, the higher the association__.

Looking at the table again, the lift of ``{cookies -> cake}`` is $2$, which implies that there is actually an association between ``cookies`` and ``cakes``.

Now that we have mastered all the core concepts, we can look into an algorithm that is able to generate item sets from transactional data, which is used to calculate these association rules.

## The Apriori Algorithm

## Overview

The Apriori Algorithm is one of the most popular algorithms used in association rule learning over relational databases. It **identifies the items in a data set and further extends them to larger and larger item sets**.

However, the Apriori Algorithm only extends if the item sets are frequent, that is the probability of the itemset is beyond a certain predetermined threshold.

More formally,

**The Apriori Algorithm proposes that:**

The probability of an itemset is not frequent if:

- $P(I) <$ Minimum support threshold, where I is any non-empty itemset
- Any subset within the itemset has value less than minimum support.

The second characteristic is defined as the **Anti-monotone Property**!

A **good example** would be if the probability of purchasing a ``burger`` is below the minimum support already, the probability of purchasing a ``burger`` and ``fries`` will definitely be __below the minimum support as well__.

## Steps in the Apriori Algorithm

The diagram below illustrates how the **Apriori Algorithm** starts building from the __smallest itemset__ and further extends forward.

- The algorithm starts by **generating an itemset through the Join Step**, that is to generate $(K+1)$ itemset from $K$-itemsets. For example, the algorithm generates ``Cookie``, ``Chocolate`` and ``Cake`` in the first iteration.
- Immediately after that, the algorithm proceeds with the __Prune Step__, that is to **remove any candidate item set that does not meet the minimum support requirement**. For example, the algorithm will remove ``Cake`` if $Support($ ``Cake`` $)$ is below the predetermined minimum $Support$.

It iterates both of the steps until there are no possible further extensions left.

Note that this diagram is not the complete version of the transactions table above. It serves as an illustration to help paint the bigger picture of the flow.

![](https://miro.medium.com/v2/resize:fit:770/1*oGmHkz3QXn-Dxf7WZeuYSg.png)

## Code Implementation

To perform a Market Basket Analysis implementation with the Apriori Algorithm, we will be using the [Groceries dataset](https://www.kaggle.com/datasets/heeraldedhia/groceries-dataset) from Kaggle. The data set was published by Heeral Dedhia on 2020 with a General Public License, version 2.

The dataset has $38765$ rows of purchase orders from the grocery stores.

### Import and read data

- First of all, let’s import some necessary modules and read the datasets that we have downloaded from Kaggle.


In [None]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import re
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from mlxtend.preprocessing import TransactionEncoder
from mpl_toolkits.mplot3d import Axes3D
import networkx as nx

In [None]:
basket = pd.read_csv("Groceries_dataset.csv")
basket.head()

### Some EDA
Let’s first have a look at the top 10 most selling products:

In [None]:
x = basket['itemDescription'].value_counts().sort_values(ascending=False)[:10]
fig = px.bar(x= x.index, y= x.values)
fig.update_layout(title_text= "Top 10 frequently sold products", xaxis_title= "Products", yaxis_title="Count", width=800, height=600)
fig.show()

Now let’s explore the higher sales:

In [None]:
basket["Year"] = basket['Date'].str.split("-").str[-1]
basket["Month-Year"] = basket['Date'].str.split("-").str[1] + "-" + basket['Date'].str.split("-").str[-1]
fig1 = px.bar(basket["Month-Year"].value_counts(ascending=False), 
              orientation= "v", 
              color = basket["Month-Year"].value_counts(ascending=False),
               labels={'value':'Count', 'index':'Date','color':'Meter'})

fig1.update_layout(title_text="Exploring higher sales by the date", width=800, height=600)

fig1.show()

### Grouping into transactions

- The data set records individual item purchases in a row. We will have to group these purchases into baskets of items.
- After that, we will use ``TransactionEncoder`` to encode the transactions into a format that is suitable for the Apriori algorithm.

In [None]:
basket.itemDescription = basket.itemDescription.transform(lambda x: [x])
basket = basket.groupby(['Member_number','Date']).sum()['itemDescription'].reset_index(drop=True)

encoder = TransactionEncoder()
encoded_data = encoder.fit_transform(basket)
transactions = pd.DataFrame(encoded_data, columns=encoder.columns_)
transactions.head()

**_Note:_** The data frame records each row as a transaction, and the items that were purchased in the transaction will be recorded as ``True``.

### Apriori and Association Rules

- The Apriori Algorithm will be used to generate frequent item sets. We will be specifying the minimum support to be $6$ out of total transactions. The association rules are generated and we filter for **Lift** value $> 1.5$.


In [None]:
frequent_itemsets = apriori(transactions, min_support= 6/len(basket), use_colnames=True, max_len = 2)
rules = association_rules(frequent_itemsets, metric="lift",  min_threshold = 1.5)
print("Rules identified: ", len(rules))
rules.head()

* ``support``: Typically, support is used to measure the abundance or frequency (often interpreted as significance or importance) of an itemset in a database. We refer to an itemset as a "frequent itemset" if you support is larger than a specified minimum-support threshold. Note that in general, due to the downward closure property, all subsets of a frequent itemset are also frequent. In other words, it is simply the **probability that a customer will buy an item**. The mathematical formula to represent support of item X is:
$$
\text{support}(A\rightarrow C) = \text{support}(A \cup C), \;\;\; \text{range: } [0, 1]
$$

* ``confidence``: The confidence of a rule $A \rightarrow C$ is the probability of seeing the consequent in a transaction given that it also contains the antecedent. Note that the metric is not symmetric or directed; for instance, the confidence for $A \rightarrow C$ is different than the confidence for $C \rightarrow A$. The confidence is $1$ (maximal) for a rule $A \rightarrow C$ if the consequent and antecedent always occur together. In other words, it tells us the **impact of one product on another** that is the probability that if a person buys product $X$ then he/she will buy product $Y$ also. Its representation in mathematical terms is:
$$
\text{confidence}(A\rightarrow C) = \frac{\text{support}(A\rightarrow C)}{\text{support}(A)}, \;\;\; \text{range: } [0, 1]
$$

* ``lift``: The lift metric is commonly used to measure how much more often the antecedent and consequent of a rule $A \rightarrow C$ occur together than we would expect if they were statistically independent. If $A$ and $C$ are independent, the **Lift** score will be exactly $1$. In other words, Lift will calculate the **confidence taking into account the popularity of both items**. Representation of lift in mathematical terms is:
$$
\text{lift}(A\rightarrow C) = \frac{\text{confidence}(A\rightarrow C)}{\text{support}(C)}, \;\;\; \text{range: } [0, \infty]
$$

* ``leverage``: Leverage computes the difference between the observed frequency of $A$ and $C$ appearing together and the frequency that would be expected if $A$ and $C$ were independent. A leverage value of $0$ indicates independence. The mathematical formula is as follows:
$$
\text{levarage}(A\rightarrow C) = \text{support}(A\rightarrow C) - \text{support}(A) \times \text{support}(C), \;\;\; \text{range: } [-1, 1]
$$

* ``conviction``: A __high conviction value means that the consequent is highly depending on the antecedent__. For instance, in the case of a perfect confidence score, the denominator becomes $0$ (due to $1 - 1$) for which the conviction score is defined as ``inf``. Similar to lift, if items are independent, the conviction is $1$. The mathematical formula is as follows:
$$
\text{conviction}(A\rightarrow C) = \frac{1 - \text{support}(C)}{1 - \text{confidence}(A\rightarrow C)}, \;\;\; \text{range: } [0, \infty]
$$

* ``zhangs_metric``: Measures both association and dissociation. Value ranges between $-1$ and $1$. A positive value $(>0)$ indicates **Association** and negative value indicated **Dissociation**. The mathematical formula is as follows:
$$
\text{zhangs metric}(A\rightarrow C) = \frac{\text{confidence}(A\rightarrow C) - \text{confidence}(A'\rightarrow C)}{Max[ \text{confidence}(A\rightarrow C) , \text{confidence}(A'\rightarrow C)]}, \;\;\; \text{range: } [-1, 1]
$$

### Visualizations

To visualize our association rules, we can plot them in a 3D scatter plot. Rules that are closer to top right are the rules that can be the most meaningful to be further dived in.

In [None]:
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(projection = '3d')

x = rules['support']
y = rules['confidence']
z = rules['lift']

ax.set_xlabel("Support")
ax.set_ylabel("Confidence")
ax.set_zlabel("Lift")

ax.scatter(x, y, z)
ax.set_title("3D Distribution of Association Rules")

plt.show()

Another type of visualizations to look at the relationship between the products is via **Network Graph**. 

Let’s define a function to draw a network graph which can specify how many rules we want to show.

In [None]:
def draw_network(rules, rules_to_show):
  # Directional Graph from NetworkX
  network = nx.DiGraph()
  
  # Loop through number of rules to show
  for i in range(rules_to_show):
    
    # Add a Rule Node
    network.add_nodes_from(["R"+str(i)])
    for antecedents in rules.iloc[i]['antecedents']: 
        # Add antecedent node and link to rule
        network.add_nodes_from([antecedents])
        network.add_edge(antecedents, "R"+str(i),  weight = 2)
      
    for consequents in rules.iloc[i]['consequents']:
        # Add consequent node and link to rule
        network.add_nodes_from([consequents])
        network.add_edge("R"+str(i), consequents,  weight = 2)

  color_map=[]  
  
  # For every node, if it's a rule, colour as Black, otherwise Orange
  for node in network:
       if re.compile(r"^[R]\d+$").fullmatch(node) != None:
            color_map.append('black')
       else:
            color_map.append('orange')
  
  # Position nodes using spring layout
  pos = nx.spring_layout(network, k=16, scale=1)
  # Draw the network graph
  nx.draw(network, pos, node_color = color_map, font_size=8)            
  
  # Shift the text position upwards
  for p in pos:  
      pos[p][1] += 0.12

  nx.draw_networkx_labels(network, pos)
  plt.title("Network Graph for Association Rules")
  plt.show()

draw_network(rules, 10)

### Business Application

Let’s say the grocery has bought up too much ``Whole Milk`` and is now worrying that the stocks will expire if they cannot be sold out in time. To make matters worse, the profit margin of ``Whole Milk`` is so low that they cannot afford to have a promotional discount without killing too much of their profits.

One approach that can be proposed is to find out which products drive the sales of ``Whole Milk`` and offer discounts on those products instead.

In [None]:
milk_rules = rules[rules['consequents'].astype(str).str.contains('whole milk')]
milk_rules = milk_rules.sort_values(by=['lift'], ascending=False).reset_index(drop=True)
milk_rules.head()

For instance, we can apply a promotional discount on ``Brandy``, ``Softener``, ``Canned Fruit``, ``Syrup`` and ``Artificial Sweetener``. Some of the associations may seem counter-intuitive, but the rules state that these products do drive the sales of ``Whole Milk``.

## Another example - Movie Recommender using Apriori Algorithm

_Retrieved from Kaggle: https://www.kaggle.com/code/ankits29/movie-recommendation-with-ml-apriori-explained_

The dataset at hand contains records of movies watched by users and their ratings. Your job is to extract relations of the movies watched by a user and recommend movies to a user based on the previously watched movies. This is same as youtube recommending videos to you saying people who watched this video also watched this, or maybe like Netflix or Amazon prime recommending you other movies or series based on your watch history and of others who have watched the same movies as you.

Let's begin creating a recommendation system for movies.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Load the data

In [None]:
ratings_df = pd.read_csv('ratings_small.csv')
movies_df = pd.read_csv('movies_metadata.csv', low_memory=False)

In [None]:
ratings_df.head()

In [None]:
movies_df.head()

The ratings dataframe contains information of userId, the movieId of the movie watched by that user, the rating given by the user and timestamp.

The movies dataframe contains the information of the movies like movieId, title, genre and so on.

In [None]:
plt.figure(figsize=(10,5))
ax = sns.countplot(data=ratings_df, x='rating')
labels = (ratings_df['rating'].value_counts().sort_index())
plt.title('Distribution of Ratings')
plt.xlabel('Ratings')

for i,v in enumerate(labels):
    ax.text(i, v+100, str(v), horizontalalignment='center', size=14, color='black')
plt.show()

The ratings distribution shows that there are relatively fewer movies that are lower rated. This can be because most of the users who didn't like the movie, didn't care enough to rate the movie. You should note this, it can be useful later. As you wouldn't want to recommend movies with relatively low number of ratings as users probably didn't like them.

### Clean the Data

You can see that in the movies dataframe, there are few records with Nan title. This doesn't serve your purpose as you cannot recommend movies without title. You can remove these records

In [None]:
title_mask = movies_df['title'].isna()

In [None]:
movies_df = movies_df.loc[title_mask == False]

You would also like to merge the two dataframes so that you have a dataframe having userId and the title of the movie watched by the user. If you know SQL, you might be familiar with the concept of join. You can merge the two dataframe on a common column -> movieId. As a result, you will have the records of ratings dataframe concatenated with the corresponding details of the movie from the movies dataframe and the way it gets to know the corresponding record is by using the common column movieId.

Before merging you need to convert the string datatype of id column of movies dataframe to int as that in the ratings dataframe.

In [None]:
movies_df = movies_df.astype({'id': 'int64'})

In [None]:
df = pd.merge(ratings_df, movies_df[['id', 'title']], left_on='movieId', right_on='id')
df.head()

Id column is repeated and the timestamp is not important for this problem. So, you can drop the two.

In [None]:
df.drop(['timestamp', 'id'], axis=1, inplace=True)

The apriori model needs data in a format such that the userId forms the index, the columns are the movie titles and the values can be 1 or 0 depending on whether that user has watched the movie of the corresponding column. The resulting data is like a user's watchlist, for each userId, having 1 in columns of the movies that the user has watched and 0 otherwise.

You can achieve this by using pivot on the dataframe. To do so you need to first make sure there are no duplicate records for the combination of userId and title.

In [None]:
df = df.drop_duplicates(['userId','title'])

In [None]:
df_pivot = df.pivot(index='userId', columns='title', values='rating').fillna(0)

You need to convert the ratings to 0 or 1 and also convert all float values to int.

In [None]:
df_pivot = df_pivot.astype('int64')

In [None]:
def encode_ratings(x):
    if x<=0:
        return 0
    if x>=1:
        return 1

df_pivot = df_pivot.applymap(encode_ratings)

In [None]:
df_pivot.head()

Your data looks to be ready now.

### Train the Model

The apriori model calculates the probability to determine how likely a user will watch movie M2 if he has already watched a movie M1. It does so by computing support, confidence and lift for different combinations of movies.

* __Support__ of a movie $M1$ is like the probability of users watching movie M1. Support computes, out of the total users, how much percentage of users have movie M1 in their watchlist.
* __Confidence__ of a movie is out of the total users having watched movie M1, how many have also watched movie M2. It is denoted as Confidence(M1 -> M2).
* __Lift__ is the ratio of __confidence__ and __support__.

From the definition of support, you know that Support(M2) is the likelihood of users watching movie M2 if you recommend it to all the users.

While Confidence(M1 -> M2), is the likelihood of users watching movie M2 if you recommend it to only the users who have already watched movie M1. In confidence you recommend the movie to a subset of population.

Lift then by definition is the measure of increase in likelihood of users watching the movie M2 when we recommend it to the subset than when we recommend it to entire population. So a high lift suggests there is some relation between the two movies and most of the users who have watched movie M1 are also likely to watch movie M2.

This is for just one pair, the model has to compute this for every possible combination of movies to recommend the movies that the user will most likely watch.

Looks like your model has to do a lot of computation.

You can ease its job by using a threshold for a minimum support. As you have seen that movies with low rating have less number of reviews. So, you don't want to bother the model to recommend such movies which users don't like. Setting a minimum support ensures that atleast some percentage of users have watched that movie.

In [None]:
from mlxtend.frequent_patterns import apriori

frequent_itemset = apriori(df_pivot, min_support=0.07, use_colnames=True)

In [None]:
frequent_itemset.head()

The apriori algorithm has given you the support, using association_rules you can compute the other paramters like confidence and lift.

In [None]:
from mlxtend.frequent_patterns import association_rules

rules = association_rules(frequent_itemset, metric="lift", min_threshold=1)

In [None]:
rules.head()

# Interpret the Results

Let's sort the result by descending order of lift. So that the most likely movie that the user will watch is recommended first.

In [None]:
df_res = rules.sort_values(by=['lift'], ascending=False)
df_res.head()

Let's see what your model recommends to someone who has watched the **Men in Black II**.

In [None]:
df_MIB = df_res[df_res['antecedents'].apply(lambda x: len(x) ==1 and next(iter(x)) == 'Men in Black II')]

In [None]:
df_MIB = df_MIB[df_MIB['lift'] > 2]

In [None]:
df_MIB.head()

You have a bunch of recommendation there. Let's have a list of unique movies in the order of descending lift.

In [None]:
movies = df_MIB['consequents'].values

movie_list = []
for movie in movies:
    for title in movie:
        if title not in movie_list:
            movie_list.append(title)

In [None]:
movie_list[0:10]

Great! You have the top 10 movies that the user is most likely to watch. The result looks convincing to me.

We have a our own recommendation system now.