## Market Basket Analysis with Apriori

Well hello there! 🙃 In this notebook I'll try to explore the groceries dataset and apply market basket analysis to identify products that are frequently purchased together and construct association rules. But first, it is always a good practice to do some Exploratory Data Analysis.

![supermarket](https://www.savethestudent.org/uploads/Supermarket-Savings-1.jpg)

### Importing the data & first look 

Here we are just importing the necessary libraries, loading the data and taking a first look at its contents. 

In [None]:
#Importing the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from itertools import permutations
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

plt.style.use('seaborn-pastel')

In [None]:
#Reading the csv file
gro = pd.read_csv('../input/groceries-dataset/Groceries_dataset.csv', index_col='Date', parse_dates=True)

In [None]:
#Taking a look at the data shape and sorting by date
print(gro.shape)
gro.sort_index(inplace=True)
gro.head()

In [None]:
#Number of unique Costumers and Items
print(gro.Member_number.nunique())
print(gro.itemDescription.nunique())

### Data Preparation and Visualization

Now, let's prepare the data to create some visualizations. We'll generate the following plots:

* Most purchased items 
* Least purchased items
* Total count of items sold per month each year
* Total count of items sold per day each year
* Total count of items sold per weekday each year
* Top costumers (Costumers who bought the most)

In [None]:
#Creating new columns based on the date column
gro['year'] = gro.index.year
gro['month'] = gro.index.month
gro['day'] = gro.index.day
gro['weekday'] = gro.index.strftime('%A')
gro['monthName'] = gro.index.strftime('%B')
gro.head()

In [None]:
#Chart 1 - Most purchased items
gro['itemDescription'].value_counts().head(20).plot.bar(figsize=(8, 6), alpha=0.8, color='violet')
plt.title('20 most purchased items', size=15)
plt.ylabel('Quantity')

In [None]:
#Chart 2 - Least purchased items
gro['itemDescription'].value_counts().tail(20).plot.bar(figsize=(8, 6), alpha=0.8, color='lightseagreen')
plt.title('20 least purchased items', size=15)
plt.ylabel('Quantity')

In [None]:
#Chart 3 - Total items sold per month each year
plt.figure(figsize=(8,6))
ax = sns.countplot(x='monthName', hue='year', palette='GnBu', data=gro)
ax.set_xticklabels(ax.get_xticklabels(), rotation=45)
plt.title('Total items sold per month each year', size=15)
plt.xlabel('Month')
plt.ylabel('Quantity')

In [None]:
#Chart 4 - Total items sold per day each year
plt.figure(figsize=(15,8))
ax = sns.countplot(x='day', hue='year', palette='YlOrBr', data=gro)
ax.set_xticklabels(ax.get_xticklabels(), rotation=45)
plt.title('Total items sold per day each year', size=15)
plt.xlabel('Day')
plt.ylabel('Quantity')

In [None]:
#Chart 5 - Total items sold per weekday each year  
plt.figure(figsize=(8,6))
ax = sns.countplot(x='weekday', hue='year', palette='RdPu', data=gro)
ax.set_xticklabels(ax.get_xticklabels(), rotation=45)
plt.title('Total items sold per weekday each year', size=15)
plt.xlabel('Weekday')
plt.ylabel('Quantity')

In [None]:
#Chart 6 - Top Costumers
plt.figure(figsize=(8,6))
ax = sns.countplot(x='Member_number', palette='winter', data=gro, alpha=0.6, order=gro.Member_number.value_counts().iloc[:20].index)
ax.set_xticklabels(ax.get_xticklabels(), rotation=90)
plt.title('Top Costumers', size=15)
plt.xlabel('Costumer')
plt.ylabel('Quantity Purchased')

### Data modelling for Association Rules and Apriori

Here we'll prepare the data to a suitable form for the apriori algorithm. The first step is to generate the transactions, the items bought by a unique costumer each day. Then, we'll perform one hot encoding to treat these categorical features as values.

In [None]:
#Grouping by Costumers and date to create transactions
transactions = gro.groupby(['Member_number', 'Date'])['itemDescription'].unique().reset_index()

In [None]:
#Taking a look at the number of transactions
print(transactions.shape)
transactions.head()

In [None]:
#Separating the transactions as a list of lists and taking a look
trsct = list(list(i) for i in transactions.itemDescription.values)
trsct

In [None]:
#one hot encoding and creating the encoded Dataframe
encoder = TransactionEncoder().fit(trsct)
onehot = encoder.transform(trsct)
dfonehot = pd.DataFrame(onehot, columns=encoder.columns_)
dfonehot.head()

### The Apriori algorithm

Apriori proceeds by identifying the frequent individual items in the data and extending them to larger and larger itemsets as long as those itemsets appear sufficiently often in the data. It prunes itemsets not known to be frequent.

### Metrics 

A metric is a measure of performance for rules. We'll work with the **support metric**, **confidence metric** and **lift metric**. 

**support:** Measures the share of transactions that contain an item. We can calculate it as followed:

support = number of transactions with item / total transactions

**confidence:** Says how likely item Y is purchased when item X is purchased. We can calculate it as followed:

confidence{X->Y} = support{X,Y} / support{X}

**lift:** Says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is.

lift{X->Y} = support{X,Y} / support{X} * support{Y}

### Association Rules

Association rules are rules that try to satisfy connections between items with a specified minimum support and a specified minimum confidence at the same time.

It contains antecedent(s) and consequent(s).

e.g. {milk} --> {coffee}

**Multi-antecedent rule**
e.g. {apple, banana} --> {orange}

**Multi-consequent rule**
e.g {bread} --> {peanut butter, jam}

In [None]:
#Applying the apriori algorithm with a min_support of 0.002
frequent_itemsets = apriori(dfonehot, min_support=0.002, use_colnames=True)
print(len(frequent_itemsets))

In [None]:
#Compute association rules with a lift threshold of 1
rules = association_rules(frequent_itemsets, metric='lift', min_threshold=1)

In [None]:
#Printing our final rules
rules