In [None]:
import numpy as np
import pandas as pd

In [None]:
df = pd.read_csv("../input/groceries-dataset-for-market-basket-analysismba/Groceries data.csv")

### **Dataset**

> **We'll be dealing with the Groceries dataset provided by Kaggle to perform the Market Basket Analysis. This analysis can be used as a way to offer products based on items that has been purchased often together, or to infer the rate which products that should be bought together, in fact are. As an example, suppose a companny offers a combo shirt+shorts, with this analysis, you may know how good the combo is performing.**

In [None]:
df.head()

> **We'll be creating a date column by collapsing the columns year, month and day to help us manage the data.**

In [None]:
df['date'] = pd.to_datetime(df[['year', 'month', 'day']])
df.head()

> **We need to perform a grouping on Member_number and date of purchase by putting them in a tuple, to impose that a purchase is defined by a combination of the Member going to the store on a specific day. This would be easier if we had an orderId field, but in this case, we had to construct the member_date field, which is our own orderId in this case. We are also creating a redundant field "quantity" which means each product was bought in 1 unit.**

In [None]:
df['member_date'] = list(zip(df['Member_number'], df['date'].dt.date))
df['quantity'] = 1

In [None]:
df.shape

> **Here we can see, on descending order, the more often products bought on the given shop, and also we see that the shop has 167 products. Next, we'll build the structure to make the analysis, grouping the items in the same Member_number, year, month and day. It is import to note that, by grouping Membernumber, year, month and day, we are considering that a Member made 1 purchase on that specific day, with 1 or more items on that purchase.**

In [None]:
df.groupby('itemDescription').size().sort_values(ascending=False)

In [None]:
basket = df.groupby(['member_date', 'itemDescription'])['quantity'].count().unstack().fillna(0)

In [None]:
def convert_values(value):
    if value >= 1:
        return 1
    else:
        return 0 

In [None]:
basket = basket.applymap(convert_values)

> **Wel'll be using MLXtend package which offers us tools to make the calculations needed for the association rules.**

In [None]:
from mlxtend.frequent_patterns import apriori 
from mlxtend.frequent_patterns import association_rules

In [None]:
basket_items = apriori(basket, min_support = 0.005, use_colnames = True, max_len = 2)

In [None]:
rules = association_rules(basket_items, metric = 'lift')

> **And here we have our association rules for the MBA Analysis. Note that we are giving more importance to the "confidence" field here, which is the chance of buying the second product(consequent) given the first one(antecedent) was bought. Since the 'whole milk' is the most purchased product in this shop, it'll be the consequent for most antecedents, but we can see in the 6º line of the table that someone who buys 'frankfurter' has a 13% chance of buying 'other vegetables'.**

In [None]:
rules.sort_values("confidence", ascending=False).head(15)

> **This analysis can also be used to make product recomendation, or to verify if certain product combos are behaving as expected(people are really buying the white shirt+pants combo instead of buying a purple shirt and white pants).**