### Market Basket Analysis



#### Import Libraries

In [None]:
import numpy as np 
import pandas as pd
import os
        
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

import mlxtend as ml

#### Read Data

In [None]:
df = pd.read_csv('../input/datasets-for-appiori/basket_analysis.csv')
df.head()

### Drop irrelevant column

In [None]:
df.drop('Unnamed: 0',axis=1,inplace=True)

In [None]:
df.head()

#### Apriori Analysis

In [None]:
df.info()

In [None]:
apriori(df, min_support=0.20)[1:25]

The numbers written in the itemset column in the table represent the products (0-15). Product number 0 refers to Apple, product number 1 refers to Bread, product number 14 refers to Yogurt.
<br><br>
 0 Apple 999 non-null bool<br> <br>
 1 Bread 999 non-null bool<br>
 2 Butter 999 non-null bool<br>
...

In [None]:
print("Number of Rules:", len(apriori(df, min_support=0.15)))

Now, using the use_colnames=True parameter within the apriori algorithm, we switch from items(products) numbers to item(product) names.

In [None]:
apriori(df, min_support=0.20, use_colnames=True)[1:25]

In the table above, it is seen that single and double itemsets are formed. After we set the min_support value (0.20) and create rules sets, we create the Association Rules table according to the metric we are interested in (confidence, lift, conviction and etc.). Here, we chose Confidence as the metric and its value 0.20 (20%).

In [None]:
frequent_itemsets = apriori(df, min_support=0.20, use_colnames=True)
rules1 = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.20)

#### Association Rules

generate association rules from frequent itemsets

*For more information about association_rules: http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/*

In [None]:
print("Number of Rules:", len(rules1))

**Based on the Confidence metric 10 Rules:**

In [None]:
rules1 = rules1.sort_values(['confidence'], ascending=False)

#rules1[1:11]
rules1

**Comment 1:** If we examine the line with ID information of 0;
* The probability of seeing Ice cream and Butter items together (support) is 21% (0.207),
* 50% (0.504878) of the people who bought the Ice Cream item (confidence) probably also bought the Butter item,
* The sales (lift) of the Butter item in the shopping carts containing the Ice Cream item has increased by 1.20 times,
* How much higher (leverage) 0.03 is when Ice Cream and Butter items are purchased together than if they are purchased separately,
* We can say that Ice Cream and Butter items are related to each other (conviction) with a value of 1.17.

Now let's add the number of items in the antecedents and consequents parts and see the first 5 lines:

In [None]:
rules1["antecedent_len"] = rules1["antecedents"].apply(lambda x: len(x))
rules1["consequents_len"] = rules1["consequents"].apply(lambda x: len(x))
rules1[1:6]

We can do what we did for the confidence metric above for other metrics. For the lift metric as an example:

In [None]:
rules2 = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
rules2 = rules2.sort_values(['lift'], ascending=False)
rules2[1:6]

In [None]:
rules2["antecedent_len"] = rules2["antecedents"].apply(lambda x: len(x))
rules2["consequents_len"] = rules2["consequents"].apply(lambda x: len(x))
rules2[1:6]

**Filtering for Generated Rule Sets**

Filter 1: Let's see the first 10 records with an Antecedent item length of 1 and a Confidence value greater than 0.20 and a Lift value greater than 1.

In [None]:
rules1[(rules1['antecedent_len'] >= 1) &
       (rules1['confidence'] >= 0.20) &
       (rules1['lift'] > 1) ].sort_values(['confidence'], ascending=False)[1:10]

Filter 2: Similarly, the records with the Antecedents item name Kidney Beans, sorted by Confidence metric:

In [None]:
rules1[rules1['antecedents'] == {'Kidney Beans'}].sort_values(['confidence'], ascending=False)

Filter 3: records with the Consequents item name Butter, sorted by Confidence metric:

In [None]:
rules1[rules1['consequents'] == {'Butter'}].sort_values(['confidence'], ascending=False)