# BigBasket Data Analysis
### Market basket analysis/ Association Analysis of a customer dataset 
Practical implementation of Market Basket Analysis in  Python
This algorithm is basically used for recomendation in reatail scenario (online/offline)
Generally used for identifying any upsale, cross-sale or recomendation purpose

In [1]:
#loading necessary packages
import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [2]:
#reading data from a BigBasket dataset of customers
data = pd.read_csv('bigBasket.csv')

In [3]:
data.head()

Unnamed: 0,Member,Order,SKU,Created On,Description,Quantity
0,M09736,6468572,34993740,41904.94792,Other Sauces,4
1,M09736,6468572,15669800,41904.94792,Cashews,5
2,M09736,6468572,34989501,41904.94792,Other Dals,3
3,M09736,6468572,7572303,41904.94792,Namkeen,3
4,M09736,6468572,15669856,41904.94792,Sugar,2


## Data Preperation
### Data cleaning

In [4]:
data['Description']= data['Description'].str.strip() #removes spaces from beginning and end of sentences in the column 'Description'
data.dropna(axis=0, subset=['Order'],inplace=True) #removes any duplicate 'Order' No.
data['Order']=data['Order'].astype('str')  #converting 'Order' No. to be string 
data = data[~data['Order'].str.contains('C')] #removing any credit Order No. if present any.
data.head()

Unnamed: 0,Member,Order,SKU,Created On,Description,Quantity
0,M09736,6468572,34993740,41904.94792,Other Sauces,4
1,M09736,6468572,15669800,41904.94792,Cashews,5
2,M09736,6468572,34989501,41904.94792,Other Dals,3
3,M09736,6468572,7572303,41904.94792,Namkeen,3
4,M09736,6468572,15669856,41904.94792,Sugar,2


### Record Count for Product Description 

In [5]:
data['Description'].value_counts()

Other Vegetables             4606
Beans                        4549
Root Vegetables              4303
Other Dals                   3272
Organic F&V                  3113
                             ... 
Lip Care                        1
Foot Care                       1
Office Stationery               1
Womens Deo                      1
Dishwash Liquids & Pastes       1
Name: Description, Length: 216, dtype: int64

### Getting a basket of transaction 

In [6]:
mybasket= (data.groupby(['Order','Description'])['Quantity'].sum().unstack().reset_index().fillna(0).set_index('Order'))

In [7]:
#viewing transaction basket
mybasket.head()

Description,After Shave,Agarbatti,Almonds,Aluminium Foil & Cling Wrap,Antiseptics,Avalakki / Poha,Ayurvedic,Ayurvedic Food,Baby Care Accessories,Baby Cereal,...,Vanaspati,Veg & Fruit,Vermicelli,Vinegar,Wafers,Washing Bars,Whole Grains,Whole Spices,Womens Deo,Yogurt & Lassi
Order,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6422558,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6422636,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0
6423338,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9.0
6423534,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6423959,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In the transaction basket dataset, if the value for any product in 0.0, then the product was absent in the 'Order',otherwise if the value is >0.0 and +ve, then the product was a part of the 'Order' those many times. 

### Defining a function that converts all the numbers less than 0 to 0 & all the numbers greater than 1 to 1

This is done to represent the datas in the dataframe to be only 0 and 1 as the algorithm of association analysis expects as input

In [9]:
def my_encode_units(x):
    if x <= 0:
        return 0
    if x>= 1:
        return 1
    
my_basket_sets =  mybasket.applymap(my_encode_units)

## Training Model

### using apriori on my basket sets 

In [12]:
#generating frequent itemsets
my_frequent_itemsets = apriori(my_basket_sets, min_support=0.07, use_colnames=True)
#considering the rules that have 0.07 support
#my_frequent_itemsets => type of transactions

In [13]:
#generating rules from above transactions
my_rules = association_rules(my_frequent_itemsets, metric="lift", min_threshold=1)

In [15]:
#viewing top 100 rules
my_rules.head(100)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Banana),(Beans),0.260284,0.399070,0.122094,0.469079,1.175431,0.018222,1.131864
1,(Beans),(Banana),0.399070,0.260284,0.122094,0.305946,1.175431,0.018222,1.065790
2,(Banana),(Brinjals),0.260284,0.272445,0.084655,0.325240,1.193782,0.013742,1.078243
3,(Brinjals),(Banana),0.272445,0.260284,0.084655,0.310722,1.193782,0.013742,1.073176
4,(Banana),(Gourd & Cucumber),0.260284,0.300346,0.086682,0.333028,1.108815,0.008507,1.049001
...,...,...,...,...,...,...,...,...,...
95,(Other Vegetables),"(Beans, Gourd & Cucumber)",0.427805,0.183975,0.131632,0.307692,1.672466,0.052927,1.178702
96,"(Beans, Root Vegetables)",(Gourd & Cucumber),0.236437,0.300346,0.113032,0.478064,1.591711,0.042019,1.340497
97,"(Beans, Gourd & Cucumber)",(Root Vegetables),0.183975,0.414093,0.113032,0.614388,1.483694,0.036849,1.519419
98,"(Root Vegetables, Gourd & Cucumber)",(Beans),0.168356,0.399070,0.113032,0.671388,1.682382,0.045846,1.828692


##### Example: 

In [16]:
my_basket_sets['Banana'].sum()

2183

In [17]:
my_basket_sets['Gourd & Cucumber'].sum()

2519

Here, The set of rules says that their is good support, lift and confidence for this combination as taken above. Thus we can recomend Item B=>'Gourd & Cucumber' to someone who buys 'Bananas'.

## Making Recomendations
### Filtering Rules based on lift and confidence

In [41]:
my_rules[(my_rules['lift'] >=2) & (my_rules['confidence'] >= 0.4)]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
37,(Moong Dal),(Other Dals),0.134852,0.232741,0.077262,0.572944,2.461723,0.045877,1.796625
47,(Toor Dal),(Other Dals),0.152259,0.232741,0.074401,0.488645,2.099522,0.038964,1.500443
135,"(Beans, Brinjals, Other Vegetables)",(Gourd & Cucumber),0.111601,0.300346,0.070824,0.634615,2.112949,0.037305,1.914843
136,"(Gourd & Cucumber, Brinjals)","(Beans, Other Vegetables)",0.141409,0.244664,0.070824,0.500843,2.047062,0.036226,1.513223
139,"(Brinjals, Other Vegetables)","(Gourd & Cucumber, Beans)",0.172529,0.183975,0.070824,0.410504,2.231303,0.039083,1.384277
140,"(Beans, Brinjals)","(Gourd & Cucumber, Other Vegetables)",0.162394,0.195183,0.070824,0.436123,2.234433,0.039127,1.427293
165,"(Gourd & Cucumber, Root Vegetables)","(Beans, Other Vegetables)",0.168356,0.244664,0.086086,0.511331,2.08993,0.044895,1.545701


# Result:
### From the above used rules created from transaction, we get these recommendations based on support , lift and confidence.