## Association Rule Mining - Market Basket Analysis with Mlxtend
<b> In this article, we are creating one DataFrame and identify strong relations discovered in DataFrame using some measures such as confidence or lift by using apriori alorithm.
    
<b> Goal is to identify strong relations discovered in DataFrame using some measures such as confidence or lift and identifying the association item.

<b> Creating a list of items and storing in "data" object

In [1]:
data = [['milk', 'bread', 'rice', 'book'],
        ['bread', 'jam', 'book', 'pen'],
        ['jam', 'milk', 'bread', 'rice', 'eggs'],
        ['rice', 'eggs', 'pen', 'book'],
        ['eggs', 'pen', 'milk', 'bread', 'jam'],
        ['eggs', 'rice', 'bread', 'jam']]

<b> Now installing 'mlxtend' library 

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. MLxtend offers additional functionalities and can be a valuable addition to your data science toolbox.

In [9]:
# installing mlxtend library.
pip install mlxtend

Collecting mlxtend
  Downloading mlxtend-0.21.0-py2.py3-none-any.whl (1.3 MB)
     ---------------------------------------- 1.3/1.3 MB 1.2 MB/s eta 0:00:00
Installing collected packages: mlxtend
Successfully installed mlxtend-0.21.0
Note: you may need to restart the kernel to use updated packages.


<b> Converting the list of items into transaction data for frequent itemset mining. Encodes database transaction data in form of a Python list of lists into a NumPy array. 
    
<b> By using and TransactionEncoder object, we can transform this dataset into an array format suitable for typical machine learning APIs. Via the fit method, the TransactionEncoder learns the unique labels in the dataset, and via the transform method, it transforms the input dataset (a Python list of lists) into a one-hot encoded NumPy boolean array:

In [10]:
# import the TransactionEncoder class from mlxtend.preprocessing library.
from mlxtend.preprocessing import TransactionEncoder

# Initializing the TransactionEncoder class as "te"
te = TransactionEncoder()

# fit and transform the data and same result store in a "te_array" object.
te_array = te.fit_transform(data)

In [11]:
# print the "te_array" object.
te_array                        # It will returns the NumPy boolean array

array([[ True,  True, False, False,  True, False,  True],
       [ True,  True, False,  True, False,  True, False],
       [False,  True,  True,  True,  True, False,  True],
       [ True, False,  True, False, False,  True,  True],
       [False,  True,  True,  True,  True,  True, False],
       [False,  True,  True,  True, False, False,  True]])

<b> Now creating DataFrame with above array and storing as 'df' object.

In [14]:
# importing the pandas library as pd 
import pandas as pd

# Creating the DataFrame with array's 
# And pass the unique column names that correspond to the data array by using 'te.columns_'
df = pd.DataFrame(te_array, columns=te.columns_)

# print the DataFrame
df

Unnamed: 0,book,bread,eggs,jam,milk,pen,rice
0,True,True,False,False,True,False,True
1,True,True,False,True,False,True,False
2,False,True,True,True,True,False,True
3,True,False,True,False,False,True,True
4,False,True,True,True,True,True,False
5,False,True,True,True,False,False,True


## Modelling - Algorithm Implementation
<b> To train the model, we will use the apriori function that will be imported from the mlxtend.frequent_patterns package. This function will return the rules to train the model on the dataset. Consider the below code:

In [22]:
# importing the aprorio algorithm from mlxtend.frequent_patterns library.
from mlxtend.frequent_patterns import apriori

# Calling the apriori algorithm. 
itemset = apriori(df, min_support=0.6, use_colnames=True)

# print the result
itemset

Unnamed: 0,support,itemsets
0,0.833333,(bread)
1,0.666667,(eggs)
2,0.666667,(jam)
3,0.666667,(rice)
4,0.666667,"(bread, jam)"


<b> In the above code, the first line is to import the apriori function. In the second line, the apriori function returns the output as the rules. It takes the following parameters:
    
   - df : Pandas DataFrame the encoded format.
    
    
   - min_support (default: 0.5): A float between 0 and 1 for minumum support of the itemsets returned.
    
    
   - use_colnames (default: False): If `True`, uses the DataFrames' column names in the returned DataFrame
  instead of column indices.
    
    
   - min_confidence: To set the minimum confidence value. It can be changed as per the business problem.
    
    
   - min_lift= To set the minimum lift value.
    
    
   - min_length= It takes the minimum number of products for the association.
    
    
   - max_length = It takes the maximum number of products for the association.

It returns the pandas DataFrame with columns ['support', 'itemsets'] of all itemsets that are >= `min_support` and < than `max_len` (if `max_len` is not None).   

<b> Now, we will use the extracted frequent itemsets in rule creation. We can create our rules by defining metric and its threshold.
    
<b> we will use the association_rules function that will be imported from the mlxtend.frequent_patterns package. This function will return the rules to train the model on the dataset. Consider the below code:

In [23]:
# importing the association_rules algorithm from mlxtend.frequent_patterns library.
from mlxtend.frequent_patterns import association_rules

# Calling the association_rules. 
res = association_rules(itemset, metric='confidence', min_threshold=0.6)

# print the result
res

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(bread),(jam),0.833333,0.666667,0.666667,0.8,1.2,0.111111,1.666667
1,(jam),(bread),0.666667,0.833333,0.666667,1.0,1.2,0.111111,inf


In the above code, the first line is to import the association_rules function. In the second line, the association_rules function returns the output as the rules. It takes the following parameters:


   - df : Pandas DataFrame of frequent itemsets with columns ['support', 'itemsets']


   - metric : string (default: 'confidence'): Metric to evaluate if a rule is of interest. supported metrics are 'support', 'confidence', 'lift', 'leverage', and 'conviction'.


   - min_threshold : float (default: 0.8): Minimal threshold for the evaluation metric.


It returns the pandas DataFrame with columns "antecedents" and "consequents" that store itemsets, plus the scoring metric columns: 

   -  "antecedent support", "consequent support", "support", "confidence", "lift", "leverage", "conviction" of all rules for which metric(rule) >= min_threshold.

<b> From the above, we have got best association as bread and jam for support>0.6,  metric='confidence', and min_threshold=0.6 Like that we can decrease or increase the support and confidence based on the client requirement.