<center><h1> Market Basket Affinity Analysis 🛒🧺 </h1>
    <h2> Apriori Model </h2>
<img src="https://miro.medium.com/max/1200/1*pExRvwQ7fuCgKvuvqaMttQ.png" width="1000" >
</center>

<br><br>


<h3>Navigate to<h3>
    
* [Problem Description](#section-one)
* [Data Modelling and Visualization](#section-two)
* [Apriori Model](#section-three)
* [Association Mapping Visualization](#section-four)
    

<a id="section-one"></a>
<h2>Problem Description</h2>

<h3>Affinity analysis</h3>
<h4>
Affinity analysis is a data analysis and data mining technique that discovers co-occurrence relationships among activities performed by (or recorded about) specific individuals or groups. In general, this can be applied to any process where agents can be uniquely identified and information about their activities can be recorded. In retail, affinity analysis is used to perform market basket analysis, in which retailers seek to understand the purchase behavior of customers.
</h4>
<h3>Association Rule Mining</h3>
<h4>
Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.
</h4>
<h3>Apriori Algorithm</h3>
<h4>
Apriori is an algorithm for frequent itemset mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent itemsets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.
</h4>

> **Support**: This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.

> **Confidence**: This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears.

> **Lift**: This says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is.



In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# for market basket analysis
! pip install --index-url https://test.pypi.org/simple/ PyARMViz
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from mlxtend.preprocessing import TransactionEncoder
import squarify
import matplotlib
from matplotlib import style
import matplotlib.pyplot as plt
import seaborn as sns
from PyARMViz import PyARMViz
from PyARMViz.Rule import generate_rule_from_dict

sns.set()
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (18, 18)
style.use('ggplot')

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current sessionn

<h2>Loading Data</h2>

In [None]:
basket = pd.read_csv('../input/groceries-dataset/Groceries_dataset.csv')
basket

<a id="section-one"></a>
<h2>Data Modelling and Visualization</h2>

We can observe that the loaded dataset consists of 38765 single item descriptions. Goal is to arrange as a set of items purchased by a customer on a particular day. 

In [None]:
transactions = [a[1]['itemDescription'].tolist() for a in list(basket.groupby(['Member_number','Date']))]
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
transactions = pd.DataFrame(te_ary, columns=te.columns_)
pf = transactions.describe()
f = pf.iloc[0]-pf.iloc[3]
a = f.tolist()
b = list(f.index)
item = pd.DataFrame([[a[r],b[r]]for r in range(len(a))], columns=['Count','Item'])
item = item.sort_values(['Count'], ascending=False).head(50)
transactions

In [None]:
fig, ax = plt.subplots()
cmap = matplotlib.cm.coolwarm

mini = min(item["Count"])
maxi = max(item["Count"])

norm = matplotlib.colors.Normalize(vmin=mini, vmax=maxi)
colors = [cmap(norm(value)) for value in item["Count"]]

squarify.plot(sizes=item["Count"], label=item["Item"], alpha=0.8, color=colors)
plt.axis('off')
plt.title("Top 50 Frequent Basket Items", fontsize=32)
ttl = ax.title
ttl.set_position([.5, 1.05])

<a id='section-three'></a>
<h3>Apriori Model</h3>
Apriori is an algorithm for frequent item set mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.
<center><img src='https://uhlibraries.pressbooks.pub/app/uploads/sites/17/2019/11/Screen-Shot-2019-12-26-at-8.17.30-PM.png' width='600'></center>

<h3>Associations</h3>

In [None]:
frequent_itemsets = apriori(transactions, min_support=0.001, use_colnames=True, max_len=5)
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
frequent_itemsets

<h3>Above table gives all association rules for basket analysis</h3>

In [None]:
b = association_rules(frequent_itemsets, metric="lift", min_threshold=0.001)
b['uni'] = np.nan
b['ant'] = np.nan
b['con'] = np.nan
b['tot'] = 14963

In [None]:
transactions = [a[1]['itemDescription'].tolist() for a in list(basket.groupby(['Member_number','Date']))]

def trans():
    for t in transactions:
        yield t
    
def ant(x):
    cnt = 0
    for t in trans():
        t = set(t)
        if x.intersection(t) == x:
            cnt = cnt + 1 
    return cnt

bb = b.values.tolist()  

In [None]:
rules_dict = []
for bbb in bb:
    bbb[10] = ant(bbb[0])
    bbb[11] = ant(bbb[1])
    bbb[9] = ant(bbb[0].union(bbb[1]))
    diction = {
        'lhs': tuple(bbb[0]), 
        'rhs': tuple(bbb[1]),
        'count_full': bbb[9],
        'count_lhs': bbb[10],
        'count_rhs': bbb[11],
        'num_transactions': bbb[12]
    }
    rules_dict.append(diction)
    

<a id='section-four'></a>

<h2>Visualizations</h2>


<h4>We will try to visualize the same association rules in different plots</h4>

In [None]:
rules = []
for rd in rules_dict: 
    rules.append(generate_rule_from_dict(rd))

<h3>Parallel Axis Plot</h3>

In [None]:
PyARMViz.generate_parallel_category_plot(rules)

<h3>Network Plot</h3>

In [None]:
PyARMViz.generate_rule_graph_plotly(rules)

In [None]:
PyARMViz.generate_rule_strength_plot(rules)