## Apriori Based Recommendation Engine

### Based on support ,confidence and lift

support refers to the popularity of item and can be calculated by finding the number of transactions containing a particular item divided by the total number of transactions

Confidence refers to the likelihood that an item B is also bought if item A is bought. It can be calculated by finding the number of transactions where A and B are bought together, divided by the total number of transactions where A is bought. Mathematically, it can be represented as:

Confidence(A → B) = (Transactions containing both (A and B))/(Transactions containing A)


Lift refers to the increase in the ratio of the sale of B when A is sold.
Lift(A –> B) can be calculated by dividing Confidence(A -> B) divided by Support(B).
Mathematically it can be represented as:
Lift(A→B) = (Confidence (A→B))/(Support (B)

Association rule by Lift
lift = 1 → There is no association between A and B.
lift < 1→ A and B are unlikely to be bought together.
lift > 1 → greater the lift greater is the likelihood of buying both products together.

In [None]:
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import association_rules,apriori

## Read Dataset

In [2]:
sales_reciepts = pd.read_csv('./datasets/201904 sales reciepts.csv')
sales_reciepts.head()

Unnamed: 0,transaction_id,transaction_date,transaction_time,sales_outlet_id,staff_id,customer_id,instore_yn,order,line_item_id,product_id,quantity,line_item_amount,unit_price,promo_item_yn
0,7,2019-04-01,12:04:43,3,12,558,N,1,1,52,1,2.5,2.5,N
1,11,2019-04-01,15:54:39,3,17,781,N,1,1,27,2,7.0,3.5,N
2,19,2019-04-01,14:34:59,3,17,788,Y,1,1,46,2,5.0,2.5,N
3,32,2019-04-01,16:06:04,3,12,683,N,1,1,23,2,5.0,2.5,N
4,33,2019-04-01,19:18:37,3,17,99,Y,1,1,34,1,2.45,2.45,N


In [3]:
product = pd.read_csv('./datasets/product.csv')
product.head()

Unnamed: 0,product_id,product_group,product_category,product_type,product,product_description,unit_of_measure,current_wholesale_price,current_retail_price,tax_exempt_yn,promo_yn,new_product_yn
0,1,Whole Bean/Teas,Coffee beans,Organic Beans,Brazilian - Organic,It's like Carnival in a cup. Clean and smooth.,12 oz,14.4,$18.00,Y,N,N
1,2,Whole Bean/Teas,Coffee beans,House blend Beans,Our Old Time Diner Blend,Out packed blend of beans that is reminiscent ...,12 oz,14.4,$18.00,Y,N,N
2,3,Whole Bean/Teas,Coffee beans,Espresso Beans,Espresso Roast,Our house blend for a good espresso shot.,1 lb,11.8,$14.75,Y,N,N
3,4,Whole Bean/Teas,Coffee beans,Espresso Beans,Primo Espresso Roast,Our primium single source of hand roasted beans.,1 lb,16.36,$20.45,Y,N,N
4,5,Whole Bean/Teas,Coffee beans,Gourmet Beans,Columbian Medium Roast,A smooth cup of coffee any time of day.,1 lb,12.0,$15.00,Y,N,N


## Data Wrangling

### Merge data

In [4]:
dataset = pd.merge(sales_reciepts[['transaction_id','transaction_date','sales_outlet_id','customer_id','product_id','quantity']],product[['product_id','product_category','product']],on='product_id',how='left')

dataset.head()

Unnamed: 0,transaction_id,transaction_date,sales_outlet_id,customer_id,product_id,quantity,product_category,product
0,7,2019-04-01,3,558,52,1,Tea,Traditional Blend Chai Rg
1,11,2019-04-01,3,781,27,2,Coffee,Brazilian Lg
2,19,2019-04-01,3,788,46,2,Tea,Serenity Green Tea Rg
3,32,2019-04-01,3,683,23,2,Coffee,Our Old Time Diner Blend Rg
4,33,2019-04-01,3,99,34,1,Coffee,Jamaican Coffee River Sm


In [6]:
dataset[dataset['product'].str.contains('Dark chocolate')]['product'].unique()

array(['Dark chocolate Lg', 'Dark chocolate Rg', 'Dark chocolate'],
      dtype=object)

In [7]:
dataset['product'].nunique()

80

In [8]:
dataset['product']= dataset['product'].str.replace(' Rg','')
dataset['product']= dataset['product'].str.replace(' Sm','')
dataset['product']= dataset['product'].str.replace(' Lg','')

In [9]:
dataset['product'].nunique()

45

### Choose Product Subset

In [11]:
products_to_take =['Cappuccino', 'Latte', 'Espresso shot',  \
                     'Dark chocolate','Sugar Free Vanilla syrup', 'Chocolate syrup',\
                    'Carmel syrup', 'Hazelnut syrup', 'Ginger Scone',  \
                    'Chocolate Croissant', 'Jumbo Savory Scone', 'Cranberry Scone', 'Hazelnut Biscotti',\
                    'Croissant', 'Almond Croissant', 'Oatmeal Scone', 'Chocolate Chip Biscotti',\
                    'Ginger Biscotti']

In [15]:
len(products_to_take)

18

In [13]:
dataset = dataset[dataset['product'].isin(products_to_take)]
dataset

Unnamed: 0,transaction_id,transaction_date,sales_outlet_id,customer_id,product_id,quantity,product_category,product
16,108,2019-04-01,3,65,40,1,Coffee,Cappuccino
17,112,2019-04-01,3,90,37,2,Coffee,Espresso shot
20,127,2019-04-01,3,116,41,2,Coffee,Cappuccino
21,134,2019-04-01,3,189,38,2,Coffee,Latte
22,135,2019-04-01,3,131,40,1,Coffee,Cappuccino
...,...,...,...,...,...,...,...,...
49883,742,2019-04-29,8,0,41,1,Coffee,Cappuccino
49884,742,2019-04-29,8,0,74,1,Bakery,Ginger Biscotti
49885,746,2019-04-29,8,0,37,2,Coffee,Espresso shot
49886,746,2019-04-29,8,0,71,1,Bakery,Chocolate Croissant


In [14]:
dataset['product'].nunique()

18

In [18]:
dataset[['product','product_category']].drop_duplicates().reset_index(drop=True)


Unnamed: 0,product,product_category
0,Cappuccino,Coffee
1,Espresso shot,Coffee
2,Latte,Coffee
3,Dark chocolate,Drinking Chocolate
4,Oatmeal Scone,Bakery
5,Jumbo Savory Scone,Bakery
6,Chocolate Chip Biscotti,Bakery
7,Ginger Biscotti,Bakery
8,Chocolate Croissant,Bakery
9,Hazelnut Biscotti,Bakery


### Clean Transactions

In [19]:
dataset['transaction'] = dataset['transaction_id'].astype(str) +"_"+  dataset['customer_id'].astype(str)

In [22]:
num_of_items_for_each_transaction = dataset['transaction'].value_counts().reset_index()
num_of_items_for_each_transaction

Unnamed: 0,transaction,count
0,209_0,31
1,206_0,30
2,204_0,27
3,208_0,25
4,203_0,24
...,...,...
8381,135_523,1
8382,130_157,1
8383,121_465,1
8384,118_748,1


In [30]:
valid_transactions = num_of_items_for_each_transaction[(num_of_items_for_each_transaction['count']>1)]['transaction']
valid_transactions

0          209_0
1          206_0
2          204_0
3          208_0
4          203_0
          ...   
2641      2403_0
2642     398_204
2643    208_8327
2644      1373_0
2645    361_5672
Name: transaction, Length: 2646, dtype: object

In [32]:
dataset = dataset[dataset['transaction'].isin(valid_transactions)]
dataset

Unnamed: 0,transaction_id,transaction_date,sales_outlet_id,customer_id,product_id,quantity,product_category,product,transaction
34,199,2019-04-01,3,112,41,2,Coffee,Cappuccino,199_112
35,199,2019-04-01,3,112,79,1,Bakery,Jumbo Savory Scone,199_112
54,296,2019-04-01,3,328,39,1,Coffee,Latte,296_328
55,296,2019-04-01,3,328,79,1,Bakery,Jumbo Savory Scone,296_328
64,357,2019-04-01,3,530,41,2,Coffee,Cappuccino,357_530
...,...,...,...,...,...,...,...,...,...
49880,736,2019-04-29,8,0,38,1,Coffee,Latte,736_0
49883,742,2019-04-29,8,0,41,1,Coffee,Cappuccino,742_0
49884,742,2019-04-29,8,0,74,1,Bakery,Ginger Biscotti,742_0
49885,746,2019-04-29,8,0,37,2,Coffee,Espresso shot,746_0


In [33]:
dataset.shape

(10189, 9)

### Product Trend

In [35]:
dataset['product_category'].value_counts()

product_category
Bakery                3800
Coffee                3174
Flavours              2246
Drinking Chocolate     947
Packaged Chocolate      22
Name: count, dtype: int64

In [36]:
dataset['product'].value_counts()

product
Cappuccino                  1290
Latte                       1256
Dark chocolate               969
Chocolate Croissant          636
Espresso shot                628
Sugar Free Vanilla syrup     605
Chocolate syrup              568
Carmel syrup                 561
Hazelnut syrup               512
Ginger Scone                 417
Jumbo Savory Scone           357
Croissant                    355
Chocolate Chip Biscotti      352
Cranberry Scone              350
Almond Croissant             347
Hazelnut Biscotti            338
Oatmeal Scone                334
Ginger Biscotti              314
Name: count, dtype: int64

#### Cappuccino is the highest selling item

### Popularity Recommnedation Engine

#### For qustions like what is the most selling item or what do you suggest me to buy , or what is your best item to sell?


In [None]:
popularity_recommendation = dataset.groupby(['product','product_category']).count().reset_index()
popularity_recommendation 

Unnamed: 0,product,product_category,transaction_id,transaction_date,sales_outlet_id,customer_id,product_id,quantity,transaction
0,Almond Croissant,Bakery,347,347,347,347,347,347,347
1,Cappuccino,Coffee,1290,1290,1290,1290,1290,1290,1290
2,Carmel syrup,Flavours,561,561,561,561,561,561,561
3,Chocolate Chip Biscotti,Bakery,352,352,352,352,352,352,352
4,Chocolate Croissant,Bakery,636,636,636,636,636,636,636
5,Chocolate syrup,Flavours,568,568,568,568,568,568,568
6,Cranberry Scone,Bakery,350,350,350,350,350,350,350
7,Croissant,Bakery,355,355,355,355,355,355,355
8,Dark chocolate,Drinking Chocolate,947,947,947,947,947,947,947
9,Dark chocolate,Packaged Chocolate,22,22,22,22,22,22,22


In [42]:
popularity_recommendation = popularity_recommendation[['product','product_category','transaction_id']]
popularity_recommendation = popularity_recommendation.rename(columns={'transaction_id':"no_of_transactions"})

In [43]:
popularity_recommendation

Unnamed: 0,product,product_category,no_of_transactions
0,Almond Croissant,Bakery,347
1,Cappuccino,Coffee,1290
2,Carmel syrup,Flavours,561
3,Chocolate Chip Biscotti,Bakery,352
4,Chocolate Croissant,Bakery,636
5,Chocolate syrup,Flavours,568
6,Cranberry Scone,Bakery,350
7,Croissant,Bakery,355
8,Dark chocolate,Drinking Chocolate,947
9,Dark chocolate,Packaged Chocolate,22


In [45]:
popularity_recommendation.to_csv('api/recommendation_objects/popularity_recommendation.csv')

### Apriori Recommendation Engine

In [46]:
train_basket = (dataset.groupby(['transaction','product'])['product'].count().reset_index(name='count'))

In [47]:
train_basket

Unnamed: 0,transaction,product,count
0,1000_0,Dark chocolate,1
1,1000_0,Oatmeal Scone,1
2,1001_8306,Cappuccino,1
3,1001_8306,Carmel syrup,1
4,1002_0,Carmel syrup,1
...,...,...,...
8391,9_0,Croissant,1
8392,9_0,Dark chocolate,3
8393,9_0,Ginger Scone,2
8394,9_0,Latte,3


In [50]:
my_basket = train_basket.pivot_table(index='transaction',columns='product',values='count',aggfunc='sum').fillna(0)
my_basket

product,Almond Croissant,Cappuccino,Carmel syrup,Chocolate Chip Biscotti,Chocolate Croissant,Chocolate syrup,Cranberry Scone,Croissant,Dark chocolate,Espresso shot,Ginger Biscotti,Ginger Scone,Hazelnut Biscotti,Hazelnut syrup,Jumbo Savory Scone,Latte,Oatmeal Scone,Sugar Free Vanilla syrup
transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
1000_0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
1001_8306,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1002_0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
1004_5383,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
1005_0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
998_5530,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
998_5793,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0
998_601,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
99_0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0


In [51]:
def encode_units(x):
  if x <= 0:
    return 0
  if x >= 0:
    return 1

my_basket_sets = my_basket.applymap(encode_units)
my_basket_sets.head(10)

  my_basket_sets = my_basket.applymap(encode_units)


product,Almond Croissant,Cappuccino,Carmel syrup,Chocolate Chip Biscotti,Chocolate Croissant,Chocolate syrup,Cranberry Scone,Croissant,Dark chocolate,Espresso shot,Ginger Biscotti,Ginger Scone,Hazelnut Biscotti,Hazelnut syrup,Jumbo Savory Scone,Latte,Oatmeal Scone,Sugar Free Vanilla syrup
transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
1000_0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0
1001_8306,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1002_0,0,0,1,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0
1004_5383,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0
1005_0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0
1005_5559,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1
1006_0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
1007_8375,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0
1008_0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0
1009_5183,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1


In [52]:

frequent_items = apriori(my_basket_sets, min_support = 0.05,use_colnames = True)
frequent_items.head()

Unnamed: 0,support,itemsets
0,0.115646,(Almond Croissant)
1,0.388889,(Cappuccino)
2,0.191232,(Carmel syrup)
3,0.112623,(Chocolate Chip Biscotti)
4,0.135676,(Chocolate Croissant)


In [53]:
rules_basket = association_rules(frequent_items, metric = "lift", min_threshold = 1)
rules_basket.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Cappuccino),(Almond Croissant),0.388889,0.115646,0.053288,0.137026,1.184874,0.008314,1.024775,0.255319
1,(Almond Croissant),(Cappuccino),0.115646,0.388889,0.053288,0.460784,1.184874,0.008314,1.133333,0.176432
2,(Dark chocolate),(Almond Croissant),0.277022,0.115646,0.057445,0.207367,1.793115,0.025409,1.115717,0.611791
3,(Almond Croissant),(Dark chocolate),0.115646,0.277022,0.057445,0.496732,1.793115,0.025409,1.436567,0.500152
4,(Almond Croissant),(Latte),0.115646,0.382086,0.054422,0.470588,1.231629,0.010235,1.167171,0.21266


In [54]:

# Save Rules Basket
rules_basket.to_pickle('rules_basket.pkl')

In [64]:
rules_basket[rules_basket['antecedents']=={'Latte'}].sort_values('confidence',ascending=False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
71,(Latte),(Sugar Free Vanilla syrup),0.382086,0.200302,0.108844,0.284866,1.422182,0.032311,1.118249,0.480415
33,(Latte),(Carmel syrup),0.382086,0.191232,0.10771,0.281899,1.474121,0.034643,1.12626,0.520509
45,(Latte),(Chocolate syrup),0.382086,0.188964,0.103175,0.27003,1.428997,0.030974,1.111053,0.485842
69,(Latte),(Hazelnut syrup),0.382086,0.17158,0.101285,0.265084,1.544961,0.035727,1.127231,0.570848
53,(Latte),(Croissant),0.382086,0.114135,0.057067,0.149357,1.308605,0.013458,1.041407,0.381651
39,(Latte),(Chocolate Croissant),0.382086,0.135676,0.055178,0.144411,1.064381,0.003338,1.010209,0.097889
67,(Latte),(Ginger Scone),0.382086,0.133409,0.055178,0.144411,1.082472,0.004204,1.01286,0.1233
5,(Latte),(Almond Croissant),0.382086,0.115646,0.054422,0.142433,1.231629,0.010235,1.031236,0.304358
65,(Latte),(Ginger Biscotti),0.382086,0.106198,0.054044,0.141444,1.33189,0.013467,1.041053,0.403272
49,(Latte),(Cranberry Scone),0.382086,0.116024,0.051398,0.13452,1.159416,0.007067,1.021371,0.222518


Here I can say that based on the condition lift ,hazel nut syrup was bought most buy the customer topk can be picked from this.


### Save in JSON Format

In [59]:
product_categories = dataset[['product','product_category']].drop_duplicates().set_index('product').to_dict()['product_category']

In [60]:
product_categories

{'Cappuccino': 'Coffee',
 'Jumbo Savory Scone': 'Bakery',
 'Latte': 'Coffee',
 'Chocolate Chip Biscotti': 'Bakery',
 'Espresso shot': 'Coffee',
 'Hazelnut Biscotti': 'Bakery',
 'Chocolate Croissant': 'Bakery',
 'Dark chocolate': 'Packaged Chocolate',
 'Cranberry Scone': 'Bakery',
 'Croissant': 'Bakery',
 'Almond Croissant': 'Bakery',
 'Ginger Biscotti': 'Bakery',
 'Oatmeal Scone': 'Bakery',
 'Ginger Scone': 'Bakery',
 'Chocolate syrup': 'Flavours',
 'Hazelnut syrup': 'Flavours',
 'Carmel syrup': 'Flavours',
 'Sugar Free Vanilla syrup': 'Flavours'}

In [61]:
recommendations_json = {}

antecedents = rules_basket['antecedents'].unique()
for antecedent in antecedents:
    df_rec = rules_basket[rules_basket['antecedents']==antecedent]
    df_rec = df_rec.sort_values('confidence',ascending=False)
    key = "_".join(antecedent)
    recommendations_json[key] = []
    for _, row in df_rec.iterrows():
        rec_objects =row['consequents']
        for rec_object in rec_objects:
            already_exists = False
            for current_rec_object in recommendations_json[key]:
                if rec_object == current_rec_object['product']:
                    already_exists=True
            if already_exists:
                continue
            
            rec = {'product':rec_object, 
                   'product_category':product_categories[rec_object],
                   'confidence': row['confidence']
                  }
            recommendations_json[key].append(rec)

In [None]:
# This is my 4 top recommendations with latte
recommendations_json['Latte'][:4]

[{'product': 'Sugar Free Vanilla syrup',
  'product_category': 'Flavours',
  'confidence': 0.28486646884273},
 {'product': 'Carmel syrup',
  'product_category': 'Flavours',
  'confidence': 0.2818991097922849},
 {'product': 'Chocolate syrup',
  'product_category': 'Flavours',
  'confidence': 0.27002967359050445},
 {'product': 'Hazelnut syrup',
  'product_category': 'Flavours',
  'confidence': 0.26508407517309596}]

In [66]:
import json
with open('api/recommendation_objects/apriori_recommendations.json', 'w') as json_file:
    json.dump(recommendations_json, json_file)