### Summary:
* [Problem definition](#problem_definition)
* [Data Analysis](#data_analysis)
* [Market Basket Analysis](#market_basket_analysis)
    * [Item Recommendation](#item_recommendation)
* [Conclusions](#conclusions)
* [References](#references)

In [2]:
# !wget https://raw.githubusercontent.com/dasarpai/DAI-Datasets/main/19560-indian-takeaway-orders.zip

In [3]:
# !rm -rf /content/testdata

In [4]:
# !mkdir testdata
# !mv 19560-indian-takeaway-orders.zip testdata
# !unzip testdata/19560-indian-takeaway-orders.zip -d testdata/

In [5]:
# !pip install unidecode

In [6]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules
import unidecode
import plotly.express as px

<a id='problem_definition'></a>
### Problem Defination:

This dataset consisted of 33k orders from two Indian takeaway restaurants in London, UK. The purpose of this notebook is to increase cross-selling when the customer performs by applying makert basket analysis (association rules) where the food could be or recommend when the takeaway is performed:

* Both X and Y could be placed on the same shelf, so that buyers of one item would be prompted to buy the other;
* Promotional discounts could be applied to only one of the two items;
* Advertisements on X could be targeted to shoppers buying Y;
* X and Y could be combined into a new product.

<a id='data_analysis'></a>
### Data Analysis:

Since we have two data sets, we need to merge the information into one for easier analysis. To differentiate it between restaurant orders, first and second, a specific column was created for this purpose and remove the total_products column because it is a sum of the quantity of items in each order.

In [7]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning, module="ipykernel.ipkernel")

# above code is suppress this warning: /usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning:
# `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above

# Suppress the specific FutureWarning
warnings.filterwarnings("ignore", category=FutureWarning, message="Allowing arbitrary scalar fill_value in SparseDtype is deprecated")
# to suppress the warning: <ipython-input-108-2f1a0893fb26>:1: FutureWarning: Allowing arbitrary scalar fill_value in SparseDtype is deprecated. In a future version, the fill_value must be a valid value for the SparseDtype.subtype.
#  sparse_df_items = pd.DataFrame.sparse.from_spmatrix(oht_orders, columns=te.columns_)

  and should_run_async(code)


In [8]:
path = r'/content/testdata/19560-indian-takeaway-orders/'
orders_first_restaurant = pd.read_csv(path + 'restaurant-1-orders.csv')
orders_first_restaurant['restaurant'] = '1 - restaurant'

orders_second_restaurant = pd.read_csv(path + 'restaurant-2-orders.csv')
orders_second_restaurant['restaurant'] = '2 - restaurant'
orders_second_restaurant.rename(columns={'Order ID':'Order Number'},inplace=True)

orders = pd.concat([orders_first_restaurant,orders_second_restaurant])
orders.drop('Total products',axis=1,inplace=True)

To facilitate the analysis, we formatted the columns in lowercase and without spaces.

In [9]:
orders

Unnamed: 0,Order Number,Order Date,Item Name,Quantity,Product Price,restaurant
0,16118,03/08/2019 20:25,Plain Papadum,2,0.80,1 - restaurant
1,16118,03/08/2019 20:25,King Prawn Balti,1,12.95,1 - restaurant
2,16118,03/08/2019 20:25,Garlic Naan,1,2.95,1 - restaurant
3,16118,03/08/2019 20:25,Mushroom Rice,1,3.95,1 - restaurant
4,16118,03/08/2019 20:25,Paneer Tikka Masala,1,8.95,1 - restaurant
...,...,...,...,...,...,...
119178,8144,13/02/2017 12:59,House Red wine 75cl,1,17.95,2 - restaurant
119179,7463,03/01/2017 19:13,House white wine 75cl,1,17.95,2 - restaurant
119180,6719,24/11/2016 18:35,House Red wine 75cl,1,17.95,2 - restaurant
119181,5251,21/08/2016 17:55,House white wine 75cl,1,17.95,2 - restaurant


In [10]:
orders.info()

<class 'pandas.core.frame.DataFrame'>
Index: 194001 entries, 0 to 119182
Data columns (total 6 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Order Number   194001 non-null  int64  
 1   Order Date     194001 non-null  object 
 2   Item Name      194001 non-null  object 
 3   Quantity       194001 non-null  int64  
 4   Product Price  194001 non-null  float64
 5   restaurant     194001 non-null  object 
dtypes: float64(1), int64(2), object(3)
memory usage: 10.4+ MB


In [11]:
orders.sample(5)

Unnamed: 0,Order Number,Order Date,Item Name,Quantity,Product Price,restaurant
65510,2236,02/04/2016 20:40,Paneer Tikka Masala,1,8.95,1 - restaurant
73566,13287,07/02/2019 16:21,Korma - King Prawn,1,12.95,1 - restaurant
44256,12656,24/12/2018 19:05,Bombay Aloo,1,5.95,1 - restaurant
2696,21170,05/01/2019 19:22,Red Sauce,1,0.5,2 - restaurant
49012,5171,21/04/2017 18:52,Bombay Aloo,1,5.95,1 - restaurant


In [12]:
orders.select_dtypes(include='number').describe()

Unnamed: 0,Order Number,Quantity,Product Price
count,194001.0,194001.0,194001.0
mean,12844.91633,1.246823,5.176943
std,6151.109227,0.740706,3.259806
min,630.0,1.0,0.5
25%,7826.0,1.0,2.95
50%,12268.0,1.0,3.95
75%,17278.0,1.0,8.95
max,25583.0,51.0,17.95


In [13]:
# Use describe() and calculate the sum
desc = orders.select_dtypes(include='number').describe()
sum_row = pd.DataFrame(orders.select_dtypes(include='number').sum()).T  # Calculate the sum and transpose to match describe() format
sum_row.index = ['sum']  # Rename the index to "sum"

# Append the sum row to the describe output
desc_with_sum = pd.concat([desc, sum_row])

print(desc_with_sum)

       Order Number       Quantity  Product Price
count  1.940010e+05  194001.000000   1.940010e+05
mean   1.284492e+04       1.246823   5.176943e+00
std    6.151109e+03       0.740706   3.259806e+00
min    6.300000e+02       1.000000   5.000000e-01
25%    7.826000e+03       1.000000   2.950000e+00
50%    1.226800e+04       1.000000   3.950000e+00
75%    1.727800e+04       1.000000   8.950000e+00
max    2.558300e+04      51.000000   1.795000e+01
sum    2.491927e+09  241885.000000   1.004332e+06


In [14]:
def format_columns(column):
    new_column = ' '.join(column.split())
    new_column = new_column.replace(' ','_')
    return new_column.lower()

orders.columns = [format_columns(c) for c in orders.columns]

In [15]:
orders

Unnamed: 0,order_number,order_date,item_name,quantity,product_price,restaurant
0,16118,03/08/2019 20:25,Plain Papadum,2,0.80,1 - restaurant
1,16118,03/08/2019 20:25,King Prawn Balti,1,12.95,1 - restaurant
2,16118,03/08/2019 20:25,Garlic Naan,1,2.95,1 - restaurant
3,16118,03/08/2019 20:25,Mushroom Rice,1,3.95,1 - restaurant
4,16118,03/08/2019 20:25,Paneer Tikka Masala,1,8.95,1 - restaurant
...,...,...,...,...,...,...
119178,8144,13/02/2017 12:59,House Red wine 75cl,1,17.95,2 - restaurant
119179,7463,03/01/2017 19:13,House white wine 75cl,1,17.95,2 - restaurant
119180,6719,24/11/2016 18:35,House Red wine 75cl,1,17.95,2 - restaurant
119181,5251,21/08/2016 17:55,House white wine 75cl,1,17.95,2 - restaurant


How we can see, there is no null values in orders

In [16]:
orders.isna().mean()

Unnamed: 0,0
order_number,0.0
order_date,0.0
item_name,0.0
quantity,0.0
product_price,0.0
restaurant,0.0


In [17]:
def stats_summary(data):

    sales = data['product_price'].sum()
    quantity = data['quantity'].sum()
    orders_count = len(data['order_number'].unique())

    avg_price_order = sales / orders_count
    avg_price_food = sales / quantity
    count_food_orders = quantity / orders_count

    return {'avg_price_order': avg_price_order,
            'avg_price_food': avg_price_food,
            'count_food_orders': count_food_orders
           }

def print_summary(data):

    stats = stats_summary(data)

    for key, value in stats.items():
        print(f'{key}: {value:.3}')

As we can see, the difference of the average order price between two restaurants is \\$ 1.44 and the food price of restaurant 1 is 0.16 cents more than the second restaurant, but the quantity of the food per order is higher in the second restaurant, which increases the average order price. The total shows that the average price is \\$ 43.6 and 10 food per order with \\$ 4.15 per food the average.

In [18]:
print('\n1 - restaurant\n')
print_summary(orders[orders['restaurant']=='1 - restaurant'])
print('\n2 - restaurant\n')
print_summary(orders[orders['restaurant']=='2 - restaurant'])

print('\nTotal\n')
print_summary(orders)


1 - restaurant

avg_price_order: 29.5
avg_price_food: 4.25
count_food_orders: 6.94

2 - restaurant

avg_price_order: 31.0
avg_price_food: 4.09
count_food_orders: 7.57

Total

avg_price_order: 43.6
avg_price_food: 4.15
count_food_orders: 10.5


As we can see, the average frequency item is 509, that is, an item appears on average 509 times. When, we look at the median, we see that it is the outliers that pull the average up. The maximum value is 13093, which represents 56.8% of the orders with this item which is Pilau Rice. The frequency percentage of the item is 2.22% (mean / total unit orders) and the median percentage is 0.49%.

In [19]:
orders.groupby('item_name').size().reset_index(name='quantity')

Unnamed: 0,item_name,quantity
0,Achar Chicken,127
1,Achar Lamb,77
2,Aloo Brinjal,146
3,Aloo Chaat,511
4,Aloo Dupiaza,259
...,...,...
376,Vindaloo - chicken-tikka,6
377,Vindaloo - king-prawn,1
378,Vindaloo - lamb,12
379,Vindaloo - prawn,2


In [20]:
items_frequence = orders.groupby('item_name').size().reset_index(name='numo') #numo - number of orders
items_frequence

Unnamed: 0,item_name,numo
0,Achar Chicken,127
1,Achar Lamb,77
2,Aloo Brinjal,146
3,Aloo Chaat,511
4,Aloo Dupiaza,259
...,...,...
376,Vindaloo - chicken-tikka,6
377,Vindaloo - king-prawn,1
378,Vindaloo - lamb,12
379,Vindaloo - prawn,2


In [21]:
print(items_frequence.describe())
print(f'\ntotal unique orders: {len(orders["order_number"].unique())}')

               numo
count    381.000000
mean     509.188976
std     1261.237290
min        1.000000
25%       26.000000
50%      113.000000
75%      408.000000
max    13093.000000

total unique orders: 23041


We need to remove the extra spaces, symbols or accents if any, because maybe there are duplicate items but the same text with another way of writing.

In [22]:
def text_normalization(text):
    new_text = ' '.join(text.split())
    return unidecode.unidecode(new_text.lower())

orders['item_name'] = orders['item_name'].apply(text_normalization)

In [23]:
orders['item_name']

Unnamed: 0,item_name
0,plain papadum
1,king prawn balti
2,garlic naan
3,mushroom rice
4,paneer tikka masala
...,...
119178,house red wine 75cl
119179,house white wine 75cl
119180,house red wine 75cl
119181,house white wine 75cl


<a id='market_basket_analysis'></a>
### Market Basket Analysis (Association Rules)

Through the frequency of the items in the orders, we try to understand if there is an association between one product and another in the order with the objective of recommending this product in an extra purchase.

### Example
We have a Market Basket transactions, where diapers and beer are a set of frequent items and with that, we have those who buy diapers have a probability of buying beer.

![example_mba_kaggle.png](attachment:7f09ccef-4191-4585-bc60-c1db7a9d03d5.png)

We need to transform the dataset into a list of items in order and convert it into a matrix (data set) where the columns are items and the rows are the order. If that item belongs to that order the value 1 is assigned, otherwise 0.

In [24]:
#convert the dataframe to list items in order
item_list = orders.groupby('order_number')['item_name'].unique()

# transform the values of the data set to 1 if that item belongs to that order, otherwise 0
te = TransactionEncoder()
oht_orders = te.fit(item_list).transform(item_list, sparse=True)

To save memory, we represent the transaction data in sparse format.  Because, we have 316 items and 23041 orders.

In [25]:
item_list

Unnamed: 0_level_0,item_name
order_number,Unnamed: 1_level_1
630,[onion bhaji]
647,[onion bhaji]
648,[onion bhaji]
651,[onion bhaji]
764,[onion bhaji]
...,...
25579,"[madras - chicken tikka, mini bhaji, plain ric..."
25580,"[tandoori fish karahi, aloo gobi, plain rice, ..."
25581,"[saag paneer, chapati, onion bhaji, pilau rice..."
25582,"[tandoori king prawn masala, pilau rice, peshw..."


In [26]:
item_list.loc[25579]

array(['madras - chicken tikka', 'mini bhaji', 'plain rice', 'naan'],
      dtype=object)

In [27]:
oht_orders.data.shape

(188095,)

In [28]:
sparse_df_items = pd.DataFrame.sparse.from_spmatrix(oht_orders, columns=te.columns_)

As a threshold for the minimum frequency of a set of items(the support metric), we used the percentage of the average/unique order frequency, which is 2.22% and max len of set of items equals 10.

In [29]:
sparse_df_items

Unnamed: 0,achar chicken,achar lamb,aloo brinjal,aloo chaat,aloo dupiaza,aloo gobi,aloo methi,aloo mithy,aloo peas,baingan hari mirch,...,vegetable samosa,vindaloo,vindaloo - chicken,vindaloo - chicken tikka,vindaloo - chicken-tikka,vindaloo - king prawn,vindaloo - king-prawn,vindaloo - lamb,vindaloo - prawn,vindaloo sauce
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23036,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
23037,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
23038,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
23039,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


These 2 restaurants are serving 346 menu items.

In [37]:
sparse_df_items.astype(int).sum(axis=1).sort_values(ascending=False)

Unnamed: 0,0
7114,65
7116,63
7113,51
7077,46
1730,35
...,...
16473,1
16474,1
16475,1
16476,1


There are order which has ordered for 65, 63 or even 51 items. We will not use orders for our recommendation system.

In [30]:
sparse_df_items.astype(int).sum(axis=0).sort_values(ascending=False)/23041

Unnamed: 0,0
pilau rice,0.501714
plain papadum,0.397813
naan,0.289875
garlic naan,0.266655
bombay aloo,0.243522
...,...
saag - chicken-tikka,0.000043
saag - king-prawn,0.000043
chicken chaat main,0.000043
bombay - prawn,0.000043


Calculating Support or each item: support for any item means, in how many orders that item appeared (in percentage term). As we can see Pilau Rice appeared in almost 50% of orders.

Keep in mind, itemset mean combination of items order together. Therefore their support will be calculated differently, i.e. in how many orders each itemset is repeated. That becomes support of that itemset.

In [40]:
# sparse_df_items.astype(int).sum(axis=1).sort_values(ascending=False)/346

In [41]:
# frequent_itemsets1 = apriori(sparse_df_items, min_support=0.0001,max_len=11, use_colnames=True, verbose=1)

# When I am calling apriori two times then apriori object is unable in the memory and kernel is crashing. Provider may fix this in future.

In [34]:
frequent_itemsets = apriori(sparse_df_items, min_support=0.02209, max_len=11, use_colnames=True, verbose=1)

Processing 600 combinations | Sampling itemset size 5


We create a copy of frequent item sets to create a custom output and analyze.

In [38]:
frequent_itemsets['itemsets']

Unnamed: 0,itemsets
0,(aloo gobi)
1,(bhindi bhajee)
2,(bhuna)
3,(bombay aloo)
4,(butter chicken)
...,...
561,"(pilau rice, plain papadum, red sauce, mint sa..."
562,"(pilau rice, naan, onion bhaji, plain papadum)"
563,"(pilau rice, naan, onion chutney, plain papadum)"
564,"(pilau rice, naan, plain rice, plain papadum)"


In [39]:
frequent_itemsets['support']

Unnamed: 0,support
0,0.060761
1,0.025476
2,0.046960
3,0.243522
4,0.105768
...,...
561,0.024825
562,0.022872
563,0.022872
564,0.023957


In [35]:
frequent_itemsets_plot = frequent_itemsets.copy()
frequent_itemsets_plot['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
frequent_itemsets_plot['support'] = (frequent_itemsets_plot['support'] * 100).round(2)
frequent_itemsets_plot["itemsets"] = frequent_itemsets_plot["itemsets"].apply(lambda x: ', '.join(list(x))).astype("str")

We get 556 frequency item sets with the filter we apply. We can see, that most of the item set sizes are 2 and the maximum value is 5. But it is not necessary that this itemset has a high frequency in orders. We have a high std for length equal to 1 indicating item variability and a low variability for length equal to 2 or 3 representing 4.64% and 3.48% respectively on average in the orders.

In [36]:
frequent_itemsets_plot.groupby('length')['support'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
length,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,73.0,8.864384,8.976832,2.22,3.28,5.06,10.58,50.17
2,252.0,4.637302,3.121391,2.21,2.7375,3.51,5.215,24.15
3,197.0,3.484112,1.451654,2.21,2.54,2.99,3.92,12.07
4,43.0,2.90814,0.703607,2.27,2.43,2.68,3.1,5.01
5,1.0,2.28,,2.28,2.28,2.28,2.28,2.28


As we can see, 50% of orders have rice pilau and 39.78% have simple papadum and other information is 24.15% of orders have simple papadum and rice pilau in the same order and 19.2% have simple papadum, mango chutney. In addition, 2.28% of orders have simple papadum, onion chutney, mango chutney, mint sauce, rice pilau as items.

In [72]:
frequent_itemsets_plot.sort_values('support',ascending=True).iloc[20:35]

Unnamed: 0,support,itemsets,length
386,2.26,"chicken tikka masala, plain papadum, onion bhaji",3
441,2.26,"pilau rice, madras, plain naan",3
464,2.27,"onion chutney, red sauce, mango chutney",3
539,2.27,"pilau rice, garlic naan, naan, plain papadum",4
376,2.27,"chicken tikka masala, plain rice, mango chutney",3
190,2.27,"onion bhajee, keema naan",2
203,2.27,"onion bhajee, korma",2
446,2.28,"mint sauce, peshwari naan, mango chutney",3
107,2.28,"plain naan, butter chicken",2
396,2.28,"plain papadum, chicken tikka masala, red sauce",3


In [74]:
top_20_frequence = frequent_itemsets_plot.sort_values('support',ascending=False).head(40).sort_values('support')
fig = px.bar(top_20_frequence, y="support", x="itemsets", orientation='v', text='support')
fig.update_traces(textposition="outside")
fig.update_layout(height=300)
fig.show()

We filter a lift equal to 1 to get only rules that have a probability of buying the antecedents and consequents in the same order. As we can see, we have a high probability of recommendations if the customer buys an item. For example, we have 174 possibilities of recommendations if the customer buys a plain papadum.

In [43]:
market_basket_rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
market_basket_rules.groupby('antecedents').size().sort_values(ascending=False)

Unnamed: 0_level_0,0
antecedents,Unnamed: 1_level_1
(plain papadum),174
(pilau rice),166
(mango chutney),105
(naan),71
(mint sauce),70
...,...
"(pilau rice, garlic naan, mango chutney)",1
"(garlic naan, mango chutney, mint sauce)",1
"(garlic naan, naan, plain papadum)",1
"(plain naan, chicken tikka masala, plain papadum)",1


<a id='item_recommendation'></a>
### Item Recommendation

To filter the best recommendations we will use the highest confidence value for each antecedent. We ran with the top 20 most frequent items and got some recommendations:

* 4.44% (confidence) of those who buys pilau rice, buys paratha as well;
* If one buys garlic naan, it's likely that one has also bought plain papadum with saag aloo. This way, it is possible to create a bundle with these items and apply a discount;
* If one buys pilau rice with chicken tikka masala it's likely that one has also bought madras. This way, it is possible apply discount in the madras.

In [75]:
# best_item_recommendations = market_basket_rules.sort_values(['confidence','lift'],ascending='False').drop_duplicates(subset=['antecedents'])
best_item_recommendations = market_basket_rules.sort_values(['confidence','lift'],ascending=False).drop_duplicates(subset=['antecedents'])
top_20_frequence_items = frequent_itemsets.sort_values('support',ascending=False).head(20)['itemsets']
best_item_recommendations[best_item_recommendations['antecedents'].isin(top_20_frequence_items)]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
303,(mango chutney),(plain papadum),0.227811,0.397813,0.192006,0.842827,2.118654,0.101379,3.831372,0.683774
341,(mint sauce),(plain papadum),0.166225,0.397813,0.133414,0.802611,2.01756,0.067288,3.050764,0.604902
135,(chicken tikka masala),(pilau rice),0.231197,0.501714,0.158413,0.685189,1.365695,0.042419,1.582807,0.348297
419,(plain naan),(pilau rice),0.16284,0.501714,0.103728,0.636994,1.269634,0.022029,1.372664,0.253681
1284,"(plain papadum, mango chutney)",(pilau rice),0.192006,0.501714,0.120698,0.628617,1.252937,0.024366,1.341702,0.249848
39,(bombay aloo),(pilau rice),0.243522,0.501714,0.151165,0.620745,1.237248,0.028987,1.313854,0.253483
409,(peshwari naan),(pilau rice),0.155766,0.501714,0.09648,0.619393,1.234552,0.01833,1.309185,0.225044
369,(naan),(pilau rice),0.289875,0.501714,0.17764,0.612816,1.221445,0.032206,1.286949,0.255303
215,(keema naan),(pilau rice),0.16041,0.501714,0.097956,0.61066,1.217147,0.017476,1.279822,0.212493
421,(plain papadum),(pilau rice),0.397813,0.501714,0.241526,0.607135,1.210121,0.041938,1.268338,0.288343


<a id='conclusions'></a>
### Conclusions:

* The quantity of the food per order is higher in the second restaurant, which increases the average order price with difference \\$ 1.44;
* The average price is \\$ 43.6 and 10 food per order with \\$ 4.15 per food the average;
* The average frequency item is 509, that is, an item appears on average 509 times which represets 2.22%;
* 50,17% of orders has pilau rice;
* 24.15% of orders have simple papadum and rice pilau in the same order 19.2% have simple papadum, mango chutney;
* 2.28% of orders have simple papadum, onion chutney, mango chutney, mint sauce, rice pilau as items;
* 4.44% (confidence) of those who buys pilau rice, buys paratha as well;
* If one buys plain papadum with pilau rice it's likely that one has also bought garlic naan with naan;
* If one buys garlic naan, it's likely that one has also bought plain papadum with saag aloo. This way, it is possible to create a bundle with these items and apply a discount;
* If one buys pilau rice with chicken tikka masala it's likely that one has also bought madras. This way, it is possible apply discount in the madras.


And that’s it! It has been a pleasure to make this kernel, I have learned a lot! Thank you for reading and if you like it, please upvote it!

<a id='references'></a>
### References:

Annalyn Ng(2016) Association rules apriori algorithm tutorial. Retrieved from: [https://www.kdnuggets.com/2016/04/association-rules-apriori-algorithm-tutorial.html](https://www.kdnuggets.com/2016/04/association-rules-apriori-algorithm-tutorial.html)

Hafsa Jabeen(2018) Market Basket Analysis using R. Retrieved from: [https://www.datacamp.com/community/tutorials/market-basket-analysis-r](https://www.datacamp.com/community/tutorials/market-basket-analysis-r)

Association Rules. Retrived from: [http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/](http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/)

Apriori. Retrived from: [http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/](http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/)
