### How to increase sales in a certain area(especially outside Java)?
This question has been asked every month by managers and c levels. Retail is a complex business, and some areas have more product types compared to others. For example:
- Area A, sells 10 different types of shampoo and 2 different types of chips
- Area B, sells 3 different types of shampoo and 5 different types of chips

Through experimentation by adding a certain product type to an area, sales performance can be increased.
Determing which product to be placed in a certain area(especially outside Java) is crucial because of difference pricing, difference promotions, etc.

However, the purpose of this notebook is to learn from the transaction behavior of an area with top sales of a product that we want to expand to other areas by using Apriori:
- When a user bought product X, what product that is likely to be bought together?

Here's the approach:
1. Get apriori of an area or a few with top sales(for example fruit, this is top sales in branch Jakarta and "what if we try to sell it in other area?")
2. Learnt its pair product
3. then search for area that we want to improve the sales + not selling the fruit
4. check their top sales if match the second point(pair product) then we can consider this area

In [1]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

In [2]:
df = pd.read_excel('/content/dummy_data.xlsx')

  and should_run_async(code)


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6259 entries, 0 to 6258
Data columns (total 2 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   order_id          6259 non-null   object
 1   product_category  6259 non-null   object
dtypes: object(2)
memory usage: 97.9+ KB


  and should_run_async(code)


In [6]:
df_encoded = pd.get_dummies(df['product_category']).groupby(df['order_id']).apply(max)

min_support = 0.01
frequent_itemsets = apriori(df_encoded, min_support=min_support, use_colnames=True)

min_confidence = 0.01
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=min_confidence)

rules["lift"] = rules["lift"].round(2)


rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']]

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(ULTRA MILK),(AIR FRESHER),0.011100,0.011117,1.00
1,(AIR FRESHER),(ULTRA MILK),0.011100,1.000000,1.00
2,(AQUA),(GREENFIELD),0.010595,0.328125,1.97
3,(GREENFIELD),(AQUA),0.010595,0.063636,1.97
4,(ULTRA MILK),(AQUA),0.032291,0.032340,1.00
...,...,...,...,...,...
849,"(ULTRA MILK, INDOMIE)","(SUGAR, MINYAK GORENG)",0.011100,0.068111,4.22
850,(SUGAR),"(ULTRA MILK, MINYAK GORENG, INDOMIE)",0.011100,0.207547,5.96
851,(MINYAK GORENG),"(SUGAR, INDOMIE, ULTRA MILK)",0.011100,0.134146,5.02
852,(INDOMIE),"(SUGAR, MINYAK GORENG, ULTRA MILK)",0.011100,0.067485,4.18


In [8]:
filtered_rules = rules[rules['consequents'].apply(lambda x: 'CHIP CHIP' in str(x))]
filtered_rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']].sort_values(by='lift', ascending=False).head(20)

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,support,confidence,lift
22,(BISCUIT),(CHIP CHIP),0.010595,0.221053,4.47
276,"(ULTRA MILK, BISCUIT)",(CHIP CHIP),0.010595,0.221053,4.47
280,(BISCUIT),"(ULTRA MILK, CHIP CHIP)",0.010595,0.221053,4.47
400,(COOKIES),"(ULTRA MILK, CHIP CHIP)",0.012109,0.196721,3.98
396,"(ULTRA MILK, COOKIES)",(CHIP CHIP),0.012109,0.196721,3.98
64,(COOKIES),(CHIP CHIP),0.012109,0.196721,3.98
423,(POTATO),"(ULTRA MILK, CHIP CHIP)",0.013623,0.188811,3.82
420,"(POTATO, ULTRA MILK)",(CHIP CHIP),0.013623,0.188811,3.82
72,(POTATO),(CHIP CHIP),0.013623,0.188811,3.82
312,"(BREAD, ULTRA MILK)",(CHIP CHIP),0.015641,0.128631,2.6


By looking at the output, on the first row we can assume that:
- Antecedent: If a customer buys "BISCUIT",
- Consequent: Then there's a 22.1% chance they will also buy "(CHIP CHIP)".
- Support: Only about 1.06% of all transactions contain both "BISCUIT" and "(CHIP CHIP)".
- Confidence: Out of those who bought "BISCUIT," 22.1% also bought "(CHIP CHIP)".
- Lift: The likelihood of buying "(CHIP CHIP)" is 4.47 times higher when "BISCUIT" is in the basket compared to random chance.

Reference:
- https://towardsdatascience.com/data-mining-market-basket-analysis-with-apriori-algorithm-970ff256a92c
- https://www.datacamp.com/blog/5-ways-use-data-science-marketing