<h1 align=center><font size = 5>Offer Food with Association Rules Analysis</font></h1>

<br>

<img src="https://images.unsplash.com/photo-1542838132-92c53300491e?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=774&q=80" height=480 width=950 alt="market">

<small>Picture Source: <a href="https://unsplash.com/photos/D6Tu_L3chLE">Unsplash</a></small>

<br>

<h2>Data Set Information:</h2>

<p>The dataset has no real-world equivalent. It is completely randomly generated. Therefore, it will be of great benefit to going through the real data set in order to make a real recommendation. If you have the appropriate data set, you can go over it (hope you show it to me).</p>

<br>

<h2>Association Rules Analysis</h2>

<p>Association Rules Analysis is a data mining technique used to discover interesting relationships or associations among a set of items in large datasets. It is commonly applied in market basket analysis, where the goal is to identify patterns in customer purchasing behavior.</p>

<br>

<p>The analysis involves finding associations between items that frequently co-occur in transactions. The associations are represented as rules of the form "if {itemset A} then {itemset B}". For example, a rule could be "if a customer buys bread and milk, then they are likely to buy eggs."</p>

<br>

<h2>Keywords</h2>

<ul>
	<li>Market</li>
	<li>Machine Learning</li>
	<li>Association Rules Analysis</li>
	<li>Apriori</li>
	<li>Association Rule Mining</li>
</ul>

<br>

<h1>Objective for this Notebook</h1>

<p>In this project, a recommendation model was developed with <i>Association rule analysis</i> based on the products preferred by the customers.</p>

<div class="alert alert-block alert-info" style="margin-top: 20px">
<li><a href="https://#importing_libraries">Importing Libraries</a></li>
<li><a href="https://#data_preprocessing">Data Preprocessing</a></li>
<li><a href="https://#build_ara_model">Building Association Rules Analysis Model</a></li>
<br>

<p></p>
Estimated Time Needed: <strong>20 min</strong>
</div>

<a id="importing_libraries"></a>

<h2 align=center>Importing Libraries</h2>

In [None]:
import pandas as pd
import numpy as np

<br>

<a id="data_preprocessing"></a>

<h2 align=center>Data Preprocessing</h2>

In [None]:
items = ['Coffee', 'Bread', 'Milk', 'Eggs', 'Butter', 'Juice', 'Cereal', 'Yogurt', 'Cheese', 'Pasta',
         'Rice', 'Chicken', 'Beef', 'Fish', 'Apples', 'Bananas', 'Oranges', 'Grapes', 'Strawberries',
         'Tomatoes', 'Potatoes', 'Carrots', 'Onions', 'Lettuce', 'Broccoli', 'Cucumber', 'Soap',
         'Shampoo', 'Toothpaste']

<p>This code will generate a DataFrame with 30 item columns and 1000 customer rows. You can change it as you wish. Each value in the DataFrame represents whether the <b>customer took the item (1)</b> or <b>not (0)</b>. The 'CustomerID' column is added to uniquely identify each customer.</p>

In [None]:
np.random.seed(42)

data = []

df_range = 1000 #@param {type:"number"}
for _ in range(df_range):
    customer = [int(pd.Series([0, 1]).sample(n=1, weights=[0.7, 0.3]).iloc[0]) for _ in range(len(items))]
    data.append(customer)

df = pd.DataFrame(data, columns=items)
df.insert(0, 'CustomerID', range(1, df_range+1))

In [None]:
df.head()

Unnamed: 0,CustomerID,Coffee,Bread,Milk,Eggs,Butter,Juice,Cereal,Yogurt,Cheese,...,Tomatoes,Potatoes,Carrots,Onions,Lettuce,Broccoli,Cucumber,Soap,Shampoo,Toothpaste
0,1,0,1,1,0,0,0,0,1,0,...,0,0,0,0,0,0,1,0,0,0
1,2,0,0,0,0,1,1,1,0,0,...,0,0,1,1,1,1,0,1,0,0
2,3,0,0,0,0,1,0,0,0,0,...,0,0,0,1,0,0,0,0,0,1
3,4,0,1,0,0,1,1,0,1,0,...,0,1,0,0,0,0,1,1,0,1
4,5,1,0,1,0,1,1,0,0,0,...,0,0,1,0,1,1,0,0,0,0


<p>Let's return the last 5 rows of the DataFrame. However, you can pass an integer value inside the parentheses to specify a different number of rows to retrieve.</p>

In [None]:
df.tail()

Unnamed: 0,CustomerID,Coffee,Bread,Milk,Eggs,Butter,Juice,Cereal,Yogurt,Cheese,...,Tomatoes,Potatoes,Carrots,Onions,Lettuce,Broccoli,Cucumber,Soap,Shampoo,Toothpaste
995,996,1,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,1,1,1
996,997,1,0,1,0,1,0,0,1,0,...,0,0,0,1,1,1,0,0,0,0
997,998,1,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
998,999,1,0,0,0,0,0,0,1,0,...,1,1,1,0,0,0,1,0,1,0
999,1000,1,0,1,0,0,0,0,0,1,...,0,1,0,0,1,0,0,1,0,1


<p>Show descriptive statistics of a DataFrame and transpose the result.</p>

In [None]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
CustomerID,1000.0,500.5,288.819436,1.0,250.75,500.5,750.25,1000.0
Coffee,1000.0,0.308,0.461898,0.0,0.0,0.0,1.0,1.0
Bread,1000.0,0.276,0.44724,0.0,0.0,0.0,1.0,1.0
Milk,1000.0,0.301,0.458922,0.0,0.0,0.0,1.0,1.0
Eggs,1000.0,0.312,0.463542,0.0,0.0,0.0,1.0,1.0
Butter,1000.0,0.302,0.459355,0.0,0.0,0.0,1.0,1.0
Juice,1000.0,0.309,0.462312,0.0,0.0,0.0,1.0,1.0
Cereal,1000.0,0.307,0.46148,0.0,0.0,0.0,1.0,1.0
Yogurt,1000.0,0.3,0.458487,0.0,0.0,0.0,1.0,1.0
Cheese,1000.0,0.284,0.451162,0.0,0.0,0.0,1.0,1.0


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 30 columns):
 #   Column        Non-Null Count  Dtype
---  ------        --------------  -----
 0   CustomerID    1000 non-null   int64
 1   Coffee        1000 non-null   int64
 2   Bread         1000 non-null   int64
 3   Milk          1000 non-null   int64
 4   Eggs          1000 non-null   int64
 5   Butter        1000 non-null   int64
 6   Juice         1000 non-null   int64
 7   Cereal        1000 non-null   int64
 8   Yogurt        1000 non-null   int64
 9   Cheese        1000 non-null   int64
 10  Pasta         1000 non-null   int64
 11  Rice          1000 non-null   int64
 12  Chicken       1000 non-null   int64
 13  Beef          1000 non-null   int64
 14  Fish          1000 non-null   int64
 15  Apples        1000 non-null   int64
 16  Bananas       1000 non-null   int64
 17  Oranges       1000 non-null   int64
 18  Grapes        1000 non-null   int64
 19  Strawberries  1000 non-null 

In [None]:
print("Number of NaN values: {}.".format(df.isnull().sum().sum()))

Number of NaN values: 0.


In [None]:
df.shape

(1000, 30)

In [None]:
df.to_excel('data.xlsx', index=False)

In [None]:
df.drop(columns=['CustomerID'], axis=1, inplace=True)

<br>

<a id="build_ara_model"></a>

<h2 align=center>Building Association Rules Analysis Model</h2>

In [None]:
from mlxtend.frequent_patterns import apriori
df1 = apriori(df, min_support=0.02, use_colnames = True)

print(df1)

      support                                itemsets
0       0.308                                (Coffee)
1       0.276                                 (Bread)
2       0.301                                  (Milk)
3       0.312                                  (Eggs)
4       0.302                                (Butter)
...       ...                                     ...
3747    0.020  (Milk, Yogurt, Strawberries, Cucumber)
3748    0.020        (Cheese, Milk, Soap, Toothpaste)
3749    0.020      (Eggs, Butter, Onions, Toothpaste)
3750    0.020   (Potatoes, Broccoli, Juice, Cucumber)
3751    0.021    (Carrots, Bananas, Apples, Cucumber)

[3752 rows x 2 columns]


In [None]:
from mlxtend.frequent_patterns import association_rules

In [None]:
rule = association_rules(df1, metric = "confidence", min_threshold = 0.2)

The strength of an association rule is measured by two metrics: support and confidence.

<p><code>Support</code>: Support indicates the frequency of occurrence of the items in the dataset. It is calculated as the proportion of transactions containing both itemset A and itemset B. Support is an indication of how frequently the itemset appears in the dataset.</p>

$$support = P(A \cap B) $$

<br>

<p><code>Confidence</code>: Confidence measures the reliability of the association rule. It is calculated as the proportion of transactions containing itemset A that also contain itemset B. Confidence is the percentage of all transactions satisfying X that also satisfy Y.</p>

$$conf(X 	\Rightarrow Y) = P(Y | X) = \frac{supp(X \cap Y)}{supp(X)}$$

<br>

Additionally, two other metrics are often used in association rules analysis:

<br>

<p><code>Lift</code>: Lift measures the strength of the association rule by comparing the observed support with the expected support if itemset A and itemset B were independent of each other. A lift greater than 1 indicates a positive association, while a lift less than 1 indicates a negative association. The ratio of the observed support to that expected if X and Y were independent.</p>

$$lift(X 	\Rightarrow Y) = \frac{supp(X \cap Y)}{supp(X) \cdot supp(Y)}$$

<br>




<p><code>Conviction</code>: Conviction measures the degree of implication of the rule. It is calculated as the ratio of the expected confidence to the observed confidence if itemset A and itemset B were independent of each other. Higher conviction values indicate stronger implications.It compares the probability that X appears without Y if they were dependent with the actual frequency of the appearance of X without Y.</p>

$$conv(X 	\Rightarrow Y) = \frac{1-supp(Y)}{1-conf(X \Rightarrow Y)}$$

<br>

Association rules analysis helps uncover hidden patterns and relationships in the data, which can be used for various purposes. For example, in retail, it can be used to make product recommendations, optimize store layouts, plan promotional strategies, and improve inventory management.

By identifying the frequent itemsets and generating meaningful association rules, businesses can gain insights into customer behavior, improve decision-making, and enhance the overall customer experience.

<br>

<small>Source: <a href='https://en.wikipedia.org/wiki/Association_rule_learning'>Wikipedia</a></small>

In [None]:
rule[(rule['confidence'] >= 0.30) & (rule['support'] >= 0.1)]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
18,(Coffee),(Rice),0.308,0.308,0.105,0.340909,1.106848,0.010136,1.049931
19,(Rice),(Coffee),0.308,0.308,0.105,0.340909,1.106848,0.010136,1.049931
40,(Coffee),(Carrots),0.308,0.310,0.102,0.331169,1.068287,0.006520,1.031650
41,(Carrots),(Coffee),0.310,0.308,0.102,0.329032,1.068287,0.006520,1.031346
46,(Coffee),(Broccoli),0.308,0.315,0.108,0.350649,1.113173,0.010980,1.054900
...,...,...,...,...,...,...,...,...,...
793,(Cucumber),(Broccoli),0.317,0.315,0.110,0.347003,1.101597,0.010145,1.049010
804,(Toothpaste),(Cucumber),0.309,0.317,0.101,0.326861,1.031107,0.003047,1.014649
805,(Cucumber),(Toothpaste),0.317,0.309,0.101,0.318612,1.031107,0.003047,1.014106
808,(Soap),(Toothpaste),0.286,0.309,0.105,0.367133,1.188132,0.016626,1.091856


In [None]:
rule["antecedents"] = rule["antecedents"].apply(lambda x: ', '.join(list(x))).astype("unicode")
rule["consequents"] = rule["consequents"].apply(lambda x: ', '.join(list(x))).astype("unicode")

In [None]:
rule

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,Coffee,Bread,0.308,0.276,0.083,0.269481,0.976379,-0.002008,0.991076
1,Bread,Coffee,0.276,0.308,0.083,0.300725,0.976379,-0.002008,0.989596
2,Coffee,Milk,0.308,0.301,0.080,0.259740,0.862924,-0.012708,0.944263
3,Milk,Coffee,0.301,0.308,0.080,0.265781,0.862924,-0.012708,0.942498
4,Coffee,Eggs,0.308,0.312,0.092,0.298701,0.957376,-0.004096,0.981037
...,...,...,...,...,...,...,...,...,...
10778,"Bananas, Apples, Cucumber",Carrots,0.045,0.310,0.021,0.466667,1.505376,0.007050,1.293750
10779,"Carrots, Bananas","Apples, Cucumber",0.088,0.103,0.021,0.238636,2.316858,0.011936,1.178149
10780,"Bananas, Apples","Carrots, Cucumber",0.098,0.107,0.021,0.214286,2.002670,0.010514,1.136545
10781,"Bananas, Cucumber","Carrots, Apples",0.101,0.107,0.021,0.207921,1.943185,0.010193,1.127412


In [None]:
['Coffee', 'Bread', 'Milk', 'Eggs', 'Butter', 'Juice', 'Cereal', 'Yogurt', 'Cheese', 'Pasta',
         'Rice', 'Chicken', 'Beef', 'Fish', 'Apples', 'Bananas', 'Oranges', 'Grapes', 'Strawberries',
         'Tomatoes', 'Potatoes', 'Carrots', 'Onions', 'Lettuce', 'Broccoli', 'Cucumber', 'Soap',
         'Shampoo', 'Toothpaste']

['Coffee',
 'Bread',
 'Milk',
 'Eggs',
 'Butter',
 'Juice',
 'Cereal',
 'Yogurt',
 'Cheese',
 'Pasta',
 'Rice',
 'Chicken',
 'Beef',
 'Fish',
 'Apples',
 'Bananas',
 'Oranges',
 'Grapes',
 'Strawberries',
 'Tomatoes',
 'Potatoes',
 'Carrots',
 'Onions',
 'Lettuce',
 'Broccoli',
 'Cucumber',
 'Soap',
 'Shampoo',
 'Toothpaste']

In [None]:
rule[(rule['antecedents'] == 'Coffee') & (rule['confidence'] >= 0.10) & (rule['support'] >= 0.001)]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,Coffee,Bread,0.308,0.276,0.083,0.269481,0.976379,-0.002008,0.991076
2,Coffee,Milk,0.308,0.301,0.08,0.25974,0.862924,-0.012708,0.944263
4,Coffee,Eggs,0.308,0.312,0.092,0.298701,0.957376,-0.004096,0.981037
6,Coffee,Butter,0.308,0.302,0.088,0.285714,0.946074,-0.005016,0.9772
8,Coffee,Juice,0.308,0.309,0.089,0.288961,0.935149,-0.006172,0.971817
10,Coffee,Cereal,0.308,0.307,0.082,0.266234,0.867211,-0.012556,0.944442
12,Coffee,Yogurt,0.308,0.3,0.086,0.279221,0.930736,-0.0064,0.971171
15,Coffee,Cheese,0.308,0.284,0.092,0.298701,1.051765,0.004528,1.020963
16,Coffee,Pasta,0.308,0.283,0.09,0.292208,1.032536,0.002836,1.013009
18,Coffee,Rice,0.308,0.308,0.105,0.340909,1.106848,0.010136,1.049931


In [None]:
rule[(rule['antecedents'] == 'Bread, Milk') & (rule['confidence'] >= 0.10) & (rule['support'] >= 0.001)]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction


In [None]:
rule[(rule['antecedents'] == 'Bread, Milk') & (rule['confidence'] >= 0.10) & (rule['support'] >= 0.001)]['consequents'][0:3].values

array([], dtype=object)

In [None]:
rule[(rule['antecedents'] == 'Bread, Milk') & (rule['confidence'] >= 0.10) & (rule['support'] >= 0.001)]['support'][0:3].values

array([], dtype=float64)

In [None]:
def suggest_item(rule, items, min_threshold=0.2, min_support=0.02, parameter='support', confidence_thold = 0.10, support_thold=0.001):
  from mlxtend.frequent_patterns import apriori
  import numpy as np
  import pandas as pd

  df1 = apriori(df, min_support=min_support, use_colnames = True)
  rule = association_rules(df1, metric = "confidence", min_threshold = min_threshold)

  rule["antecedents"] = rule["antecedents"].apply(lambda x: ', '.join(list(x))).astype("unicode")
  rule["consequents"] = rule["consequents"].apply(lambda x: ', '.join(list(x))).astype("unicode")

  suggestions = rule[(rule['antecedents'] == items) & \
                     (rule['confidence'] >= confidence_thold) & \
                     (rule['support'] >= support_thold)]['consequents'].values

  parameters = rule[(rule['antecedents'] == items) & \
                     (rule['confidence'] >= confidence_thold) & \
                     (rule['support'] >= support_thold)][parameter].values

  return suggestions, parameters

In [None]:
item = 'Bread'
suggested_items, parameter = suggest_item(rule, item, parameter='confidence')
for i in range(len(suggested_items)):
  print(f'I recommend to buy you {suggested_items[i]} with {parameter[i]} value with {item}.')

I recommend to buy you Coffee with 0.3007246376811594 value with Bread.
I recommend to buy you Milk with 0.30434782608695654 value with Bread.
I recommend to buy you Eggs with 0.30434782608695654 value with Bread.
I recommend to buy you Butter with 0.3188405797101449 value with Bread.
I recommend to buy you Juice with 0.31159420289855067 value with Bread.
I recommend to buy you Cereal with 0.32246376811594196 value with Bread.
I recommend to buy you Yogurt with 0.26449275362318836 value with Bread.
I recommend to buy you Cheese with 0.3079710144927536 value with Bread.
I recommend to buy you Pasta with 0.2681159420289855 value with Bread.
I recommend to buy you Rice with 0.3079710144927536 value with Bread.
I recommend to buy you Chicken with 0.27536231884057966 value with Bread.
I recommend to buy you Beef with 0.30434782608695654 value with Bread.
I recommend to buy you Fish with 0.3007246376811594 value with Bread.
I recommend to buy you Apples with 0.31159420289855067 value with Br

<p>The best recommendation is on the below for the <code>Bread</code> item.

In [None]:
list_zip = zip(suggested_items, parameter)
zipped_list = list(list_zip)
df2 = pd.DataFrame(zipped_list, columns = ['item','parameter'])
best_parameter = df2['parameter'].max()
best_suggestion = df2['item'][df2['parameter'].argmax()]
print(f'Best parameter: {best_parameter}')
print(f'Best suggestion: {best_suggestion}')

Best parameter: 0.32246376811594196
Best suggestion: Cereal


<br>

<h1>Contact Me<h1>
<p>If you have something to say to me please contact me:</p>

<ul>
  <li>Twitter: <a href="https://twitter.com/Doguilmak">Doguilmak</a></li>
  <li>Mail address: doguilmak@gmail.com</li>
</ul>

In [None]:
from datetime import datetime
print(f"Changes have been made to the project on {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

Changes have been made to the project on 2023-06-13 15:24:17
