<h1 align=center><font size = 5>Offer Food with Association Rules Analysis</font></h1>

<br>

<img src="https://images.unsplash.com/photo-1542838132-92c53300491e?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=774&q=80" height=480 width=950 alt="market">

<small>Picture Source: <a href="https://unsplash.com/photos/D6Tu_L3chLE">Unsplash</a></small>

<br>

<h2>Data Set Information:</h2>

<p>The dataset has no real-world equivalent. It is completely randomly generated. Therefore, it will be of great benefit to going through the real data set in order to make a real recommendation. If you have the appropriate data set, you can go over it (hope you show it to me).</p>

<br>

<h2>Keywords</h2> 

<ul>
	<li>Market</li>
	<li>Machine Learning</li>
	<li>Association Rules Analysis</li>
	<li>Apriori</li>
	<li>Association Rule Mining</li>
</ul> 

<br>

<h1>Objective for this Notebook</h1>

<p>In this project, a recommendation model was developed with <i>Association rule analysis</i> based on the products preferred by the customers.</p>

<div class="alert alert-block alert-info" style="margin-top: 20px">
<li><a href="https://#importing_libraries">Importing Libraries</a></li>
<li><a href="https://#data_preprocessing">Data Preprocessing</a></li>
<li><a href="https://#build_ara_model">Building Association Rules Analysis Model</a></li>
<br>

<p></p>
Estimated Time Needed: <strong>20 min</strong>
</div>

<a id="importing_libraries"></a>

<h2 align=center>Importing Libraries</h2>

In [1]:
import pandas as pd
import numpy as np

<br>

<a id="data_preprocessing"></a>

<h2 align=center>Data Preprocessing</h2>

In [2]:
df = pd.read_csv('pocket.csv')

In [3]:
df.head()

Unnamed: 0,BREAD,CHEESE,COFFEE,EGG,MILK,TEA,BACON,CHOCOLATE,WATER
0,0,0,0,0,0,0,0,0,0
1,0,1,1,0,0,0,0,0,1
2,0,1,0,0,0,0,1,0,0
3,0,0,1,0,0,0,1,0,0
4,1,1,0,1,1,0,1,0,0


In [4]:
df.tail()

Unnamed: 0,BREAD,CHEESE,COFFEE,EGG,MILK,TEA,BACON,CHOCOLATE,WATER
1994,1,0,0,0,0,0,0,0,0
1995,1,1,0,1,0,0,0,0,1
1996,0,0,0,0,0,1,0,0,1
1997,0,0,0,1,1,0,1,1,1
1998,1,0,0,0,0,1,0,0,1


In [5]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
BREAD,1999.0,0.335668,0.472342,0.0,0.0,0.0,1.0,1.0
CHEESE,1999.0,0.342671,0.474721,0.0,0.0,0.0,1.0,1.0
COFFEE,1999.0,0.322161,0.467421,0.0,0.0,0.0,1.0,1.0
EGG,1999.0,0.321161,0.467039,0.0,0.0,0.0,1.0,1.0
MILK,1999.0,0.338169,0.473205,0.0,0.0,0.0,1.0,1.0
TEA,1999.0,0.34017,0.473885,0.0,0.0,0.0,1.0,1.0
BACON,1999.0,0.316658,0.465289,0.0,0.0,0.0,1.0,1.0
CHOCOLATE,1999.0,0.326663,0.46911,0.0,0.0,0.0,1.0,1.0
WATER,1999.0,0.333167,0.471463,0.0,0.0,0.0,1.0,1.0


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1999 entries, 0 to 1998
Data columns (total 9 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   BREAD      1999 non-null   int64
 1   CHEESE     1999 non-null   int64
 2   COFFEE     1999 non-null   int64
 3   EGG        1999 non-null   int64
 4   MILK       1999 non-null   int64
 5   TEA        1999 non-null   int64
 6   BACON      1999 non-null   int64
 7   CHOCOLATE  1999 non-null   int64
 8   WATER      1999 non-null   int64
dtypes: int64(9)
memory usage: 140.7 KB


In [7]:
print("Number of NaN values: {}.".format(df.isnull().sum().sum()))

Number of NaN values: 0.


In [8]:
df.shape

(1999, 9)

<br>

<a id="build_ara_model"></a>

<h2 align=center>Building Association Rules Analysis Model</h2>

In [9]:
from mlxtend.frequent_patterns import apriori
df1 = apriori(df, min_support=0.02, use_colnames = True)

print(df1)

      support                   itemsets
0    0.335668                    (BREAD)
1    0.342671                   (CHEESE)
2    0.322161                   (COFFEE)
3    0.321161                      (EGG)
4    0.338169                     (MILK)
..        ...                        ...
124  0.043522   (CHOCOLATE, WATER, MILK)
125  0.035518    (BACON, CHOCOLATE, TEA)
126  0.035518        (BACON, WATER, TEA)
127  0.040520    (WATER, CHOCOLATE, TEA)
128  0.039020  (BACON, WATER, CHOCOLATE)

[129 rows x 2 columns]


In [10]:
from mlxtend.frequent_patterns import association_rules

In [11]:
rule = association_rules(df1, metric = "confidence", min_threshold = 0.2)

<p><code>Support</code>: Support is an indication of how frequently the itemset appears in the dataset.</p>

$$support = P(A \cap B) $$

<br>

<p><code>Confidence</code>: Confidence is the percentage of all transactions satisfying X that also satisfy Y.</p>

$$conf(X 	\Rightarrow Y) = P(Y | X) = \frac{supp(X \cap Y)}{supp(X)}$$

<br>

<p><code>Lift</code>: or the ratio of the observed support to that expected if X and Y were independent.</p>

$$lift(X 	\Rightarrow Y) = \frac{supp(X \cap Y)}{supp(X) \cdot supp(Y)}$$

<br>

<p><code>Conviction</code>: compares the probability that X appears without Y if they were dependent with the actual frequency of the appearance of X without Y.</p>

$$conv(X 	\Rightarrow Y) = \frac{1-supp(Y)}{1-conf(X \Rightarrow Y)}$$

<br>

Source: <a href='https://en.wikipedia.org/wiki/Association_rule_learning'>Wikipedia</a>

In [12]:
rule[(rule['confidence'] >= 0.30) & (rule['support'] >= 0.1)]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(BREAD),(CHEESE),0.335668,0.342671,0.111556,0.332340,0.969850,-0.003468,0.984526
1,(CHEESE),(BREAD),0.342671,0.335668,0.111556,0.325547,0.969850,-0.003468,0.984995
2,(BREAD),(COFFEE),0.335668,0.322161,0.105553,0.314456,0.976083,-0.002586,0.988761
3,(COFFEE),(BREAD),0.322161,0.335668,0.105553,0.327640,0.976083,-0.002586,0.988060
4,(BREAD),(EGG),0.335668,0.321161,0.110055,0.327869,1.020888,0.002252,1.009981
...,...,...,...,...,...,...,...,...,...
67,(CHOCOLATE),(BACON),0.326663,0.316658,0.103052,0.315467,0.996238,-0.000389,0.998260
68,(BACON),(WATER),0.316658,0.333167,0.106053,0.334913,1.005242,0.000553,1.002626
69,(WATER),(BACON),0.333167,0.316658,0.106053,0.318318,1.005242,0.000553,1.002435
70,(WATER),(CHOCOLATE),0.333167,0.326663,0.110555,0.331832,1.015822,0.001722,1.007735


In [13]:
rule["antecedents"] = rule["antecedents"].apply(lambda x: ', '.join(list(x))).astype("unicode")
rule["consequents"] = rule["consequents"].apply(lambda x: ', '.join(list(x))).astype("unicode")

In [14]:
rule

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,BREAD,CHEESE,0.335668,0.342671,0.111556,0.332340,0.969850,-0.003468,0.984526
1,CHEESE,BREAD,0.342671,0.335668,0.111556,0.325547,0.969850,-0.003468,0.984995
2,BREAD,COFFEE,0.335668,0.322161,0.105553,0.314456,0.976083,-0.002586,0.988761
3,COFFEE,BREAD,0.322161,0.335668,0.105553,0.327640,0.976083,-0.002586,0.988060
4,BREAD,EGG,0.335668,0.321161,0.110055,0.327869,1.020888,0.002252,1.009981
...,...,...,...,...,...,...,...,...,...
319,"WATER, TEA",CHOCOLATE,0.120060,0.326663,0.040520,0.337500,1.033174,0.001301,1.016357
320,"CHOCOLATE, TEA",WATER,0.115058,0.333167,0.040520,0.352174,1.057051,0.002187,1.029340
321,"BACON, WATER",CHOCOLATE,0.106053,0.326663,0.039020,0.367925,1.126311,0.004376,1.065279
322,"BACON, CHOCOLATE",WATER,0.103052,0.333167,0.039020,0.378641,1.136491,0.004686,1.073185


In [15]:
rule[(rule['antecedents'] == 'BREAD') & (rule['confidence'] >= 0.10) & (rule['support'] >= 0.001)]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,BREAD,CHEESE,0.335668,0.342671,0.111556,0.33234,0.96985,-0.003468,0.984526
2,BREAD,COFFEE,0.335668,0.322161,0.105553,0.314456,0.976083,-0.002586,0.988761
4,BREAD,EGG,0.335668,0.321161,0.110055,0.327869,1.020888,0.002252,1.009981
6,BREAD,MILK,0.335668,0.338169,0.116058,0.345753,1.022425,0.002546,1.011591
8,BREAD,TEA,0.335668,0.34017,0.116558,0.347243,1.020792,0.002374,1.010835
10,BREAD,BACON,0.335668,0.316658,0.109055,0.324888,1.02599,0.002763,1.01219
12,BREAD,CHOCOLATE,0.335668,0.326663,0.107054,0.318927,0.976317,-0.002597,0.988641
14,BREAD,WATER,0.335668,0.333167,0.113557,0.338301,1.015411,0.001723,1.00776


In [16]:
rule[(rule['antecedents'] == 'WATER, CHOCOLATE') & (rule['confidence'] >= 0.10) & (rule['support'] >= 0.001)]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
155,"WATER, CHOCOLATE",BREAD,0.110555,0.335668,0.037019,0.334842,0.997539,-9.1e-05,0.998758
217,"WATER, CHOCOLATE",CHEESE,0.110555,0.342671,0.037519,0.339367,0.990356,-0.000365,0.994997
263,"WATER, CHOCOLATE",COFFEE,0.110555,0.322161,0.034517,0.312217,0.969134,-0.001099,0.985542
291,"WATER, CHOCOLATE",EGG,0.110555,0.321161,0.038519,0.348416,1.084866,0.003013,1.04183
309,"WATER, CHOCOLATE",MILK,0.110555,0.338169,0.043522,0.393665,1.164107,0.006135,1.091527
318,"WATER, CHOCOLATE",TEA,0.110555,0.34017,0.04052,0.366516,1.077449,0.002913,1.041589
323,"WATER, CHOCOLATE",BACON,0.110555,0.316658,0.03902,0.352941,1.11458,0.004011,1.056073


In [17]:
rule[(rule['antecedents'] == 'WATER, CHOCOLATE') & (rule['confidence'] >= 0.10) & (rule['support'] >= 0.001)]['consequents'][0:3].values

array(['BREAD', 'CHEESE', 'COFFEE'], dtype=object)

In [18]:
rule[(rule['antecedents'] == 'WATER, CHOCOLATE') & (rule['confidence'] >= 0.10) & (rule['support'] >= 0.001)]['support'][0:3].values

array([0.03701851, 0.03751876, 0.03451726])

In [19]:
def suggest_item(rule, items, min_threshold=0.2, min_support=0.02, parameter='support', confidence_thold = 0.10, support_thold=0.001):
  from mlxtend.frequent_patterns import apriori
  import numpy as np
  import pandas as pd

  df1 = apriori(df, min_support=min_support, use_colnames = True)
  rule = association_rules(df1, metric = "confidence", min_threshold = min_threshold)
  
  rule["antecedents"] = rule["antecedents"].apply(lambda x: ', '.join(list(x))).astype("unicode")
  rule["consequents"] = rule["consequents"].apply(lambda x: ', '.join(list(x))).astype("unicode")
  
  suggestions = rule[(rule['antecedents'] == items) & \
                     (rule['confidence'] >= confidence_thold) & \
                     (rule['support'] >= support_thold)]['consequents'].values

  parameters = rule[(rule['antecedents'] == items) & \
                     (rule['confidence'] >= confidence_thold) & \
                     (rule['support'] >= support_thold)][parameter].values

  return suggestions, parameters

In [20]:
item = 'BREAD'
suggested_items, parameter = suggest_item(rule, item, parameter='confidence')
for i in range(len(suggested_items)):
  print(f'I recommend to buy you {suggested_items[i]} with {parameter[i]} value with {item}.')

I recommend to buy you CHEESE with 0.3323397913561848 value with BREAD.
I recommend to buy you COFFEE with 0.3144560357675112 value with BREAD.
I recommend to buy you EGG with 0.3278688524590164 value with BREAD.
I recommend to buy you MILK with 0.34575260804769 value with BREAD.
I recommend to buy you TEA with 0.3472429210134128 value with BREAD.
I recommend to buy you BACON with 0.32488822652757077 value with BREAD.
I recommend to buy you CHOCOLATE with 0.31892697466467956 value with BREAD.
I recommend to buy you WATER with 0.338301043219076 value with BREAD.


In [21]:
list_zip = zip(suggested_items, parameter)
zipped_list = list(list_zip)
df2 = pd.DataFrame(zipped_list, columns = ['item','parameter'])
best_parameter = df2['parameter'].max()
best_suggestion = df2['item'][df2['parameter'].argmax()]
print(f'Best parameter: {best_parameter}')
print(f'Best suggestion: {best_suggestion}')

Best parameter: 0.3472429210134128
Best suggestion: TEA


<br>

<h1>Contact Me<h1>
<p>If you have something to say to me please contact me:</p>

<ul>
  <li>Twitter: <a href="https://twitter.com/Doguilmak">Doguilmak</a></li>
  <li>Mail address: doguilmak@gmail.com</li>
</ul>

In [22]:
from datetime import datetime
print(f"Changes have been made to the project on {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

Changes have been made to the project on 2023-01-19 16:53:21
