**Start by loading in the required libraries: `Pandas`, `NumPy`, and `Mlxtend.frequent_patterns`. Then import the `crique_orders.csv` dataset. Perform any preprocessing needs (dummy coding transactions, dropping any unnecessary columns, etc.)**


In [13]:
import pandas as pd
import numpy as np
from google.colab import data_table
from mlxtend import frequent_patterns
from mlxtend.frequent_patterns import apriori, association_rules

In [5]:
df = pd.read_csv('crique_orders.csv')
df.head()

Unnamed: 0,OrderID,Order_Items
0,22914,SFry
1,22915,"BigCriq, LFry, MSoda"
2,22916,"Cito, Nacho, MSoda, TPop"
3,22917,"CriqBurg, SFry, LSoda, TPop"
4,22918,"CriqBurg, LFry, FSc, LSoda"


### Create Dummy Variables for all Order Items

In [6]:
df = df.dropna()
data = list(df['Order_Items'].apply(lambda x:x.split(","))) # convert the order_items column to a list

In [7]:
from mlxtend.preprocessing import TransactionEncoder

a = TransactionEncoder()

a_data = a.fit(data).transform(data)
df = pd.DataFrame(a_data, columns=a.columns_)
df

Unnamed: 0,CarbOff,FSc,Hop,LFry,LSoda,MSoda,Nacho,SFry,SSoda,TPop,...,Cito,CriqBurg,Hop.1,LFry.1,MSoda.1,Nacho.1,SCito,SFry.1,SSoda.1,TPop.1
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
1,False,False,False,True,False,True,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,True,True,False,False,True,...,True,False,False,False,False,False,False,False,False,False
3,False,False,False,False,True,False,False,True,False,True,...,False,True,False,False,False,False,False,False,False,False
4,False,True,False,True,True,False,False,False,False,False,...,False,True,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1858,True,True,False,True,True,False,False,False,False,False,...,True,False,False,False,False,False,False,False,False,False
1859,False,False,False,False,False,False,False,True,True,False,...,False,False,False,False,False,False,False,False,False,False
1860,False,False,False,False,True,False,False,True,False,False,...,False,True,False,False,False,False,False,False,False,False
1861,True,False,False,False,False,False,True,False,True,False,...,False,False,False,False,False,False,False,False,False,False


### Convert T/F to 1/0 and clean up dataframe

In [8]:
df.replace({True: 1, False: 0}, inplace=True)
df

Unnamed: 0,CarbOff,FSc,Hop,LFry,LSoda,MSoda,Nacho,SFry,SSoda,TPop,...,Cito,CriqBurg,Hop.1,LFry.1,MSoda.1,Nacho.1,SCito,SFry.1,SSoda.1,TPop.1
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
1,0,0,0,1,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,1,1,0,0,1,...,1,0,0,0,0,0,0,0,0,0
3,0,0,0,0,1,0,0,1,0,1,...,0,1,0,0,0,0,0,0,0,0
4,0,1,0,1,1,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1858,1,1,0,1,1,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
1859,0,0,0,0,0,0,0,1,1,0,...,0,0,0,0,0,0,0,0,0,0
1860,0,0,0,0,1,0,0,1,0,0,...,0,1,0,0,0,0,0,0,0,0
1861,1,0,0,0,0,0,1,0,1,0,...,0,0,0,0,0,0,0,0,0,0


In [9]:
df.columns = df.columns.str.replace(' ', '') # remove duplicated columns by first trimming all leading and trailing whitespaces
df = df.loc[:,~df.columns.duplicated()] # drop duplicated columns
df

Unnamed: 0,CarbOff,FSc,Hop,LFry,LSoda,MSoda,Nacho,SFry,SSoda,TPop,BigCriq,CBChs,Cito,CriqBurg,SCito
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0
2,0,0,0,0,0,1,1,0,0,1,0,0,1,0,0
3,0,0,0,0,1,0,0,1,0,1,0,0,0,1,0
4,0,1,0,1,1,0,0,0,0,0,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1858,1,1,0,1,1,0,0,0,0,0,0,0,1,0,0
1859,0,0,0,0,0,0,0,1,1,0,0,1,0,0,0
1860,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0
1861,1,0,0,0,0,0,1,0,1,0,0,1,0,0,0


## A

**Generate frequent itemsets with a mimumum support of 0.001. Show the table.**


In [15]:
freq_is = apriori(df, min_support = 0.001, use_colnames = True)
data_table.enable_dataframe_formatter()
freq_is

Unnamed: 0,support,itemsets
0,0.234568,(CarbOff)
1,0.107890,(FSc)
2,0.028986,(Hop)
3,0.207193,(LFry)
4,0.122920,(LSoda)
...,...,...
451,0.001074,"(FSc, CBChs, TPop, LSoda, SFry)"
452,0.001610,"(FSc, CriqBurg, TPop, LSoda, SFry)"
453,0.001074,"(FSc, CriqBurg, MSoda, TPop, SFry)"
454,0.001074,"(SSoda, FSc, CBChs, TPop, SFry)"


## B

**What is the second most frequently purchased combination of two or more times? In what percentage of orders does this combination occur? *Hint: sort support in descending order.*** 




CriqBurg, SFry, .12077

## C 

**What is the most frequently purchased combination of three or more items? In what percentage of orders does this combination appear?** 




SSoda, Cito, Nacho, .06173

## D

**Confirm the support of this itemset (answer to 1C) by writing a formula to calculate support:**

In [22]:
sc = len(df[(df['SSoda'] == 1) & (df['Cito'] == 1) & (df['Cito'] == 1)])
support = sc/len(df)
support

0.08695652173913043

## E

**In what percentage of orders were CriqBurgers purchased?**

.31723

## F

**Apart from looking up the support of CriqBurgers in the frequent itemsets table, how could you have determined the answer to 1E using descriptive statistics? Show below:**



In [25]:
df.describe()

Unnamed: 0,CarbOff,FSc,Hop,LFry,LSoda,MSoda,Nacho,SFry,SSoda,TPop,BigCriq,CBChs,Cito,CriqBurg,SCito
count,1863.0,1863.0,1863.0,1863.0,1863.0,1863.0,1863.0,1863.0,1863.0,1863.0,1863.0,1863.0,1863.0,1863.0,1863.0
mean,0.234568,0.10789,0.028986,0.207193,0.12292,0.239936,0.294149,0.213097,0.333333,0.195921,0.094471,0.149222,0.243693,0.31723,0.101449
std,0.423842,0.310325,0.167811,0.405404,0.328434,0.427159,0.455782,0.409606,0.471531,0.397014,0.292562,0.356403,0.429425,0.465523,0.302004
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


# Q2

**Conduct an association rules analysis with a minimum confidence level of 0.001. Change the support of your frequent itemsets to 0.01.**


In [31]:
freq_is = apriori(df, min_support = 0.01, use_colnames = True,)
df_ar = association_rules(freq_is, metric = "confidence", min_threshold = 0.001) #change minimum_threshold confidence value
df_ar

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(FSc),(CarbOff),0.107890,0.234568,0.019860,0.184080,0.784760,-0.005447,0.938121
1,(CarbOff),(FSc),0.234568,0.107890,0.019860,0.084668,0.784760,-0.005447,0.974630
2,(CarbOff),(LFry),0.234568,0.207193,0.054750,0.233410,1.126534,0.006150,1.034199
3,(LFry),(CarbOff),0.207193,0.234568,0.054750,0.264249,1.126534,0.006150,1.040341
4,(LSoda),(CarbOff),0.122920,0.234568,0.026302,0.213974,0.912204,-0.002531,0.973800
...,...,...,...,...,...,...,...,...,...
549,"(FSc, SFry)","(SSoda, CriqBurg)",0.045625,0.107354,0.010199,0.223529,2.082176,0.005301,1.149620
550,(SSoda),"(CriqBurg, FSc, SFry)",0.333333,0.028449,0.010199,0.030596,1.075472,0.000716,1.002215
551,(CriqBurg),"(SSoda, FSc, SFry)",0.317230,0.017177,0.010199,0.032149,1.871669,0.004750,1.015470
552,(FSc),"(SSoda, CriqBurg, SFry)",0.107890,0.045625,0.010199,0.094527,2.071817,0.005276,1.054007


# Q3

**Report the confidence values for the following prompts:**

## A
**List the rule and the confidence value for the three rules with the highest confidence using the following notation {itemset 1} → {itemset 2} to indicate the rules:**



{FSc, Cito} → {LFry}, 1.0

{SSoda, SCito} → {Nacho}, .73214

{SSoda, SCito} → {Nacho}, .72

## B

**Report the confidence for the following rules:**

**{Cito, LFry} → {MSoda}** 

**{Cito} → {SSoda, Nacho}**  

{Cito, LFry} → {MSoda}, .22916

{Cito} → {SSoda, Nacho}.2533

## C

**Replicate at least one of these results by writing the calculation for confidence with a formula below:**


In [38]:
sc_x = len(df[(df['Cito'] == 1) & (df['LFry'] == 1)]) #support count of antecedent with two items in itemset
sc_xy = len(df[(df['Cito'] == 1) & (df['LFry'] == 1) & df['MSoda'] == 1]) #support count of antecendent and consequent
confidence = sc_xy/sc_x #divide support count of antecendent and consequent by the support count of the antecedent
confidence

0.22916666666666666

## D

**Interpret the meaning of these confidence values in general. Then specifically describe what one of the confidence values reported in 3b indicates about the pair of itemsets included in its rule.**


The confidence values are a measurement of the probability that the consequent happends when the antecedent happens. In 3C I calculated the probability that someone who gets Cito and LFry will also order MSoda.

# Q4 

**Report on the lift values generated in the association rules table, specifically:**

## A

**Which three rules have the highest lift values? Write the rules in proper notation with their corresponding lift value.**



{FSc, Cito} → {LFry}, 4.82642

{LFry} → {FSc, Cito}, 4.82642

{LFry} → {LSoda, Cito}, 2.98103


## B

**Describe what lift indicates in general. Then specifically describe what one lift value from 4A indicates. Finally, the rules associated with the top two lift values have the same itemsets, just swapped antecedents and consequents. Does it make sense that they share the same lift value, why or why not?**


Lift Indicates how much more likely the consequent is given the occurence of the antecedent.

Someone who purchases FSc and Cito is 4.83 times more likely to buy LFry than the general customer.

Yes, because those occurences happen in tandem. Every time someone purchases LFry, Cito, and FSc, they buy {FSc, Cito} → {LFry} and {LFry} → {FSc, Cito}.

# Q5

**Describe what effects increasing the minimum support and confidence levels would have on your analysis. Focus on the number of rules as well as the impacts to support and confidence levels.**

Increasing the minimum support will filter out the pairs with low support. This will decrease the number of rules and increase the average support.

Increasing the confidence will make only the rules that are more likely to occur show up in the analysis This will also decrease the number of rules.

# Q6

**Finally, using your frequent itemset and/or association rules table, find an additional problem or opportunity for The Crique in terms of their menu.**

**To help guide you, try to find items that *should* be purchased together, but don’t seem to be very often. You might also want to look at items that *are* purchased together, but for which the combination may not make immediate sense.**

**Identify the opportunity, then in fewer than 50 words,what would you recommend that The Crique do about it?**  

When people buy a Smothered Currito (very hot habanero sauce), they are only 1.12 times more likely to buy a MSoda. They are goin to need that soda to help their mouth cool off from the habanero sauce! The Crique should consider making a special combo of a M or L Soda and a Smothered Currito and advertise it as being something hot with something to cool you off. That way people will be aware of the benefit of buying both a Smothered Currito and a soda. This will increase soda sales by upselling Smothered Currito fans.