<a href="https://colab.research.google.com/github/adong-hood/dm-24/blob/main/module_aa.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Module AA: Association Analysis

An example from this link.

https://www.kaggle.com/code/sangwookchn/association-rule-learning-with-scikit-learn/notebook

In [None]:
# Please install the package. There is another package class mlxend.
!pip install apyori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25l[?25hdone
  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5953 sha256=050b824585433ee403425f14b06b92562c0a7d94947761e900aa9df91ee9dc14
  Stored in directory: /root/.cache/pip/wheels/c4/1a/79/20f55c470a50bb3702a8cb7c94d8ada15573538c7f4baebe2d
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from apyori import apriori

In [None]:
dataset = pd.read_csv('http://pluto.hood.edu/~dong/datasets/Market_Basket_Optimisation.csv', header = None) #To make sure the first row is not thought of as the heading
print(dataset.shape)
dataset.head(10)

(7501, 20)


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
5,low fat yogurt,,,,,,,,,,,,,,,,,,,
6,whole wheat pasta,french fries,,,,,,,,,,,,,,,,,,
7,soup,light cream,shallot,,,,,,,,,,,,,,,,,
8,frozen vegetables,spaghetti,green tea,,,,,,,,,,,,,,,,,
9,french fries,,,,,,,,,,,,,,,,,,,


In [None]:
#Transforming the list into a list of lists, so that each transaction can be indexed easier
transactions = []
for i in range(0, dataset.shape[0]):
    transactions.append([str(dataset.values[i, j]) for j in range(0, 20)])

print(transactions[0:2])

[['shrimp', 'almonds', 'avocado', 'vegetables mix', 'green grapes', 'whole weat flour', 'yams', 'cottage cheese', 'energy drink', 'tomato juice', 'low fat yogurt', 'green tea', 'honey', 'salad', 'mineral water', 'salmon', 'antioxydant juice', 'frozen smoothie', 'spinach', 'olive oil'], ['burgers', 'meatballs', 'eggs', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan']]


In [None]:
# Support: number of transactions containing set of times / total number of transactions
# .      --> products that are bought at least 3 times a day --> 21 / 7501 = 0.0027
# Confidence: Should not be too high, as then this wil lead to obvious rules
#Try many combinations of values to experiment with the model.

rules = apriori(transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2)

#viewing the rules
results = list(rules)

In [None]:
#Transferring the list to a table

results = pd.DataFrame(results)
results.head(10)

Unnamed: 0,items,support,ordered_statistics
0,"(light cream, chicken)",0.004533,"[((light cream), (chicken), 0.2905982905982905..."
1,"(mushroom cream sauce, escalope)",0.005733,"[((mushroom cream sauce), (escalope), 0.300699..."
2,"(pasta, escalope)",0.005866,"[((pasta), (escalope), 0.3728813559322034, 4.7..."
3,"(honey, fromage blanc)",0.003333,"[((fromage blanc), (honey), 0.2450980392156863..."
4,"(herb & pepper, ground beef)",0.015998,"[((herb & pepper), (ground beef), 0.3234501347..."
5,"(ground beef, tomato sauce)",0.005333,"[((tomato sauce), (ground beef), 0.37735849056..."
6,"(light cream, olive oil)",0.0032,"[((light cream), (olive oil), 0.20512820512820..."
7,"(whole wheat pasta, olive oil)",0.007999,"[((whole wheat pasta), (olive oil), 0.27149321..."
8,"(shrimp, pasta)",0.005066,"[((pasta), (shrimp), 0.3220338983050847, 4.506..."
9,"(spaghetti, avocado, milk)",0.003333,"[((avocado, spaghetti), (milk), 0.416666666666..."


"The first item in the list is a list itself containing three items. The first item of the list shows the grocery items in the rule.

For instance from the first item, we can see that light cream and chicken are commonly bought together. This makes sense since people who purchase light cream are careful about what they eat hence they are more likely to buy chicken i.e. white meat instead of red meat i.e. beef. Or this could mean that light cream is commonly used in recipes for chicken.

The support value for the first rule is 0.0045. This number is calculated by dividing the number of transactions containing light cream divided by total number of transactions. The confidence level for the rule is 0.2905 which shows that out of all the transactions that contain light cream, 29.05% of the transactions also contain chicken. Finally, the lift of 4.84 tells us that chicken is 4.84 times more likely to be bought by the customers who buy light cream compared to the default likelihood of the sale of chicken."

From https://stackabuse.com/association-rule-mining-via-apriori-algorithm-in-python/

## In-Class: Association Analysis ##
Customer Computer Configuration

In [None]:
pc_purchase = pd.read_csv('http://pluto.hood.edu/~dong/datasets/PC-Purchase-Data.csv') #To make sure the first row is not thought of as the heading
print(pc_purchase.shape)
pc_purchase.head()

(67, 12)


Unnamed: 0,Intel Core i3,Intel Core i5,Intel Core i7,10 inch screen,12 inch screen,15 inch screen,2 GB,4 GB,8 GB,320 GB,500 GB,750 GB
0,0,1,0,0,1,0,0,1,0,0,1,0
1,0,1,0,0,0,1,0,0,1,0,0,1
2,0,1,0,0,1,0,0,1,0,1,0,0
3,1,0,0,0,1,0,0,0,1,0,1,0
4,0,0,1,0,0,1,0,0,1,0,0,1


The data represent the configurations for a small number of orders of laptops placed over the web. The main options from which customers can choose are the type of processors, screen size, memory, and hard drive. A '1' signifies that a customer selected a particular option. If the manufacturer can better understand what types of components are often ordered together, it can speed up final assembly by having partially completed laptops with the mYourorelar combinations of orderingnents configured before order.

**You task is to find the hidden pattern with respect to popular configuraions. Use support, confidence, and lift correctly to explain your findings.**

Hint: make sure the data are in the right format.
