In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


In [2]:
data = pd.read_csv('/content/drive/MyDrive/Market_Basket_Optimisation.csv')

### Dataset of 7500 grocery store transactions.
### Apriori algorithm will see which items are commonly bought together

### We do not fill in missing data values, as that would give the algorithm incorrect input, instead we leave them blank.

In [3]:
data.head()

Unnamed: 0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
0,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
1,chutney,,,,,,,,,,,,,,,,,,,
2,turkey,avocado,,,,,,,,,,,,,,,,,,
3,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
4,low fat yogurt,,,,,,,,,,,,,,,,,,,


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7500 entries, 0 to 7499
Data columns (total 20 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   shrimp             7500 non-null   object 
 1   almonds            5746 non-null   object 
 2   avocado            4388 non-null   object 
 3   vegetables mix     3344 non-null   object 
 4   green grapes       2528 non-null   object 
 5   whole weat flour   1863 non-null   object 
 6   yams               1368 non-null   object 
 7   cottage cheese     980 non-null    object 
 8   energy drink       653 non-null    object 
 9   tomato juice       394 non-null    object 
 10  low fat yogurt     255 non-null    object 
 11  green tea          153 non-null    object 
 12  honey              86 non-null     object 
 13  salad              46 non-null     object 
 14  mineral water      24 non-null     object 
 15  salmon             7 non-null      object 
 16  antioxydant juice  3 non

In [5]:
#Transforming the list into a list of lists, so that each transaction can be indexed easier
transactions = []
for i in range(0, data.shape[0]):
    transactions.append([str(data.values[i, j]) for j in range(0, 20)])

print(transactions[0])

['burgers', 'meatballs', 'eggs', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan']


## Installing the dependency

In [13]:
!pip install apyori

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25l[?25hdone
  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5975 sha256=b40a8df303c6dc25fec24f2874d5bf22a91fc3099589290d5066eefaa6228e34
  Stored in directory: /root/.cache/pip/wheels/cb/f6/e1/57973c631d27efd1a2f375bd6a83b2a616c4021f24aab84080
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


### `Support`: number of transactions containing set an times / total number of transactions. Basically the popularity of an itemset. 

### `Confidence`: the likelihood of item Y being purchased when item X is purchased.

### `Lift`: likelihood of itemset Y being purchased when item X is purchased while taking into account the popularity of Y

In [21]:
from apyori import apriori


rules = apriori(transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2)

#viewing the rules
results = list(rules)

In [22]:
#Transferring the list to a table

results = pd.DataFrame(results)
results.head(5)

Unnamed: 0,items,support,ordered_statistics
0,"(chicken, light cream)",0.004533,"[((light cream), (chicken), 0.2905982905982906..."
1,"(escalope, mushroom cream sauce)",0.005733,"[((mushroom cream sauce), (escalope), 0.300699..."
2,"(pasta, escalope)",0.005867,"[((pasta), (escalope), 0.37288135593220345, 4...."
3,"(honey, fromage blanc)",0.003333,"[((fromage blanc), (honey), 0.2450980392156863..."
4,"(herb & pepper, ground beef)",0.016,"[((herb & pepper), (ground beef), 0.3234501347..."


### Items that made the support threshold were `chicken` and `light cream`, `escalope` and `mushroom cream sauce`, `pasta` and `escalope`, `honey` and `fromage blanc`, `herb & pepper` and `ground beef`.



### `pasta` and `escalope` had the highest probabiblity of being purchased together at 37%

## Sources
https://pypi.org/project/apyori/

https://www.kaggle.com/code/sangwookchn/association-rule-learning-with-scikit-learn/notebook

```
# This is formatted as code
```

