# Association Rules

The input for our data transformation will be the cleaned dataset generated by running `cleaner.py`

In [26]:
import numpy as np
import pandas as pd

df = pd.read_csv(
    "../../data_transformation/nft_transactions_cleaned.csv",
    parse_dates=["full_date"],
    dtype={"wei_price": float}
)

df

Unnamed: 0,full_date,transaction_id,asset_id,asset_name,collection_name,wei_price,payment_token_name,quantity,seller_address,seller_username,buyer_address,asset_category,day_name,day_of_week,day_of_month,month_name,month_number,year,time,hour
0,2020-11-28,75418817,17125992,. . ( + - . & - ),Unknown,1.000000e+08,BAEPAY,1,0x3d95d4a6dbae0cd0643a82b13a13b08921d6adf7,NORMANCOMICS,0x8f482e5fa7ca33d7734447cbb376df6a525fe750,Uncategorized,Saturday,5,28,November,11,2020,05:22:37,5
1,2021-01-24,77179836,17528062,NWSNI SIM SM s O INO,Unknown,1.000000e+17,Ether,1,0x3d95d4a6dbae0cd0643a82b13a13b08921d6adf7,NORMANCOMICS,0xedba5d56d0147aee8a227d284bcaac03b4a87ed4,Uncategorized,Sunday,6,24,January,1,2021,06:55:16,6
2,2021-08-12,490127224,36309701,DEAD -01,! DEAD !,5.000000e+15,Ether,1,0xd42b0f0b9c93f826281c45b3dab34e1827312274,DEADCLUB,0x64cd629e020dc1131bd18b7a80c0656341d89038,Uncategorized,Thursday,3,12,August,8,2021,08:20:14,8
3,2021-08-12,495251447,36330253,DEAD -07,! DEAD !,5.000000e+15,Ether,2,0xd42b0f0b9c93f826281c45b3dab34e1827312274,DEADCLUB,0x5eb3f21ef75dbfd5f8e2b2fcfad88c9497c94720,Uncategorized,Thursday,3,12,August,8,2021,19:47:55,19
4,2021-08-14,507555211,36781130,DEAD -13,! DEAD !,5.000000e+15,Ether,1,0xd42b0f0b9c93f826281c45b3dab34e1827312274,DEADCLUB,0xae06997625b4afbf88d537d1d416c8e1de27aac3,Uncategorized,Saturday,5,14,August,8,2021,03:56:11,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5203286,2021-11-29,2232084052,115052697,"#86 - ""Fret For Your Prozac""",~MyriadMeaning~,0.000000e+00,Ether,1,0x14b78b9fe0af98145e57f2a1b8a66044ab6a9eff,PiratesBootyGallery,0xaeb2707eda1e201cb13aa80b915bc3979d7e33a0,Uncategorized,Monday,0,29,November,11,2021,07:27:49,7
5203287,2021-12-03,2301205938,102476305,#67 - MOLD,~MyriadMeaning~,8.000000e+15,Ether,1,0x609b9fc66519fb5fa855b8427fb482bca63cd423,KillMePlease,0x42b018ffce9498915e530ca0f871d6f7144b9acc,Uncategorized,Friday,4,3,December,12,2021,07:06:44,7
5203288,2021-03-06,82596530,18310514,A Gentle Breeze,~~ A e s t h e t i c ~~,2.000000e+16,Wrapped Ether,1,0x974a344968786201a5f2e282014098f1333aa73b,VisualSensation,0xe7e30b515304b0d4e2cf919c449f1d2ec77b1319,Uncategorized,Saturday,5,6,March,3,2021,06:09:07,6
5203289,2021-03-29,94187279,19807908,Gas Fees,~~ A e s t h e t i c ~~,1.000000e+16,Ether,1,0x974a344968786201a5f2e282014098f1333aa73b,VisualSensation,0x263108604e0c9e1f6cdcf698e160cd563635a328,Uncategorized,Monday,0,29,March,3,2021,11:33:23,11


To create the input for the association rules we only have to keep the following columns: `buyer_address` and `asset_category`

In [21]:
association = df[['buyer_address', 'asset_category']]
association

Unnamed: 0,buyer_address,asset_category
0,0x8f482e5fa7ca33d7734447cbb376df6a525fe750,Uncategorized
1,0xedba5d56d0147aee8a227d284bcaac03b4a87ed4,Uncategorized
2,0x64cd629e020dc1131bd18b7a80c0656341d89038,Uncategorized
3,0x5eb3f21ef75dbfd5f8e2b2fcfad88c9497c94720,Uncategorized
4,0xae06997625b4afbf88d537d1d416c8e1de27aac3,Uncategorized
...,...,...
5203286,0xaeb2707eda1e201cb13aa80b915bc3979d7e33a0,Uncategorized
5203287,0x42b018ffce9498915e530ca0f871d6f7144b9acc,Uncategorized
5203288,0xe7e30b515304b0d4e2cf919c449f1d2ec77b1319,Uncategorized
5203289,0x263108604e0c9e1f6cdcf698e160cd563635a328,Uncategorized


In [22]:
# Create a column to count purchases. I could use the original quantity column but it doesn't matter.
# The input for the association rules will always be true/false.
association.loc[:, ['purchase']] = 1

# Pivot the table
pivot_table = association.pivot_table(index='buyer_address',
                             columns='asset_category',
                             values='purchase',
                             aggfunc='sum',
                             fill_value=0)

# Reset the index
pivot_table = pivot_table.reset_index()
# Remove the index name
pivot_table = pivot_table.rename_axis(None, axis=1)
pivot_table

Unnamed: 0,buyer_address,Art,Collectibles,Domain,Music,Photography,Sports,Trading Cards,Uncategorized,Utility,Virtual Worlds
0,0x00000000000040c8d72ad3a15ce5408f99cd61b4,0,0,1,0,0,0,0,0,0,0
1,0x0000000000004c0fd5233e6c14d0e10cf190de82,0,0,0,0,0,0,0,1,0,0
2,0x0000000000015b23c7e20b0ea5ebd84c39dcbe60,0,9,0,0,0,0,0,5,0,0
3,0x00000000000360176d958e11c140308cd0863679,9,3,0,0,0,0,8,0,12,0
4,0x000000000004d7463d0f9c77383600bc82d612f5,3,7,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...
614130,0xffffe0f5e89ccedadb322fe4ca6bd3ea5badaff2,3,7,0,0,0,0,0,0,0,0
614131,0xffffe59e4ebefce216470864fd92407023288cb4,0,2,0,0,0,0,0,0,0,0
614132,0xfffff6e70842330948ca47254f2be673b1cb0db7,4,0,0,0,0,0,0,3,0,0
614133,0xfffffc03e9d2c7f163135d2b0f18fc7f27c43cce,0,5,0,0,0,0,0,1,0,0


To see that everything checks out let's calculate the sum of every value of the pivot table and compare it against the length of our initial dataframe

In [23]:
pivot_table[pivot_table.columns.drop('buyer_address')].sum().sum()

np.int64(5203291)

In [24]:
len(df)

5203291

Everything works as intended. Let's export our data.

In [25]:
pivot_table.to_csv("nft_association_input.csv",index= False)

The analysis was conducted on RapidMiner, using the following process.

![process](resources/association_process.png)

To obtain more meaningful results, categories `Uncategorized` and `Collectibles` where excluded. `Uncategorized`, doesn't provide any value and `Collectibles` was considered to be too vague. If deemed necessary of course this can be subject to change.

The association rules found, with thresholds for support equal to 2% and for confidence equal to 10% can be seen in the following picture.

![rules](resources/association_rules.png)