***ML Algorithm to Cluster DeFi Pools***

The aim of this script is to create an algorith pipeline to cluster DeFi pools.

The script is divided in 4 parts:

1. Data Collection and Processing: The data is colleceted from the DeFi Llama API. As a first step I will bring in the data. Additional new columns will need to be created such as the underlying token column, as well as creating the average APY over 7 and 30 days, the change in TVL over 7 and 30 days, standard deviation of APY over 7 and 30 days, and the standard deviation of TVL over 7 and 30 days.
2. Data Exploration: The data will be explored to understand the distribution of the data, and to understand the correlation between the different variables.
3. Machine Learning: The data will be clustered using K-Means clustering, hierarchical clustering, t-sne scatter plots . The optimal number of clusters will be determined using the elbow method.
4. Data Visualization: The data will be visualized using a scatter plot.



In [2]:
# importing all required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from scipy.cluster.hierarchy import dendrogram, linkage
from sklearn.manifold import TSNE


In [6]:
# importing the data csv file
flepath = r'/Users/karolk/Python_Work/Data_Sets/Global_Data/DeFi_Global_DB.csv'
data = pd.read_csv(flepath, index_col=0)

# sort the dataframe by 'pool' column then by 'date' column in descending order
data.sort_values(by=['pool', 'date'], ascending=[True, False], inplace=True)

# create 2 new columns to calculate the rolling average of the APY over 7 days and 30 days
data['7_day_rolling_apy'] = data.groupby('pool')['apy'].rolling(7).mean().reset_index(0, drop=True)
data['30_day_rolling_apy'] = data.groupby('pool')['apy'].rolling(30).mean().reset_index(0, drop=True)

# create 2 new columns to calculate the change in tvlUsd over 7 days and 30 days
data['7_day_change_tvlUsd'] = data.groupby('pool')['tvlUsd'].diff(7)
data['30_day_change_tvlUsd'] = data.groupby('pool')['tvlUsd'].diff(30)





Unnamed: 0_level_0,chain,project,symbol,tvlUsd,apy,pool,stablecoin,ilRisk,exposure,outlier,apyMean30d,date,time added,new_upload,possible_error
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
1046401,Fantom,beethoven-x,USDC-DAI,15398.0,0.01458,00025e37-595f-48dd-a050-05d84fb2ce9f,True,no,multi,False,3.33380,2023-07-25,2023-07-25 07:01:34,False,False
1034472,Fantom,beethoven-x,USDC-DAI,15398.0,0.01458,00025e37-595f-48dd-a050-05d84fb2ce9f,True,no,multi,False,3.19825,2023-07-24,2023-07-24 07:01:17,False,False
1022470,Fantom,beethoven-x,USDC-DAI,15398.0,0.01458,00025e37-595f-48dd-a050-05d84fb2ce9f,True,no,multi,False,3.08742,2023-07-23,2023-07-23 07:03:39,False,False
1010455,Fantom,beethoven-x,USDC-DAI,15398.0,0.01458,00025e37-595f-48dd-a050-05d84fb2ce9f,True,no,multi,False,3.01907,2023-07-22,2023-07-22 11:49:58,False,False
998361,Fantom,beethoven-x,USDC-DAI,15398.0,0.01458,00025e37-595f-48dd-a050-05d84fb2ce9f,True,no,multi,False,2.90118,2023-07-21,2023-07-21 07:01:11,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1559955,Ethereum,uniswap-v2,BLOCK-WETH,144867.0,0.00000,ffff4226-4328-404f-be4c-428d01a06ccd,False,yes,multi,False,0.00000,2023-09-08,2023-09-08 10:06:36,False,False
1548650,Ethereum,uniswap-v2,BLOCK-WETH,145480.0,0.00000,ffff4226-4328-404f-be4c-428d01a06ccd,False,yes,multi,False,0.00000,2023-09-07,2023-09-07 11:38:08,False,False
1537439,Ethereum,uniswap-v2,BLOCK-WETH,147687.0,0.00000,ffff4226-4328-404f-be4c-428d01a06ccd,False,yes,multi,False,0.00000,2023-09-06,2023-09-06 07:01:15,False,False
1526186,Ethereum,uniswap-v2,BLOCK-WETH,137764.0,0.00000,ffff4226-4328-404f-be4c-428d01a06ccd,False,yes,multi,False,0.00000,2023-09-05,2023-09-05 07:01:12,False,False
