<a href="https://colab.research.google.com/github/chainiqedu/chainiqedu/blob/main/volmex_user_bucketing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Volmex User Bucketing

Looking at user's (addresses) that have interacted with the protocol, bucket these into 3 groups e.g Low/Medium/High

Do this by considering users across all 3 chains and look at: Minters, LP's (Liquidity Providers) and Traders (users/addresses that performed swaps in the uniswap/quickswap pools).

## Summary

Minters 

	- 5872  
		- ARBITRUM	443  
		- ETHEREUM	764  
		- POLYGON	472  

Users  

	- Low: 1 Mint Action Only  
	- Medium: 2-7 Mint Actions  
	- High: 10+ Mint Actions (Highest is 25)  

Amounts:  

	- Low: 0-500  
	- Medium: 500-1500  
	- High: 1500+ (some users in 100K+ amounts)  


Traders  

	- 1396  
	- ARBITRUM	39  
	- ETHEREUM	150  
	- POLYGON	1214  

Users  

	- Low: 0-19 Trades  
	- Medium: 20-100 Trades  
	- High: 100+ Trades  

Amounts: Amount Traded in Number of Tokens

	- Low: 0-990  
	- Medium: 1000-5000  
	- High: 6000+ (some as high as 100K)  

Liquidity Providers 

	- 1702  
	- ARBITRUM	71  
	- ETHEREUM	170  
	- POLYGON	1476  

Users

	- Low: 1 Liquidity Add Action  
	- Medium: 2-9  
	- High: 9+ (highest user has 350)  

Amounts: 
 
	- Low: 0-999 Value of Liquidity  
	- Medium: 1000-2000  
	- High: 2000+ (highest user has 100K+)  

In [266]:
#Import libraries
import pandas as pd
import numpy as np
import plotly.express as px

## Minters

In [267]:
# read ethereum minter CSV files
eth_net_btcv_dai = pd.read_csv('/content/BTCV_DAI_export-address-token-0x187922d4235D10239b2c6CCb2217aDa724F56DDA.csv', index_col=False)
eth_net_ethv_dai = pd.read_csv('/content/ETHV_DAI_export-address-token-0xa57fC404f69fCE71CA26e26f0A4DF7F35C8cd5C3 (1).csv', index_col=False)
eth_net_btcv_usdc = pd.read_csv('/content/BTCV_USDC_export-address-token-0x054FBeBD2Cb17205B57fb56a426ccc54cAaBFaBC.csv', index_col=False)
eth_net_ethv_usdc = pd.read_csv('/content/ETHV_USDC_export-address-token-0x1BB632a08936e17Ee3971E6Eeb824910567e120B.csv', index_col=False)

# read polygon minter CSV files
polygon_btcv_dai = pd.read_csv('/content/BTCV_DAI_export-address-token-0x90E6c403c02f72986a98E8a361Ec7B7C8BC29259.csv', index_col=False)
polygon_ethv_dai = pd.read_csv('/content/ETHV_DAI_export-address-token-0x164c668204Ce54558431997A6DD636Ee4E758b19.csv', index_col=False)
polygon_btcv_usdc = pd.read_csv('/content/BTCV_USDC_export-address-token-0xA2b3501d34edA289F0bEF1cAf95E5D0111032F36.csv', index_col=False)
polygon_ethv_usdc = pd.read_csv('/content/ETHV_USDC_export-address-token-0xEeb6f0C2261E21b657A27582466e5aD9acC072D7.csv', index_col=False)

# read arbitrum minter CSV files
arbitrum_btcv_dai = pd.read_csv('/content/BTCV_DAI_export-address-token-0xe46277336d9cc2ebe7b24ba7268624f5f1495611.csv', index_col=False)
arbitrum_ethv_dai = pd.read_csv('/content/ETHV_DAI_export-address-token-0xf613b55131cf8a69c5b4f62d0d5e5d2c2d9c3280.csv', index_col=False)
arbitrum_btcv_usdc = pd.read_csv('/content/BTCV_USDC_export-address-token-0xdf87072ac4722431861837492edf7adbfec0efa9.csv', index_col=False)
arbitrum_ethv_usdc = pd.read_csv('/content/ETHV_USDC_export-address-token-0xf9b04aad2612d3d664f41e9af5711953e058ff52.csv', index_col=False)

In [268]:
# Add event type marker for collaterlize events to identify minters - when an address/user collaterilzes they mint the volatility tokens (aka minter)
eth_net_btcv_dai['type'] = np.where(eth_net_btcv_dai['To'] == '0x187922d4235d10239b2c6ccb2217ada724f56dda', 'COLLATERALIZE', 'REDEEM/OTHER')
eth_net_ethv_dai['type'] = np.where(eth_net_ethv_dai['To'] == '0xa57fc404f69fce71ca26e26f0a4df7f35c8cd5c3', 'COLLATERALIZE', 'REDEEM/OTHER')
eth_net_btcv_usdc['type'] = np.where(eth_net_btcv_usdc['To'] == '0x054fbebd2cb17205b57fb56a426ccc54caabfabc', 'COLLATERALIZE', 'REDEEM/OTHER')
eth_net_ethv_usdc['type'] = np.where(eth_net_ethv_usdc['To'] == '0x1bb632a08936e17ee3971e6eeb824910567e120b', 'COLLATERALIZE', 'REDEEM/OTHER')

polygon_btcv_dai['type'] = np.where(polygon_btcv_dai['To'] == '0x90e6c403c02f72986a98e8a361ec7b7c8bc29259', 'COLLATERALIZE', 'REDEEM/OTHER')
polygon_ethv_dai['type'] = np.where(polygon_ethv_dai['To'] == '0x164c668204ce54558431997a6dd636ee4e758b19', 'COLLATERALIZE', 'REDEEM/OTHER')
polygon_btcv_usdc['type'] = np.where(polygon_btcv_usdc['To'] == '0xa2b3501d34eda289f0bef1caf95e5d0111032f36', 'COLLATERALIZE', 'REDEEM/OTHER')
polygon_ethv_usdc['type'] = np.where(polygon_ethv_usdc['To'] == '0xeeb6f0c2261e21b657a27582466e5ad9acc072d7', 'COLLATERALIZE', 'REDEEM/OTHER')

arbitrum_btcv_dai['type'] = np.where(arbitrum_btcv_dai['To'] == '0xe46277336d9cc2ebe7b24ba7268624f5f1495611', 'COLLATERALIZE', 'REDEEM/OTHER')
arbitrum_ethv_dai['type'] = np.where(arbitrum_ethv_dai['To'] == '0xf613b55131cf8a69c5b4f62d0d5e5d2c2d9c3280', 'COLLATERALIZE', 'REDEEM/OTHER')
arbitrum_btcv_usdc['type'] = np.where(arbitrum_btcv_usdc['To'] == '0xdf87072ac4722431861837492edf7adbfec0efa9', 'COLLATERALIZE', 'REDEEM/OTHER')
arbitrum_ethv_usdc['type'] = np.where(arbitrum_ethv_usdc['To'] == '0xf9b04aad2612d3d664f41e9af5711953e058ff52', 'COLLATERALIZE', 'REDEEM/OTHER')

In [269]:
# Add Chain Type
eth_net_btcv_dai['chain'] = 'ETHEREUM'
eth_net_ethv_dai['chain'] = 'ETHEREUM'
eth_net_btcv_usdc['chain'] = 'ETHEREUM'
eth_net_ethv_usdc['chain'] = 'ETHEREUM'

polygon_btcv_dai['chain'] = 'POLYGON'
polygon_ethv_dai['chain'] = 'POLYGON'
polygon_btcv_usdc['chain'] = 'POLYGON'
polygon_ethv_usdc['chain'] = 'POLYGON'

arbitrum_btcv_dai['chain'] = 'ARBITRUM'
arbitrum_ethv_dai['chain'] = 'ARBITRUM'
arbitrum_btcv_usdc['chain'] = 'ARBITRUM'
arbitrum_ethv_usdc['chain'] = 'ARBITRUM'

In [270]:
# Combine into single dataframe
minters_df = pd.concat([eth_net_btcv_dai, eth_net_ethv_dai, eth_net_btcv_usdc, eth_net_ethv_usdc, 
                        polygon_btcv_dai, polygon_ethv_dai, polygon_btcv_usdc, polygon_ethv_usdc,
                        arbitrum_btcv_dai, arbitrum_ethv_dai, arbitrum_btcv_usdc, arbitrum_ethv_usdc], ignore_index=True)

In [271]:
# Write to CSV
minters_df.to_csv('minters_v3_final.csv')

In [272]:
# Filter for collateralize events and addresses
minter_addresses_df = minters_df[(minters_df['type'] == 'COLLATERALIZE')]

In [273]:
# There are some collateral types that are not recognised so remove them
minter_addresses_dai_usdc_df = minter_addresses_df[(minter_addresses_df['TokenSymbol'] == 'DAI') |
                    (minter_addresses_df['TokenSymbol'] == 'USDC')]

## Unique Minters

In [195]:
# Number of unique minters overall and across all 3 chains
# This should also be pasted in the final excel file
minter_addresses_dai_usdc_df['From'].nunique()

5872

In [196]:
minter_addresses_dai_usdc_df.groupby(by='chain', as_index=False).agg({'From': pd.Series.nunique})

Unnamed: 0,chain,From
0,ARBITRUM,443
1,ETHEREUM,764
2,POLYGON,4724


## User/Address Bucketing

In [197]:
# Histogram for number of times interacting wtih protocol and values
mint_address_counts_df = pd.Series(minter_addresses_dai_usdc_df["From"].value_counts(), name='Total Volmex Mints').to_frame()

### Histogram of Number of Mint Actions by Address

That is, how many times has each address performed a mint action? This is to create segments of low/medium/high users

In [206]:
fig = px.histogram(mint_address_counts_df, x='Total Volmex Mints', marginal="rug")
fig.show()

In [207]:
minter_addresses_dai_usdc_df['Value'] = minter_addresses_dai_usdc_df['Value'].str.split('.').str[0]
minter_addresses_dai_usdc_df['Value'] = minter_addresses_dai_usdc_df['Value'].str.replace(",","").astype(float)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [208]:
collateral_address_counts_df = pd.Series(minter_addresses_dai_usdc_df.groupby(['From'])['Value'].sum(), name='Total Value Collateralized').to_frame()

### Histogram of Total Collateral Amount by Address

That is, how much collateral has each address deposited into Volmex?

In [210]:
fig = px.histogram(collateral_address_counts_df, x='Total Value Collateralized', marginal="rug")
fig.show()

## Liquidity Providers and Traders

In [274]:
# Read ethereum LP/Trader CSV files
eth_net_btcv_usdc = pd.read_csv('/content/btcv_usdc_uniswap_v3.csv', index_col=False)
eth_net_ethv_usdc = pd.read_csv('/content/ethv_usdc_uniswap_v3.csv', index_col=False)
eth_net_ibtcv_usdc = pd.read_csv('/content/ibtcv_usdc_uniswap_v3.csv', index_col=False)
eth_net_iethv_usdc = pd.read_csv('/content/iethv_usdc_uniswap_v3.csv', index_col=False)

# Read polygon LP/Trader CSV files
polygon_btcv_usdc = pd.read_csv('/content/btcv_usdc_quickswap_v3 (1).csv', index_col=False)
polygon_ethv_usdc = pd.read_csv('/content/ethv_usdc_quickswap_v3.csv', index_col=False)
polygon_ibtcv_usdc = pd.read_csv('/content/ibtcv_usdc_quickswap_v3.csv', index_col=False)
polygon_iethv_usdc = pd.read_csv('/content/iethv_usdc_quickswap_v3.csv', index_col=False)

# Read arbitrum LP/Trader CSV files
arbitrum_btcv_usdc = pd.read_csv('/content/btcv_usdc_arbitrum_uniswap_v3.csv', index_col=False)
arbitrum_ethv_usdc = pd.read_csv('/content/ethv_usdc_arbitrum_uniswap_v3.csv', index_col=False)
arbitrum_ibtcv_usdc = pd.read_csv('/content/ibtcv_usdc_arbitrum_uniswap_v3.csv', index_col=False)
arbitrum_iethv_usdc = pd.read_csv('/content/iethv_usdc_arbitrum_uniswap_v3.csv', index_col=False)

In [275]:
# Add Chain Type
eth_net_btcv_usdc['chain'] = 'ETHEREUM'
eth_net_ethv_usdc['chain'] = 'ETHEREUM'
eth_net_ibtcv_usdc['chain'] = 'ETHEREUM'
eth_net_iethv_usdc['chain'] = 'ETHEREUM'

polygon_btcv_usdc['chain'] = 'POLYGON'
polygon_ethv_usdc['chain'] = 'POLYGON'
polygon_ibtcv_usdc['chain'] = 'POLYGON'
polygon_iethv_usdc['chain'] = 'POLYGON'

arbitrum_btcv_usdc['chain'] = 'ARBITRUM'
arbitrum_ethv_usdc['chain'] = 'ARBITRUM'
arbitrum_ibtcv_usdc['chain'] = 'ARBITRUM'
arbitrum_iethv_usdc['chain'] = 'ARBITRUM'

In [276]:
# Combine into single dataframe
lps_traders_df = pd.concat([eth_net_btcv_usdc, eth_net_ethv_usdc, eth_net_ibtcv_usdc, eth_net_iethv_usdc, 
                            polygon_btcv_usdc, polygon_ethv_usdc, polygon_ibtcv_usdc, polygon_iethv_usdc,
                            arbitrum_btcv_usdc, arbitrum_ethv_usdc, arbitrum_ibtcv_usdc, arbitrum_iethv_usdc], ignore_index=True).drop(['Unnamed: 0'], axis=1)

In [277]:
# Write to CSV
lps_traders_df.to_csv('lps_traders_v3_final.csv')

## Unique Traders and LPs

In [219]:
# Filter for addresses that have performed at least 1 swap - these are known as the traders
# Swap action is where one asset was swapped for another, Liqudity Added/Liqudity Removed Swap combination is where a tx_hash includes a liqudity action and a swap so including these also as the address has technically made a swap
trader_addresses_df = lps_traders_df[(lps_traders_df['tx_type_label'] == 'Swap') | 
                                     (lps_traders_df['tx_type_label'] == 'Liquidity Added and Swap') |
                                     (lps_traders_df['tx_type_label'] == 'Liquidity Removed and Swap')]

In [259]:
lps_addresses_df = lps_traders_df[(lps_traders_df['tx_type_label'] == 'Add Liquidity') |
                                  (lps_traders_df['tx_type_label'] == 'Liquidity Added and Swap')]

In [65]:
# Number of Unique Traders and Number of Unique LPs - overall and across all 3 chains
# This should also be pasted in the final excel file
trader_addresses_df['From'].nunique()

1396

In [66]:
trader_addresses_df.groupby(by='chain', as_index=False).agg({'From': pd.Series.nunique})

Unnamed: 0,chain,From
0,ARBITRUM,39
1,ETHEREUM,150
2,POLYGON,1214


In [67]:
lps_addresses_df['From'].nunique()

1702

In [68]:
lps_addresses_df.groupby(by='chain', as_index=False).agg({'From': pd.Series.nunique})

Unnamed: 0,chain,From
0,ARBITRUM,71
1,ETHEREUM,170
2,POLYGON,1476


## User/Address Bucketing Traders

In [221]:
trader_addresses_counts_df = pd.Series(trader_addresses_df["From"].value_counts(), name='Number of Trades').to_frame()

In [222]:
fig = px.histogram(trader_addresses_counts_df, x='Number of Trades', marginal="rug")
fig.show()

In [223]:
# Amounts
trader_addresses_df['Value'] = trader_addresses_df['Value'].str.split('.').str[0]
trader_addresses_df['Value'] = trader_addresses_df['Value'].str.replace(",","").astype(float)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [234]:
trader_addresses_amounts_df = pd.Series(trader_addresses_df.groupby(['From'])['Value'].sum(), name='Total Value Traded').to_frame()

In [236]:
fig = px.histogram(trader_addresses_amounts_df, x='Total Value Traded', marginal="rug")
fig.show()

## User/Address Bucketing Liqudity Providers

In [246]:
lps_addresses_counts_df = pd.Series(lps_addresses_df["From"].value_counts(), name='Liquidity Added Counts').to_frame()

In [247]:
fig = px.histogram(lps_addresses_counts_df, x='Liquidity Added Counts', marginal="rug")
fig.show()

In [260]:
# Amounts
lps_addresses_df['Value'] = lps_addresses_df['Value'].str.split('.').str[0]
lps_addresses_df['Value'] = lps_addresses_df['Value'].str.replace(",","").astype(float)
lps_addresses_df = lps_addresses_df[(lps_addresses_df['TokenSymbol'] == 'BTCV') |
                    (lps_addresses_df['TokenSymbol'] == 'iBTCV') |
                    (lps_addresses_df['TokenSymbol'] == 'ETHV') |
                    (lps_addresses_df['TokenSymbol'] == 'iETHV') |
                    (lps_addresses_df['TokenSymbol'] == 'USDC')]



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [262]:
lp_address_amounts_df = pd.Series(lps_addresses_df.groupby(['From'])['Value'].sum(), name='Liquidity Added Amounts').to_frame()

In [265]:
fig = px.histogram(lp_address_amounts_df, x='Liquidity Added Amounts', marginal="rug")
fig.show()