این پروژه هم می‌تونه خروجی عملیاتی برای پروتکل‌ها، والت ها یا وی سی ها داشته باشه.
<br>
## 🎯 هدف:

خوشه‌بندی کاربران بر اساس نحوه تعامل‌شون با اسمارت کانترکت‌ها در یک بازه زمانی مشخص، با درنظر گرفتن:
<br> •	نوع کانترکت‌ها (DeFi, NFT, DEX, Staking, Gaming)
<br> •	فرکانس تراکنش‌ها
<br> •	ارزش تراکنش‌ها (on-chain volume)
<br> •	ترتیب زمانی تعامل‌ها (session patterns)

## 🧠 سوالات تحقیقاتی (Research Questions):
	1.	آیا کاربران مختلف رفتار تعاملی مشابهی با نوع خاصی از کانترکت‌ها دارند؟
	2.	آیا می‌توان دسته‌هایی مثل “power users”، “explorers”، “testers”، یا “sleepers” را از هم جدا کرد؟
	3.	آیا رفتار کاربران جدید با کاربران قدیمی متفاوت است؟
	4.	آیا بعضی از خوشه‌ها رابطه مستقیم با contract success metrics دارند (TVL، تعداد کاربران، longevity)
    
    
    
## 🔧 مراحل اجرایی پروژه:
1. 	دریافت تراکنش‌های کاربران روی EVM
2. ساخت Feature Matrix
3. نرمال‌سازی فیچرها (StandardScaler
4. حذف outlierها (مثل کاربران بسیار پرتراکنش)
5. clustering with KMeans
6. clustering with DBSCAN
7. clustering with HDBSCAN
8. 	استفاده از UMAP یا t-SNE برای visualization دو بعدی
9. تحلیل اینکه هر خوشه چه ویژگی‌ای دارد
10. بررسی composition هر خوشه (مثلاً درصد تعامل با NFT vs DeFi)
11. نتایج و کاربردها:
	<br> •	خروجی به عنوان segmentation برای marketing / product targeting
	<br> •	تحلیل قراردادهایی که بیشتر در خوشه‌های فعال بوده‌اند
	<br> •	پیشنهاد به VCها برای سرمایه‌گذاری روی کانترکت‌هایی با بیشترین سهم از power users


In [148]:
import pandas as pd
import numpy as np

In [149]:
data = pd.read_csv('Transactions.csv')
cat = pd.read_csv('Contract_Category.csv')

In [150]:
data.head()

Unnamed: 0,FROM_ADDRESS,TO_ADDRESS,VALUE,GAS,GAS_USED,BLOCK_TIMESTAMP,TX_HASH,FUNCTION_NAME,TYPE
0,0xb2ecfe4e4d61f8790bbb9de2d1259b9e2410cea5,0x5fa60726e62c50af45ff2f6280c468da438a7837,0.0,149479.0,143759.0,2024-09-25 19:14:11.000,0xadcd3bae13011aa44b416d45f78fc18016d63f85c140...,takeBidSingle,DELEGATECALL
1,0x7a250d5630b4cf539739df2c5dacb4c659f2488d,0x8b3d14ed2bf1285aa34ae394137872b12ad2feea,0.0,36913.0,617.0,2024-09-25 16:01:23.000,0x808294e069844895e090924d2bfbb7560f3e40fa9c6a...,balanceOf,STATICCALL
2,0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48,0x43506849d7c04f9138d1a2050bbf3a0c054402dd,0.0,150764.0,2553.0,2024-09-25 22:33:59.000,0xfec3c4db5d6e739f9014d7d34a82b62c8e0be7eb4e4c...,balanceOf,DELEGATECALL
3,0xdef1c0ded9bec7f1a1670819833240f027b25eff,0x5ebac8dbfbba22168471b0f914131d1976536a25,0.0,246130.0,5528.0,2024-09-25 22:23:35.000,0x57886983145b0c86ab1b41a608dd46eb6f6ce7873b7e...,fillOtcOrder,DELEGATECALL
4,0x68b3465833fb72a70ecdf485e0e4c7bd8665fc45,0x68b3465833fb72a70ecdf485e0e4c7bd8665fc45,0.0,265794.0,255.0,2024-09-25 04:32:35.000,0x9444dc983a03df25b7a5f96e406c15e9aca923deb708...,refundETH,DELEGATECALL


In [171]:
data['FUNCTION_NAME'].value_counts().head(40)

balanceOf                                             29872
transfer                                              19036
swap                                                   6599
transferFrom                                           5758
getReserves                                            4803
deposit                                                3003
withdraw                                               2453
approve                                                2281
execute                                                1617
uniswapV3SwapCallback                                   757
token0                                                  756
latestRoundData                                         677
allowance                                               588
token1                                                  579
latestAnswer                                            573
swapExactTokensForETHSupportingFeeOnTransferTokens      451
permit                                  

In [None]:
# define category & sub category
function_mapping = {
    'swap': ('DeFi', 'DEX'),
    'swapExactETHForTokens': ('DeFi', 'DEX'),
    'swapExactETHForTokensSupportingFeeOnTransferTokens': ('DeFi', 'DEX'),
    'swapExactTokensForETHSupportingFeeOnTransferTokens': ('DeFi', 'DEX'),
    'uniswapV3SwapCallback': ('DeFi', 'DEX'),

    'transfer': ('DeFi', 'Token Transfer'),
    'transferFrom': ('DeFi', 'Token Transfer'),
    'approve': ('DeFi', 'Token Transfer'),
    'allowance': ('DeFi', 'Token Transfer'),
    'balanceOf': ('DeFi', 'Token Transfer'),

    'deposit': ('DeFi', 'Lending'),
    'withdraw': ('DeFi', 'Lending'),

    'getReserves': ('DeFi', 'Liquidity Pools'),
    'token0': ('DeFi', 'Liquidity Pools'),
    'token1': ('DeFi', 'Liquidity Pools'),
    'totalSupply': ('DeFi', 'Token Metadata'),
    'decimals': ('DeFi', 'Token Metadata'),

    'latestRoundData': ('Infrastructure', 'Oracle'),
    'latestAnswer': ('Infrastructure', 'Oracle'),

    'execute': ('Infrastructure', 'Smart Wallet'),
    'executeDelegateCall': ('Infrastructure', 'Smart Wallet'),
    'router': ('Infrastructure', 'Smart Wallet'),
    'WETH': ('Infrastructure', 'Smart Wallet'),
    'implementation': ('Infrastructure', 'Smart Wallet'),

    'resolve': ('Infrastructure', 'Proxy'),
    'getApp': ('Infrastructure', 'Proxy'),
    'getProvider': ('Infrastructure', 'Proxy'),

    'claimTokens': ('Rewards', 'Airdrop'),
    'mint': ('Rewards', 'Airdrop'),
    'permit': ('Rewards', 'Airdrop'),
    'claimMintRewardAndShare': ('Rewards', 'Airdrop'),
    'callClaimMintReward': ('Rewards', 'Airdrop'),

    'proposeBlock': ('Governance', 'Staking / L2 Infra'),
    'proveBlock': ('Governance', 'Staking / L2 Infra'),
    'powerDown': ('Governance', 'Staking / L2 Infra'),

    'transform': ('Utility', 'Other'),
    'fee': ('Utility', 'Other'),
    'max': ('Utility', 'Other'),
    'min': ('Utility', 'Other'),
}

default_category = ('Other', 'Uncategorized')

data[['category', 'sub_category']] = data['FUNCTION_NAME'].apply(
    lambda f: pd.Series(function_mapping.get(f, default_category))
)

# مشاهده نتیجه
df[['FUNCTION_NAME', 'category', 'sub_category']].head()

In [170]:
data['TYPE'].value_counts()

CALL            47101
STATICCALL      42839
DELEGATECALL    10060
Name: TYPE, dtype: int64

In [153]:
cat['TO_ADDRESS'] = cat['ADDRESS'].copy()
cat = cat.drop(['ADDRESS'], axis=1)

data_main = pd.merge(data, cat, on='TO_ADDRESS', how='left')
data_main.head()

Unnamed: 0,FROM_ADDRESS,TO_ADDRESS,VALUE,GAS,GAS_USED,BLOCK_TIMESTAMP,TX_HASH,FUNCTION_NAME,TYPE,LABEL_TYPE,LABEL_SUBTYPE
0,0xb2ecfe4e4d61f8790bbb9de2d1259b9e2410cea5,0x5fa60726e62c50af45ff2f6280c468da438a7837,0.0,149479.0,143759.0,2024-09-25 19:14:11.000,0xadcd3bae13011aa44b416d45f78fc18016d63f85c140...,takeBidSingle,DELEGATECALL,,
1,0x7a250d5630b4cf539739df2c5dacb4c659f2488d,0x8b3d14ed2bf1285aa34ae394137872b12ad2feea,0.0,36913.0,617.0,2024-09-25 16:01:23.000,0x808294e069844895e090924d2bfbb7560f3e40fa9c6a...,balanceOf,STATICCALL,,
2,0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48,0x43506849d7c04f9138d1a2050bbf3a0c054402dd,0.0,150764.0,2553.0,2024-09-25 22:33:59.000,0xfec3c4db5d6e739f9014d7d34a82b62c8e0be7eb4e4c...,balanceOf,DELEGATECALL,,
3,0xdef1c0ded9bec7f1a1670819833240f027b25eff,0x5ebac8dbfbba22168471b0f914131d1976536a25,0.0,246130.0,5528.0,2024-09-25 22:23:35.000,0x57886983145b0c86ab1b41a608dd46eb6f6ce7873b7e...,fillOtcOrder,DELEGATECALL,,
4,0x68b3465833fb72a70ecdf485e0e4c7bd8665fc45,0x68b3465833fb72a70ecdf485e0e4c7bd8665fc45,0.0,265794.0,255.0,2024-09-25 04:32:35.000,0x9444dc983a03df25b7a5f96e406c15e9aca923deb708...,refundETH,DELEGATECALL,,


In [154]:
data_main['LABEL_TYPE'].value_counts()

token    9
dex      4
Name: LABEL_TYPE, dtype: int64

# Making features

In [155]:
data['FUNCTION_NAME'].value_counts().head(10)

balanceOf                29872
transfer                 19036
swap                      6599
transferFrom              5758
getReserves               4803
deposit                   3003
withdraw                  2453
approve                   2281
execute                   1617
uniswapV3SwapCallback      757
Name: FUNCTION_NAME, dtype: int64

In [156]:
data['FROM_ADDRESS'].nunique()

15520

In [157]:
data['TO_ADDRESS'].nunique()

8664

In [158]:
data.head()

Unnamed: 0,FROM_ADDRESS,TO_ADDRESS,VALUE,GAS,GAS_USED,BLOCK_TIMESTAMP,TX_HASH,FUNCTION_NAME,TYPE
0,0xb2ecfe4e4d61f8790bbb9de2d1259b9e2410cea5,0x5fa60726e62c50af45ff2f6280c468da438a7837,0.0,149479.0,143759.0,2024-09-25 19:14:11.000,0xadcd3bae13011aa44b416d45f78fc18016d63f85c140...,takeBidSingle,DELEGATECALL
1,0x7a250d5630b4cf539739df2c5dacb4c659f2488d,0x8b3d14ed2bf1285aa34ae394137872b12ad2feea,0.0,36913.0,617.0,2024-09-25 16:01:23.000,0x808294e069844895e090924d2bfbb7560f3e40fa9c6a...,balanceOf,STATICCALL
2,0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48,0x43506849d7c04f9138d1a2050bbf3a0c054402dd,0.0,150764.0,2553.0,2024-09-25 22:33:59.000,0xfec3c4db5d6e739f9014d7d34a82b62c8e0be7eb4e4c...,balanceOf,DELEGATECALL
3,0xdef1c0ded9bec7f1a1670819833240f027b25eff,0x5ebac8dbfbba22168471b0f914131d1976536a25,0.0,246130.0,5528.0,2024-09-25 22:23:35.000,0x57886983145b0c86ab1b41a608dd46eb6f6ce7873b7e...,fillOtcOrder,DELEGATECALL
4,0x68b3465833fb72a70ecdf485e0e4c7bd8665fc45,0x68b3465833fb72a70ecdf485e0e4c7bd8665fc45,0.0,265794.0,255.0,2024-09-25 04:32:35.000,0x9444dc983a03df25b7a5f96e406c15e9aca923deb708...,refundETH,DELEGATECALL


# Features

In [159]:
data['BLOCK_TIMESTAMP'] = pd.to_datetime(data['BLOCK_TIMESTAMP'])
data = data.sort_values(by=['FROM_ADDRESS', 'BLOCK_TIMESTAMP'])
df = data.copy()

## 🎯 فیچرهای مربوط به زمان
	<br> 1.	تعداد تراکنش در هفته (tx_count_week)
	<br> 2.	تعداد تراکنش در ماه (tx_count_month)
	<br> 3.	تعداد روز فعال در هفته (active_days_week)
	<br> 4.	تعداد روز فعال در ماه (active_days_month)
	<br> 5.	میانگین فاصله بین تراکنش‌ها (mean_time_between_txs_days)

## فیچرهای مربوط به رفتار کاربر
1. 

In [160]:
# times 

df['week'] = df['BLOCK_TIMESTAMP'].dt.to_period('W').apply(lambda r: r.start_time)
df['month'] = df['BLOCK_TIMESTAMP'].dt.to_period('M').apply(lambda r: r.start_time)
df['date'] = df['BLOCK_TIMESTAMP'].dt.date

### number of transactions
tx_count_week = df.groupby(['FROM_ADDRESS', 'week']).size().groupby('FROM_ADDRESS').mean().rename('tx_count_week')
tx_count_month = df.groupby(['FROM_ADDRESS', 'month']).size().groupby('FROM_ADDRESS').mean().rename('tx_count_month')

### active days
active_days_week = df.groupby(['FROM_ADDRESS', 'week'])['date'].nunique().groupby('FROM_ADDRESS').mean().rename('active_days_week')
active_days_month = df.groupby(['FROM_ADDRESS', 'month'])['date'].nunique().groupby('FROM_ADDRESS').mean().rename('active_days_month')

def mean_time_between_txs(timestamps):
    times = timestamps.sort_values()
    deltas = times.diff().dropna()
    if deltas.empty:
        return np.nan
    return deltas.mean().total_seconds() / (60 * 60 * 24)

mean_time_between = df.groupby('FROM_ADDRESS')['BLOCK_TIMESTAMP'].apply(mean_time_between_txs).rename('mean_time_between_txs_days')

features = pd.concat([
    tx_count_week,
    tx_count_month,
    active_days_week,
    active_days_month,
    mean_time_between
], axis=1).reset_index()


In [161]:
features.head()

Unnamed: 0,FROM_ADDRESS,tx_count_week,tx_count_month,active_days_week,active_days_month,mean_time_between_txs_days
0,0x0000000000000068f116a894984e2db1123eb395,31.0,31.0,1.0,1.0,0.245549
1,0x0000000000001ff3684f28c67538d4d072c22734,30.0,30.0,1.0,1.0,0.03307
2,0x000000000000ad05ccc4f10045630fb830b95127,1.0,1.0,1.0,1.0,
3,0x000000000000c9b3e2c3ec88b1b4c0cd853f4321,1.5,1.5,1.0,1.0,6.859306
4,0x0000000000085d4780b73119b644ae5ecd22b376,2.0,2.0,1.0,1.0,0.277639


In [163]:
features['mean_time_between_txs_days'].value_counts()

0.000000    250
0.000278     18
0.000139     18
0.000417     15
0.000833     13
           ... 
3.613229      1
0.852222      1
4.738935      1
2.429352      1
0.019583      1
Name: mean_time_between_txs_days, Length: 3677, dtype: int64

In [164]:
# behavioral features2

func_counts = df.groupby(['FROM_ADDRESS', 'FUNCTION_NAME']).size().unstack(fill_value=0)

features2 = pd.DataFrame(index=func_counts.index)

features2['swap_count'] = func_counts.get('swap', 0)
features2['transfer_count'] = func_counts.get('transfer', 0)
features2['execute_count'] = func_counts.get('execute', 0)

approve = func_counts.get('approve', 0)
transfer_from = func_counts.get('transferFrom', 0).replace(0, 1) 
features2['approve_transfer_ratio'] = approve / transfer_from

features2.reset_index(inplace=True)


In [165]:
features2.head()

Unnamed: 0,FROM_ADDRESS,swap_count,transfer_count,execute_count,approve_transfer_ratio
0,0x0000000000000068f116a894984e2db1123eb395,0,0,24,0.0
1,0x0000000000001ff3684f28c67538d4d072c22734,0,0,15,0.0
2,0x000000000000ad05ccc4f10045630fb830b95127,0,0,0,0.0
3,0x000000000000c9b3e2c3ec88b1b4c0cd853f4321,0,1,0,0.0
4,0x0000000000085d4780b73119b644ae5ecd22b376,0,1,0,0.0


In [167]:
features2.describe()

Unnamed: 0,swap_count,transfer_count,execute_count,approve_transfer_ratio
count,15519.0,15519.0,15519.0,15519.0
mean,0.425221,1.226625,0.104195,0.12856
std,15.284999,24.448559,0.437815,0.512698
min,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0
50%,0.0,0.0,0.0,0.0
75%,0.0,1.0,0.0,0.0
max,1531.0,2673.0,24.0,34.0
