# Wallet Risk Scoring From Scratch

In [46]:
import pandas as pd

url = "https://raw.githubusercontent.com/Himani-Barmase/Wallet-Risk-Scoring-From-Scratch/main/Wallet%20id%20-%20Sheet1.csv"
df = pd.read_csv(url)
df.head()


Unnamed: 0,wallet_id
0,0x0039f22efb07a647557c7c5d17854cfd6d489ef3
1,0x06b51c6882b27cb05e712185531c1f74996dd988
2,0x0795732aacc448030ef374374eaae57d2965c16c
3,0x0aaa79f1a86bc8136cd0d1ca0d51964f4e3766f9
4,0x0fe383e5abc200055a7f391f94a5f5d1f844b9ae


In [47]:
import requests
import time
import json
# Load wallet IDs from your CSV
url = "https://raw.githubusercontent.com/Himani-Barmase/Wallet-Risk-Scoring-From-Scratch/main/Wallet%20id%20-%20Sheet1.csv"
df = pd.read_csv(url)
wallet_ids = df['wallet_id'].tolist()


In [48]:
API_KEY = "cqt_rQgwmYRfX7j4ffhXfGbtk7DrvcJD"
CHAIN_ID = "1"  # Ethereum Mainnet


use this one wallet to print and study the structure

In [49]:
wallet = "0x0039f22efb07a647557c7c5d17854cfd6d489ef3"
transactions = get_wallet_transactions(wallet)

print(json.dumps(transactions[0], indent=2))  # print the first transaction neatly


{
  "block_signed_at": "2025-06-16T21:15:11Z",
  "block_height": 22719696,
  "block_hash": "0xff57a0c234e73c4897632e198caa0074e84d16080e0f99731a3da891faebb42b",
  "tx_hash": "0x98703fb4a7c6804d82e98f009ecc0e089abd53de94696088fb9675dde740c570",
  "tx_offset": 108,
  "successful": true,
  "miner_address": "0x95222290dd7278aa3ddd389cc1e1d165cc4bafe5",
  "from_address": "0xc6b602de080fc9ac9d96a431b2d749d38e77cbbc",
  "from_address_label": null,
  "to_address": "0x13173761e24c3708495b1dd314920f67f97011d0",
  "to_address_label": null,
  "value": "0",
  "value_quote": 0.0,
  "pretty_value_quote": "$0.00",
  "gas_metadata": {
    "contract_decimals": 18,
    "contract_name": "Ether",
    "contract_ticker_symbol": "ETH",
    "contract_address": "0xeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee",
    "supports_erc": [],
    "logo_url": "https://www.datocms-assets.com/86369/1669653891-eth.svg"
  },
  "gas_offered": 84929,
  "gas_spent": 55815,
  "gas_price": 4200000000,
  "fees_paid": "234423000000000"

Data Collection Method:

I utilized the Covalent API to fetch transaction-level blockchain data for a list of Ethereum wallet addresses. This included fields such as transaction value, gas fees, and decoded log events, which are crucial for understanding on-chain wallet activity. We collected data programmatically using Python for reproducibility and scalability.



In [50]:
def get_wallet_transactions(wallet_address):
    url = f"https://api.covalenthq.com/v1/{CHAIN_ID}/address/{wallet_address}/transactions_v3/?key={API_KEY}"
    response = requests.get(url)

    if response.status_code == 200:
        return response.json().get("data", {}).get("items", [])
    else:
        print(f"Error fetching {wallet_address} | Status: {response.status_code}")
        return []


Feature Selection Rationale:

The following features were selected for scoring, based on their relevance to transactional risk:

Transaction Value (value): High-value transfers may indicate high-risk behavior, especially in DeFi or token swaps.

Gas Fee (gas_quote): Extremely low gas fees may correlate with

In [None]:
#Step 3: Build a Scoring Function

Transaction Frequency:

More frequent transactions indicate active usage → lower risk.

Token Diversity: Wallets interacting with many different tokens or smart contracts may be higher risk (potential airdrop farmers or bots).

High-Value Transfers: Frequent large incoming/outgoing transfers may suggest suspicious or risky activity.

Age of Wallet: Older wallets are generally lower risk compared to newly created ones.

Gas Usage Patterns: Abnormally high gas fees or rapid transactions may hint at bot-like behavior or smart contract exploitation.

Each factor can be normalized and weighted to calculate a final risk score between 0 (low risk) to 1000 (high risk).

In [51]:
def compute_risk_score(transactions):
    score = 0

    for tx in transactions:
        try:
            value = int(tx['log_events'][0]['decoded']['params'][2]['value'])
            gas_fee = float(tx['gas_quote'])
            ticker = tx['log_events'][0]['sender_contract_ticker_symbol']

            if value > 1e18:
                score += 50
            if gas_fee < 0.01:
                score += 10
            if "FREEROMAN" in ticker:
                score += 100
        except:
            continue

    # Set max possible raw score (based on your rules)
    max_possible_score = 160  # adjust based on your actual logic
    normalized_score = min(score / max_possible_score, 1)  # get 0–1
    final_score = int(normalized_score * 1000)  # scale to 0–1000
    return final_score


In [52]:
print(df.columns)


Index(['wallet_id'], dtype='object')


In [53]:
# Apply risk scoring to all wallets
for wallet in wallet_ids:
    tx = get_wallet_transactions(wallet)
    score = compute_risk_score(tx)
    results.append({"wallet": wallet, "risk_score": score})

# Print result or convert to DataFrame
print(results)


[{'wallet': '0x0039f22efb07a647557c7c5d17854cfd6d489ef3', 'risk_score': 937}, {'wallet': '0x06b51c6882b27cb05e712185531c1f74996dd988', 'risk_score': 312}, {'wallet': '0x0795732aacc448030ef374374eaae57d2965c16c', 'risk_score': 312}, {'wallet': '0x0aaa79f1a86bc8136cd0d1ca0d51964f4e3766f9', 'risk_score': 1000}, {'wallet': '0x0fe383e5abc200055a7f391f94a5f5d1f844b9ae', 'risk_score': 312}, {'wallet': '0x0039f22efb07a647557c7c5d17854cfd6d489ef3', 'risk_score': 937}, {'wallet': '0x06b51c6882b27cb05e712185531c1f74996dd988', 'risk_score': 312}, {'wallet': '0x0795732aacc448030ef374374eaae57d2965c16c', 'risk_score': 312}, {'wallet': '0x0aaa79f1a86bc8136cd0d1ca0d51964f4e3766f9', 'risk_score': 1000}, {'wallet': '0x0fe383e5abc200055a7f391f94a5f5d1f844b9ae', 'risk_score': 312}, {'wallet': '0x104ae61d8d487ad689969a17807ddc338b445416', 'risk_score': 312}, {'wallet': '0x111c7208a7e2af345d36b6d4aace8740d61a3078', 'risk_score': 312}, {'wallet': '0x124853fecb522c57d9bd5c21231058696ca6d596', 'risk_score': 93

In [54]:
# Create DataFrame
df = pd.DataFrame(results)

# Show top 5 rows as preview (optional)
df.head(30)

Unnamed: 0,wallet,risk_score
0,0x0039f22efb07a647557c7c5d17854cfd6d489ef3,937
1,0x06b51c6882b27cb05e712185531c1f74996dd988,312
2,0x0795732aacc448030ef374374eaae57d2965c16c,312
3,0x0aaa79f1a86bc8136cd0d1ca0d51964f4e3766f9,1000
4,0x0fe383e5abc200055a7f391f94a5f5d1f844b9ae,312
5,0x0039f22efb07a647557c7c5d17854cfd6d489ef3,937
6,0x06b51c6882b27cb05e712185531c1f74996dd988,312
7,0x0795732aacc448030ef374374eaae57d2965c16c,312
8,0x0aaa79f1a86bc8136cd0d1ca0d51964f4e3766f9,1000
9,0x0fe383e5abc200055a7f391f94a5f5d1f844b9ae,312


In [55]:
# Save to CSV (without index and without extra columns)
df.to_csv("wallet_risk_scores_clean.csv", index=False)


In [56]:
from google.colab import files
files.download('wallet_risk_scores_clean.csv')


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Scoring Method:

I assigned scores to each wallet by evaluating key transactional behaviors. Each feature was scored based on predefined thresholds and weighted accordingly. Then, the scores were normalized and scaled to a final range of 0 to 1000 to standardize risk interpretation.

Key steps:

Each wallet's transaction data was parsed.

For every transaction, I assessed:

High-value transfers

Unusually low gas fees

Suspicious contract interactions (e.g., known risky tokens)

Each behavior contributed positively or negatively to the total score.

The final score per wallet was the cumulative result, scaled proportionally.

Justification of the Risk Indicators Used:

We chose the following risk indicators based on commonly observed patterns in fraudulent or high-risk wallet activity:

High Transaction Value: Large crypto transfers (especially over 1 ETH) can be associated with scams, rug pulls, or money laundering.

Low Gas Fees: Extremely low gas usage may signal interactions with gas-optimized or suspicious smart contracts often used to avoid detection.

Token Type or Contract Pattern: Interaction with certain known risky tokens or contract names (e.g., “FREEROMAN”) often hints at involvement in unverified or malicious projects.