Phase 0 — Orientation & Tooling (Week 1)

* **Local dev environment ready**
    Repo folder: /Software_development/Python/projects/wallet_analytics
* **API key obtained**
    Alchemy API key: L1x4RXcfGHuBfGhvZSwof
* **Git repo initialized**
    https://github.com/Dev-Uchiha/wallet_analytics.git


To start, i created a repo folder called wallet_analytics where everything related to the project can be stored.
I then installed a package manager called "uv". A package manager is a a python (programming language) toolkit which allows the users to install all the needed tools (packages of code) to create a project. For e.g to do maths using python, i would need to install "numpy" which will already have all the code needed to do maths.
I chose uv specifically because it is much faster that the standard package manager "pip" and it is also what we use at work.
I then used uv to create a virtual environment inside the wallet_analytics folder. This virtual environment is the place where all the installed tools related to the projects will live. Each project should have its own virtual environment for code cleanliness.

I then created an account with Alchemy to get their API key (An API is a way for one software to speak to another). This key allows me to get data about Ethereum wallets from their website.

Lastly, I installed "git", which allows me to keep track of the different changes i make to my code. I also connected it to "github" so that i can store my code on the github website.

cd users/name/project
uv venv .venv        # once
uv sync              # whenever deps change
source .venv/bin/activate

In [1]:
import requests
import pandas as pd
import time
import os
import json
from datetime import datetime

In [8]:
# Cell 5 (Simple): ETH balance + last 10 transactions
# Get your API key from https://dashboard.alchemy.com/
ALCHEMY_API_KEY = "L1x4RXcfGHuBfGhvZSwof"  # ideally: os.getenv("ALCHEMY_API_KEY")
ALCHEMY_BASE_URL = "https://eth-mainnet.g.alchemy.com/v2"


def get_alchemy_json(method, params):
    """Make a request to Alchemy JSON-RPC."""
    url = f"{ALCHEMY_BASE_URL}/{ALCHEMY_API_KEY}"
    payload = {"id": 1, "jsonrpc": "2.0", "method": method, "params": params}

    r = requests.post(url, json=payload, headers={"Content-Type": "application/json"})
    r.raise_for_status()

    data = r.json()
    if "error" in data:
        raise Exception(f"Alchemy API error: {data['error']}")

    return data


def get_eth_balance(address):
    """Return ETH balance for an address (in ETH)."""
    data = get_alchemy_json("eth_getBalance", [address, "latest"])
    balance_wei = int(data["result"], 16)
    return balance_wei / 10**18


def get_last_10_transactions(address):
    """Return the last 10 *normal* (external ETH) transfers involving the address."""
    base_params = {
        "fromBlock": "0x0",
        "toBlock": "latest",
        "category": ["external"],
        "maxCount": hex(10),
        "excludeZeroValue": False,
    }

    all_txs = []

    # Transfers TO the address
    params_to = base_params.copy()
    params_to["toAddress"] = address
    data_to = get_alchemy_json("alchemy_getAssetTransfers", [params_to])
    all_txs.extend(data_to.get("result", {}).get("transfers", []))

    time.sleep(0.1)  # rate limiting

    # Transfers FROM the address
    params_from = base_params.copy()
    params_from["fromAddress"] = address
    data_from = get_alchemy_json("alchemy_getAssetTransfers", [params_from])
    all_txs.extend(data_from.get("result", {}).get("transfers", []))

    df = pd.DataFrame(all_txs)
    if df.empty:
        return df

    # Deduplicate and sort so we truly return the most recent 10.
    if "hash" in df.columns:
        df = df.drop_duplicates(subset=["hash"], keep="first")

    if "blockNum" in df.columns:
        df["_block_num"] = df["blockNum"].apply(
            lambda x: int(x, 16)
            if isinstance(x, str) and x.startswith("0x")
            else -1
        )
        df = df.sort_values("_block_num", ascending=False).drop(columns=["_block_num"])

    return df.head(10).reset_index(drop=True)


def list_relevant_data_headers(address):
    """Print the main field names returned by the API.

    Think of these as the "column headers" you can use once you load the data into a DataFrame.
    """
    params = {
        "fromBlock": "0x0",
        "toBlock": "latest",
        "category": ["external"],
        "maxCount": hex(10),
        "excludeZeroValue": False,
        "toAddress": address,
    }

    data = get_alchemy_json("alchemy_getAssetTransfers", [params])
    transfers = (data.get("result", {}) or {}).get("transfers", []) or []

    headers = set()
    for t in transfers:
        if isinstance(t, dict):
            headers.update(t.keys())

    print("Relevant data headers:", sorted(list(headers)))


# Example usage
if ALCHEMY_API_KEY:
    eth_address = "0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045"  # example address

    balance = get_eth_balance(eth_address)
    print(f"ETH Balance: {balance:.6f} ETH")

    list_relevant_data_headers(eth_address)

    last_10 = get_last_10_transactions(eth_address)
    print(f"Last 10 transactions fetched: {len(last_10)}")
    display(last_10.head(10))
else:
    print("Please set your ALCHEMY_API_KEY to use the API")


ETH Balance: 32.112658 ETH
Relevant data headers: ['asset', 'blockNum', 'category', 'erc1155Metadata', 'erc721TokenId', 'from', 'hash', 'metadata', 'rawContract', 'to', 'tokenId', 'uniqueId', 'value']
Last 10 transactions fetched: 10


Unnamed: 0,blockNum,uniqueId,hash,from,to,value,erc721TokenId,erc1155Metadata,tokenId,asset,category,rawContract,metadata
0,0x10efdd,0x05fe9ec8c7f6ba25688ddf68b49bbaee0e54e80255e2...,0x05fe9ec8c7f6ba25688ddf68b49bbaee0e54e80255e2...,0xbbdbbefb30aab65c7dd5e30f5f9358013bb6792f,0xd8da6bf26964af9d7eed9e03e53415d37aa96045,0.006412,,,,ETH,external,"{'value': '0x16c7ae164cc000', 'address': None,...",
1,0x10efdb,0xbc4d158132d6db6378f09cda085541d6f8c3ff5adc00...,0xbc4d158132d6db6378f09cda085541d6f8c3ff5adc00...,0x0b54b2fc857c823a1afe788d2c90deb068cdf0fc,0xd8da6bf26964af9d7eed9e03e53415d37aa96045,0.006076,,,,ETH,external,"{'value': '0x159616fda7c000', 'address': None,...",
2,0x10efd0,0x17b46ad5453494aa56dfba10c40bfa3fd934d436c2d2...,0x17b46ad5453494aa56dfba10c40bfa3fd934d436c2d2...,0xe2f2cb9d8c1bdd17f58a3a91569bff14b00f0044,0xd8da6bf26964af9d7eed9e03e53415d37aa96045,0.005992,,,,ETH,external,"{'value': '0x1549b1377e8000', 'address': None,...",
3,0x10efd0,0x9be45b03c54234fd5a6c442c3ad47ac7681323107cbd...,0x9be45b03c54234fd5a6c442c3ad47ac7681323107cbd...,0x0bd9571f2e4a8cb9c9ef3290e2f7892a072e93a7,0xd8da6bf26964af9d7eed9e03e53415d37aa96045,0.006034,,,,ETH,external,"{'value': '0x156fe41a932000', 'address': None,...",
4,0x6d6b7,0x9b43167ed68e5607024285dc5f2d91b95ca8d2a10472...,0x9b43167ed68e5607024285dc5f2d91b95ca8d2a10472...,0x1db3439a222c519ab44bb1144fc28167b4fa6ee6,0xd8da6bf26964af9d7eed9e03e53415d37aa96045,0.1,,,,ETH,external,"{'value': '0x16345785d8a0000', 'address': None...",
5,0x69617,0x7ceb1538dbd5927ed406c59b178dd04e22c6fba0e641...,0x7ceb1538dbd5927ed406c59b178dd04e22c6fba0e641...,0x1db3439a222c519ab44bb1144fc28167b4fa6ee6,0xd8da6bf26964af9d7eed9e03e53415d37aa96045,0.0,,,,ETH,external,"{'value': '0x0', 'address': None, 'decimal': '...",
6,0x6827d,0xa68c128877d8056f3ef0f008782cc6c94e8134e52806...,0xa68c128877d8056f3ef0f008782cc6c94e8134e52806...,0x1db3439a222c519ab44bb1144fc28167b4fa6ee6,0xd8da6bf26964af9d7eed9e03e53415d37aa96045,0.0,,,,ETH,external,"{'value': '0x0', 'address': None, 'decimal': '...",
7,0x5be50,0x4d4c6fc89ff719889314e5a46bcebc612a17c3db448b...,0x4d4c6fc89ff719889314e5a46bcebc612a17c3db448b...,0x1db3439a222c519ab44bb1144fc28167b4fa6ee6,0xd8da6bf26964af9d7eed9e03e53415d37aa96045,1.0,,,,ETH,external,"{'value': '0xde0b6b3a7640000', 'address': None...",
8,0x50b49,0x197f702031f60ea46533fa979a2ef0daf94e0cbf5ca2...,0x197f702031f60ea46533fa979a2ef0daf94e0cbf5ca2...,0x1db3439a222c519ab44bb1144fc28167b4fa6ee6,0xd8da6bf26964af9d7eed9e03e53415d37aa96045,2e-18,,,,ETH,external,"{'value': '0x2', 'address': None, 'decimal': '...",
9,0x4ea4a,0x55c749e2dfefb46060a455ea2037319c802c8f76008d...,0x55c749e2dfefb46060a455ea2037319c802c8f76008d...,0xd8da6bf26964af9d7eed9e03e53415d37aa96045,0x7e2d0fe0ffdd78c264f8d40d19acb7d04390c6e8,0.0,,,,ETH,external,"{'value': '0x0', 'address': None, 'decimal': '...",


Phase 1 — Raw On-Chain Data Ingestion (Weeks 2–3)
* Normal transactions
* ERC-20 token transfers

**Deliverable**

* Script that pulls data for 1 wallet
* Raw tables saved locally
* Re-runnable without manual edits


I then created various functions, using the key, to bring the data from the website, into python as well as storing the data in .csv folders for later usage. Some of the functions created allow me to see the balance of an ethereum wallet, 

In [9]:
# Cell 6 (Complex): use the simple functions from Cell 5, then add more datasets + saving
# NOTE: This cell assumes Cell 5 has already been run (so `ALCHEMY_API_KEY` and `get_alchemy_json` exist).


def get_eth_balance_and_save(address, save_to_file=False, output_dir="outputs"):
    """Get ETH balance (in ETH). Optionally save balance JSON to disk."""
    data = get_alchemy_json("eth_getBalance", [address, "latest"])
    balance_wei = int(data["result"], 16)
    balance_eth = balance_wei / 10**18

    if save_to_file:
        os.makedirs(output_dir, exist_ok=True)
        filename = f"{output_dir}/eth_balance_{address}_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
        with open(filename, "w") as f:
            json.dump(
                {
                    "address": address,
                    "balance_wei": str(balance_wei),
                    "balance_eth": balance_eth,
                    "timestamp": datetime.now().isoformat(),
                },
                f,
                indent=2,
            )
        print(f"Saved balance data to {filename}")

    return balance_eth


def _sort_and_dedupe_transfers(df):
    """De-dupe by tx hash and sort by block number (newest first)."""
    if df.empty:
        return df

    if "hash" in df.columns:
        df = df.drop_duplicates(subset=["hash"], keep="first")

    if "blockNum" in df.columns:
        df["_block_num"] = df["blockNum"].apply(
            lambda x: int(x, 16) if isinstance(x, str) and x.startswith("0x") else -1
        )
        df = df.sort_values("_block_num", ascending=False).drop(columns=["_block_num"])

    return df


def get_normal_transactions(
    address,
    from_block="0x0",
    to_block="latest",
    max_count=1000,
    save_to_file=False,
    output_dir="outputs",
):
    """Get normal ETH transactions (external transfers) for an address."""
    params = {
        "fromBlock": from_block,
        "toBlock": to_block,
        "category": ["external"],
        "maxCount": hex(max_count),
        "excludeZeroValue": False,
    }

    all_transactions = []

    params_to = params.copy()
    params_to["toAddress"] = address
    data_to = get_alchemy_json("alchemy_getAssetTransfers", [params_to])
    all_transactions.extend(data_to.get("result", {}).get("transfers", []))

    time.sleep(0.1)

    params_from = params.copy()
    params_from["fromAddress"] = address
    data_from = get_alchemy_json("alchemy_getAssetTransfers", [params_from])
    all_transactions.extend(data_from.get("result", {}).get("transfers", []))

    df = _sort_and_dedupe_transfers(pd.DataFrame(all_transactions)).reset_index(drop=True)

    if save_to_file and len(df) > 0:
        os.makedirs(output_dir, exist_ok=True)
        filename = f"{output_dir}/normal_transactions_{address}_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
        df.to_csv(filename, index=False)
        print(f"Saved {len(df)} normal transactions to {filename}")

    return df


def get_erc20_transfers(
    address,
    from_block="0x0",
    to_block="latest",
    max_count=1000,
    save_to_file=False,
    output_dir="outputs",
):
    """Get ERC-20 token transfers for an address."""
    params = {
        "fromBlock": from_block,
        "toBlock": to_block,
        "category": ["erc20"],
        "maxCount": hex(max_count),
        "excludeZeroValue": False,
    }

    all_transfers = []

    params_to = params.copy()
    params_to["toAddress"] = address
    data_to = get_alchemy_json("alchemy_getAssetTransfers", [params_to])
    all_transfers.extend(data_to.get("result", {}).get("transfers", []))

    time.sleep(0.1)

    params_from = params.copy()
    params_from["fromAddress"] = address
    data_from = get_alchemy_json("alchemy_getAssetTransfers", [params_from])
    all_transfers.extend(data_from.get("result", {}).get("transfers", []))

    df = _sort_and_dedupe_transfers(pd.DataFrame(all_transfers)).reset_index(drop=True)

    if save_to_file and len(df) > 0:
        os.makedirs(output_dir, exist_ok=True)
        filename = f"{output_dir}/erc20_transfers_{address}_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
        df.to_csv(filename, index=False)
        print(f"Saved {len(df)} ERC-20 token transfers to {filename}")

    return df


def get_wallet_data(
    address,
    include_normal_txs=True,
    include_erc20=True,
    from_block="0x0",
    to_block="latest",
    max_count=1000,
    save_to_file=False,
    output_dir="outputs",
):
    """Get wallet data: ETH balance + normal txs + ERC-20 transfers."""
    results = {}

    # Use the simple balance function for quick display,
    # and the extended one if you want to save raw output.
    results["balance"] = (
        get_eth_balance_and_save(address, save_to_file=save_to_file, output_dir=output_dir)
        if save_to_file
        else get_eth_balance(address)
    )

    if include_normal_txs:
        df_normal = get_normal_transactions(
            address,
            from_block=from_block,
            to_block=to_block,
            max_count=max_count,
            save_to_file=save_to_file,
            output_dir=output_dir,
        )
        results["normal_transactions"] = df_normal
        print(f"Fetched {len(df_normal)} normal transactions")

    if include_erc20:
        df_erc20 = get_erc20_transfers(
            address,
            from_block=from_block,
            to_block=to_block,
            max_count=max_count,
            save_to_file=save_to_file,
            output_dir=output_dir,
        )
        results["erc20_transfers"] = df_erc20
        print(f"Fetched {len(df_erc20)} ERC-20 token transfers")

    return results


# Example usage
if ALCHEMY_API_KEY:
    eth_address = "0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045"  # example address

    # Simple functions reused from Cell 5
    balance = get_eth_balance(eth_address)
    print(f"ETH Balance: {balance:.6f} ETH")

    last_10 = get_last_10_transactions(eth_address)
    print(f"Last 10 transactions fetched: {len(last_10)}")

    # Complex: fetch more data and save raw outputs
    wallet_data = get_wallet_data(
        eth_address,
        include_normal_txs=True,
        include_erc20=True,
        max_count=100,
        save_to_file=True,
        output_dir="outputs",
    )

    print(f"\nNormal Transactions: {len(wallet_data.get('normal_transactions', pd.DataFrame()))} records")
    print(f"ERC-20 Transfers: {len(wallet_data.get('erc20_transfers', pd.DataFrame()))} records")
else:
    print("Please set your ALCHEMY_API_KEY to use the API")


ETH Balance: 32.112658 ETH
Last 10 transactions fetched: 10
Saved balance data to outputs/eth_balance_0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045_20260125_175355.json
Saved 200 normal transactions to outputs/normal_transactions_0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045_20260125_175356.csv
Fetched 200 normal transactions
Saved 155 ERC-20 token transfers to outputs/erc20_transfers_0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045_20260125_175357.csv
Fetched 155 ERC-20 token transfers

Normal Transactions: 200 records
ERC-20 Transfers: 155 records
