Phase 0 — Orientation & Tooling (Week 1)

* **Local dev environment ready**
    Repo folder: /Software_development/Python/projects/wallet_analytics
* **API key obtained**
    Alchemy API key: L1x4RXcfGHuBfGhvZSwof
* **Git repo initialized**
    https://github.com/Dev-Uchiha/wallet_analytics.git


To start, i created a repository folder on my laptop called wallet_analytics where everything related to the project can be stored.
I then installed a python package manager called "uv". A package manager is a python (programming language) toolkit which allows the user to install all the needed tools (packages of code) to create a project. For e.g to do maths using python, i would need to install "numpy" which will already have all the code needed to do calculations.
I chose uv specifically because it is much faster at downloading packages than the standard package manager "pip" and it is also what we use at work.
I then used uv to create a virtual environment inside the wallet_analytics folder. This virtual environment is the place where all the installed tools(code) related to the projects will reside. Each project should have its own virtual environment for code cleanliness.

I then created an account with Alchemy to get their API key (An API is a way for one software to speak to another). This key allows me to get data about Ethereum wallets from the Alchemy website.

Lastly, I installed "git", which allows me to keep track of the different changes i make to my code. I also connected it to "github" so that i can store my code on the github website.

In [None]:
#Creating the virtual environment

cd users/name/project
uv venv .venv        # once
uv sync              # whenever deps change
source .venv/bin/activate

Phase 1 — Raw On-Chain Data Ingestion (Weeks 2–3)
* Normal transactions
* ERC-20 token transfers

**Deliverable**

* Script that pulls data for 1 wallet
* Raw tables saved locally
* Re-runnable without manual edits


I then created various functions (blocks of code that have a spefic purpose), to return the last 10 transactions from vitalik buterins eth wallet, displaying the transaction hash (a unique transaction id), eth amount and data. This can be verified by putting his wallet address in an eth blockchain explorer (a website which allows you to track wallet data) to see if the last 10 transactions are the same. 

https://etherscan.io/address/0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045

The next step was to bring back all the relevant wallet data and save them locally as .csv files. 
This included:

**transactions**
tx_hash
block_number
timestamp
from_address
to_address
value_eth
gas_used
gas_price
status

**token_transfers**
tx_hash
token_symbol
token_contract
from_address
to_address
value
timestamp

The differences between transactions and token transfers are:
    - a transaction moves a value of eth (the base asset) from one wallet to another (e.g sending someone 5 eth). It initiates an execution and shows what was sent and who sent it.
    - a token transfer changes the ownership of a token from one wallet to another (e.g sending someone an nft). It shows what did the contract say happened. 
    - all token transfers happen within transactions, but a token transfer isnt needed for a transaction
    - transactions are recorded in the transaction object() but token transfers are recorded in the logs (a description of what happened in the contract)


In [None]:
# Return the last 10 transactions from an ethereum wallet
# Return the most relevant data headers

import requests
import pandas as pd
import time
import os
import json
from datetime import datetime

ALCHEMY_API_KEY = "L1x4RXcfGHuBfGhvZSwof"  # ideally: os.getenv("ALCHEMY_API_KEY")
ALCHEMY_BASE_URL = "https://eth-mainnet.g.alchemy.com/v2"


def get_alchemy_json(method, params):
    """Make a request to Alchemy JSON-RPC."""
    url = f"{ALCHEMY_BASE_URL}/{ALCHEMY_API_KEY}"
    payload = {"id": 1, "jsonrpc": "2.0", "method": method, "params": params}

    r = requests.post(url, json=payload, headers={"Content-Type": "application/json"})
    r.raise_for_status()

    data = r.json()
    if "error" in data:
        raise Exception(f"Alchemy API error: {data['error']}")

    return data


def get_eth_balance(address):
    """Return ETH balance for an address (in ETH)."""
    data = get_alchemy_json("eth_getBalance", [address, "latest"])
    balance_wei = int(data["result"], 16)
    return balance_wei / 10**18


def get_last_10_transactions(address):
    """Return the last 10 *normal* (external ETH) transfers involving the address."""
    base_params = {
        "fromBlock": "0x0",
        "toBlock": "latest",
        "category": ["external"],
        "maxCount": hex(10),
        "excludeZeroValue": False,
        "withMetadata": True,
        "order": "desc"
    }

    all_txs = []

    # Transfers TO the address
    params_to = base_params.copy()
    params_to["toAddress"] = address
    data_to = get_alchemy_json("alchemy_getAssetTransfers", [params_to])
    all_txs.extend(data_to.get("result", {}).get("transfers", []))

    time.sleep(0.1)  # rate limiting

    # Transfers FROM the address
    params_from = base_params.copy()
    params_from["fromAddress"] = address
    data_from = get_alchemy_json("alchemy_getAssetTransfers", [params_from])
    all_txs.extend(data_from.get("result", {}).get("transfers", []))

    df = pd.DataFrame(all_txs)
    if df.empty:
        return df

    # Deduplicate and sort so we truly return the most recent 10.
    if "hash" in df.columns:
        df = df.drop_duplicates(subset=["hash"], keep="first")

    if "blockNum" in df.columns:
        df["_block_num"] = df["blockNum"].apply(
            lambda x: int(x, 16)
            if isinstance(x, str) and x.startswith("0x")
            else -1
        )
        df = df.sort_values("_block_num", ascending=False).drop(columns=["_block_num"])

    return df.head(10).reset_index(drop=True)


def list_relevant_data_headers(address):
    """Print the main field names returned by the API.

    Think of these as the "column headers" you can use once you load the data into a DataFrame.
    """
    params = {
        "fromBlock": "0x0",
        "toBlock": "latest",
        "category": ["external"],
        "maxCount": hex(10),
        "excludeZeroValue": False,
        "toAddress": address,
    }

    data = get_alchemy_json("alchemy_getAssetTransfers", [params])
    transfers = (data.get("result", {}) or {}).get("transfers", []) or []

    headers = set()
    for t in transfers:
        if isinstance(t, dict):
            headers.update(t.keys())

    print("Relevant data headers:", sorted(list(headers)))


# Example usage
if ALCHEMY_API_KEY:
    eth_address = "0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045"  # example address

    balance = get_eth_balance(eth_address)
    print(f"ETH Balance: {balance:.6f} ETH")

    list_relevant_data_headers(eth_address)

    last_10 = get_last_10_transactions(eth_address)
    print(f"Last 10 transactions fetched: {len(last_10)}")

    pd.set_option('display.max_colwidth', None)
    pd.set_option('display.float_format', '{:.15f}'.format)

    display(last_10[['uniqueId', 'value']].assign(timestamp=last_10['metadata'].apply(lambda x: x['blockTimestamp'])))



else:
    print("Please set your ALCHEMY_API_KEY to use the API")


In [None]:
# Transactions and token_transfers in project schema — display last 10, CSV saves last 100

from datetime import datetime

OUTPUT_DIR = "output"
DISPLAY_COUNT = 10
CSV_COUNT = 100


def _fetch_external_transfers(address, max_count):
    """Fetch up to max_count external (ETH) transfers for address; reuses get_alchemy_json from above."""
    base_params = {
        "fromBlock": "0x0",
        "toBlock": "latest",
        "category": ["external"],
        "maxCount": hex(max_count),
        "excludeZeroValue": False,
        "withMetadata": True,
        "order": "desc",
    }
    all_txs = []
    for key, val in [("toAddress", address), ("fromAddress", address)]:
        p = {**base_params, key: val}
        data = get_alchemy_json("alchemy_getAssetTransfers", [p])
        all_txs.extend(data.get("result", {}).get("transfers", []))
        time.sleep(0.1)
    df = pd.DataFrame(all_txs)
    if df.empty:
        return df
    if "hash" in df.columns:
        df = df.drop_duplicates(subset=["hash"], keep="first")
    if "blockNum" in df.columns:
        df["_bn"] = df["blockNum"].apply(
            lambda x: int(x, 16) if isinstance(x, str) and x.startswith("0x") else -1
        )
        df = df.sort_values("_bn", ascending=False).drop(columns=["_bn"])
    return df.head(max_count).reset_index(drop=True)


def get_transactions(address, save_csv=False):
    """
    Return normal (external ETH) transactions as a DataFrame with schema:
    tx_hash, block_number, timestamp, from_address, to_address, value_eth, gas_used, gas_price, status.
    Displays last 10; CSV (when save_csv=True) contains last 100. Saves to output/transactions_{date}.csv.
    """
    raw = _fetch_external_transfers(address, CSV_COUNT)
    if raw.empty:
        return pd.DataFrame(columns=[
            "tx_hash", "block_number", "timestamp", "from_address", "to_address",
            "value_eth", "gas_used", "gas_price", "status",
        ])

    rows = []
    for _, row in raw.iterrows():
        tx_hash = row.get("hash", "")
        block_hex = row.get("blockNum", "0x0")
        block_number = int(block_hex, 16) if isinstance(block_hex, str) and block_hex.startswith("0x") else 0
        meta = row.get("metadata") or {}
        timestamp = meta.get("blockTimestamp", "")
        from_addr = row.get("from", "")
        to_addr = row.get("to", "")
        val = row.get("value")
        if val is None:
            value_wei = 0
        elif isinstance(val, str) and str(val).startswith("0x"):
            value_wei = int(val, 16)
        else:
            value_wei = int(val) if val is not None else 0
        value_eth = value_wei / 10**18

        gas_used, gas_price, status = None, None, None
        try:
            tx_data = get_alchemy_json("eth_getTransactionByHash", [tx_hash])
            receipt = get_alchemy_json("eth_getTransactionReceipt", [tx_hash])
            time.sleep(0.05)
            if tx_data.get("result"):
                gas_price = int(tx_data["result"].get("gasPrice", "0x0"), 16)
            if receipt.get("result"):
                gas_used = int(receipt["result"].get("gasUsed", "0x0"), 16)
                status = "success" if int(receipt["result"].get("status", "0x0"), 16) == 1 else "failed"
        except Exception:
            pass

        rows.append({
            "tx_hash": tx_hash,
            "block_number": block_number,
            "timestamp": timestamp,
            "from_address": from_addr,
            "to_address": to_addr,
            "value_eth": value_eth,
            "gas_used": gas_used,
            "gas_price": gas_price,
            "status": status,
        })

    df = pd.DataFrame(rows)
    display(df.head(DISPLAY_COUNT))
    if save_csv:
        os.makedirs(OUTPUT_DIR, exist_ok=True)
        date_str = datetime.now().strftime("%Y-%m-%d")
        path = os.path.join(OUTPUT_DIR, f"transactions_{date_str}.csv")
        df.to_csv(path, index=False)
        print(f"Saved {len(df)} rows to {path}")
    return df


def _fetch_token_transfers(address, max_count):
    """Fetch up to max_count ERC-20 token transfers for address."""
    base_params = {
        "fromBlock": "0x0",
        "toBlock": "latest",
        "category": ["erc20"],
        "maxCount": hex(max_count),
        "excludeZeroValue": False,
        "withMetadata": True,
        "order": "desc",
    }
    all_txs = []
    for key, val in [("toAddress", address), ("fromAddress", address)]:
        p = {**base_params, key: val}
        data = get_alchemy_json("alchemy_getAssetTransfers", [p])
        all_txs.extend(data.get("result", {}).get("transfers", []))
        time.sleep(0.1)
    df = pd.DataFrame(all_txs)
    if df.empty:
        return df
    if "hash" in df.columns:
        df = df.drop_duplicates(subset=["hash"], keep="first")
    if "blockNum" in df.columns:
        df["_bn"] = df["blockNum"].apply(
            lambda x: int(x, 16) if isinstance(x, str) and x.startswith("0x") else -1
        )
        df = df.sort_values("_bn", ascending=False).drop(columns=["_bn"])
    return df.head(max_count).reset_index(drop=True)


def get_token_transfers(address, save_csv=False):
    """
    Return ERC-20 token transfers as a DataFrame with schema:
    tx_hash, token_symbol, token_contract, from_address, to_address, value, timestamp.
    Displays last 10; CSV (when save_csv=True) contains last 100. Saves to output/token_transfers_{date}.csv.
    """
    raw = _fetch_token_transfers(address, CSV_COUNT)
    if raw.empty:
        return pd.DataFrame(columns=[
            "tx_hash", "token_symbol", "token_contract", "from_address", "to_address",
            "value", "timestamp",
        ])

    rows = []
    for _, row in raw.iterrows():
        meta = row.get("metadata") or {}
        raw_contract = row.get("rawContract") or {}
        token_contract = raw_contract.get("address", "") or ""
        # Alchemy may return 'asset' as symbol (e.g. "USDC"); fallback to contract or "N/A"
        token_symbol = row.get("asset") or row.get("symbol") or token_contract or "N/A"
        if isinstance(token_symbol, dict):
            token_symbol = token_symbol.get("symbol", "N/A") or "N/A"
        value_raw = row.get("value")
        if value_raw is None:
            value = None
        elif isinstance(value_raw, (int, float)):
            value = float(value_raw)
        else:
            try:
                value = int(str(value_raw), 16) if str(value_raw).startswith("0x") else float(value_raw)
            except (ValueError, TypeError):
                value = value_raw
        rows.append({
            "tx_hash": row.get("hash", ""),
            "token_symbol": str(token_symbol),
            "token_contract": token_contract,
            "from_address": row.get("from", ""),
            "to_address": row.get("to", ""),
            "value": value,
            "timestamp": meta.get("blockTimestamp", ""),
        })

    df = pd.DataFrame(rows)
    display(df.head(DISPLAY_COUNT))
    if save_csv:
        os.makedirs(OUTPUT_DIR, exist_ok=True)
        date_str = datetime.now().strftime("%Y-%m-%d")
        path = os.path.join(OUTPUT_DIR, f"token_transfers_{date_str}.csv")
        df.to_csv(path, index=False)
        print(f"Saved {len(df)} rows to {path}")
    return df


# Example: display last 10 for both; CSV files contain last 100
if ALCHEMY_API_KEY:
    eth_address = "0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045"

    print("--- Last 10 transactions ---")
    tx_df = get_transactions(eth_address, save_csv=True)

    print("\n--- Last 10 token transfers ---")
    token_df = get_token_transfers(eth_address, save_csv=True)
else:
    print("Please set ALCHEMY_API_KEY (run the cell above first).")