Exploring API technique for Data Analysis.
Required libraries:

In [None]:
import pandas as pd
from etherscan import Etherscan
from dotenv import load_dotenv
import os
from datetime import datetime, timedelta
import requests

Load Etherscan API key from .env

In [39]:
load_dotenv(dotenv_path=".env")
api_key = os.environ.get("ETHERSCAN_API_KEY")



Initial exploration: fetching Ethereum transactions via Etherscan API

In [40]:
eth = Etherscan(api_key)


Example: Fetch transactions for a given address (this address was flagged by Scam Sniffer and got labeleed as Phish/ Hack on Etherscan)

In [None]:
address = "0x3895c7e8c65c4ad1102e16689a9f83b56bc67c14"

# Fetch all normal transactions for the address
transactions = eth.get_normal_txs_by_address(
    address=address,
    startblock=0,
    endblock=99999999,
    sort="asc"
)

In [None]:
#Load transaction into a DataFrame
df = pd.DataFrame(transactions)

#Cleaning up the data to usable format

# Convert value to Ether
df['value'] = df['value'].astype(float) / 1e18

# Convert blockNumber to integer
df['blockNumber'] = df['blockNumber'].astype(int)

# Convert timestamp to datetime
df['timeStamp'] = pd.to_datetime(df['timeStamp'], unit='s')

  df['timeStamp'] = pd.to_datetime(df['timeStamp'], unit='s')


Select columns that are relevant

In [None]:
columns = ['blockNumber', 'timeStamp', 'from', 'to', 'value', 'hash']
df_clean = df[columns]
df_clean.head()

Unnamed: 0,blockNumber,timeStamp,from,to,value,hash
0,16672663,2023-02-20 22:14:59,0xf1da173228fcf015f43f3ea15abbb51f0d8f1123,0x3895c7e8c65c4ad1102e16689a9f83b56bc67c14,0.093008,0x9ddb2d4a61c3c9ba6b73d7228198208564e80e7055de...
1,16673501,2023-02-21 01:04:35,0x3895c7e8c65c4ad1102e16689a9f83b56bc67c14,,0.0,0x73d60992447f221505e6c362e74d7c6e24bfc3438326...
2,16673689,2023-02-21 01:42:47,0x3895c7e8c65c4ad1102e16689a9f83b56bc67c14,,0.0,0xfa12d513e7c158b2b76360aed150190f157b5006b821...
3,16673689,2023-02-21 01:42:47,0x3895c7e8c65c4ad1102e16689a9f83b56bc67c14,,0.0,0xf4214160cc3027f342935d896a67baa3d4987689eab1...
4,16680412,2023-02-22 00:23:23,0x3895c7e8c65c4ad1102e16689a9f83b56bc67c14,0xdac17f958d2ee523a2206206994597c13d831ec7,0.0,0x11c8e1da654f4a37a6c4ab922b31845de33182ce93f7...


Importing a test file that contains a list of scam addresses.
To expand the analysis, I imported a blacklist of known scam addresses and checked which ones were active within the last 6 months:

In [57]:
with open('../Data/Raw/master_blacklist_set.txt', 'r') as b:
    scam_addresses = [line.strip() for line in b if line.strip()]

Checking active addresses in the past 6 months

In [None]:

ETHERSCAN_API_KEY = 'apy_key'  
recent_cutoff = datetime.now() - timedelta(days=180)  # Last 6 months

def is_address_active(address):
    url = f'https://api.etherscan.io/api?module=account&action=txlist&address={address}&sort=desc&apikey={ETHERSCAN_API_KEY}'
    resp = requests.get(url).json()
    if resp['status'] != '1':
        return False  # No transactions or error
    for tx in resp['result']:
        tx_time = datetime.fromtimestamp(int(tx['timeStamp']))
        if tx_time > recent_cutoff:
            return True
    return False

In [None]:
# Sample test

sample_addresses = scam_addresses[:20]
active_addresses = [addr for addr in sample_addresses if is_address_active(addr)]
print(f"Active scam addresses: {active_addresses}")

Active scam addresses: []


### Conclusion:

In this notebook, I explored using the Etherscan API to retrieve Ethereum wallet activity, including testing a known scam address list for recent activity.

However, due to API rate limits and the high number of inactive addresses, this method was not scalable for deeper behavioral analysis. 

While technically effective, this method surfaced several issues:
- Many known scam addresses were inactive (no recent transactions)
- This made the approach unsuitable for identifying network-wide scam patterns.

In the next notebook, I pivot using a Kaggle dataset with full Ethereum transaction history to support more robust scam pattern detection.