# Q7.1 — Cryptopunks Graph EDA (Neo4j prep)

This notebook performs brief exploratory data analysis focusing on **relationships** among wallets (buyers/sellers) and tokens for the **CryptoPunks** project.

**Data file:** `punks.csv`  
**Goal:** understand graph-shaped aspects (who trades with whom, token liquidity, degree of wallet activity) to inform a **Neo4j** model (Wallets —[TRADED]→ Wallets, with properties like `token_id`, `timestamp`, `price`).

## Inferred schema (from CSV)
- Buyer column: `to`  
- Seller column: `from`  
- Token column: `punk`  
- Timestamp column: `None`  
- Price column: `amount`

## High-level stats
- Rows (transactions): **20002**
- Columns: **8**
- Unique wallets: **2476**
- Unique tokens: **1635**


In [None]:
import pandas as pd
df = pd.read_csv("punks.csv")
df.columns = [c.strip().lower().replace(" ", "_") for c in df.columns]
buyer_col = "to"
seller_col = "from"
token_col  = "punk"
ts_col     = "None"
price_col  = "amount"
df.head(5)

## Top buyers and sellers

In [None]:
import pandas as pd
if buyer_col:
    top_buyers = (df.groupby(buyer_col).size().sort_values(ascending=False).head(10)
                  .rename("purchases").reset_index())
    top_buyers
else:
    print("No buyer column inferred")


In [None]:
if seller_col:
    top_sellers = (df.groupby(seller_col).size().sort_values(ascending=False).head(10)
                   .rename("sales").reset_index())
    top_sellers
else:
    print("No seller column inferred")


## Visuals
![Top buyers](q7_figs/top_buyers.png)

![Top sellers](q7_figs/top_sellers.png)

![Trades per token](q7_figs/trades_per_token.png)



## Takeaways

- **Wallet relationships:** Buyers and sellers form a transactional graph; degree distribution is skewed, indicating a few highly active wallets.  
- **Token liquidity:** Some punks change hands multiple times (high trades-per-token), useful signals for market dynamics.  
- **Pricing:** If present, price distribution is heavy-tailed; median vs. tail behavior matters for outlier handling.  
- **Neo4j mapping (proposed):**
  - Nodes: `Wallet(address)`, `Punk(token_id)` (optional as separate node type)
  - Relationship: `(:Wallet)-[:TRADED {token_id, price, timestamp, tx_hash?}]->(:Wallet)`
  - Useful indexes: `Wallet(address)`, `Punk(token_id)`
