Crypto tax engine validated against 660K+ events from public Hyperliquid whale wallets. Three-way IRS treatment election, §1092 straddle detection, cross-source HL+EVM reconciliation.
A crypto trader hands their accountant a pile of CSVs from Hyperliquid and exports from one or more on-chain wallets at year-end. The accountant has to reconcile them into a Form 8949 the IRS will accept. The hard parts are matching cross-chain bridges (so the same dollar is not taxed twice), classifying perpetual funding payments (where the IRS has not given clean guidance), detecting §1092 straddles (that defer losses), and processing hundreds of thousands of events from a single year of active trading.
Generic crypto tax tools handle common cases but break on the edge cases that matter most for active futures traders. ChainTax handles those edge cases and produces a partner-review-ready deliverable.
The core deliverable is a partner-review-ready PDF with executive summary, per-item authority citations, and three-way treatment comparison. Each flagged item gets a dedicated detail page with explicit IRS authority citations and recommended treatment based on §163(d) NII cap analysis and §1012 basis adjustment timing.
Page 1: executive summary with top-3 items ranked by dollar impact. The differential column shows the Conservative-vs-Moderate delta computed by the three-way treatment election engine in src/classification/funding_classifier.py. Funding Treatment Comparison at bottom: Conservative $18.1M, Moderate recommended $10.5M, Aggressive $10.4M — the same wynn 2025 dataset under three IRS treatments.
Per-item detail for the BTC funding payment ($5.8M at issue). Conservative paragraph cites §163(d), §61, Rev. Rul. 78-149 with NII cap analysis. Moderate (recommended) cites economic substance doctrine and §1012 cost basis with no cap. Aggressive cites §165(c)(2) ordinary expense, flagged as requiring Form 8275 disclosure. Authority citations are generated per item in src/output/memo_template.py.
Full sample PDF (7 pages): output/wynn/2025/wynn_ChainTax_Report_2025.pdf
To inspect the pipeline interactively, launch the Streamlit app on the included wynn 2025 demo data. The dashboard surfaces both the headline metrics and the underlying reasoning behind each treatment recommendation.
Executive summary auto-loads on page load for the wynn demo (precomputed via output/wynn/2025/summary.json). The pipeline reads 538,740 HL events plus 2,887 EVM events into a unified timeline, runs five classifiers (liquidation, funding, spot, §1092 straddle, §475(f) MTM simulation), and writes Form 8949 CSVs across three IRS treatments. The Funding Treatment Comparison table below the metrics shows the same dataset under each treatment — Conservative $18.1M, Moderate $10.5M, Aggressive $10.4M.
Tax Benefit Summary tab: the engine quantifies §163(d) suspension risk ($2.9M for this wallet), computes the timing-deferred Moderate benefit, and projects the additional savings from Aggressive treatment if the NII cap fully binds. The per-coin breakdown shows the same analysis applied granularly across BTC, kPEPE, FARTCOIN, XRP, TRUMP, ETH, SUI, DOGE, and HYPE. The recommendation footer reasons through which treatment captures the same economic result with lower audit risk.
The demo ships a precomputed summary.json alongside sampled transaction-level CSVs (~3,500 rows) so first-time viewers see real numbers from the full 541k-event dataset instantly. Clicking "Re-run Pipeline" regenerates everything from raw data (5-10 minutes on commodity hardware) and overwrites the summary.
Hyperliquid CSVs EVM exports (Alchemy, DeBank, Zerion)
| |
v v
src/ingestion/ postprocess/evm_parser.py
| |
+------------------+------------------+
|
v
src/ledger/ <-- FIFO + SpecID, exchange-agnostic
|
v
src/classification/ <-- 5 classifiers
|
v
src/output/ <-- Form 8949, flagged report, partner PDF
|
v
postprocess/merger.py <-- HL+EVM unified timeline, bridge detection
|
v
partner PDF + flagged CSV + 3x Form 8949 + treatment_comparison.csv
The five classifiers: liquidation_classifier.py (capital vs. ordinary loss), funding_classifier.py (three-way treatment election), spot_classifier.py (realized G/L with ST/LT), straddle_detector.py (§1092 offsetting positions), mtm_simulator.py (§475(f) mark-to-market hypothetical).
The two layers are loosely coupled by CSV files on disk; postprocess/ does not import from src/. The ledger layer (src/ledger/) is genuinely exchange-agnostic — the FIFO and SpecID engines operate on a normalized fill schema with no Hyperliquid-specific concepts, so feeding Coinbase, Binance, or Kraken fills would not require any engine changes. Domain specificity lives in the ingestion adapters and the merger's bridge-detection logic. The classifiers are mostly generic in their tax math but receive domain knowledge through configuration (mark-price dicts, manually-flagged closes, correlated-pair maps).
src/classification/funding_classifier.py
Hyperliquid perpetual funding payments have no settled IRS authority — ChainTax computes three competing treatments in a single pipeline run and shows the dollar differential per item:
- Conservative (§163(d)): Investment expense deduction, capped at net investment income, excess suspends and carries forward.
- Moderate (basis adjustment): Funding payments distributed proportionally across open lots; deferred until lot closes; recommended position for the canonical Form 8949.
- Aggressive (§165(c)(2)): Ordinary expense deduction with no cap; Form 8275 disclosure required.
The treatment_comparison_2025.csv output lets a preparer compare net G/L across all three in a single read. For wynn 2025: Conservative $18.1M, Moderate $10.5M, Aggressive $10.4M.
src/classification/straddle_detector.py
Identifies three categories of offsetting positions that trigger §1092 loss deferral:
- Same-coin straddle: simultaneous long and short perp in the same coin.
- Spot-vs-perp straddle: long spot plus short perp in the same coin (IRS Notice 2003-54). Architecturally non-trivial because it correlates across two entirely separate data streams.
- Correlated straddle: opposing positions in correlated pairs (BTC/ETH, BTC/SOL, SOL/AVAX) via a static correlation map.
Each detection rebuilds position intervals from raw fills, finds temporal overlaps with a 3-day minimum and $100k minimum exposure threshold, then estimates deferred-loss exposure by summing closedPnl during the overlap window.
Prerequisites: Python 3.11+, pip. Optionally set COINGECKO_API_KEY for EVM swap classification (free tier works).
# Install
pip install -r requirements.txt
# Launch interactive demo (loads wynn demo data instantly)
streamlit run src/app.py
# Or run pipeline from CLI
python -m src.main wynn --year 2025 --fullCLI flags: --no-straddle (skip §1092), --no-specid (skip SpecID simulation), --no-fees (skip fee extraction), --year YYYY (different tax year).
- Not a production tax-filing service. The PDF says "Partner review required before filing" because that is what it is — a preparer's worksheet, not a final return.
- Not legal or tax advice. The treatment elections (§163(d), §165(c)(2), §1092) are real tax positions; their application to any specific situation requires a credentialed tax professional.
- Not validated by IRS audit. The treatment analysis reflects published authority as of build time; positions taken on actual returns should be confirmed by a credentialed tax professional reviewing current law.
- Not generally available beyond Hyperliquid + EVM. Other exchanges would require new ingestion adapters; the engine layer is exchange-agnostic, but the ingestion layer is not.
- The October 10, 2025 cascade liquidation event is hardcoded in
src/utils/cascade_detector.py— will need updating for future tax years. - The SpecID engine produces simulation-only output. Voluntary lot selection appears in the flagged report as advisory but does not drive Form 8949 generation (FIFO still drives).
- The §475(f) MTM simulator works from realized G/L only. A full election would also require year-end position marking of unrealized gains — not implemented.
- No automated test suite. The pipeline was validated against three public high-volume Hyperliquid wallets at scale; behavior was not formalized in unit tests.
- The merger uses row-by-row Python processing instead of vectorized pandas for readability. First-run latency on 500k+ event wallets is 5-10 minutes; the View Results path bypasses this by using precomputed
summary.json.
Built solo over approximately 7-8 days as a working tax preparation tool. The PDF output was designed to be reviewable by a tax partner in 1-2 hours instead of the 15-30 hours an unstructured Hyperliquid wallet typically requires. Presented here as an engineering portfolio piece; the tax treatment analysis reflects published authority at build time and should be reviewed by a credentialed professional before any actual filing.
ChainTax surfaced cross-source data reconciliation as the harder problem behind crypto tax filing. That observation became the basis for two follow-on projects:
- aether — a workflow reasoning engine for financial documents. Where ChainTax built a domain-specific pipeline for crypto tax, aether generalizes into LLM-assisted document workflows with a planner/executor/critic loop over hybrid RAG.
- polymarket-autopsy — a 45-day systematic trading bot project that applied aether's LLM-assisted methodology to reverse-engineer Polymarket operator behavior. 15-page technical autopsy with a companion live execution repo.
The three projects share an underlying interest: reconciling cross-source data where the answer is not obvious from any single source.



