# Automating Non-GAAP Metric Extraction from 8-Ks

**Adjusted EBITDA, Free Cash Flow, and Core Earnings** are central to financial analysis—often more informative than GAAP figures. But extracting these metrics from earnings releases and 8-Ks is difficult. Formats are inconsistent, terminology varies, and key tables are often buried deep in unstructured disclosures.

**Traditional tools** (regex, templates, table parsers) struggle with this inconsistency, making extraction slow, manual, and error-prone.

Captide uses a **retrieval-augmented generation (RAG) API** to pull metrics like Adjusted EBITDA from SEC filings. It returns structured, schema-consistent JSON—even from messy source documents—making the data ready for modeling or analysis.

In [18]:
# 📦 Install required packages
!pip install requests pandas python-dotenv

/bin/bash: pip: command not found


We first collect recent 8-Ks (Item 2.02) using a `fetch_documents` function. Then we send a structured prompt to Captide to extract Net Income to Adjusted EBITDA reconciliations.

In [19]:
# 🔐 Load API key from .env or environment variable
import os
from IPython.display import display
from dotenv import load_dotenv

load_dotenv()
CAPTIDE_API_KEY = os.getenv("CAPTIDE_API_KEY")

HEADERS = {
    "X-API-Key": CAPTIDE_API_KEY,
    "Content-Type": "application/json",
    "Accept": "application/json"
}

TICKERS = ["SNAP", "PLTR", "UBER"]

In [20]:
# 🛠️ Helper functions for document filtering and API parsing
import re, json, requests, pandas as pd
from typing import Dict, List

def is_valid_fiscal_period(fp: str) -> bool:
    m = re.match(r"Q([1-4]) (\d{4})", fp)
    return bool(m and int(m.group(2)) > 2022)

def is_valid_document(doc: Dict) -> bool:
    if doc["sourceType"] == "8-K":
        return "2.02" in doc.get("additionalKwargs", {}).get("item", "")
    return True

def fetch_documents(ticker: str) -> List[Dict]:
    url = f"https://rest-api.captide.co/api/v1/companies/ticker/{ticker}/documents"
    docs = requests.get(url, headers=HEADERS, timeout=60).json()
    return [
        {"ticker": doc["ticker"],
         "fiscalPeriod": doc["fiscalPeriod"],
         "sourceLink": doc["sourceLink"]}
        for doc in docs
        if doc["sourceType"] == "8-K"
        and "fiscalPeriod" in doc
        and is_valid_fiscal_period(doc["fiscalPeriod"])
        and is_valid_document(doc)
    ]

In [21]:
# 📊 SSE Response Parsing and Prompt Generation
def parse_sse_response(sse_text: str) -> Dict:
    try:
        lines = [l[6:] for l in sse_text.splitlines() if l.startswith("data: ")]
        for l in lines:
            obj = json.loads(l)
            if obj.get("type") == "full_answer":
                content = re.sub(r"\s*\[#\w+\]", "", obj["content"])
                m = re.search(r"\{.*\}", content, re.DOTALL)
                return json.loads(m.group(0)) if m else {}
    except Exception:
        pass
    return {}

def fetch_metrics_with_prompt(source_links: List[str], prompt: str) -> Dict:
    payload = {"query": prompt, "sourceLink": source_links}
    r = requests.post(
        "https://rest-api.captide.co/api/v1/rag/agent-query-stream",
        json=payload, headers=HEADERS, timeout=120
    )
    return parse_sse_response(r.text)

Reconciliation formats change across quarters. To handle this, we dynamically learn a stable schema using previous reconciliations as a guide. This avoids rigid templates while maintaining consistency—critical for time-series analysis.

In [22]:
# 🔁 Prompt building and reconciliation merging logic
BASE_PROMPT = (
    "Return a single valid JSON object with double-quoted keys and numeric values. Values must be stored in thousands. "
    "The object must represent the reconciliation from Net Income to Adjusted EBITDA, including all reported line items. "
    "Use positive values for metrics that are added to Net Income in the reconciliation and negative values for metrics "
    "that are subtracted. Do not include words like 'add' or 'less' in the keys. Output only the JSON object—no commentary "
    "or extra text."
)

def build_prompt(prev_keys: List[str]) -> str:
    if not prev_keys:
        return BASE_PROMPT
    joined = ", ".join(f'"{k}"' for k in prev_keys)
    return (
        BASE_PROMPT +
        f" Use the following keys in this order if they appear: [{joined}]."
        " If the document contains additional reconciliation line items, insert "
        "them at the correct position relative to the list above."
    )

def merge_key_lists(master: list[str], this_quarter: list[str]) -> list[str]:
    for i, k in enumerate(this_quarter):
        if k in master:
            continue
        insert_pos = None
        for j in range(i - 1, -1, -1):
            prev_key = this_quarter[j]
            if prev_key in master:
                insert_pos = master.index(prev_key) + 1
                break
        if insert_pos is None:
            for j in range(i + 1, len(this_quarter)):
                nxt_key = this_quarter[j]
                if nxt_key in master:
                    insert_pos = master.index(nxt_key)
                    break
        if insert_pos is None:
            insert_pos = len(master)
        master.insert(insert_pos, k)
    return master

Using ```run_one_ticker```, we batch process filings for each company, normalize the results, and align the schema. This creates a per-ticker dictionary of clean, time-indexed financial data—ready for modeling or dashboards.

In [23]:
# 🧠 Execute the API logic for each ticker
def fiscal_sort_key(fp: str) -> tuple[int, int]:
    m = re.match(r"Q([1-4]) (\d{4})", fp)
    if not m:
        return (9999, 9)
    q, yr = int(m.group(1)), int(m.group(2))
    return (yr, q)

def run_one_ticker(ticker: str) -> Dict[str, Dict[str, float]]:
    docs = fetch_documents(ticker)
    docs.sort(key=lambda d: fiscal_sort_key(d["fiscalPeriod"]))

    key_order: List[str] = []
    results: Dict[str, Dict[str, float]] = {}

    for doc in docs:
        prompt = build_prompt(key_order)
        data = fetch_metrics_with_prompt([doc["sourceLink"]], prompt)
        if not data:
            continue
        results[doc["fiscalPeriod"]] = data
        key_order = merge_key_lists(key_order, list(data.keys()))

    return {"keys": key_order, "data": results}

In [24]:
# 🚀 Run the notebook for selected tickers

from concurrent.futures import ThreadPoolExecutor, as_completed

per_ticker_output = {}
with ThreadPoolExecutor(max_workers=len(TICKERS)) as pool:
    futures = {pool.submit(run_one_ticker, t): t for t in TICKERS}
    for fut in as_completed(futures):
        ticker = futures[fut]
        per_ticker_output[ticker] = fut.result()

We convert each company’s results into a tidy pandas.DataFrame:

In [25]:
# 📊 Format and display as dataframes
tables = {}
for ticker, payload in per_ticker_output.items():
    key_order = payload["keys"]
    series_by_q = payload["data"]
    df = pd.DataFrame(series_by_q).reindex(key_order)
    df.index.name = "Line item"
    tables[ticker] = df

for t, frame in tables.items():
    print(f"\n📊 {t}")
    display(frame)


📊 PLTR


Unnamed: 0_level_0,Q1 2023,Q2 2023,Q3 2023,Q4 2023,Q1 2024,Q2 2024,Q3 2024,Q4 2024,Q1 2025
Line item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Net income attributable to common stockholders,16802,28127,71505,93391,105530.0,134126.0,143525.0,79009.0,214031.0
Net income attributable to noncontrolling interests,2349,-255,1934,3522,541.0,1444.0,5816.0,-2073.0,3686.0
Interest income,-20853,-30310,-36864,-44545,-43352.0,-46593.0,-52120.0,-54727.0,-50441.0
Interest expense,1275,1317,742,136,,,,,
"Other (income) expense, net",2861,9024,-3864,3956,13507.0,11173.0,8110.0,-14768.0,3173.0
Provision for income taxes,1681,2171,6530,9334,4655.0,5189.0,7809.0,3602.0,5599.0
Depreciation and amortization,8320,8399,8663,7972,8438.0,8056.0,8087.0,7006.0,6622.0
Stock-based compensation,114714,114201,114380,132608,125651.0,141764.0,142425.0,281798.0,155339.0
Employer payroll taxes related to stock-based compensation,6285,10760,8909,10953,19926.0,6464.0,19950.0,79681.0,59323.0



📊 UBER


Unnamed: 0_level_0,Q1 2023,Q2 2023,Q3 2023,Q4 2023,Q1 2024,Q2 2024,Q3 2024,Q4 2024,Q1 2025
Line item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
"Net income attributable to Uber Technologies, Inc.",-157000.0,394.0,221.0,1429000.0,-654000.0,1015000.0,2612000.0,6883000.0,1776000.0
"Net income attributable to non-controlling interests, net of tax",0.0,0.0,-2.0,271000.0,-9000.0,-7000.0,-13000.0,18000.0,-2000.0
Provision for income taxes,55000.0,65.0,-40.0,133000.0,29000.0,57000.0,158000.0,-6002000.0,-402000.0
Income from equity method investments,-36000.0,-4.0,-3.0,-5000.0,4000.0,12000.0,12000.0,10000.0,-13000.0
Interest expense,168000.0,144.0,166.0,155000.0,124000.0,139000.0,143000.0,117000.0,105000.0
"Other income (expense), net",-292000.0,-273.0,52.0,-1331000.0,678000.0,-420000.0,-1851000.0,-256000.0,-262000.0
Depreciation and amortization,207000.0,208.0,205.0,203000.0,190000.0,173000.0,179000.0,169000.0,171000.0
Stock-based compensation expense,470000.0,504.0,492.0,469000.0,484000.0,455000.0,438000.0,419000.0,435000.0
"Legal, tax, and regulatory reserve changes and settlements",250000.0,-155.0,-13.0,-73000.0,527000.0,134000.0,0.0,462000.0,28000.0
Goodwill and asset impairments/loss on sale of assets,67000.0,16.0,2.0,-1000.0,-3000.0,0.0,0.0,6000.0,0.0



📊 SNAP


Unnamed: 0_level_0,Q1 2023,Q2 2023,Q3 2023,Q4 2023,Q1 2024,Q2 2024,Q3 2024,Q4 2024
Line item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Net loss,-328674.0,-377308.0,-368256,-248247,-305090,-249000,153000.0,0
Interest income,-37948.0,-43144.0,-43839,-43463,-39898,-18000,,0
Interest expense,5885.0,5343.0,5521,5275,4743,21000,,0
"Other (income) expense, net",-11372.0,-1323.0,20662,34447,81,16000,,0
Income tax (benefit) expense,6845.0,12093.0,5849,3275,6932,2000,,0
Depreciation and amortization,35220.0,39688.0,41209,43882,38098,54000,,0
Stock-based compensation expense,314931.0,317943.0,353846,333063,254715,255000,266000.0,0
Payroll and other tax expense related to stock-based compensation,15926.0,8229.0,6463,8706,15970,14000,,0
Restructuring charges (1),,,18639,22211,70108,17000,0.0,0
Adjusted EBITDA,813.0,-38479.0,40094,159149,45659,55000,132000.0,276000
