# EDGAR Financial RAG System
 
## Overview
 
This notebook provides a natural language interface to analyze SEC filings (10-K and 10-Q) for five major tech companies:
- **AMZN** - Amazon
- **GOOGL** - Alphabet/Google
- **META** - Meta/Facebook
- **MSFT** - Microsoft
- **ORCL** - Oracle
 
**Data Coverage:** January 2023 - Present (64 filings total)
 
## Architecture
 
The system combines two complementary data sources:
 
### 1. Structured Financial Data (XBRL)
- **10,847 financial facts** extracted from balance sheets, income statements, and cash flow statements
- **Standardized XBRL concepts** - consistent metric names across all companies
- **SQL queryable** - precise numerical queries with date ranges and filtering
- **Use for:** Revenue, expenses, equity, assets, specific financial metrics
 
### 2. Narrative Content (Vector RAG)
- **17,730 text chunks** from risk factors, MD&A, business descriptions
- **Vector embeddings** via Voyage AI for semantic search
- **PostgreSQL pgvector** - efficient similarity search
- **Use for:** Risks, strategies, qualitative discussions, business context
 
### 3. Intelligent Query Router
- **Claude classifies** each query as quantitative, qualitative, or both
- **Automatically retrieves** from appropriate data sources
- **Synthesizes answers** combining numbers with narrative context

## Quick Start

The simplest way to use this system is the `ask()` function:
 
```python
ask("How much did Amazon spend on infrastructure in 2025?")
ask("What risks does Microsoft mention about AI and power?")
ask("Compare data center spending across the hyperscalers")
```
 
## Example Use Cases
 
**Quantitative Analysis:**
- Track infrastructure spending trends across companies
- Compare equity growth rates
- Analyze CapEx trajectories
 
**Qualitative Analysis:**
- Identify energy and power-related risks
- Understand data center expansion strategies
- Compare AI investment narratives

**Hybrid Analysis:**
- "How much is Microsoft spending on AI infrastructure and what do they say about power requirements?"
- "Compare AWS vs Azure capital investments and their growth strategies"
 
## Data Freshness
 
This system contains filings through early February 2026. To update:
1. Run `python edgar_parser_v2.py` to fetch new filings
2. Run `python generate_embeddings.py` to embed new chunks
3. Restart this notebook kernel

---

# EDGAR Financial RAG System
 
This notebook provides a natural language interface to query:
- **Structured financial data** (XBRL facts via SQL)
- **Narrative content** (risk factors, MD&A via vector search)
- **Synthesized answers** (Claude combines both)


In [9]:
import os
import json
import re
from datetime import date
import pandas as pd
from sqlalchemy import create_engine, text
from dotenv import load_dotenv
import voyageai
from anthropic import Anthropic

load_dotenv()

# Runtime config (keep in sync with generate_embeddings.py)
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "voyage-3-lite")
TODAY_STR = date.today().isoformat()

# Initialize clients
engine = create_engine(
    f"postgresql://{os.getenv('POSTGRES_USER')}:{os.getenv('POSTGRES_PASSWORD')}@"
    f"{os.getenv('POSTGRES_HOST')}:{os.getenv('POSTGRES_PORT')}/{os.getenv('POSTGRES_DB')}"
)
voyage_client = voyageai.Client(api_key=os.getenv("VOYAGE_API_KEY"))
anthropic_client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

print("âœ“ Clients initialized")
print(f"âœ“ Embedding model: {EMBEDDING_MODEL}")
print(f"âœ“ Today: {TODAY_STR}")

âœ“ Clients initialized
âœ“ Embedding model: voyage-3-lite
âœ“ Today: 2026-02-18


## Helper Functions

In [25]:
def analyze_query(question):
    """
    Use Claude to analyze the query and extract structured search parameters.
    Replaces classify_query_type, extract_company_ticker, and extract_keywords.
    """
    prompt = f"""Analyze this financial question and extract structured search parameters.

Question: {question}

Available companies and tickers:
- AMZN (Amazon)
- GOOGL (Alphabet/Google)
- META (Meta/Facebook)
- MSFT (Microsoft)
- ORCL (Oracle)

Available financial statement types: balance_sheet, income_statement, cashflow

Example XBRL labels in our database (use these to generate precise sql_keywords):
- "Technology and infrastructure", "Cost of sales", "Total operating expenses"
- "Property and equipment, net", "Capital expenditures", "Purchases of property and equipment"
- "Total assets", "Total current assets", "Cash and cash equivalents"
- "Total stockholders equity", "Retained earnings", "Additional paid-in capital"
- "Revenue", "Net income", "Operating income", "Net sales"
- "Long-term debt", "Total liabilities", "Accounts payable"
- "Depreciation and amortization", "Stock-based compensation"
- "Research and development", "General and administrative"

Respond with ONLY valid JSON (no markdown, no explanation):
{{
  "query_type": "quantitative | qualitative | both",
  "tickers": ["TICKER1", "TICKER2"],
  "sql_keywords": ["keyword phrase 1", "keyword phrase 2"],
  "date_range": {{"start": "YYYY-MM-DD", "end": "YYYY-MM-DD"}} or null,
  "narrative_search_query": "optimized search query for vector similarity" or ""
}}

Rules:
- query_type: "quantitative" for numbers/metrics, "qualitative" for risks/strategies/narratives, "both" for combined
- tickers: list ALL companies mentioned. Empty list [] if no specific company mentioned (searches all)
- sql_keywords: 2-5 precise phrases that would match XBRL labels or concepts via SQL ILIKE. Think about what the financial line items would actually be called in SEC filings. Do NOT include apostrophes in keywords.
- date_range: extract if the question mentions a time period (e.g., "last 2 years" means start 2024-01-01, end 2026-12-31). null if no time constraint
- narrative_search_query: a concise, semantically rich query for vector search. Empty string "" if purely quantitative
- Today's date is {TODAY_STR} for calculating relative date ranges"""

    response = anthropic_client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=500,
        messages=[{"role": "user", "content": prompt}]
    )

    raw = response.content[0].text.strip()

    # Parse JSON with fallback
    try:
        analysis = json.loads(raw)
    except json.JSONDecodeError:
        # Try to extract JSON from markdown code blocks or surrounding text
        match = re.search(r'\{.*\}', raw, re.DOTALL)
        if match:
            try:
                analysis = json.loads(match.group())
            except json.JSONDecodeError:
                analysis = None
        else:
            analysis = None

        if analysis is None:
            # Fallback to safe defaults
            analysis = {
                "query_type": "both",
                "tickers": [],
                "sql_keywords": ["revenue", "income"],
                "date_range": None,
                "narrative_search_query": question
            }

    # Validate tickers
    valid_tickers = {'AMZN', 'GOOGL', 'META', 'MSFT', 'ORCL'}
    analysis['tickers'] = [t for t in analysis.get('tickers', []) if t in valid_tickers]

    return analysis


# Apostrophe characters to strip from both DB labels and search keywords.
# DB labels use Unicode right single quote (U+2019), but keywords from Claude
# typically omit apostrophes entirely. Stripping both sides ensures matching.
_APOSTROPHES = "\u2018\u2019\u0027\u0060\u00B4"
_APOSTROPHE_RE = re.compile(f"[{_APOSTROPHES}]")


def search_financial_facts(keywords, tickers=None, date_range=None, limit=20):
    """
    Search structured financial facts using keyword matching.
    Supports multiple keywords, multiple tickers, and date range filtering.
    Ensures balanced per-ticker results when multiple tickers are requested.
    """
    if not keywords:
        return pd.DataFrame()

    keyword_conditions = []
    base_params = {}
    for i, kw in enumerate(keywords[:8]):
        kw_clean = _APOSTROPHE_RE.sub("", kw)
        param_name = f'keyword{i}'
        keyword_conditions.append(
            f"(REPLACE(REPLACE(label, E'\\u2019', ''), '''', '') ILIKE :{param_name}"
            f" OR REPLACE(REPLACE(concept, E'\\u2019', ''), '''', '') ILIKE :{param_name})"
        )
        base_params[param_name] = f'%{kw_clean}%'

    keyword_clause = " OR ".join(keyword_conditions)

    base_query = f"""
        SELECT 
            ticker,
            period_date,
            statement_type,
            label,
            value,
            unit,
            filing_type,
            filing_date
        FROM financial_facts_clean
        WHERE ({keyword_clause})
    """

    if date_range:
        if date_range.get('start'):
            base_query += " AND period_date >= :date_start"
            base_params['date_start'] = date_range['start']
        if date_range.get('end'):
            base_query += " AND period_date <= :date_end"
            base_params['date_end'] = date_range['end']

    # Single-ticker or all-ticker behavior
    if not tickers or len(tickers) <= 1:
        query = base_query
        params = dict(base_params)

        if tickers:
            params['ticker0'] = tickers[0].upper()
            query += " AND ticker = :ticker0"

        effective_limit = limit
        if date_range:
            effective_limit *= 3

        query += " ORDER BY ticker, label, period_date DESC LIMIT :limit"
        params['limit'] = effective_limit

        with engine.connect() as conn:
            return pd.read_sql(text(query), conn, params=params)

    # Multi-ticker behavior: query each ticker separately for balanced coverage
    per_ticker_limit = max(5, limit)
    if date_range:
        per_ticker_limit *= 3

    frames = []
    with engine.connect() as conn:
        for ticker in tickers:
            ticker_query = base_query + " AND ticker = :ticker ORDER BY label, period_date DESC LIMIT :limit"
            params = dict(base_params)
            params['ticker'] = ticker.upper()
            params['limit'] = per_ticker_limit

            df = pd.read_sql(text(ticker_query), conn, params=params)
            if not df.empty:
                frames.append(df)

    if not frames:
        return pd.DataFrame()

    out = pd.concat(frames, ignore_index=True)
    out = out.sort_values(['ticker', 'label', 'period_date'], ascending=[True, True, False])
    return out.reset_index(drop=True)


def search_narrative(query_text, tickers=None, limit=5, similarity_threshold=0.5):
    """
    Search narrative chunks using vector similarity.
    Supports multiple tickers and filters out low-similarity results.
    """
    if not query_text:
        return pd.DataFrame()

    # Generate query embedding
    query_embedding = voyage_client.embed(
        texts=[query_text],
        model=EMBEDDING_MODEL,
        input_type="query"
    ).embeddings[0]

    embedding_str = "[" + ",".join(map(str, query_embedding)) + "]"

    # Inner query fetches extra candidates to allow threshold filtering
    inner_limit = limit * 3

    inner_query = """
        SELECT 
            ticker,
            filing_type,
            filing_date,
            section,
            chunk_text,
            1 - (embedding <=> CAST(:query_embedding AS vector)) as similarity
        FROM document_chunks
        WHERE embedding IS NOT NULL
    """

    params = {'query_embedding': embedding_str}

    # Multi-ticker support
    if tickers:
        ticker_placeholders = []
        for i, t in enumerate(tickers):
            param_name = f'ticker{i}'
            ticker_placeholders.append(f":{param_name}")
            params[param_name] = t.upper()
        inner_query += f" AND ticker IN ({', '.join(ticker_placeholders)})"

    inner_query += """
        ORDER BY embedding <=> CAST(:query_embedding AS vector)
        LIMIT :inner_limit
    """
    params['inner_limit'] = inner_limit

    # Wrap with similarity threshold filter
    sql_query = f"""
        SELECT * FROM ({inner_query}) sub
        WHERE similarity >= :threshold
        LIMIT :outer_limit
    """
    params['threshold'] = similarity_threshold
    params['outer_limit'] = limit

    with engine.connect() as conn:
        return pd.read_sql(text(sql_query), conn, params=params)

## Main Query Function

In [17]:
def query_edgar(question, max_financial_results=20, max_narrative_results=5, synthesize=True):
    """
    Main query interface - routes to appropriate data sources and optionally synthesizes answer.
    
    Args:
        question: Natural language question
        max_financial_results: Max financial facts to retrieve
        max_narrative_results: Max narrative chunks to retrieve
        synthesize: Whether to use Claude to synthesize final answer
    
    Returns:
        dict with 'financial_data', 'narrative_data', and optionally 'answer'
    """
    print(f"\n{'='*60}")
    print(f"QUERY: {question}")
    print('='*60)
    
    # Analyze query with Claude
    analysis = analyze_query(question)
    
    query_type = analysis['query_type']
    tickers = analysis['tickers']
    sql_keywords = analysis['sql_keywords']
    date_range = analysis.get('date_range')
    narrative_search_query = analysis.get('narrative_search_query', '')
    
    print(f"\nQuery type: {query_type}")
    print(f"Tickers: {tickers if tickers else 'all'}")
    print(f"SQL keywords: {sql_keywords}")
    if date_range:
        print(f"Date range: {date_range['start']} to {date_range['end']}")
    if narrative_search_query:
        print(f"Narrative query: {narrative_search_query}")
    
    results = {}
    
    # Fetch financial data if needed
    if query_type in ['quantitative', 'both']:
        print("\nðŸ”¢ Searching financial facts...")
        financial_df = search_financial_facts(sql_keywords, tickers, date_range, max_financial_results)
        results['financial_data'] = financial_df
        print(f"   Found {len(financial_df)} financial facts")
    
    # Fetch narrative data if needed
    if query_type in ['qualitative', 'both']:
        print("\nðŸ“„ Searching narrative chunks...")
        narrative_df = search_narrative(narrative_search_query, tickers, max_narrative_results)
        results['narrative_data'] = narrative_df
        print(f"   Found {len(narrative_df)} relevant chunks")
    
    # Synthesize answer with Claude
    if synthesize:
        print("\nðŸ¤– Synthesizing answer with Claude...")
        results['answer'] = synthesize_answer(question, results)
    
    return results


def synthesize_answer(question, data):
    """Use Claude to synthesize a comprehensive answer from retrieved data."""
    
    # Build context from retrieved data
    context_parts = []
    
    if 'financial_data' in data and not data['financial_data'].empty:
        df = data['financial_data']
        context_parts.append("# Financial Data\n\n")
        # Group by ticker and show key metrics (up to 20 rows per ticker for trend coverage)
        for ticker in df['ticker'].unique():
            ticker_data = df[df['ticker'] == ticker].head(20)
            context_parts.append(f"## {ticker}\n")
            for _, row in ticker_data.iterrows():
                context_parts.append(
                    f"- {row['label']}: ${row['value']:.0f}M ({row['period_date']}, {row['filing_type']})\n"
                )
            context_parts.append("\n")
    
    if 'narrative_data' in data and not data['narrative_data'].empty:
        df = data['narrative_data']
        context_parts.append("\n# Narrative Context\n\n")
        for idx, row in df.head(5).iterrows():
            context_parts.append(
                f"## {row['ticker']} - {row['section'][:60]}... "
                f"({row['filing_type']} {row['filing_date']}, similarity: {row['similarity']:.3f})\n\n"
                f"{row['chunk_text'][:800]}...\n\n"
            )
    
    context = "".join(context_parts)
    
    if not context.strip():
        return "No relevant data found to answer this question."
    
    # Generate answer
    response = anthropic_client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2000,
        messages=[{
            "role": "user",
            "content": f"""You are a financial analyst assistant. Answer the following question using ONLY the provided data from SEC filings.

Question: {question}

Retrieved Data:
{context}

Instructions:
- Provide a clear, concise answer
- Cite specific numbers and dates from the data
- If comparing companies, create a clear comparison
- If the data doesn't fully answer the question, say so
- Do not make up information not in the data"""
        }]
    )
    
    return response.content[0].text

## Example Queries

In [18]:
# Example 1: Quantitative query
result = query_edgar(
    "How much did Amazon spend on technology and infrastructure in 2025?",
    synthesize=True
)

if 'financial_data' in result:
    print("\n" + "="*60)
    print("FINANCIAL DATA")
    print("="*60)
    display(result['financial_data'])

if 'answer' in result:
    print("\n" + "="*60)
    print("SYNTHESIZED ANSWER")
    print("="*60)
    print(result['answer'])



QUERY: How much did Amazon spend on technology and infrastructure in 2025?

Query type: quantitative
Tickers: ['AMZN']
SQL keywords: ['Technology and infrastructure', 'Capital expenditures', 'Purchases of property and equipment']
Date range: 2025-01-01 to 2025-12-31

ðŸ”¢ Searching financial facts...
   Found 8 financial facts

ðŸ¤– Synthesizing answer with Claude...

FINANCIAL DATA


Unnamed: 0,ticker,period_date,statement_type,label,value,unit,filing_type,filing_date
0,AMZN,2025-12-31,cashflow,Purchases of property and equipment,131819.0,millions,10-K,2026-02-06
1,AMZN,2025-09-30,cashflow,Purchases of property and equipment,92297.0,millions,10-Q,2025-10-31
2,AMZN,2025-06-30,cashflow,Purchases of property and equipment,57202.0,millions,10-Q,2025-08-01
3,AMZN,2025-03-31,cashflow,Purchases of property and equipment,25019.0,millions,10-Q,2025-05-02
4,AMZN,2025-12-31,income_statement,Technology and infrastructure,108521.0,millions,10-K,2026-02-06
5,AMZN,2025-09-30,income_statement,Technology and infrastructure,79122.0,millions,10-Q,2025-10-31
6,AMZN,2025-06-30,income_statement,Technology and infrastructure,50160.0,millions,10-Q,2025-08-01
7,AMZN,2025-03-31,income_statement,Technology and infrastructure,22994.0,millions,10-Q,2025-05-02



SYNTHESIZED ANSWER
Based on the SEC filing data provided, Amazon spent **$108,521 million** on technology and infrastructure in 2025, according to the 2025-12-31 10-K filing.

This represents the full-year 2025 technology and infrastructure spending as reported in Amazon's annual 10-K filing.


In [20]:
# Example 2: Qualitative query
result = query_edgar(
    "What does Amazon mention related to data center power and energy?",
    synthesize=True
)

if 'narrative_data' in result:
    print("\n" + "="*60)
    print("NARRATIVE DATA")
    print("="*60)
    for idx, row in result['narrative_data'].head(3).iterrows():
        print(f"\n{idx+1}. {row['ticker']} {row['filing_type']} ({row['filing_date']})")
        print(f"   Section: {row['section'][:60]}...")
        print(f"   Similarity: {row['similarity']:.3f}")
        print(f"   {row['chunk_text'][:400]}...\n")

if 'answer' in result:
    print("\n" + "="*60)
    print("SYNTHESIZED ANSWER")
    print("="*60)
    print(result['answer'])


QUERY: What does Amazon mention related to data center power and energy?

Query type: both
Tickers: ['AMZN']
SQL keywords: ['Technology and infrastructure', 'Capital expenditures', 'Property and equipment', 'Purchases of property and equipment', 'Depreciation and amortization']
Narrative query: data center power energy infrastructure electricity consumption renewable energy sustainability

ðŸ”¢ Searching financial facts...
   Found 20 financial facts

ðŸ“„ Searching narrative chunks...
   Found 1 relevant chunks

ðŸ¤– Synthesizing answer with Claude...

NARRATIVE DATA

1. AMZN 10-K (2026-02-06)
   Section: Operating Risks...
   Similarity: 0.506
   <div align='center'>10</div>

â€¢ potential negative impacts of climate change, including: increased operating costs due to more frequent extreme weather events or climate-related changes, such as rising temperatures and water scarcity; increased investment requirements associated with the transition to a low-carbon economy; decreased deman

In [26]:
# Example 3: Comparative query (both quantitative and qualitative)
result = query_edgar(
    "Compare data center infrastructure spending across Amazon, Microsoft, and Google",
    max_financial_results=30,
    synthesize=True
)

if 'financial_data' in result:
    print("\n" + "="*60)
    print("FINANCIAL DATA")
    print("="*60)
    display(result['financial_data'])

if 'answer' in result:
    print("\n" + "="*60)
    print("SYNTHESIZED ANSWER")
    print("="*60)
    print(result['answer'])



QUERY: Compare data center infrastructure spending across Amazon, Microsoft, and Google

Query type: quantitative
Tickers: ['AMZN', 'MSFT', 'GOOGL']
SQL keywords: ['Technology and infrastructure', 'Capital expenditures', 'Purchases of property and equipment', 'Property and equipment']

ðŸ”¢ Searching financial facts...
   Found 90 financial facts

ðŸ¤– Synthesizing answer with Claude...

FINANCIAL DATA


Unnamed: 0,ticker,period_date,statement_type,label,value,unit,filing_type,filing_date
0,AMZN,2025-12-31,cashflow,Depreciation and amortization of property and ...,65756.0,millions,10-K,2026-02-06
1,AMZN,2025-09-30,cashflow,Depreciation and amortization of property and ...,46285.0,millions,10-Q,2025-10-31
2,AMZN,2025-06-30,cashflow,Depreciation and amortization of property and ...,29489.0,millions,10-Q,2025-08-01
3,AMZN,2025-03-31,cashflow,Depreciation and amortization of property and ...,14262.0,millions,10-Q,2025-05-02
4,AMZN,2024-12-31,cashflow,Depreciation and amortization of property and ...,52795.0,millions,10-K,2026-02-06
...,...,...,...,...,...,...,...,...
85,MSFT,2023-06-30,balance_sheet,"Property and equipment, net of accumulated dep...",95641.0,millions,10-K,2024-07-30
86,MSFT,2024-09-30,balance_sheet,"Property and equipment, net of accumulated dep...",152863.0,millions,10-Q,2024-10-30
87,MSFT,2024-12-31,balance_sheet,"Property and equipment, net of accumulated dep...",166902.0,millions,10-Q,2025-01-29
88,MSFT,2025-03-31,balance_sheet,"Property and equipment, net of accumulated dep...",183939.0,millions,10-Q,2025-04-30



SYNTHESIZED ANSWER
Based on the provided SEC filing data, I can compare some aspects of infrastructure spending across these three companies, though the data types vary by company:

## Microsoft - Capital Expenditures (Additions to Property & Equipment)
Microsoft shows the most comprehensive capital expenditure data:
- **2025**: $49.3B (Q2) + $19.4B (Q1) = $68.7B (6 months)
- **2024 Full Year**: $44.5B 
- **2023 Full Year**: $28.1B
- **2022 Full Year**: $23.9B

Microsoft demonstrates dramatic growth in infrastructure spending, with 2024 capex ($44.5B) nearly doubling from 2023 ($28.1B).

## Amazon - Depreciation & Amortization
Amazon's data shows depreciation of existing infrastructure:
- **2024 Full Year**: $52.8B
- **2023 Full Year**: $48.7B  
- **2022 Full Year**: $41.9B
- **2021 Full Year**: $34.4B

## Google - Depreciation of Property & Equipment
Google's depreciation data:
- **2024 Full Year**: $15.3B
- **2023 Full Year**: $11.9B
- **Historical**: $15.3B (2022), $11.6B (2021)

#

## Interactive Query Interface

In [22]:
def ask(question):
    """Simple interface - just ask a question!"""
    result = query_edgar(question, synthesize=True)
    
    if 'answer' in result:
        print("\n" + "="*60)
        print("ANSWER")
        print("="*60)
        print(result['answer'])
        print()
    
    return result


Example usage:

ask("What is Microsoft's total stockholders equity trend over the last 2 years?")

ask("What does Meta say about AI infrastructure investments?")

In [23]:
ask("What is Microsoft's total stockholders equity trend over the last 2 years?")


QUERY: What is Microsoft's total stockholders equity trend over the last 2 years?

Query type: quantitative
Tickers: ['MSFT']
SQL keywords: ['Total stockholders equity', 'stockholders equity', 'shareholders equity', 'total equity']
Date range: 2024-01-01 to 2026-12-31

ðŸ”¢ Searching financial facts...
   Found 16 financial facts

ðŸ¤– Synthesizing answer with Claude...

ANSWER
Based on the SEC filing data provided, Microsoft's total stockholders' equity has shown a consistent upward trend over the last 2 years:

**2-Year Trend Analysis:**
- **March 31, 2024**: $253,152M
- **December 31, 2025**: $390,875M
- **Total Growth**: $137,723M (54.4% increase over ~21 months)

**Quarterly Progression:**
- Q1 2024 (Mar 31): $253,152M
- Q2 2024 (Jun 30): $268,477M 
- Q3 2024 (Sep 30): $287,723M
- Q4 2024 (Dec 31): $302,695M
- Q1 2025 (Mar 31): $321,891M
- Q2 2025 (Jun 30): $343,479M
- Q3 2025 (Sep 30): $363,076M
- Q4 2025 (Dec 31): $390,875M

The trend shows **consistent quarter-over-quarter gro

{'financial_data':    ticker period_date statement_type  \
 0    MSFT  2025-12-31  balance_sheet   
 1    MSFT  2025-09-30  balance_sheet   
 2    MSFT  2025-06-30  balance_sheet   
 3    MSFT  2025-03-31  balance_sheet   
 4    MSFT  2024-12-31  balance_sheet   
 5    MSFT  2024-09-30  balance_sheet   
 6    MSFT  2024-06-30  balance_sheet   
 7    MSFT  2024-03-31  balance_sheet   
 8    MSFT  2025-12-31  balance_sheet   
 9    MSFT  2025-09-30  balance_sheet   
 10   MSFT  2025-06-30  balance_sheet   
 11   MSFT  2025-03-31  balance_sheet   
 12   MSFT  2024-12-31  balance_sheet   
 13   MSFT  2024-09-30  balance_sheet   
 14   MSFT  2024-06-30  balance_sheet   
 15   MSFT  2024-03-31  balance_sheet   
 
                                          label     value      unit  \
 0   Total liabilities and stockholdersâ€™ equity  665302.0  millions   
 1   Total liabilities and stockholdersâ€™ equity  636351.0  millions   
 2   Total liabilities and stockholdersâ€™ equity  619003.0  milli