SEC-MCP

MCP server for analyzing SEC filings (10-K, 10-Q, 8-K) with industry-aware financial extraction and BERT-based NLP.

Features

Company Search — Look up companies by ticker or name via SEC EDGAR
Standardized Financials — Industry-aware XBRL extraction with ~250 concept mappings across 5 industry classes (standard, bank, insurance, REIT, utility)
Validation — Automatic sanity checks (revenue ≥ net income, accounting equation, segment vs total detection)
Filing Access — Fetch filing text and specific sections (Risk Factors, MD&A, etc.)
Sentiment Analysis — FinBERT financial sentiment (positive/negative/neutral)
Summarization — BART-based hierarchical summarization for long filing sections
Entity Extraction — NER for companies, people, locations + regex for monetary values, dates, percentages

Setup

# Clone
git clone https://github.com/YOUR_USERNAME/SEC-MCP.git
cd SEC-MCP

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install
pip install -e ".[dev]"

# Configure EDGAR identity (required by SEC)
cp .env.example .env
# Edit .env and set EDGAR_IDENTITY="Your Name your@email.com"

Available Tools

Base / Discovery

Tool	Description
`search_company`	Search by ticker/name → CIK, ticker, SIC code, industry
`get_filing_list`	List filings, filter by form type (10-K, 10-Q, 8-K)

Financials (standardized, industry-aware, validated)

Tool	Description
`get_financials`	Full standardized extraction: metrics, ratios, validation, opt. statements
`get_financials_batch`	Same as above for N tickers in parallel
`get_income_statement`	Just the income statement rows
`get_balance_sheet`	Just the balance sheet rows
`get_cash_flow`	Just the cash flow rows
`get_financial_ratios`	Just computed ratios (margins, ROA, ROE, leverage, etc.)
`compare_companies`	Side-by-side metrics + ratios for multiple tickers

Filing Text

Tool	Description
`get_filing_text`	Full filing or specific section text (supports aliases like 'risk factors')

NLP Analysis

Tool	Description
`analyze_sentiment`	FinBERT sentiment on text or filing section
`summarize_filing`	Hierarchical BART summarization
`extract_entities`	NER (ORG, PER, LOC, MONEY, DATE, PERCENT)
`analyze_filing`	Combined sentiment + summary + entities in one call

How financials extraction works

Industry detection

The SIC code is used to classify a company into one of 5 industry classes:

Class	SIC Range	Revenue Strategy
standard	Everything else	First match: `Revenues`, `RevenueFromContractWithCustomer`, `SalesRevenueNet`, …
bank	6020–6299	Try total (`Revenues`, `NetRevenues`), then aggregate NII + non-interest + trading + fees
insurance	6310–6411	Try total, then aggregate premiums + investment income + fees
reit	6500–6553	Lease revenue + other income
utility	4900–4991	Electric + gas utility revenue

XBRL concept dictionary

xbrl_mappings.py maps ~250 XBRL concepts to 20+ standardized metrics. Each metric has an ordered list of concepts to try — earlier entries are preferred. Some entries are marked aggregate=True (sum all matching, used for multi-component revenue like banks).

Validation rules

Every extraction runs these checks:

revenue ≥ net income (when both positive) — catches segment-only revenue
Assets = Liabilities + Equity (within 5%) — catches mismatched concepts
Revenue not null — warns if no concept matched
Bank segment check — flags if bank revenue < 80% of net income
Gross margin 0–100% — for standard companies

Warnings are returned in the validation array so the AI can explain or retry.

Usage

Run as MCP server (STDIO)

python -m sec_mcp.server

Using with your app (Cursor, Claude Desktop, etc.)

Configure MCP so your app starts the SEC-MCP server (see below).
Set EDGAR_IDENTITY in .env or in the MCP server env.
The AI chooses the right tool per request:
- "Apple's financials" → get_financials("AAPL")
- "Compare AAPL vs MSFT vs GOOGL" → compare_companies(["AAPL","MSFT","GOOGL"])
- "Morgan Stanley income statement" → get_income_statement("MS")
- "What are Apple's risk factors?" → get_filing_text with section='risk factors'

Cursor / Claude Desktop configuration

{
  "mcpServers": {
    "sec-mcp": {
      "command": "python",
      "args": ["-m", "sec_mcp.server"],
      "cwd": "/path/to/SEC-MCP",
      "env": {
        "EDGAR_IDENTITY": "Your Name your@email.com"
      }
    }
  }
}

Configuration

Variable	Default	Description
`EDGAR_IDENTITY`	`SEC-MCP sec-mcp@example.com`	Your identity for SEC EDGAR API
`SENTIMENT_MODEL`	`ProsusAI/finbert`	Sentiment analysis model
`SUMMARIZATION_MODEL`	`facebook/bart-large-cnn`	Summarization model
`NER_MODEL`	`dslim/bert-base-NER`	NER model
`MAX_CHUNK_TOKENS`	`512`	Max tokens per chunk
`CHUNK_OVERLAP_TOKENS`	`128`	Overlap between chunks

Architecture

src/sec_mcp/
├── server.py           # MCP tool definitions (14 tools)
├── edgar_client.py     # EDGAR API wrapper (company search, filings, text)
├── financials.py       # Standardized extraction engine + validation
├── xbrl_mappings.py    # XBRL concept → metric dictionary (5 industry classes)
├── models.py           # Pydantic models (StandardizedFinancials, ratios, etc.)
├── config.py           # Environment config
└── nlp/
    ├── sentiment.py    # FinBERT
    ├── summarizer.py   # BART
    └── ner.py          # NER

NLP Models

Models are lazy-loaded (downloaded on first use, ~2.5GB total):

ProsusAI/finbert — Financial sentiment, trained on SEC filings
facebook/bart-large-cnn — Abstractive summarization
dslim/bert-base-NER — Named entity recognition

Development

# Run tests
pytest

# Run tests (skip slow model tests)
pytest -m "not slow"

# Lint
ruff check src/ tests/

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
scripts		scripts
src/sec_mcp		src/sec_mcp
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.vercelignore		.vercelignore
API_IMPLEMENTATION_SUMMARY.md		API_IMPLEMENTATION_SUMMARY.md
API_QUICK_REFERENCE.md		API_QUICK_REFERENCE.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
FILES_CREATED.txt		FILES_CREATED.txt
IMPLEMENTATION_CHECKLIST.md		IMPLEMENTATION_CHECKLIST.md
README.md		README.md
UPGRADE_PLAN.md		UPGRADE_PLAN.md
new-frontend-prototype.html		new-frontend-prototype.html
new-frontend-prototype.jsx		new-frontend-prototype.jsx
pyproject.toml		pyproject.toml
railway.toml		railway.toml
uv.lock		uv.lock
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SEC-MCP

Features

Setup

Available Tools

Base / Discovery

Financials (standardized, industry-aware, validated)

Filing Text

NLP Analysis

How financials extraction works

Industry detection

XBRL concept dictionary

Validation rules

Usage

Run as MCP server (STDIO)

Using with your app (Cursor, Claude Desktop, etc.)

Cursor / Claude Desktop configuration

Configuration

Architecture

NLP Models

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SEC-MCP

Features

Setup

Available Tools

Base / Discovery

Financials (standardized, industry-aware, validated)

Filing Text

NLP Analysis

How financials extraction works

Industry detection

XBRL concept dictionary

Validation rules

Usage

Run as MCP server (STDIO)

Using with your app (Cursor, Claude Desktop, etc.)

Cursor / Claude Desktop configuration

Configuration

Architecture

NLP Models

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages