MCP server for analyzing SEC filings (10-K, 10-Q, 8-K) with industry-aware financial extraction and BERT-based NLP.
- Company Search — Look up companies by ticker or name via SEC EDGAR
- Standardized Financials — Industry-aware XBRL extraction with ~250 concept mappings across 5 industry classes (standard, bank, insurance, REIT, utility)
- Validation — Automatic sanity checks (revenue ≥ net income, accounting equation, segment vs total detection)
- Filing Access — Fetch filing text and specific sections (Risk Factors, MD&A, etc.)
- Sentiment Analysis — FinBERT financial sentiment (positive/negative/neutral)
- Summarization — BART-based hierarchical summarization for long filing sections
- Entity Extraction — NER for companies, people, locations + regex for monetary values, dates, percentages
# Clone
git clone https://github.com/YOUR_USERNAME/SEC-MCP.git
cd SEC-MCP
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install
pip install -e ".[dev]"
# Configure EDGAR identity (required by SEC)
cp .env.example .env
# Edit .env and set EDGAR_IDENTITY="Your Name your@email.com"| Tool | Description |
|---|---|
search_company |
Search by ticker/name → CIK, ticker, SIC code, industry |
get_filing_list |
List filings, filter by form type (10-K, 10-Q, 8-K) |
| Tool | Description |
|---|---|
get_financials |
Full standardized extraction: metrics, ratios, validation, opt. statements |
get_financials_batch |
Same as above for N tickers in parallel |
get_income_statement |
Just the income statement rows |
get_balance_sheet |
Just the balance sheet rows |
get_cash_flow |
Just the cash flow rows |
get_financial_ratios |
Just computed ratios (margins, ROA, ROE, leverage, etc.) |
compare_companies |
Side-by-side metrics + ratios for multiple tickers |
| Tool | Description |
|---|---|
get_filing_text |
Full filing or specific section text (supports aliases like 'risk factors') |
| Tool | Description |
|---|---|
analyze_sentiment |
FinBERT sentiment on text or filing section |
summarize_filing |
Hierarchical BART summarization |
extract_entities |
NER (ORG, PER, LOC, MONEY, DATE, PERCENT) |
analyze_filing |
Combined sentiment + summary + entities in one call |
The SIC code is used to classify a company into one of 5 industry classes:
| Class | SIC Range | Revenue Strategy |
|---|---|---|
| standard | Everything else | First match: Revenues, RevenueFromContractWithCustomer, SalesRevenueNet, … |
| bank | 6020–6299 | Try total (Revenues, NetRevenues), then aggregate NII + non-interest + trading + fees |
| insurance | 6310–6411 | Try total, then aggregate premiums + investment income + fees |
| reit | 6500–6553 | Lease revenue + other income |
| utility | 4900–4991 | Electric + gas utility revenue |
xbrl_mappings.py maps ~250 XBRL concepts to 20+ standardized metrics. Each metric has an ordered list of concepts to try — earlier entries are preferred. Some entries are marked aggregate=True (sum all matching, used for multi-component revenue like banks).
Every extraction runs these checks:
- revenue ≥ net income (when both positive) — catches segment-only revenue
- Assets = Liabilities + Equity (within 5%) — catches mismatched concepts
- Revenue not null — warns if no concept matched
- Bank segment check — flags if bank revenue < 80% of net income
- Gross margin 0–100% — for standard companies
Warnings are returned in the validation array so the AI can explain or retry.
python -m sec_mcp.server- Configure MCP so your app starts the SEC-MCP server (see below).
- Set
EDGAR_IDENTITYin.envor in the MCP server env. - The AI chooses the right tool per request:
- "Apple's financials" →
get_financials("AAPL") - "Compare AAPL vs MSFT vs GOOGL" →
compare_companies(["AAPL","MSFT","GOOGL"]) - "Morgan Stanley income statement" →
get_income_statement("MS") - "What are Apple's risk factors?" →
get_filing_textwith section='risk factors'
- "Apple's financials" →
{
"mcpServers": {
"sec-mcp": {
"command": "python",
"args": ["-m", "sec_mcp.server"],
"cwd": "/path/to/SEC-MCP",
"env": {
"EDGAR_IDENTITY": "Your Name your@email.com"
}
}
}
}| Variable | Default | Description |
|---|---|---|
EDGAR_IDENTITY |
SEC-MCP sec-mcp@example.com |
Your identity for SEC EDGAR API |
SENTIMENT_MODEL |
ProsusAI/finbert |
Sentiment analysis model |
SUMMARIZATION_MODEL |
facebook/bart-large-cnn |
Summarization model |
NER_MODEL |
dslim/bert-base-NER |
NER model |
MAX_CHUNK_TOKENS |
512 |
Max tokens per chunk |
CHUNK_OVERLAP_TOKENS |
128 |
Overlap between chunks |
src/sec_mcp/
├── server.py # MCP tool definitions (14 tools)
├── edgar_client.py # EDGAR API wrapper (company search, filings, text)
├── financials.py # Standardized extraction engine + validation
├── xbrl_mappings.py # XBRL concept → metric dictionary (5 industry classes)
├── models.py # Pydantic models (StandardizedFinancials, ratios, etc.)
├── config.py # Environment config
└── nlp/
├── sentiment.py # FinBERT
├── summarizer.py # BART
└── ner.py # NER
Models are lazy-loaded (downloaded on first use, ~2.5GB total):
- ProsusAI/finbert — Financial sentiment, trained on SEC filings
- facebook/bart-large-cnn — Abstractive summarization
- dslim/bert-base-NER — Named entity recognition
# Run tests
pytest
# Run tests (skip slow model tests)
pytest -m "not slow"
# Lint
ruff check src/ tests/MIT