Skip to content

MikeSchaef/defense-intel-platform

Repository files navigation

Defense Intelligence Platform

A defense technology intelligence platform that tracks federal procurement, maps the defense industrial base, and identifies investment signals across national security technology.

3M+ contract awards. 130K+ companies. 200+ capability categories. Built solo.

Live Demo — static demo with sample data (no backend required)

What It Does

Ingests public federal procurement data from 10+ sources, resolves entities across messy government records, enriches with LLM analysis, and surfaces actionable intelligence through an API, React dashboard, and MCP server for Claude Desktop.

Two core questions:

  1. What is the government buying? — from whom, how much, in what capability areas, how it's changing
  2. What should it be buying but isn't? — capability gaps, unfunded requirements, companies positioned to fill them

Scale

Metric Count
Companies 130K+
Awards (total) 3M+
Contracts (USASpending) 2M+
Grants/Assistance 350K+
SBIR Awards 170K+
Subawards 400K+
Entity Aliases Resolved 100K+
Capabilities Tracked 200+
API Endpoints 40+

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        INGESTION LAYER                              │
│  USASpending │ SBIR.gov │ SAM.gov │ SEC EDGAR │ Crunchbase │ GitHub│
│  Subawards   │ Grants   │ PatentsView │ Congress.gov │ GDELT      │
└──────────────────────────────┬──────────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────────┐
│                      PROCESSING LAYER                               │
│  Entity Resolution (UEI-first) → Alias Matching → Capability Mapper │
│  Deduplication → Aggregate Calculation → Quality Scoring            │
└──────────────────────────────┬──────────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     INTELLIGENCE LAYER                              │
│                                                                     │
│  ANALYST: Company profiles, funding, awards, capabilities           │
│  TACHYON: Weak signals, forecasts, capability gaps                  │
│  STRATEGIST: Investment implications, portfolio construction        │
│  BOARDROOM: Executive briefings, thesis generation                  │
└──────────────────────────────┬──────────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────────┐
│                       OUTPUT LAYER                                  │
│  FastAPI (42 endpoints) │ React Dashboard │ MCP Server │ Agents    │
└─────────────────────────────────────────────────────────────────────┘

Tech Stack

  • Backend: Python 3.11+, FastAPI, SQLAlchemy, Alembic
  • Database: PostgreSQL 15
  • Frontend: React + TypeScript + Tailwind CSS, Vite
  • LLM Integration: Anthropic Claude API (enrichment, capability tagging, thesis generation)
  • MCP Server: Claude Desktop integration with 6 query tools for natural language contract exploration
  • Deployment: Docker, Cloudflare Tunnel
  • Hardware: Runs on a Raspberry Pi 5 (8GB) + Hailo-8L AI Hat (13 TOPS)

Key Features

Entity Resolution Pipeline

Government procurement data is messy — the same company appears under dozens of names, with different DUNS numbers, UEIs, and spellings. The platform resolves entities using a 5-step cascade:

1. UEI lookup (deterministic, highest confidence)
2. RecipientAlias exact match (120K+ mappings)
3. DUNS/CAGE lookup (legacy identifiers)
4. Slug/normalized name match
5. Create new company + alias (last resort)

AI-Powered Enrichment

Each contract is enriched using Claude to extract structured intelligence: capability classification, technology identification, agency relationships, partner ecosystems, and strategic summaries. 92K+ contracts enriched to date.

MCP Server for Claude Desktop

A Model Context Protocol server exposes 6 tools that let Claude query the database conversationally:

  • search_contracts — keyword search across enriched contract data
  • get_company_contracts — full contract history for any company
  • search_companies — find companies by name with aggregate stats
  • find_agency_contractors — who works for a specific agency
  • find_technology_contracts — contracts involving specific technologies
  • get_enrichment_stats — database coverage and quality metrics

LLM Agent System

Four specialized agents for different analytical lenses:

  • Analyst: Company deep-dives, capability mapping, competitive landscape
  • Tachyon: Weak signal detection, trend forecasting, emerging capability gaps
  • Strategist: Investment thesis generation, portfolio construction
  • Boardroom: Executive briefing generation, decision support

Data Sources

Source Type Records
USASpending Contracts Federal contracts 2.15M
USASpending Grants Assistance awards 360K
USASpending Subawards Sub-tier awards 410K
SBIR.gov Innovation awards 171K
SAM.gov Entity enrichment Lookup
SEC EDGAR 10-K/10-Q filings Structured extraction
Crunchbase VC/funding data Enrichment
GitHub Open source activity Enrichment

Data Model

Core Tables

  • companies — 130K+ entities with UEI, DUNS, CAGE, SEC CIK, ticker, NAICS codes
  • awards — 3M+ records with NAICS/PSC codes, subaward-to-prime linking
  • capabilities — Hierarchical taxonomy (domains, functions, enablers)
  • company_capability — Many-to-many with confidence scores
  • company_relationships — Graph edges with evidence tracking
  • recipient_aliases — Entity resolution mappings

Extended Tables

  • solicitations — SAM.gov demand signals
  • sec_filings — Structured 10-K/10-Q intelligence
  • patents — USPTO data via PatentsView
  • lobbying_records — OpenSecrets lobbying activity
  • legislation — Defense bills and appropriations
  • persons / person_company_roles — Executive tracking
  • world_events — Geopolitical event signals

Capability Taxonomy

Domains: Land, Maritime, Air, Space, Cyber, Cognitive, Electromagnetic

Functions: ISR, Strike, Protection, Mobility, C2, Sustainment, CBRN

Enablers: Autonomy, Connectivity, Computing, Energy, Manufacturing, Human Performance, Materials

Emerging: Swarm Intelligence, Bio-Integration, Cognitive Security, Space Dominance, Hypersonics, Autonomous Logistics, Gray Zone Tech

Project Structure

defense-intel-platform/
├── src/
│   ├── api/              # FastAPI routes (42 endpoints)
│   ├── agents/           # LLM agents (Analyst, Tachyon, Strategist, Boardroom)
│   ├── ingestion/        # Data scrapers and API clients
│   ├── models/           # SQLAlchemy models (20+ tables)
│   ├── processing/       # Document parsing, entity extraction
│   └── visualization/    # ORBAT data generation
├── frontend/             # React + TypeScript dashboard
├── scripts/              # Import, enrich, backfill, analysis scripts
├── alembic/              # Database migrations
├── mcp_server/           # MCP server for Claude Desktop
├── docker-compose.yml
├── Dockerfile
└── requirements.txt

Getting Started

git clone https://github.com/MikeSchaef/defense-intel-platform.git
cd defense-intel-platform
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your API keys and database URL
alembic upgrade head
uvicorn src.api.main:app --reload

Author

Built by Mike Schaefer

About

Defense technology intelligence platform — tracks federal procurement, maps the defense industrial base, and identifies investment signals across national security technology. 3M+ awards, 132K companies, MCP server, LLM enrichment pipeline.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors