Not a UFO blog. An AI investigation system.
ARGUS is a full-stack data science platform that analyzes 79,621 declassified sightings from the National UFO Reporting Center (NUFORC), runs them through an NLP pipeline, and surfaces patterns humans miss at scale — bias corrections, per-capita normalization, behavioral clustering, and credibility scoring included.
Live Demo → · LinkedIn Post → · ▶ Watch Walkthrough (1m47s)
| Finding | Stat | Method |
|---|---|---|
| Night dominance | 61% of sightings occur 8pm–2am | Computed from hourly breakdown across all 79,621 records |
| 2012 spike is reporting bias | Flattens when normalized | Corrected using World Bank internet penetration data |
| True hotspot | Washington State #1 at 58.6/100k | US population normalized via 2010 Census |
| Hardest to explain | 77 cases combine silent flight + instant acceleration | Rule-based NLP across full corpus |
| Shape shift | Triangle reports 3× more common since 1980s | Computed from decade-level shape breakdown |
| Physics language | 207 reports use explicit violation terms | Keyword extraction from raw witness text |
NUFORC Raw Data (79,621 reports)
↓
COLLECT — ingest from public NUFORC database
↓
EXTRACT — rule-based NLP: shape, speed, movement, duration,
military context, physics-violation language
↓
ANALYZE — bias correction, per-capita normalization,
temporal and geographic pattern detection
↓
CLUSTER — behavioral grouping: 6 clusters by signature
(silent+instant, high-altitude hover, formation,
military proximity, EM interference, trace evidence)
↓
SCORE — multi-factor credibility ranking: radar confirmation,
witness type, corroboration, physical traces
↓
VISUALIZE — interactive global map, research dashboard,
live anomaly feed, AI analyst
| Page | What it shows |
|---|---|
| Mission Overview | Live command center — global map, anomaly feed, headline stats |
| Collect | Data sources: NUFORC, Pentagon files, pilot testimonies, radar logs |
| Extract | NLP pipeline output — extracted fields, 77 anomalous cases, witness cards |
| Analyze | Core research dashboard — year chart with bias toggle, per-capita state ranking, 8 key questions answered with data |
| Cluster | 6 behavioral clusters with counts and confidence scores |
| Score | Top-ranked incidents by multi-factor credibility model |
| Visualize | Interactive global heat map — filterable by shape, year, severity |
| Insights | 6 key stats computed from raw data, nothing fabricated |
| Live Feed | Real-time anomaly stream — filter by severity, shape, behavior tag |
| AI Analyst | Bring your own OpenAI key — query the dataset in plain English |
Raw NUFORC counts are heavily inflated post-2010 due to smartphone adoption. We normalize using World Bank internet penetration data:
correction_factor = baseline_internet_pct / year_internet_pct
corrected_count = raw_count * correction_factorThis deflates the 2012 spike from ~8,000 reports to a much flatter trend — more representative of actual sighting frequency.
State-level counts are normalized using 2010 US Census population data. California drops significantly; Washington State emerges as the true #1 at 58.6 sightings per 100,000 people.
Rule-based regex matching across all 79,621 raw report texts — no external API, no LLM hallucinations, fully auditable:
- Shape extraction: light, triangle, disc, fireball, sphere, cylinder
- Behavior flags: silent flight, instant acceleration, hovering, formation
- Context flags: military mention, radar reference, multiple witnesses, physical trace
- Physics-violation language: 207 reports flagged
- NUFORC is self-reported — no independent verification of individual accounts
- 81% US-centric — international coverage is sparse and inconsistent
- Reporting bias increases with internet access — pre-1990 data is underrepresented
- NLP is rule-based — complex descriptions may be missed or miscategorized
| Layer | Technology |
|---|---|
| Frontend | Next.js 16, TypeScript, Recharts |
| Styling | Inline styles, dark military aesthetic |
| Data pipeline | Python 3 — Pandas, Pillow, regex NLP |
| Data format | Static JSON served from public/data/ |
| AI Analyst | OpenAI API (user-supplied key, no server-side key required) |
| Deployment | Vercel |
git clone https://github.com/Mugeshgithub/Argus_UFO_AI_Data
cd Argus_UFO_AI_Data
npm install
npm run devNo environment variables required. All data is pre-computed and shipped as static JSON.
The AI Analyst tab requires your own OpenAI API key — enter it at runtime via the sidebar. Nothing is stored server-side.
| Source | Records | Used For |
|---|---|---|
| NUFORC — National UFO Reporting Center | 79,621 reports | Primary sighting database (1941–2014) |
| World Bank — Internet users (% of population) | Yearly 1990–2024 | Bias correction for reporting inflation |
| US Census 2010 — State population estimates | 50 states | Per-capita normalization |
| Wikipedia — Pentagon UAP disclosures | Reference | Context for high-credibility cases |
| FAA — Airspace & pilot report conventions | Reference | Military proximity and radar flag context |
MIT — open source, use freely, attribution appreciated.
Data sourced from NUFORC public database. This project makes no claims about the nature of reported phenomena.
