IntelliNews is a full-stack news intelligence platform. It continuously polls RSS feeds, analyses each item with LLMs, optionally geocodes the location, stores results in PostgreSQL, and exposes read-only APIs plus React map/admin UIs with clustered markers, filters, link lines, and analytics.
- LLM-assisted news enrichment pipeline with provider abstraction for local and hosted models.
- PostgreSQL-backed event assignment, consolidation, contradiction analysis, and analytics.
- FastAPI read APIs and admin endpoints for moderation, operations, billing, and observability.
- React/Vite map and admin UI with Playwright coverage for critical map layouts.
- Documented fast check path that combines backend smoke checks, pytest, and frontend build.
All Python commands must run inside conda env: env7.
- Pipeline:
/home/jupiter/miniconda3/envs/env7/bin/python main.py - API:
/home/jupiter/miniconda3/envs/env7/bin/python -m app.api.server - Frontend:
cd web && npm install && npm run dev -- --host
Database:
- PostgreSQL only (
DATABASE_URL, optionalDATABASE_URL_API,DATABASE_URL_PIPELINE)
Single entrypoint:
PYTHONNOUSERSITE=1 TOOLS_CHECK_MONITOR=1 TOOLS_CHECK_MONITOR_STALL_S=180 TOOLS_CHECK_MONITOR_KILL_ON_STALL=1 TOOLS_CHECK_PYTEST_WORKERS=auto TOOLS_CHECK_PYTEST_DIST=loadscope TOOLS_CHECK_EXTENDED=1 /home/jupiter/miniconda3/envs/env7/bin/python -m tools.checkUse this full command (with env vars) for the full test run; it is the documented fast path for the authoritative check.
It runs smoke imports, pytest, and npm run build if web/ exists.
- Project map:
docs/PROJECT_MAP.md - System overview:
docs/ABOUT.md - Development workflow:
docs/DEVELOPMENT.md - Agent instructions:
AGENTS.md
- Pipeline:
app/pipeline.py - LLM analysis schema:
app/llm/analyser.py - Link analysis:
app/links/analyser.py - API server:
app/api/server.py
Pipeline:
Article -> Gate 1 (Storyline LLM) -> Gate 2 (Incident Matcher) -> Gate 3 (Two-Pass Consolidation) -> Gate 3.5 (Deterministic Family Consolidation) -> Gate 4 (Selective LLM, optional) -> Gate 5 (Global Reclusterer, optional)
Key components:
| Component | File | Description |
|---|---|---|
| Gate 1 | app/events/builder/storyline.py |
Storyline keying, shortlist linking, affinity lookup |
| Gate 2 | app/events/builder/match/api.py |
Article-to-event assignment with anti-merge guards |
| Gate 3 | app/events/consolidation/runner.py |
Two-pass deterministic consolidation |
| Gate 3.5 | app/events/consolidation/family_consolidator.py |
Deterministic same-canonical-family merge with blacklist and size guard |
| Gate 4 | app/events/consolidation/selective_llm_merger.py |
Optional LLM verify for top umbrella cross-family event pairs |
| Gate 5 | app/events/consolidation/global_reclusterer.py |
Optional article-level reclustering behind feature flag |
| Entity norm | app/events/consolidation/entity_normalizer.py |
Entity normalization for classifier features |
| Family build | tools/scripts/build_canonical_families.py |
Canonical families and umbrella hierarchy |
Current architecture decisions:
Gate 1stays enabled in the practical stack.Gate 2 attach_verifyis not part of the accepted default.Gate 3.5exists and is production-available, but broad Iran families are blacklisted because full-data audit showed dangerous same-family contamination.Gate 4 selective LLMexists and is additive, not gatekeeping.Gate 5 global reclusterremains experimental and is not part of the accepted production ceiling.
Performance:
- latest measured dry-run consolidation benchmark on
War,168hwindow:- command:
EVENT_CONSOLIDATION_TOPIC=War EVENT_CONSOLIDATION_SINCE_HOURS=168 EVENT_CONSOLIDATION_FAMILY_CONSOLIDATION=1 EVENT_CONSOLIDATION_SELECTIVE_LLM=1 /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.run_event_consolidation - wall clock:
14.71s
- command:
- latest dry-run result on that window:
- Gate 3 merge plan:
6pairs - Gate 3.5 family merges:
0 - selective LLM candidates:
0
- Gate 3 merge plan:
Registry support paths:
- full population:
/home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.populate_storyline_registry - canonical families:
/home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.build_canonical_families - consolidation dry-run:
EVENT_CONSOLIDATION_TOPIC=War EVENT_CONSOLIDATION_SINCE_HOURS=168 /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.run_event_consolidation - registry health:
/home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.check_storyline_registry_health - detailed research log:
docs/research/EVENT_ASSIGNMENT_GOLD_CYCLE_LOG_2026-03-08.md
Full gold compare metrics (495 articles, 343 gold events):
| Configuration | false_splits | false_merges | exact_matches |
|---|---|---|---|
Baseline (20260308T123308Z_manual_gold_compare) |
1357 | 0 | 248 |
Accepted production ceiling (20260309T034357Z_two_pass_negative_pairs_full_v1) |
714 | 125 | 248 |
Final experimental full stack (20260310T142951Z_new_stack_full_v1) |
541 | 245 | 242 |
| Active production default | 714 | 125 | 248 |
Interpretation:
- the final experimental stack improved recall materially
- but contamination rose too far above the accepted safety ceiling
- therefore the accepted production default did not change
Current bottleneck:
- the dominant failure mode is upstream canonical-family purity, especially in the Iran umbrella space
- downstream threshold sweeps and extra merge stages no longer move the safe ceiling enough
Current evidence after the final replay cycle:
495gold articles map to343gold events- average density is only
1.44articles per gold event - broad umbrella wars need aggressive merging, while most events are singletons or pairs
- a single threshold does not solve both regimes
Confirmed lessons:
- threshold sweeps do not fix broken upstream topology
- event-level classifier improvements help recall, but purity collapses without clean family partitioning
- full Gate 2 LLM gating is too expensive and hurts recall
- selective LLM as a post-step is operationally viable, but it did not become the dominant lever in the final stack
- deterministic same-family rules are useful only when canonical families are already pure
Next serious direction:
- improve canonical-family purity upstream before consolidation
- treat broad Iran umbrella families as partitioning problems, not threshold problems
- Refresh storyline registry:
/home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.populate_storyline_registry
- Refresh canonical families from the gold standard:
/home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.build_canonical_families
- Consolidation dry-run:
EVENT_CONSOLIDATION_DRY_RUN=1 EVENT_CONSOLIDATION_TOPIC=War EVENT_CONSOLIDATION_SINCE_HOURS=168 /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.run_event_consolidation
- Consolidation apply:
EVENT_CONSOLIDATION_APPLY=1 EVENT_CONSOLIDATION_TOPIC=War EVENT_CONSOLIDATION_SINCE_HOURS=168 EVENT_CONSOLIDATION_MAX_MERGES=8 /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.run_event_consolidation
- Registry and consolidation health:
/home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.check_storyline_registry_health
- storyline registry and canonical-family infrastructure are deployed
- replay tooling and consolidation health tooling are working
- the accepted production compromise remains the deterministic two-pass negative-pairs stack
Gate 3.5, selective LLM, and global recluster exist in code, but the final replay series did not justify switching the production defaultGate 5global recluster remains experimental and flag-gated