Skip to content

Andrem19/IntelliNews

Repository files navigation

IntelliNews — RSS → LLM → Geo → Links → API → Map/Admin UI

What it does

IntelliNews is a full-stack news intelligence platform. It continuously polls RSS feeds, analyses each item with LLMs, optionally geocodes the location, stores results in PostgreSQL, and exposes read-only APIs plus React map/admin UIs with clustered markers, filters, link lines, and analytics.

Portfolio highlights

  • LLM-assisted news enrichment pipeline with provider abstraction for local and hosted models.
  • PostgreSQL-backed event assignment, consolidation, contradiction analysis, and analytics.
  • FastAPI read APIs and admin endpoints for moderation, operations, billing, and observability.
  • React/Vite map and admin UI with Playwright coverage for critical map layouts.
  • Documented fast check path that combines backend smoke checks, pytest, and frontend build.

Environment

All Python commands must run inside conda env: env7.

Quickstart (current)

  • Pipeline: /home/jupiter/miniconda3/envs/env7/bin/python main.py
  • API: /home/jupiter/miniconda3/envs/env7/bin/python -m app.api.server
  • Frontend: cd web && npm install && npm run dev -- --host

Database:

  • PostgreSQL only (DATABASE_URL, optional DATABASE_URL_API, DATABASE_URL_PIPELINE)

Testing

Single entrypoint:

  • PYTHONNOUSERSITE=1 TOOLS_CHECK_MONITOR=1 TOOLS_CHECK_MONITOR_STALL_S=180 TOOLS_CHECK_MONITOR_KILL_ON_STALL=1 TOOLS_CHECK_PYTEST_WORKERS=auto TOOLS_CHECK_PYTEST_DIST=loadscope TOOLS_CHECK_EXTENDED=1 /home/jupiter/miniconda3/envs/env7/bin/python -m tools.check Use this full command (with env vars) for the full test run; it is the documented fast path for the authoritative check.

It runs smoke imports, pytest, and npm run build if web/ exists.

Docs

  • Project map: docs/PROJECT_MAP.md
  • System overview: docs/ABOUT.md
  • Development workflow: docs/DEVELOPMENT.md
  • Agent instructions: AGENTS.md

Repo notes

  • Pipeline: app/pipeline.py
  • LLM analysis schema: app/llm/analyser.py
  • Link analysis: app/links/analyser.py
  • API server: app/api/server.py

Event Assignment Architecture (Final)

Pipeline: Article -> Gate 1 (Storyline LLM) -> Gate 2 (Incident Matcher) -> Gate 3 (Two-Pass Consolidation) -> Gate 3.5 (Deterministic Family Consolidation) -> Gate 4 (Selective LLM, optional) -> Gate 5 (Global Reclusterer, optional)

Key components:

Component File Description
Gate 1 app/events/builder/storyline.py Storyline keying, shortlist linking, affinity lookup
Gate 2 app/events/builder/match/api.py Article-to-event assignment with anti-merge guards
Gate 3 app/events/consolidation/runner.py Two-pass deterministic consolidation
Gate 3.5 app/events/consolidation/family_consolidator.py Deterministic same-canonical-family merge with blacklist and size guard
Gate 4 app/events/consolidation/selective_llm_merger.py Optional LLM verify for top umbrella cross-family event pairs
Gate 5 app/events/consolidation/global_reclusterer.py Optional article-level reclustering behind feature flag
Entity norm app/events/consolidation/entity_normalizer.py Entity normalization for classifier features
Family build tools/scripts/build_canonical_families.py Canonical families and umbrella hierarchy

Current architecture decisions:

  • Gate 1 stays enabled in the practical stack.
  • Gate 2 attach_verify is not part of the accepted default.
  • Gate 3.5 exists and is production-available, but broad Iran families are blacklisted because full-data audit showed dangerous same-family contamination.
  • Gate 4 selective LLM exists and is additive, not gatekeeping.
  • Gate 5 global recluster remains experimental and is not part of the accepted production ceiling.

Performance:

  • latest measured dry-run consolidation benchmark on War, 168h window:
    • command: EVENT_CONSOLIDATION_TOPIC=War EVENT_CONSOLIDATION_SINCE_HOURS=168 EVENT_CONSOLIDATION_FAMILY_CONSOLIDATION=1 EVENT_CONSOLIDATION_SELECTIVE_LLM=1 /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.run_event_consolidation
    • wall clock: 14.71s
  • latest dry-run result on that window:
    • Gate 3 merge plan: 6 pairs
    • Gate 3.5 family merges: 0
    • selective LLM candidates: 0

Registry support paths:

  • full population: /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.populate_storyline_registry
  • canonical families: /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.build_canonical_families
  • consolidation dry-run: EVENT_CONSOLIDATION_TOPIC=War EVENT_CONSOLIDATION_SINCE_HOURS=168 /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.run_event_consolidation
  • registry health: /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.check_storyline_registry_health
  • detailed research log: docs/research/EVENT_ASSIGNMENT_GOLD_CYCLE_LOG_2026-03-08.md

Final Metrics

Full gold compare metrics (495 articles, 343 gold events):

Configuration false_splits false_merges exact_matches
Baseline (20260308T123308Z_manual_gold_compare) 1357 0 248
Accepted production ceiling (20260309T034357Z_two_pass_negative_pairs_full_v1) 714 125 248
Final experimental full stack (20260310T142951Z_new_stack_full_v1) 541 245 242
Active production default 714 125 248

Interpretation:

  • the final experimental stack improved recall materially
  • but contamination rose too far above the accepted safety ceiling
  • therefore the accepted production default did not change

Current bottleneck:

  • the dominant failure mode is upstream canonical-family purity, especially in the Iran umbrella space
  • downstream threshold sweeps and extra merge stages no longer move the safe ceiling enough

Structural Ceiling Analysis

Current evidence after the final replay cycle:

  • 495 gold articles map to 343 gold events
  • average density is only 1.44 articles per gold event
  • broad umbrella wars need aggressive merging, while most events are singletons or pairs
  • a single threshold does not solve both regimes

Confirmed lessons:

  • threshold sweeps do not fix broken upstream topology
  • event-level classifier improvements help recall, but purity collapses without clean family partitioning
  • full Gate 2 LLM gating is too expensive and hurts recall
  • selective LLM as a post-step is operationally viable, but it did not become the dominant lever in the final stack
  • deterministic same-family rules are useful only when canonical families are already pure

Next serious direction:

  1. improve canonical-family purity upstream before consolidation
  2. treat broad Iran umbrella families as partitioning problems, not threshold problems

Production Commands

  • Refresh storyline registry:
    • /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.populate_storyline_registry
  • Refresh canonical families from the gold standard:
    • /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.build_canonical_families
  • Consolidation dry-run:
    • EVENT_CONSOLIDATION_DRY_RUN=1 EVENT_CONSOLIDATION_TOPIC=War EVENT_CONSOLIDATION_SINCE_HOURS=168 /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.run_event_consolidation
  • Consolidation apply:
    • EVENT_CONSOLIDATION_APPLY=1 EVENT_CONSOLIDATION_TOPIC=War EVENT_CONSOLIDATION_SINCE_HOURS=168 EVENT_CONSOLIDATION_MAX_MERGES=8 /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.run_event_consolidation
  • Registry and consolidation health:
    • /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.check_storyline_registry_health

Current Deployment Status

  • storyline registry and canonical-family infrastructure are deployed
  • replay tooling and consolidation health tooling are working
  • the accepted production compromise remains the deterministic two-pass negative-pairs stack
  • Gate 3.5, selective LLM, and global recluster exist in code, but the final replay series did not justify switching the production default
  • Gate 5 global recluster remains experimental and flag-gated

About

Full-stack AI news intelligence platform with RSS ingestion, LLM analysis, PostgreSQL APIs, and React map/admin UI

Topics

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors