IntelliNews — RSS → LLM → Geo → Links → API → Map/Admin UI

What it does

IntelliNews is a full-stack news intelligence platform. It continuously polls RSS feeds, analyses each item with LLMs, optionally geocodes the location, stores results in PostgreSQL, and exposes read-only APIs plus React map/admin UIs with clustered markers, filters, link lines, and analytics.

Portfolio highlights

LLM-assisted news enrichment pipeline with provider abstraction for local and hosted models.
PostgreSQL-backed event assignment, consolidation, contradiction analysis, and analytics.
FastAPI read APIs and admin endpoints for moderation, operations, billing, and observability.
React/Vite map and admin UI with Playwright coverage for critical map layouts.
Documented fast check path that combines backend smoke checks, pytest, and frontend build.

Environment

All Python commands must run inside conda env: env7.

Quickstart (current)

Pipeline: /home/jupiter/miniconda3/envs/env7/bin/python main.py
API: /home/jupiter/miniconda3/envs/env7/bin/python -m app.api.server
Frontend: cd web && npm install && npm run dev -- --host

Database:

PostgreSQL only (DATABASE_URL, optional DATABASE_URL_API, DATABASE_URL_PIPELINE)

Testing

Single entrypoint:

PYTHONNOUSERSITE=1 TOOLS_CHECK_MONITOR=1 TOOLS_CHECK_MONITOR_STALL_S=180 TOOLS_CHECK_MONITOR_KILL_ON_STALL=1 TOOLS_CHECK_PYTEST_WORKERS=auto TOOLS_CHECK_PYTEST_DIST=loadscope TOOLS_CHECK_EXTENDED=1 /home/jupiter/miniconda3/envs/env7/bin/python -m tools.check Use this full command (with env vars) for the full test run; it is the documented fast path for the authoritative check.

It runs smoke imports, pytest, and npm run build if web/ exists.

Docs

Project map: docs/PROJECT_MAP.md
System overview: docs/ABOUT.md
Development workflow: docs/DEVELOPMENT.md
Agent instructions: AGENTS.md

Repo notes

Pipeline: app/pipeline.py
LLM analysis schema: app/llm/analyser.py
Link analysis: app/links/analyser.py
API server: app/api/server.py

Event Assignment Architecture (Final)

Pipeline: Article -> Gate 1 (Storyline LLM) -> Gate 2 (Incident Matcher) -> Gate 3 (Two-Pass Consolidation) -> Gate 3.5 (Deterministic Family Consolidation) -> Gate 4 (Selective LLM, optional) -> Gate 5 (Global Reclusterer, optional)

Key components:

Component	File	Description
Gate 1	`app/events/builder/storyline.py`	Storyline keying, shortlist linking, affinity lookup
Gate 2	`app/events/builder/match/api.py`	Article-to-event assignment with anti-merge guards
Gate 3	`app/events/consolidation/runner.py`	Two-pass deterministic consolidation
Gate 3.5	`app/events/consolidation/family_consolidator.py`	Deterministic same-canonical-family merge with blacklist and size guard
Gate 4	`app/events/consolidation/selective_llm_merger.py`	Optional LLM verify for top umbrella cross-family event pairs
Gate 5	`app/events/consolidation/global_reclusterer.py`	Optional article-level reclustering behind feature flag
Entity norm	`app/events/consolidation/entity_normalizer.py`	Entity normalization for classifier features
Family build	`tools/scripts/build_canonical_families.py`	Canonical families and umbrella hierarchy

Current architecture decisions:

Gate 1 stays enabled in the practical stack.
Gate 2 attach_verify is not part of the accepted default.
Gate 3.5 exists and is production-available, but broad Iran families are blacklisted because full-data audit showed dangerous same-family contamination.
Gate 4 selective LLM exists and is additive, not gatekeeping.
Gate 5 global recluster remains experimental and is not part of the accepted production ceiling.

Performance:

latest measured dry-run consolidation benchmark on War, 168h window:
- command: EVENT_CONSOLIDATION_TOPIC=War EVENT_CONSOLIDATION_SINCE_HOURS=168 EVENT_CONSOLIDATION_FAMILY_CONSOLIDATION=1 EVENT_CONSOLIDATION_SELECTIVE_LLM=1 /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.run_event_consolidation
- wall clock: 14.71s
latest dry-run result on that window:
- Gate 3 merge plan: 6 pairs
- Gate 3.5 family merges: 0
- selective LLM candidates: 0

Registry support paths:

full population: /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.populate_storyline_registry
canonical families: /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.build_canonical_families
consolidation dry-run: EVENT_CONSOLIDATION_TOPIC=War EVENT_CONSOLIDATION_SINCE_HOURS=168 /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.run_event_consolidation
registry health: /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.check_storyline_registry_health
detailed research log: docs/research/EVENT_ASSIGNMENT_GOLD_CYCLE_LOG_2026-03-08.md

Final Metrics

Full gold compare metrics (495 articles, 343 gold events):

Configuration	false_splits	false_merges	exact_matches
Baseline (`20260308T123308Z_manual_gold_compare`)	1357	0	248
Accepted production ceiling (`20260309T034357Z_two_pass_negative_pairs_full_v1`)	714	125	248
Final experimental full stack (`20260310T142951Z_new_stack_full_v1`)	541	245	242
Active production default	714	125	248

Interpretation:

the final experimental stack improved recall materially
but contamination rose too far above the accepted safety ceiling
therefore the accepted production default did not change

Current bottleneck:

the dominant failure mode is upstream canonical-family purity, especially in the Iran umbrella space
downstream threshold sweeps and extra merge stages no longer move the safe ceiling enough

Structural Ceiling Analysis

Current evidence after the final replay cycle:

495 gold articles map to 343 gold events
average density is only 1.44 articles per gold event
broad umbrella wars need aggressive merging, while most events are singletons or pairs
a single threshold does not solve both regimes

Confirmed lessons:

threshold sweeps do not fix broken upstream topology
event-level classifier improvements help recall, but purity collapses without clean family partitioning
full Gate 2 LLM gating is too expensive and hurts recall
selective LLM as a post-step is operationally viable, but it did not become the dominant lever in the final stack
deterministic same-family rules are useful only when canonical families are already pure

Next serious direction:

improve canonical-family purity upstream before consolidation
treat broad Iran umbrella families as partitioning problems, not threshold problems

Production Commands

Refresh storyline registry:
- /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.populate_storyline_registry
Refresh canonical families from the gold standard:
- /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.build_canonical_families
Consolidation dry-run:
- EVENT_CONSOLIDATION_DRY_RUN=1 EVENT_CONSOLIDATION_TOPIC=War EVENT_CONSOLIDATION_SINCE_HOURS=168 /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.run_event_consolidation
Consolidation apply:
- EVENT_CONSOLIDATION_APPLY=1 EVENT_CONSOLIDATION_TOPIC=War EVENT_CONSOLIDATION_SINCE_HOURS=168 EVENT_CONSOLIDATION_MAX_MERGES=8 /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.run_event_consolidation
Registry and consolidation health:
- /home/jupiter/miniconda3/envs/env7/bin/python -m tools.scripts.check_storyline_registry_health

Current Deployment Status

storyline registry and canonical-family infrastructure are deployed
replay tooling and consolidation health tooling are working
the accepted production compromise remains the deterministic two-pass negative-pairs stack
Gate 3.5, selective LLM, and global recluster exist in code, but the final replay series did not justify switching the production default
Gate 5 global recluster remains experimental and flag-gated

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
app		app
docs		docs
migrations		migrations
mobile		mobile
models		models
tests		tests
tests_slow		tests_slow
tools		tools
web		web
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IntelliNews — RSS → LLM → Geo → Links → API → Map/Admin UI

What it does

Portfolio highlights

Environment

Quickstart (current)

Testing

Docs

Repo notes

Event Assignment Architecture (Final)

Final Metrics

Structural Ceiling Analysis

Production Commands

Current Deployment Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IntelliNews — RSS → LLM → Geo → Links → API → Map/Admin UI

What it does

Portfolio highlights

Environment

Quickstart (current)

Testing

Docs

Repo notes

Event Assignment Architecture (Final)

Final Metrics

Structural Ceiling Analysis

Production Commands

Current Deployment Status

About

Topics

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages