Observer v1.0.0

Field-portable intelligence aggregation system. Collects articles from RSS feeds across global news sources, translates non-English content via NLLB-200, screens entities against sanctions databases, and delivers signals through a real-time WebSocket dashboard.

No LLMs, no cloud APIs, no GPU required.

Features

RSS Collection: 200+ curated feeds across global news sources with configurable check intervals
NLLB Translation: Translates non-English articles to English using NLLB-200 via CTranslate2 (CPU-friendly, int8 quantization)
Content Filtering: Whitelist/blacklist keyword filtering with regex support (13 filter lists included)
Entity Screening: Matches against OpenSanctions database locally via pg_trgm fuzzy matching. Optional FBI/Interpol API lookups
Field Extraction: Regex-based location and casualty extraction from headlines
On-Demand Scraping: Full-text article fetch via trafilatura with paywall detection
Real-Time Dashboard: WebSocket-powered feed with time filtering, full-text search, CSV/JSON export, keyboard shortcuts
Admin Console: Built-in management interface for system monitoring, feed management, collector controls, filter editing, translation tuning, and entity screening status
Feed Management: Add/remove/disable feeds individually or by group/region from the admin console

Requirements

Python 3.11+
PostgreSQL 14+ with pg_trgm extension
~2 GB disk for NLLB translation model
~4 GB RAM minimum

Quick Start

# Clone and enter
git clone git@github.com:Rybatter50-cloud/Observer.git && cd Observer

# Run setup (creates venv, installs deps, configures PostgreSQL, downloads NLLB model)
python setup_observer.py

# Start
source venv/bin/activate
python main.py

News feed at http://localhost:8999. Admin console at http://localhost:8999/dev.

On first run, Observer automatically seeds the database with 200+ RSS feed sources covering all UN member states.

Admin Console

The admin console (/dev) provides full system management:

System — Active feed count, accept rate, queue depth, translator status, pipeline and app restart controls
Database — DB size, signal counts, pool status, max signals limit, backup/restore
Collectors — RSS and Trafilatura collector status, 24h counts, error tracking, on/off toggle and manual collect
Content Filters — Switch between blacklist/whitelist/both modes, select filter files, edit patterns inline
NLLB Translation — Configure device, compute type, workers, beam size, length/repetition penalty, temperature, top-k, batch size
Entity Screening — Live status for FBI, Interpol, Sanctions Network, and OpenSanctions screeners with hit counts
Signals — Searchable, time-filtered signal table with score, source, title, and location
Feed Groups — Group-level stats, region presets (Ukraine, Middle East, Asia, Africa, Americas, Caucasus/Central Asia), bulk enable/disable
Feed Sites — Individual feed search/filter by name, URL, group, or type (RSS/Scraper), with per-feed toggle and delete

Windows Quick Start

# Run the installer (creates venv, installs deps, sets up .env)
.\install.ps1

# Start the server
.\start.bat

NLLB Translation Model

The setup script (python setup_observer.py or .\install.ps1) automatically downloads and converts the NLLB-200 model to CTranslate2 int8 format. This requires a one-time download of ~1.2 GB.

If you skipped setup or need to install the model manually:

pip install transformers torch huggingface_hub   # build-time only
python scripts/download_nllb.py                  # download + convert
pip uninstall transformers torch                  # free ~2 GB disk

To run on CPU with int8 quantization (recommended, set in .env):

NLLB_DEVICE=cpu
NLLB_COMPUTE_TYPE=int8

Without the model, the app still runs but translation is disabled.

Configuration

The setup script handles DATABASE_URL automatically. Everything else has working defaults.

Variable	Default	Description
`DATABASE_URL`	—	PostgreSQL connection string (set by setup)
`AI_TRANSLATOR_MODE`	`nllb`	Translation mode: `nllb` or `off`
`FEED_COLLECTION_ENABLED`	`true`	Enable RSS feed collection
`FEED_CHECK_INTERVAL`	`300`	Seconds between collection cycles
`CONTENT_FILTER_ENABLED`	`true`	Enable content filtering
`SANCTIONS_NET_ENABLED`	`true`	Enable OpenSanctions screening
`FBI_ENABLED`	`false`	Enable FBI Most Wanted API
`INTERPOL_ENABLED`	`false`	Enable Interpol Notices API
`ENTITY_EXTRACTION_ENABLED`	`true`	Enable GLiNER entity extraction
`ENTITY_AUTO_SCREEN`	`true`	Auto-screen persons against watchlists
`HOST`	`0.0.0.0`	Server bind address
`PORT`	`8999`	Server port

See .env.example for all available settings.

Entity Extraction (GLiNER)

Observer uses GLiNER (urchade/gliner_medium-v2.1, ~80MB) to extract named entities from article titles and descriptions. Entity types: Person, Organization, Location, Country, Military Unit, Weapon.

Entity extraction runs as a batch script — not inline during collection:

# Extract entities from unprocessed signals (default: 500)
python scripts/extract_entities.py

# Extract + auto-screen persons against sanctions/FBI/Interpol
python scripts/extract_entities.py --auto-screen

# Process more signals
python scripts/extract_entities.py --limit 2000 --auto-screen

# Re-extract all signals (e.g. after model update)
python scripts/extract_entities.py --reprocess

Extracted entities appear as color-coded pills on article cards. When auto-screening finds a 100% match, a red shield badge appears on the card and screening details are shown in the article detail modal.

Project Structure

Observer/
├── main.py                     # FastAPI app, lifespan, middleware
├── config.py                   # Environment variables and validation
├── api/                        # Route handlers
├── services/                   # Business logic
│   └── collectors/             # RSS, NP4K collectors
├── database/                   # Schema, connection pool, repositories
├── templates/                  # Jinja2 HTML (news feed + admin console)
├── static/                     # CSS and JS (modular, per-view)
├── filters/                    # Whitelist/blacklist keyword files
├── models/                     # NLLB CTranslate2 model (auto-downloaded)
├── data/                       # Feed seed data, runtime state, cache
├── scripts/                    # Setup utilities
│   └── download_nllb.py        # NLLB model download & conversion
├── requirements.txt            # Python dependencies
├── setup_observer.py           # Cross-platform setup script
├── install.ps1                 # Windows PowerShell installer
├── start.bat                   # Windows quick-start script
└── .env.example                # Configuration template

Tech Stack

Layer	Technology
Web framework	FastAPI + Uvicorn
Database	PostgreSQL 14+ via asyncpg
Translation	CTranslate2 (NLLB-200) + sentencepiece
Language detection	langdetect
Web scraping	trafilatura + lxml
RSS parsing	feedparser
Rate limiting	slowapi
Frontend	HTML5/Jinja2, WebSocket, vanilla JS/CSS

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
api		api
data		data
database		database
deploy		deploy
filters		filters
models		models
screenshots		screenshots
scripts		scripts
services		services
static		static
templates		templates
tests		tests
utils		utils
.env.example		.env.example
.env.portable		.env.portable
.gitignore		.gitignore
API_REFERENCE.md		API_REFERENCE.md
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CONFIGURATION.md		CONFIGURATION.md
DATABASE_SCHEMA.md		DATABASE_SCHEMA.md
LICENSE		LICENSE
README.md		README.md
build.py		build.py
com.observer.intelligence.plist		com.observer.intelligence.plist
config.py		config.py
install.ps1		install.ps1
main.py		main.py
observer.service		observer.service
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup_observer.py		setup_observer.py
start.bat		start.bat
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Observer v1.0.0

Features

Requirements

Quick Start

Admin Console

Windows Quick Start

NLLB Translation Model

Configuration

Entity Extraction (GLiNER)

Project Structure

Tech Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Observer v1.0.0

Features

Requirements

Quick Start

Admin Console

Windows Quick Start

NLLB Translation Model

Configuration

Entity Extraction (GLiNER)

Project Structure

Tech Stack

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages