# GDELT Intelligence Pipeline v2.4 (Reference / Archive)

> **NOTE:** This notebook is the archived reference implementation that preceded the  
> `geoeventfusion` Python package. It is retained for historical reference only.
>
> **Do not use this notebook for new analysis.**  
> Use `quickstart.ipynb` instead — it calls the production package directly.

## What this notebook represents

Version 2.4 was the final single-file notebook implementation before the codebase was
refactored into the `geoeventfusion` modular Python package. Key issues resolved in the
package (documented in `AGENTS.md §8`):

| Issue | Resolution |
|---|---|
| `from collections import Count` (typo) | Fixed to `Counter` |
| Anthropic model hardcoded | Moved to `PipelineConfig.anthropic_model` |
| All logic in one file — untestable | Separated into package modules |
| Phase outputs to flat `data/` dir | Moved to `outputs/runs/<run_id>/` |
| Only 3 article pools | Expanded to 6 core + 3 conditional |
| No `HybridRel` sort | Added `articles_relevant` pool |
| No `TimelineVolRaw` | Added with `norm` field for vol_ratio |

This notebook is provided as a reference for understanding the original implementation
and the design decisions that led to the current package architecture.

## Cell 1 — Configuration (v2.4 Reference)

In [None]:
# ─────────────────────────────────────────────────────────────────
# GDELT Intelligence Pipeline v2.4 — Reference Implementation
# This is the archived single-file version.
# Use geoeventfusion package (quickstart.ipynb) for production use.
# ─────────────────────────────────────────────────────────────────

# Configuration
QUERY = 'Houthi Red Sea attacks'
DAYS_BACK = 90
MAX_RECORDS = 250
LLM_BACKEND = 'anthropic'   # 'anthropic' or 'ollama'
ANTHROPIC_MODEL = 'claude-sonnet-4-6'  # Moved to PipelineConfig in package
OLLAMA_MODEL = 'gemma3:27b'
SPIKE_Z_THRESHOLD = 1.5
MAX_CONFIDENCE = 0.82  # Hard cap — never inflate past this
MIN_CITATIONS = 3

print('Configuration loaded.')
print(f'Query: {QUERY}')
print(f'LLM Backend: {LLM_BACKEND}')

## Cell 2 — API Keys (v2.4 Reference)

In [None]:
import os

try:
    from google.colab import userdata
    ANTHROPIC_API_KEY = userdata.get('ANTHROPIC_API_KEY')
    os.environ['ANTHROPIC_API_KEY'] = ANTHROPIC_API_KEY
    print('API keys loaded from Colab Secrets.')
except Exception:
    ANTHROPIC_API_KEY = os.getenv('ANTHROPIC_API_KEY', '')
    print('API keys loaded from environment.')

## Cells 3–36 (Archived)

The remaining implementation cells (GDELT fetching, spike detection, LLM extraction,
storyboard generation, etc.) are omitted from this archive to avoid divergence from
the canonical package implementation.

**For the production implementation, see:**
- `geoeventfusion/agents/gdelt_agent.py` — GDELT multi-mode fetch
- `geoeventfusion/analysis/spike_detector.py` — Z-score spike detection
- `geoeventfusion/agents/llm_extraction_agent.py` — LLM structured extraction
- `geoeventfusion/agents/storyboard_agent.py` — Narrative panel generation
- `geoeventfusion/pipeline.py` — Full orchestration
- `notebooks/quickstart.ipynb` — Thin notebook calling the package

All known issues from v2.4 have been resolved in the package.  
See `AGENTS.md §8` for the full list of resolved issues.

In [None]:
# Migration helper — run the same query via the production package:
print('To run the same analysis using the production package:')
print()
print('    from config.settings import PipelineConfig')
print('    from geoeventfusion.pipeline import run_pipeline')
print()
print(f'    config = PipelineConfig(query="{QUERY}", days_back={DAYS_BACK})')
print('    context = run_pipeline(config)')
print()
print('Or use notebooks/quickstart.ipynb for the guided entry point.')