# Crime Analysis Pipeline (LLM-1 → LLM-4)

Clean notebook to demo the multi-stage pipeline using preprocessed datasets and `run_full_pipeline` from `run_pipeline.py`. No preprocessing is done here; all data is already prepared.

In [None]:
import os, json
import pandas as pd

print("Working dir:", os.getcwd())
print("Available data files:", [f for f in os.listdir('.') if f.endswith('.csv') or f.endswith('.json')])

In [None]:
from run_pipeline import run_full_pipeline

# Helper to pretty-print nested dicts
import pprint
pp = pprint.PrettyPrinter(indent=2, width=120)

In [None]:
sample_record = {
    "crime_text": "On 2020-05-10 at 22 hours, in central area, a 21-year-old M was involved in robbery at street. Weapon used: verbal threat. Case status: invest cont.",
    "crime_type": "robbery",
    "weapon_desc": "verbal threat",
    "premis_desc": "street",
    "vict_age": "21",
    "vict_sex": "M",
    "area_name": "central",
    "domestic": "false",
    "status_desc": "invest cont",
    "arrest": "false",
    "state": "ANDHRA PRADESH",
    "year": 2020,
    "district": "UNKNOWN",
}

result = run_full_pipeline(sample_record)

print("LLM-1 motivation:")
pp.pprint(result["llm1"])
print("\nLLM-2 historical:")
pp.pprint(result["llm2"])
print("\nLLM-3 pattern:")
pp.pprint(result["llm3"])
print("\nFusion:")
pp.pprint(result["fusion"])
print("\nLLM-4 report:\n")
print(result["report"])

## Notes
- Set `GEMINI_API_KEY` env var before running cells: `export GEMINI_API_KEY=your_key`.
- To run multiple samples quickly, prefer `run_pipeline.py` test harness: `python run_pipeline.py`.
- Streamlit UI is available via `streamlit run streamlit_app.py` and uses the same pipeline.