# demo_1_entity_annotation.ipynb

_Entity annotation demo with Q↔A-aware testimony segmentation_


## Table of Contents
1. [Setup](#setup)
2. [Imports](#imports)
3. [Data & Quick Demo](#data-demo)
4. [Main Tutorial](#main)
5. [Tips & Troubleshooting](#tips)
6. [Summary](#summary)


## Setup  <a id='setup'></a>

In [None]:
# If running on Colab, install dependencies.
# Note: Uncomment as needed.
# %pip -q install spacy geonamescache tqdm folium
# Optional backends (uncomment where relevant):
# %pip -q install transformers torch openai

# Download at least one spaCy model
# !python -m spacy download en_core_web_sm


## Imports  <a id='imports'></a>

In [None]:
from spatio_textual.utils import load_spacy_model, Annotator, save_annotations, load_annotations
from spatio_textual.qa import segment_testimony
from spatio_textual.sentiment import SentimentAnalyzer
from spatio_textual.emotion import EmotionAnalyzer
from spatio_textual.analysis import analyze_records
from spatio_textual.viz import to_geojson, make_map_geojson, build_cooccurrence

import pandas as pd


## Data & Quick Demo  <a id='data-demo'></a>

In [None]:
text = "Anne Frank was taken from Amsterdam to Auschwitz."

nlp = load_spacy_model("en_core_web_sm")
annotator = Annotator(nlp)

# Basic annotation (entities + optional verbs)
rec = annotator.annotate(text, include_verbs=True)
rec.update({"fileId":"example","segId":1,"segCount":1,"text":text})
rec


## Main Tutorial  <a id='main'></a>

### 1) Annotate a list of segments

In [None]:
segments = [text, text]
recs = annotator.annotate_texts(segments, file_id="sample", include_text=True, include_verbs=True)
pd.DataFrame([{"segId":r["segId"],"entities":len(r["entities"]), "verbs":len(r["verb_data"])} for r in recs])


### 2) Q↔A‑aware segmentation of testimony with `qa.py`

In [None]:
raw = f"Q: {text}\nA: I was separated from my family in Amsterdam."
qa_segments = segment_testimony(raw, nlp=nlp)
qa_segments[:3]


In [None]:
# Annotate QA segments
qa_texts = [s.text for s in qa_segments]
qa_recs = annotator.annotate_texts(qa_texts, file_id="testimony", include_text=True)
# Attach QA metadata
for r, s in zip(qa_recs, qa_segments):
    r.update({
        "role": s.role,
        "turnId": s.turn_id,
        "isQuestion": s.is_question,
        "isAnswer": s.is_answer,
        "qaPairId": s.qa_pair_id,
    })
qa_df = pd.DataFrame([{k:r.get(k) for k in ["segId","role","isQuestion","isAnswer","text"]} for r in qa_recs])
qa_df


### 3) Save & reload

In [None]:
save_annotations(qa_recs, "entity_demo.jsonl")
df = load_annotations("entity_demo.jsonl")
df.head()


## Tips & Troubleshooting  <a id='tips'></a>
- If a spaCy model is missing, run the download cell in *Setup*.
- Keep `--resources-dir` consistent if you use custom pattern lists.
- For very large corpora, prefer JSONL and use chunked processing.


## Summary  <a id='summary'></a>
We covered single-text and list annotation, plus Q↔A testimony segmentation with sentence-safe splits and metadata.