# Annotating Historical Texts (Spacy)

**Task Description:**
- Develop annotation guidelines for named entities in your chosen historical text type.
- Annotate the sample text according to your guidelines (manually annotate 5 person names from the text).
- Calculate inter-annotator agreement (if you have multiple annotators) using a metric like Cohen's kappa.

**Key points to discuss:**
- The effectiveness of combining manual annotation with automated detection. - - This approach allows us to leverage both human expertise and the model's capabilities.
- The potential for the model to identify names that a human might miss, especially in longer texts.
- The possibility of false positives in the model's output. Are there any detected "names" that aren't actually person names?
- The challenges of historical named entity recognition, such as archaic names or titles that modern models might not recognize.
- The importance of context in identifying person names in historical texts.
- This approach provides a good balance between manual annotation and automated detection, allowing for a more comprehensive identification of person names in the text. It can be easily expanded to include other types of named entities or applied to larger historical texts.

In [1]:
import spacy

# Sample historical text (snippet from the Anglo-Saxon Chronicle)
historical_text = """
A.D. 871. This year came the army to Reading in Wessex; and in the course of three
nights after rode two earls up, who were met by Alderman Ethelwulf at Englefield;
where he fought with them, and obtained the victory. There one of them was slain,
whose name was Sidrac. About four nights after this, King Ethelred and Alfred his
brother led their main army to Reading, where they fought
NameError: name 'annotations_1' is not definedwith the enemy; and there
was much slaughter on either hand, Alderman Ethelwulf being among the slain; but
the Danes kept possession of the field.
"""

print("Sample Historical Text:")
print(historical_text)

# Manual annotation of a few person names
annotated_names = ["Ethelwulf", "Sidrac", "Ethelred", "Alfred"]

print("\nManually Annotated Names:")
for name in annotated_names:
    print(name)

# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Process the text
doc = nlp(historical_text)

# Detect person names
detected_names = [ent.text for ent in doc.ents if ent.label_ == "PERSON"]

print("\nNames Detected by the Model:")
for name in detected_names:
    print(name)

# Compare manual annotations with model detections
correctly_detected = set(annotated_names) & set(detected_names)
missed = set(annotated_names) - set(detected_names)
extra = set(detected_names) - set(annotated_names)

print("\nEvaluation:")
print(f"Correctly detected: {correctly_detected}")
print(f"Missed: {missed}")
print(f"Extra detections: {extra}")

# Calculate simple metrics
precision = len(correctly_detected) / len(detected_names) if detected_names else 0
recall = len(correctly_detected) / len(annotated_names) if annotated_names else 0
f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0

print(f"\nPrecision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")

Sample Historical Text:

A.D. 871. This year came the army to Reading in Wessex; and in the course of three
nights after rode two earls up, who were met by Alderman Ethelwulf at Englefield;
where he fought with them, and obtained the victory. There one of them was slain,
whose name was Sidrac. About four nights after this, King Ethelred and Alfred his
brother led their main army to Reading, where they fought
NameError: name 'annotations_1' is not definedwith the enemy; and there
was much slaughter on either hand, Alderman Ethelwulf being among the slain; but
the Danes kept possession of the field.


Manually Annotated Names:
Ethelwulf
Sidrac
Ethelred
Alfred

Names Detected by the Model:
Sidrac
Alfred
annotations_1

Evaluation:
Correctly detected: {'Sidrac', 'Alfred'}
Missed: {'Ethelred', 'Ethelwulf'}
Extra detections: {'annotations_1'}

Precision: 0.67
Recall: 0.50
F1 Score: 0.57
