<a href="https://colab.research.google.com/github/IgnatiusEzeani/spatio-textual/blob/main/spatio_textual_package_a_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introducing `spatio-textual` Python package

**spatio-textual**: a Python package for spatial entity recognition and verb relation extraction from text created by the [Spatial Narratives Project](https://spacetimenarratives.github.io/) and designed to support spatio-textual annotation, analysis and visualization in digital humanities projects, with initial applications to:

- the *Corpus of Lake District Writing* (CLDW)
- Holocaust survivors' testimonies (e.g., USC Shoah Foundation archives)

This package leverages spaCy and gazetteer-based classification to identify and label spatial entities such as cities, countries, camps, and geographic nouns, and also extracts action-verb contexts involving these entities.


---
## Setting up
Download `en_core_web_trf` spaCy model and install `spatio-textual` package.

**_Note:_** *Please wait a while, this may take about 2 mins* 🕐


In [None]:
!python -m spacy download en_core_web_trf
!pip install -q git+https://github.com/SpaceTimeNarratives/spatio-textual.git

---
## Importing the `spatio-textual` package
Having successfully downloaded the spaCy model and installed the `spatio-textual` package, it can now be imported and used in a Python environment to process text.

*Again, this may take about a minute too, sorry...*

In [None]:
import spatio_textual

The `spatio-textual` package has the `annotate` module with functions `annotate_text` which does the job of identifying and labelling spatial entities. So we can import the function directly as below...

In [None]:
from spatio_textual.annotate import annotate_text

---
## Annotating spatial entities

Beyond the typical labels for the named entity recognition task [`PERSON`, `ORG`, `LOC`, `DATE`], we have defined a set of entity labels that are relevant for our work as shown below:

| Tag          | Description                                                  |
| ------------ | ------------------------------------------------------------ |
| `PERSON`     | A named person                                               |
| `CONTINENT`  | A continent name (e.g. “Europe”, “Asia”)                     |
| `COUNTRY`    | A country name (e.g. “Germany”, “Czechoslovakia”)            |
| `US-STATE`   | A U.S. state name (e.g. “California”, “New York”)            |
| `CITY`       | A city name (e.g. “Berlin”, “London”,  when classified)     |
| `CAMP`       | A Holocaust-camp name e.g. “Auschwitz” (from your custom list)                |
| `PLACE`      | Other place-type entities not matched above                  |
| `GEONOUN`    | Generic geographic nouns (e.g. “valley”, “moor”)             |
| `NON-VERBAL` | Terms like [PAUSES], [LAUGHS] in non-verbal list |
| `FAMILY`     | Kinship terms (e.g. “mother”, “uncle”)                       |
| `DATE`       | Temporal expressions (e.g. “March 9, 1996”)                  |
| `TIME`       | Time-of-day expressions (e.g. “3 PM”)                        |
| `EVENT`      | Named events (e.g. “D-Day”)                                  |
| `QUANTITY`   | Numeric/measure expressions (e.g. “100 miles”)               |

with the `annotate_text` function, we will now be able to label these entities in the given text as shown below

### Annotating an example text

In [None]:
text = """
"During the summer of 1942, my family and I were deported from our home in Krakow to the Plaszow labor camp.
We spent several difficult months there before being transferred to Auschwitz-Birkenau."
"""

result = annotate_text(text)

In the above code, the output of the `annotate_text` function is stored in the variable `result` which is a dictionary containing `'entities'` and `'verb_data'`. We can look at the individual elements in each of them

In [None]:
#@title ##### Let's look at `'entities'`...
print("Entities:")
display(result['entities'])

As you can see, it contains a list of all identified entities in the text each of which is a dictionar containing the starting character position, the entity span, as well as its label.

In [None]:
#@title ##### Now let's see the `'verb_data'`...

print("\nVerb Data:")
display(result['verb_data'])

## Annotating text from file

You can read the content of a text file for annotation.

The code below downloads the example text file, `example-text`, from the source repo [here](https://github.com/SpaceTimeNarratives/spatio-textual/blob/main/example-text) and annotates it.

In [None]:
#@title ##### Download the example text:
!wget -c -q "https://raw.githubusercontent.com/SpaceTimeNarratives/spatio-textual/refs/heads/main/example-text"

In [None]:
#@title ##### Read and annotate the text:
example_text = open("example-text", 'r').read()
file_result = annotate_text(example_text)

In [None]:
#@title ##### Display annotation results:
print("Entities from file:")
display(file_result['entities'])

print("\nVerb Data from file:")
display(file_result['verb_data'])


Another way to this is by uploading your file using the file icon in the left sidebar. Then, replace `"your_file.txt"` with the name of your uploaded file and run the code cell.

## Annotating a list of texts