<a href="https://colab.research.google.com/github/IgnatiusEzeani/spatio-textual-colab-demos/blob/main/demo_2_sentiment_emotions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Classifying Sentiment and Emotion with `spatio-textual`

In this demo, we explore the sentiment classification and analysis features withi the `spatio-textual` package.

It defaults to the a rule-based approach but includes the supports for large language models and HuggingFace

---

## Setting up

### Downloads
As earlier, download the [spaCy](https://spacy.io/)'s NLP model, `en_core_web_trf`, and install the `spatio-textual` package.

In [None]:
!python -m spacy download en_core_web_trf -q
!pip install -q git+https://github.com/SpaceTimeNarratives/spatio-textual.git

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m457.4/457.4 MB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m237.9/237.9 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m734.0/734.0 kB[0m [31m15.1 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_trf')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m26.6/26.6 MB[0m [31m85.8 MB/s[0m 

### Imports  <a id='imports'></a>
Let's import the necessary tools: `load_spacy_model` and `Annotator` from `spatio_textual.utils`.

We also need `pandas` for working with data frames.

In [None]:
import spatio_textual
from spatio_textual.utils import load_spacy_model, Annotator
import pandas as pd

### Annotating entities

As in Demo 1, we need the `spaCy` model and the `Annotator` module for the spatial entity annotations.

In [None]:
#@title ###### Use `spaCy` nlp model to instantiate `Annotator`
nlp = load_spacy_model("en_core_web_trf")
ann = Annotator(nlp)

In [None]:
#@title ###### Consider these example texts
texts = [
    "I felt safe and relieved when we reached the farmhouse.",
    "We were afraid, hungry, and cold during the march.",
    "They asked us questions.",
]

In [None]:
#@title ###### We start by annotating the `entities` (see [Demo 1](https://github.com/SpaceTimeNarratives/spatio-textual-colab-demos/blob/main/demo_1_entity_annotation.ipynb))
entities = ann.annotate_texts(
    texts,
    file_id="sent_demo",  # Use what is relevant for your work
    include_text=True,    # Let's you include the text in the result
    include_verbs=True)   # Let's you extract verbs
entities

[{'entities': [{'start_char': 45, 'token': 'farmhouse', 'tag': 'GEONOUN'}],
  'verb_data': [{'sent-id': 0,
    'verb': 'felt',
    'subject': 'I',
    'object': '',
    'sentence': 'I felt safe and relieved when we reached the farmhouse.'},
   {'sent-id': 0,
    'verb': 'reached',
    'subject': 'we',
    'object': 'farmhouse',
    'sentence': 'I felt safe and relieved when we reached the farmhouse.'}],
  'fileId': 'sent_demo',
  'segId': 1,
  'text': 'I felt safe and relieved when we reached the farmhouse.',
  'segCount': 3},
 {'entities': [],
  'verb_data': [],
  'fileId': 'sent_demo',
  'segId': 2,
  'text': 'We were afraid, hungry, and cold during the march.',
  'segCount': 3},
 {'entities': [],
  'verb_data': [{'sent-id': 0,
    'verb': 'asked',
    'subject': 'They',
    'object': 'questions',
    'sentence': 'They asked us questions.'}],
  'fileId': 'sent_demo',
  'segId': 3,
  'text': 'They asked us questions.',
  'segCount': 3}]

In [None]:
pd.DataFrame([{k:row.get(k) for k in [
    "segId","text","entities","verb_data"]} for row in entities])

Unnamed: 0,segId,text,entities,verb_data
0,1,I felt safe and relieved when we reached the f...,"[{'start_char': 45, 'token': 'farmhouse', 'tag...","[{'sent-id': 0, 'verb': 'felt', 'subject': 'I'..."
1,2,"We were afraid, hungry, and cold during the ma...",[],[]
2,3,They asked us questions.,[],"[{'sent-id': 0, 'verb': 'asked', 'subject': 'T..."


---
## Adding Sentiment

Now we need a module called `SentimentAnalyzer` from `spatio_textual.sentiment`. It's backend supports three distinct approaches to assigning sentiments to text:
- `rule`: uses a **rule-based** method with sentiment lexicon to estimate a sentiment score for the text
- `hf`: uses **HuggingFace** models via its `sentiment-analysis` pipeline.   
- `llm`: uses large language models, **LLMs** and supports some of the common providers and models:
  - providers: `openai`, `anthropic`, `google`, `groq`, `xai`, `ollama`
  - models: `gpt-4o-mini`, `claude-3-5-sonnet-20240620`, `gemini-1.5-pro`, `llama3:8b`

### 1. Rule-based Sentiment Analysis


This approach is quite basic. The key steps include:
1. Split the text into words (lowercase; strip punctuation).
2. Count how many words are in a **positive** lexicon and how many are in a **negative** lexicon.
    * `pos = #positive words in text`
    * `neg = #negative words in text`

3. Compute a raw balance:
    * `raw = pos − neg`.
4. Convert that raw number into a bounded score in **\[−1, 1]** using a smooth squash (`tanh`).

$$
\text{score} = \tanh\!\left(\frac{\text{raw}}{3}\right) \in [-1, 1]
$$


5. Assign a label based on the score:
$$
\text{label} =
\begin{cases}
\text{positive} & \text{if } \text{score} > 0.15 \\
\text{negative} & \text{if } \text{score} < -0.15 \\
\text{neutral} & \text{otherwise}
\end{cases}
$$

**Quick examples:**

* 2 positive words, 1 negative: $\text{raw}=1\Rightarrow\text{score}=\tanh(1/3)\approx 0.32\Rightarrow$ **positive**.
* 1 positive, 1 negative: $\text{raw}=0\Rightarrow\text{score}=0\Rightarrow$ **neutral**.
* 0 positive, 3 negative: $\text{raw}=-3\Rightarrow\text{score}=\tanh(-1)\approx -0.76\Rightarrow$ **negative**.


In [None]:
#@title ###### So let's import `SentimentAnalyzer`...
from spatio_textual.sentiment import SentimentAnalyzer

In [None]:
#@title ###### ... and then classify the example...
sa = SentimentAnalyzer("rule")
sentiment_scores = sa.predict(texts)
sentiment_scores

[{'label': 'positive', 'score': 0.32151273753163434},
 {'label': 'neutral', 'score': 0.0},
 {'label': 'neutral', 'score': 0.0},
 {'label': 'positive', 'score': 0.32151273753163434},
 {'label': 'negative', 'score': -0.32151273753163434},
 {'label': 'neutral', 'score': 0.0},
 {'label': 'negative', 'score': -0.5827829453479102},
 {'label': 'neutral', 'score': 0.0},
 {'label': 'negative', 'score': -0.5827829453479102},
 {'label': 'negative', 'score': -0.7615941559557649},
 {'label': 'negative', 'score': -0.5827829453479102},
 {'label': 'negative', 'score': -0.32151273753163434},
 {'label': 'neutral', 'score': 0.0},
 {'label': 'neutral', 'score': 0.0}]

In [None]:
#@title ###### ...and combine it with `entities`.
results = entities
for r, p in zip(results, sentiment_scores):
    r.update({"sentiment_label": p["label"], "sentiment_score": p["score"]})

pd.DataFrame([{k:row.get(k) for k in [
    "segId","text","entities","verb_data","sentiment_label","sentiment_score"
    ]} for row in results])

Unnamed: 0,segId,text,entities,verb_data,sentiment_label,sentiment_score
0,1,I felt safe and relieved when we reached the f...,"[{'start_char': 45, 'token': 'farmhouse', 'tag...","[{'sent-id': 0, 'verb': 'felt', 'subject': 'I'...",positive,0.321513
1,2,"We were afraid, hungry, and cold during the ma...",[],[],negative,-0.582783
2,3,They asked us questions.,[],"[{'sent-id': 0, 'verb': 'asked', 'subject': 'T...",neutral,0.0


### 2. Sentiment Analysis with transformer model

To use the `HuggingFace` pipeline for sentiment analysis at the backend, we simply pass the `hf` parameter while initialising the `SentimentAnalyzer` object.

In [None]:
sa = SentimentAnalyzer("hf")

The default model is [CardiffNLP](https://cardiffnlp.github.io/)'s [twitter-roberta-base-sentiment-latest](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) but you can pass any other model on HuggingFace e.g.


>```python
>sa = SentimentAnalyzer("hf", model_name="siebert/sentiment-roberta-large-english")
>```

In [None]:
import warnings

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    hf_sentiment_scores = sa.predict(texts)

hf_sentiment_scores

In [None]:
#@title ###### As earlier, we can combine the results with the extracted `entities`.
results = entities
for r, p in zip(results, hf_sentiment_scores):
    r.update({"hf_sentiment_label": p["label"], "hf_sentiment_score": p["score"]})

pd.DataFrame([{k:row.get(k) for k in [
    "segId","text","entities","verb_data","hf_sentiment_label","hf_sentiment_score"
    ]} for row in results])

>Observe that unlike the rule-based scores that are signed to indicate positive and negative, HuggingFace (transformer) sentiment scoring basically shows probability values indicating how 'confident' the model is in the decision.

The `spatio-textual` package allows us to convert the scores to signed values if required by setting the `include_signed` parameter to `True`.

In [None]:
#@title ###### So let's try that...
hf_sentiment_scores = sa.predict(texts, include_signed=True)
hf_sentiment_scores

In [None]:
#@title ###### ...and of course combine it with `entities`.
results = entities
for r, p in zip(results, hf_sentiment_scores):
    r.update({"hf_sentiment_label": p["label"], "hf_sentiment_score": p["score"],
              "hf_signed_score": p["signed"]})

pd.DataFrame([{k:row.get(k) for k in [
    "segId","text","entities","verb_data","hf_sentiment_label",
    "hf_sentiment_score", "hf_signed_score"
    ]} for row in results])

### 3. LLM-based Sentiment Analysis

`spatio_textual.sentiment` has a built in LLM support for theses providers and their models: **openai**: `gpt-4o-mini`, **anthropic**: `claude-3-5-sonnet-20240620`, **google**: `gemini-1.5-pro`, **groq**: `llama3-70b-8192` (or mixtral, etc), **xai**: `grok-beta` (use `base_url=https://api.x.ai, OPENAI-compatible`), **ollama**: `llama3:8b` (local)


We will need the `LLMRouter` from the `spatio_textual.llm` module to define the LLM provider (e.g `openai`), the specific model (e.g. `gpt-4o-mini`) as well as API key and other parameters.
> You can store your API keys on Colab for easy access or paste it when prompted in the is demo

In [None]:
#@title ###### So let's import `LLMRouter` and set the API key.
from spatio_textual.llm import LLMRouter
from google.colab import userdata

try:
  api_key = userdata.get('OPENAI_API_KEY')
except:
  api_key = input('API KEY: ')

In [None]:
#@title ###### We can now set up the router and instantiate `SentimentAnalyzer` with `llm` for the backend.
router = LLMRouter(provider="openai", model="gpt-4o-mini",
    api_key= api_key, # else OPENAI_API_KEY / ANTHROPIC_API_KEY / GOOGLE_API_KEY / GROQ_API_KEY
    # base_url="https://api.x.ai",  # for OpenAI-compatible endpoints like xAI/Together
    temperature=0.0,
    max_tokens=64,
)
sa = SentimentAnalyzer(backend="llm", llm_fn=router.sentiment)

In [None]:
#@title ###### Now we are ready to predict texts.
llm__sentiment_scores = sa.predict(texts)
llm__sentiment_scores

In [None]:
#@title ###### Oh 🤔, we probably need the signed scores as well.
llm__sentiment_scores = sa.predict(texts, include_signed=True)
llm__sentiment_scores

In [None]:
#@title ###### Combining it with `entities`...
results = entities
for r, p in zip(results, llm__sentiment_scores):
    r.update({"llm_sentiment_label": p["label"], "llm_sentiment_score": p["score"],
              "llm_signed_score": p["signed"]})

pd.DataFrame([{k:row.get(k) for k in [
    "segId","text","entities","verb_data","llm_sentiment_label",
    "llm_sentiment_score", "llm_signed_score"
    ]} for row in results])

---
## Adding Emotion

### Expanding `texts` with Emotion Classes

In [None]:
#@title ###### Let's expanded `texts` a bit to reflect other emotion classes
texts = [
    # JOY (positive valence)
    "I felt safe and relieved when we reached the farmhouse.",
    "We were reunited with my sister and welcomed inside.",

    # SURPRISE (slightly positive/neutral valence)
    "Suddenly, the guards announced a change we did not expect.",
    "To my surprise, the train stopped before dawn.",

    # SADNESS (negative valence)
    "I cried for days after the loss of my friend.",
    "We mourned in silence, thinking about those who were gone.",

    # FEAR (negative valence)
    "We were afraid, hungry, and cold during the march.",
    "I was terrified when the sirens sounded across the camp.",

    # ANGER (negative valence)
    "I was furious at the cruelty we faced.",
    "He spoke with rage about the injustice they suffered.",

    # DISGUST (negative valence)
    "We were disgusted by the filth in the barracks.",
    "The stench made us nauseated and we looked away.",

    # NEUTRAL (baseline)
    "They asked us questions.",
    "We walked along the road and waited in line.",
]

In [None]:
#@title ###### As in the sentiment example, we want to annotate the `entities` in `texts`
entities = ann.annotate_texts(
    texts,
    include_text=True,    # Let's you include the text in the result
    include_verbs=True)   # Let's you extract verbs
entities

pd.DataFrame([{k:row.get(k) for k in [
    "segId","text","entities","verb_data"]} for row in entities])

Now, let's label each text with one of the emotion classes.

We will start by importing the `EmotionAnalyzer` from `spatio_textual`'s `emotion` module.

In [None]:
from spatio_textual.emotion import EmotionAnalyzer

### 1. Rule-based Emotion Analysis

**How it works**

1. **Tokenize** the text into lowercase words (strip punctuation).
2. **Count matches** against each emotion lexicon: `joy`, `surprise`, `sadness`, `fear`, `anger`, `disgust`.
   A `neutral` bucket is kept as baseline (no counts).
3. **Convert counts to probabilities** with a softmax. If no matches at all, set `neutral = 1`.
4. **Pick the label** as the emotion with the highest probability; the **score** is that top probability.
5. Optionally compute a **valence** (signed score in $[-1,1]$) by weighting the distribution with fixed positivity/negativity weights.

---

**Notation:**

Let the emotion set be
$\mathcal{E}=\{\text{neutral},\ \text{joy},\ \text{surprise},\ \text{sadness},\ \text{fear},\ \text{anger},\ \text{disgust}\}.$

**Counts (per emotion $e$)**
$c_e=\#\{\text{tokens matching lexicon for } e\},\quad c_{\text{neutral}}=0.$

**Distribution $d_e$ (probabilities)**

* If $\sum_{e\neq \text{neutral}} c_e = 0$:
  $d_{\text{neutral}}=1,\quad d_{e}=0\ \text{for } e\neq \text{neutral}.$
* Else (softmax over counts):
  $d_e=\frac{\exp(c_e)}{\sum\limits_{e'\in\mathcal{E}}\exp(c_{e'})}.$

**Label & confidence**

$$text{label}=\arg\max_{e\in\mathcal{E}} d_e,\qquad
\text{score}=\max_{e\in\mathcal{E}} d_e.\]

---

### Optional: Valence (signed score in \([-1,1]\))
Choose fixed weights \(w(e)\):  

$[
w(\text{neutral})=0, w(\text{joy})=1, w(\text{surprise})=0.2, w(\text{sadness})=-1, w(\text{fear})=-0.8, w(\text{anger})=-0.7, w(\text{disgust})=-0.7]$

Then

$[\text{signed}=\sum_{e\in\mathcal{E}} d_e\,w(e)\ \in [-1,1]]$

(Clip to \([-1,1]\) if rounding pushes you out of range.)

**Optional coarse label from valence** (mirrors your sentiment thresholds):  
$[\text{positive if } \text{signed}>0.15;\quad
[\text{negative if } \text{signed}< -0.15;\quad
[\text{neutral otherwise.}\]$

---

### Quick examples
- “**I felt safe and relieved when we reached the farmhouse.**”  
\(c_{\text{joy}}>0\), others \(\approx 0\) → \(d_{\text{joy}}\) highest → **label = joy**,  
**signed** \(\approx +d_{\text{joy}}\) (positive).

- “**We were afraid, hungry, and cold during the march.**”  
\(c_{\text{fear}}>0\) → \(d_{\text{fear}}\) highest → **label = fear**,  
**signed** \(\approx -0.8\times d_{\text{fear}}\) (negative).

- “**They asked us questions.**”  
No matches → **neutral** distribution → **label = neutral**, **signed = 0**.
$$


In [None]:
# Rule backend with valence and distribution
emo = EmotionAnalyzer(backend="rule")
emotion_scores = emo.predict(texts, include_signed=True)

In [None]:
#@title ###### Annotate `entities` and add emotion scores...
results = entities
for r, p in zip(results, emotion_scores):
    r.update({"emotion_label": p["label"], "emotion_score": p["score"],
              "emotion_score": p["signed"]})

pd.DataFrame([{k:row.get(k) for k in [
    "segId","text","entities","verb_data","emotion_label",
    "emotion_score", "emotion_score"
    ]} for row in results])

## Tips & Troubleshooting  <a id='tips'></a>
- Rule backend is offline and immediate but simplistic; HF/LLM provide richer signals.
- Keep inputs as short segments for better classifier performance.


## Summary  <a id='summary'></a>
You ran sentiment classification with the rule backend and saw how to plug an HF pipeline.
