<a href="https://colab.research.google.com/github/IgnatiusEzeani/spatio-textual-colab-demos/blob/main/demo_2_sentiment_emotions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Classifying Sentiment and Emotion with `spatio-textual`

In this demo, we explore the sentiment classification and analysis features withi the `spatio-textual` package.

It defaults to the a rule-based approach but includes the supports for large language models and HuggingFace

---

## Setting up

### Downloads
As earlier, download the `spaCy` model and install the `spatio-textual` package

In [1]:
!python -m spacy download en_core_web_trf
!pip install -q git+https://github.com/SpaceTimeNarratives/spatio-textual.git

Collecting en-core-web-trf==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_trf-3.8.0/en_core_web_trf-3.8.0-py3-none-any.whl (457.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m457.4/457.4 MB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting spacy-curated-transformers<1.0.0,>=0.2.2 (from en-core-web-trf==3.8.0)
  Downloading spacy_curated_transformers-0.3.1-py2.py3-none-any.whl.metadata (2.7 kB)
Collecting curated-transformers<0.2.0,>=0.1.0 (from spacy-curated-transformers<1.0.0,>=0.2.2->en-core-web-trf==3.8.0)
  Downloading curated_transformers-0.1.1-py2.py3-none-any.whl.metadata (965 bytes)
Collecting curated-tokenizers<0.1.0,>=0.0.9 (from spacy-curated-transformers<1.0.0,>=0.2.2->en-core-web-trf==3.8.0)
  Downloading curated_tokenizers-0.0.9-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.9 kB)
Downloading spacy_curated_transformers-0.3.1-py2.py3-none-any.whl (237 kB)
[2K   [90m━━━━━

### Imports  <a id='imports'></a>
Let's import the necessary tools: `load_spacy_model` and `Annotator` from `spatio_textual.utils`.

We also need `pandas` for working with data frames.

In [2]:
import spatio_textual
from spatio_textual.utils import load_spacy_model, Annotator
import pandas as pd

## Annotating entities

As in Demo 1, we need the `spaCy` model and the `Annotator` module for the spatial entity annotations.

In [3]:
#@title ###### Use `spaCy` to instantiate `Annotator`
nlp = load_spacy_model("en_core_web_trf")
ann = Annotator(nlp)

In [4]:
#@title ###### Using example texts
texts = [
    "I felt safe and relieved when we reached the farmhouse.",
    "We were afraid, hungry, and cold during the march.",
    "They asked us questions.",
]

In [5]:
#@title ###### Annotating texts
entities = ann.annotate_texts(
    texts,
    file_id="sent_demo",  # Use what is relevant for your work
    include_text=True,    # Let's you include the text in the result
    include_verbs=True)   # Let's you extract verbs
entities

[{'entities': [{'start_char': 45, 'token': 'farmhouse', 'tag': 'GEONOUN'}],
  'verb_data': [{'sent-id': 0,
    'verb': 'felt',
    'subject': 'I',
    'object': '',
    'sentence': 'I felt safe and relieved when we reached the farmhouse.'},
   {'sent-id': 0,
    'verb': 'reached',
    'subject': 'we',
    'object': 'farmhouse',
    'sentence': 'I felt safe and relieved when we reached the farmhouse.'}],
  'fileId': 'sent_demo',
  'segId': 1,
  'text': 'I felt safe and relieved when we reached the farmhouse.',
  'segCount': 3},
 {'entities': [],
  'verb_data': [],
  'fileId': 'sent_demo',
  'segId': 2,
  'text': 'We were afraid, hungry, and cold during the march.',
  'segCount': 3},
 {'entities': [],
  'verb_data': [{'sent-id': 0,
    'verb': 'asked',
    'subject': 'They',
    'object': 'questions',
    'sentence': 'They asked us questions.'}],
  'fileId': 'sent_demo',
  'segId': 3,
  'text': 'They asked us questions.',
  'segCount': 3}]

In [6]:
pd.DataFrame([{k:row.get(k) for k in [
    "segId","text","entities","verb_data"]} for row in entities])

Unnamed: 0,segId,text,entities,verb_data
0,1,I felt safe and relieved when we reached the f...,"[{'start_char': 45, 'token': 'farmhouse', 'tag...","[{'sent-id': 0, 'verb': 'felt', 'subject': 'I'..."
1,2,"We were afraid, hungry, and cold during the ma...",[],[]
2,3,They asked us questions.,[],"[{'sent-id': 0, 'verb': 'asked', 'subject': 'T..."


---
## Adding Sentiment

Now we need a module called `SentimentAnalyzer` from `spatio_textual.sentiment`. It's backend supports three distinct approaches to assigning sentiments to text:
- `rule`: uses a **rule-based** method with sentiment lexicon to estimate a sentiment score for the text
- `hf`: uses **HuggingFace** models via its `sentiment-analysis` pipeline.   
- `llm`: uses large language models, **LLMs** and supports some of the common providers and models:
  - providers: `openai`, `anthropic`, `google`, `groq`, `xai`, `ollama`
  - models: `gpt-4o-mini`, `claude-3-5-sonnet-20240620`, `gemini-1.5-pro`, `llama3:8b`

### 1. Rule-based Sentiment Analysis

In general, this approach here is very basic and simplistic. The sentiment score is easy to read: positive and negative values indictate positive and negative sentiments respectively while zero (or values very close to zero) neutral are considered neutral sentiments.

In [7]:
#@title ###### So let's import `SentimentAnalyzer`...
from spatio_textual.sentiment import SentimentAnalyzer

In [8]:
#@title ###### ... and then classify the example...
sa = SentimentAnalyzer("rule")
sentiment_scores = sa.predict(texts)
sentiment_scores

[{'label': 'positive', 'score': 0.32151273753163434},
 {'label': 'negative', 'score': -0.5827829453479102},
 {'label': 'neutral', 'score': 0.0}]

In [9]:
#@title ###### ...and combine it with the entities
results = entities
for r, p in zip(results, sentiment_scores):
    r.update({"sentiment_label": p["label"], "sentiment_score": p["score"]})

pd.DataFrame([{k:row.get(k) for k in [
    "segId","text","entities","verb_data","sentiment_label","sentiment_score"
    ]} for row in results])

Unnamed: 0,segId,text,entities,verb_data,sentiment_label,sentiment_score
0,1,I felt safe and relieved when we reached the f...,"[{'start_char': 45, 'token': 'farmhouse', 'tag...","[{'sent-id': 0, 'verb': 'felt', 'subject': 'I'...",positive,0.321513
1,2,"We were afraid, hungry, and cold during the ma...",[],[],negative,-0.582783
2,3,They asked us questions.,[],"[{'sent-id': 0, 'verb': 'asked', 'subject': 'T...",neutral,0.0


### 2. Sentiment Analysis with transformer model

To use the `HuggingFace` pipeline for sentiment analysis at the backend, we simply pass the `hf` parameter while initialising the `SentimentAnalyzer` object.

The default model is [CardiffNLP](https://cardiffnlp.github.io/)'s [twitter-roberta-base-sentiment-latest](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) but you can pass any other model on HuggingFace

In [10]:
sa = SentimentAnalyzer("hf")

In [12]:
sentiment_scores = sa.predict(texts)
sentiment_scores

[{'label': 'positive', 'score': 0.8691964745521545},
 {'label': 'negative', 'score': 0.8452902436256409},
 {'label': 'neutral', 'score': 0.8955906629562378}]

In [None]:
from transformers import pipeline
hf = pipeline("sentiment-analysis", model="cardiffnlp/twitter-roberta-base-sentiment-latest")
sentiment_analysis = pipeline("sentiment-analysis",model="siebert/sentiment-roberta-large-english")

### 3. LLM-based Sentiment Analysis

In [None]:
from spatio_textual.llm import LLMRouter

### Quick Demo  <a id='data-demo'></a>

In [None]:
texts = [
    "I felt safe and relieved when we reached the farmhouse.",
    "We were afraid, hungry, and cold during the march.",
    "They asked us questions.",
]
sa = SentimentAnalyzer("rule")
sa.predict(texts)


### Main Tutorial
#### 1. Annotate + attach sentiment
We can annotate the texts and attach sentiment score using `SentimentAnalyzer("rule")` i.e. the default rule-based approach in `spatio-textual`

In [None]:
recs = ann.annotate_texts(
    texts,
    file_id="sent_demo", # Use what is relevant for your work
    include_text=True, # Let's you include the text in the result
    include_verbs=True) # Let's you extract verbs

sa = SentimentAnalyzer("rule")
preds = sa.predict([r["text"] for r in recs])

for r, p in zip(recs, preds):
    r.update({"sentiment_label": p["label"], "sentiment_score": p["score"]})


#### 2. Using a HuggingFace pipeline
We can also use a transformer-based sentiment analysis model from HuggingFace.

Here are using the [twitter-roberta-base-sentiment-latest](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) from the [CardiffNLP](https://cardiffnlp.github.io/) team.

In [None]:
recs = ann.annotate_texts(
    texts,
    file_id="sent_demo",  # Use what is relevant for your work
    include_text=True,    # Let's you include the text in the result
    include_verbs=True)   # Let's you extract verbs

hf_sentiments = hf(texts)
for r, p in zip(recs, hf_sentiments):
    r.update({"hf_sentiment_label": p["label"],
              "hf_sentiment_score": p["score"]})

pd.DataFrame([{k:r.get(k) for k in [
    "segId","entities","verb_data","text",
    "hf_sentiment_label","hf_sentiment_score"]}
              for r in recs])

#### 2. Hooking up an LLM for sentiment classification
`spatio-textual` has a built in LLM support for theses providers and their models:

* **openai**: `gpt-4o-mini`
* **anthropic**: `claude-3-5-sonnet-20240620`
* **google**: `gemini-1.5-pro`
* **groq**: `llama3-70b-8192` (or mixtral, etc)
* **xai**: `grok-beta` (use `base_url=https://api.x.ai, OPENAI-compatible`)
* **ollama**: `llama3:8b` (local)


In [None]:
router = LLMRouter(
    provider="openai",
    model="gpt-4o-mini",
    api_key="",
    # Optional overrides (or use env vars):
    # api_key="...",                # else OPENAI_API_KEY / ANTHROPIC_API_KEY / GOOGLE_API_KEY / GROQ_API_KEY
    # base_url="https://api.x.ai",  # for OpenAI-compatible endpoints like xAI/Together
    temperature=0.0,
    max_tokens=64,
)

# Your existing ann pipeline
recs = ann.annotate_texts(
    texts,
    file_id="sent_demo",
    include_text=True,
    include_verbs=True
)

# Drop-in LLM sentiment
llm_sentiments = router.sentiment(texts, rate_limit_s=0.0)

for r, p in zip(recs, llm_sentiments):
    r.update({"llm_sentiment_label": p["label"], "llm_sentiment_score": p["score"]})

pd.DataFrame([{k:r.get(k) for k in [
    "segId","entities","verb_data","text",
    "llm_sentiment_label","llm_sentiment_score"]}
              for r in recs])

## Tips & Troubleshooting  <a id='tips'></a>
- Rule backend is offline and immediate but simplistic; HF/LLM provide richer signals.
- Keep inputs as short segments for better classifier performance.


## Summary  <a id='summary'></a>
You ran sentiment classification with the rule backend and saw how to plug an HF pipeline.


### Draft usage examples

A. Use LLM provider via env (no code changes beyond backend="llm")

In [None]:

# Example for OpenAI
export LLM_PROVIDER=openai
export LLM_MODEL=gpt-4o-mini
export OPENAI_API_KEY=sk-...

# Then in Python:
from spatio_textual.sentiment import SentimentAnalyzer
sa = SentimentAnalyzer(backend="llm")
sa.predict(["Anne Frank was taken from Amsterdam to Auschwitz."])
B) Pass an LLMRouter instance (explicit)
python
Copy code
from spatio_textual.llm import LLMRouter
from spatio_textual.sentiment import SentimentAnalyzer

router = LLMRouter(provider="openai", model="gpt-4o-mini")
sa = SentimentAnalyzer(backend="llm", llm_fn=router)
sa.predict(["Anne Frank was taken from Amsterdam to Auschwitz."])
C) Hugging Face Transformers
python
Copy code
from spatio_textual.sentiment import SentimentAnalyzer
sa = SentimentAnalyzer(backend="hf", model_name="cardiffnlp/twitter-roberta-base-sentiment-latest")
sa.predict(["Anne Frank was taken from Amsterdam to Auschwitz."])