# **Sentiment Analysis of Cancer Metaphors**
---
Metaphors are pervasive in healthcare discourse, especially in oncology, where they shape how patients articulate personal and clinical experiences of illness. Common metaphors, such as,
⚔️ battle,🥊 fight, 👹 enemy, 🛣️ journey, 🎢 roller coaster, 💣 war, frame cancer in ways that strongly influence emotions, perceptions, and coping strategies.

---
Understanding sentiment embedded in metaphorical language is essential for:

🤝 Enhancing human-computer interaction

💜 Enabling empathic AI systems

🏥 Improving clinical communication

---

## **Roberta**

### **1. Install dependencies**

In [1]:
!pip install transformers
!pip install pytorch_lightning

Collecting pytorch_lightning
  Downloading pytorch_lightning-2.5.3-py3-none-any.whl.metadata (20 kB)
Collecting torchmetrics>0.7.0 (from pytorch_lightning)
  Downloading torchmetrics-1.8.1-py3-none-any.whl.metadata (22 kB)
Collecting lightning-utilities>=0.10.0 (from pytorch_lightning)
  Downloading lightning_utilities-0.15.2-py3-none-any.whl.metadata (5.7 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.1.0->pytorch_lightning)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.1.0->pytorch_lightning)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.1.0->pytorch_lightning)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.1.0->pytorch_lightning)
  Downloadin

In [2]:
!pip install sentencepiece



In [3]:
!pip install demoji

Collecting demoji
  Downloading demoji-1.1.0-py3-none-any.whl.metadata (9.2 kB)
Downloading demoji-1.1.0-py3-none-any.whl (42 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/42.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.9/42.9 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: demoji
Successfully installed demoji-1.1.0


In [4]:
!pip install emoji

Collecting emoji
  Downloading emoji-2.14.1-py3-none-any.whl.metadata (5.7 kB)
Downloading emoji-2.14.1-py3-none-any.whl (590 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m590.6/590.6 kB[0m [31m17.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: emoji
Successfully installed emoji-2.14.1


### **2. Imports & NLTK setup**

In [19]:
import nltk
nltk.download('punkt_tab')
from nltk.tokenize import sent_tokenize
import emoji
import demoji
import pandas as pd
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
import pytorch_lightning as pl
import re

from transformers import pipeline
sentiment_task = pipeline("sentiment-analysis")
sentiment_task("Covid cases are increasing fast!")

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.9981315732002258}]

### **3. Data Preprocessing**

In [7]:
def emoji_free_text(text):
    allchars = [str for str in text.decode('utf-8')]
    emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
    clean_text = ' '.join([str for str in text.decode('utf-8').split() if not any(i in str for i in emoji_list)])
    return clean_text

In [8]:
def deEmojify(text):
    regrex_pattern = re.compile(pattern = "["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                           "]+", flags = re.UNICODE)
    return regrex_pattern.sub(r'',text)

In [18]:
def filter(post):
  try:
    post = re.sub(r"^https://ibb.co/[a-zA-Z0-9]*\s", " ", post)
    post = re.sub(r"\s+https://ibb.co/[a-zA-Z0-9]*\s", " ", post)
    post = re.sub(r"(https://ibb.co/[a-zA-Z0-9]*)", " ", post)
    post = re.sub(r"(https://www.reddit.com/r/[a-zA-Z0-9]*)", " ", post)
    post = re.sub(r"(https://[a-zA-Z0-9//._@-]*)", " ", post)
    post = re.sub(r"\s+https://ibb.co/[a-zA-Z0-9]*$", " ", post)
    post = re.sub(r"\s+https://ibb.co/[a-zA-Z0-9]*$", " ", post)
    post = re.sub(r"(https://cbcn.bandcamp.com/[a-zA-Z0-9//]*)", " ", post)
    post = re.sub("@[A-Za-z0-9_]+","", post)
    post = re.sub("#[A-Za-z0-9_]+","", post)
    post = re.sub(r'//www.reddit.com/r/[a-zA-Z0-9]*\s', ' ', post)
    # post = clean(post.encode('utf8'), no_emoji=True)
    # post = emoji.replace_emoji(post, replace="")
    post = re.sub(r"that's","that is", post)
    post = re.sub(r"there's","there is", post)
    post = re.sub(r"what's","what is", post)
    post = re.sub(r"where's","where is", post)
    post = re.sub(r"it's","it is", post)
    post = re.sub(r"It's","it is", post)
    post = re.sub(r"I'm","I am", post)
    post = re.sub(r"who's","who is", post)
    post = re.sub(r"i'm","i am", post)
    post = re.sub(r"she's","she is", post)
    post = re.sub(r"he's","he is", post)
    post = re.sub(r"you're","you are", post)
    post = re.sub(r"they're","they are", post)
    post = re.sub(r"who're","who are", post)
    post = re.sub(r"ain't","am not", post)
    post = re.sub(r"wouldn't","would not", post)
    post = re.sub(r"shouldn't","should not", post)
    post = re.sub(r"can't","can not", post)
    post = re.sub(r"couldn't","could not", post)
    post = re.sub(r"won't","will not", post)
    post = re.sub(r"don't","do not", post)
    post = re.sub(r"\s+[s]\s+"," ", post)
    post = re.sub(r"\s*[\[\]\(\)\*#<>\'\":]\s*"," ", post)
    post = re.sub(r"\s+"," ", post)
    post = deEmojify(post)
    post = demoji.replace(post)
  except TypeError:
      pass
  return post

### **4. Example data**

In [41]:
sentences = [
    "I struggle this time of year because I miss my Mummy who was my best friend but who sadly lost her battle to cancer at the young age of 64 in Apr 2013.",
    "You have fought a long and hard battle but I hope that you are able to enjoy Christmas with your family.",
    "What you say resonates with me even tho I'm at the tail end of my first battle rather than facing my second.",
    "Sorry for your re occurance but you look fabulously healthy and strong so I feel you are up to another battle.",
    "You find out another personal battle raged.",
    "I have a close friend who had to battle her GP for an awfully long time before finally being diagnosed at stage 4  whose to say if they had been more proactive and knowledgeable that she could have been helped much sooner when her long term prognosis could have been so much more positive?.",
    "Nothing sad about that at all sweet I can imagine how you feel once you're told you have OC I admire everyone on this site dealing with this battle everyday you're all amazing and don't let anyone tell you different xx",
    "Now I'm in recurrence so starting battle again.",
    "You've fought another big battle and triumphed.",
    "If it comes back we have to face it  we've done it before and won the battle.",
    "Wear your bald head with pride you have come through a battle and won just remember sunscreen on that head!!!",
    "Good to hear your news  sorry you are having the joint pains but I think sometimes a small price to pay for our fight in this battle.",
    "Try not to worry if you are starting a new regime  once the plan is in place and you start you begin to feel the battle is on and its all systems go.",
    "Yes I just joined this scary battle!",
]


In [47]:
df = pd.DataFrame(sentences, columns=["Sentence"])

### **5. Run RoBERTa sentiment**

In [10]:
from transformers import AutoModelForSequenceClassification
from transformers import TFAutoModelForSequenceClassification
from transformers import AutoTokenizer, AutoConfig
import numpy as np
from scipy.special import softmax
# Preprocess text (username and link placeholders)
def preprocess(text):
    new_text = []
    for t in text.split(" "):
        t = '@user' if t.startswith('@') and len(t) > 1 else t
        t = 'http' if t.startswith('http') else t
        new_text.append(t)
    return " ".join(new_text)
MODEL = f"cardiffnlp/twitter-roberta-base-sentiment-latest"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
config = AutoConfig.from_pretrained(MODEL)
# PT
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
#model.save_pretrained(MODEL)

config.json:   0%|          | 0.00/929 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/501M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/501M [00:00<?, ?B/s]

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [13]:
def extract_emotions(data):
    posts_emo = {}
    for i in range(len(data)):
        scores_list = []
        posts = []
        sentences = nltk.sent_tokenize(data[i])
        for sent in sentences:
            if(sent):
                sent = filter(sent)
                text = preprocess(sent)
                # Truncate the sequence during tokenization
                encoded_input = tokenizer(text, return_tensors='pt', truncation=True, max_length=512)  # Assuming max_length is 512
                output = model(**encoded_input)
                scores = output[0][0].detach().numpy()
                scores = softmax(scores)
                ranking = np.argsort(scores)
                ranking = ranking[::-1]
                sentiment_scores = []
                for j in range(scores.shape[0]):
                    l = model.config.id2label[ranking[j]]
                    s = scores[ranking[j]]
                    # print(f"{text}{i+1}) {l} {np.round(float(s), 4)}")
                    sentiment = l + '=' + str(np.round(float(s), 4))
                    sentiment_scores.append(sentiment)
                    # print(sentiment_scores)
                scores_list.append(sentiment_scores)
        posts_emo[i] = scores_list
    return posts_emo

In [49]:
df_roberta_sentiments = pd.DataFrame(columns = ['Post Text', 'Emotion Score'])

### **6. Roberta Results**

In [50]:
df_roberta_sentiments['Post Text'] = df['Sentence']
df_roberta_sentiments['Emotion Score'] = extract_emotions(df['Sentence'].tolist())

In [51]:
df_roberta_sentiments

Unnamed: 0,Post Text,Emotion Score
0,I struggle this time of year because I miss my...,"[[negative=0.8794, neutral=0.1066, positive=0...."
1,You have fought a long and hard battle but I h...,"[[positive=0.8332, neutral=0.1351, negative=0...."
2,What you say resonates with me even tho I'm at...,"[[neutral=0.4861, positive=0.4676, negative=0...."
3,Sorry for your re occurance but you look fabul...,"[[positive=0.8443, neutral=0.1128, negative=0...."
4,You find out another personal battle raged.,"[[neutral=0.7893, negative=0.1474, positive=0...."
5,I have a close friend who had to battle her GP...,"[[negative=0.6423, neutral=0.3146, positive=0...."
6,Nothing sad about that at all sweet I can imag...,"[[positive=0.6456, neutral=0.2081, negative=0...."
7,Now I'm in recurrence so starting battle again.,"[[negative=0.6207, neutral=0.3585, positive=0...."
8,You've fought another big battle and triumphed.,"[[neutral=0.5737, positive=0.3408, negative=0...."
9,If it comes back we have to face it we've don...,"[[neutral=0.5841, negative=0.2645, positive=0...."


## **GPT3.5**

---

### **7. Import & Authenticate**

In [55]:
import openai
import getpass

openai.api_key = getpass.getpass("Please enter your OpenAI Key:")

Please enter your OpenAI Key:··········


In [56]:
def complete(prompt):
    response = openai.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {
            "role": "user",
            "content": prompt
            }
        ],
    )
    return response.choices[0].message.content

complete("is this working?")

'Yes, it appears to be working as I am able to respond to your question. Let me know if you need any further assistance.'

### **8. Define the prompt**

In [57]:
prompt = "Classify the sentiment of the sentence as Positive, Neutral, or Negative."

### **9. GPT Sentiment**

In [58]:
gpt_sentiments = []

for sent in df['Sentence']:
    base_prompt = prompt + f" Sentence: {sent}"
    result = complete(base_prompt)
    gpt_sentiments.append(result)

In [59]:
df_gpt_sentiments = pd.DataFrame(columns = ['Post Text', 'Emotion Score'])

### **10. GPT Results**

In [60]:
df_gpt_sentiments['Post Text'] = df['Sentence']
df_gpt_sentiments['Emotion Score'] = extract_emotions(gpt_sentiments)

In [61]:
df_gpt_sentiments

Unnamed: 0,Post Text,Emotion Score
0,I struggle this time of year because I miss my...,"[[neutral=0.6068, negative=0.2858, positive=0...."
1,You have fought a long and hard battle but I h...,"[[positive=0.779, neutral=0.201, negative=0.02]]"
2,What you say resonates with me even tho I'm at...,"[[positive=0.779, neutral=0.201, negative=0.02]]"
3,Sorry for your re occurance but you look fabul...,"[[positive=0.779, neutral=0.201, negative=0.02]]"
4,You find out another personal battle raged.,"[[neutral=0.5295, positive=0.4071, negative=0...."
5,I have a close friend who had to battle her GP...,"[[neutral=0.6068, negative=0.2858, positive=0...."
6,Nothing sad about that at all sweet I can imag...,"[[positive=0.779, neutral=0.201, negative=0.02]]"
7,Now I'm in recurrence so starting battle again.,"[[neutral=0.5295, positive=0.4071, negative=0...."
8,You've fought another big battle and triumphed.,"[[positive=0.779, neutral=0.201, negative=0.02]]"
9,If it comes back we have to face it we've don...,"[[positive=0.779, neutral=0.201, negative=0.02]]"


**End.**

---