<a href="https://colab.research.google.com/github/brianchirn/text_sentiment_huggingface/blob/main/TextSentiment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Brian Chirn 5/25/2021

Uses pretrained NLP models to  detect text sentiment from patients responding to DPPP automate messages. 

Uses the twitter-roberta-base-sentiment model from Hugging Face. 
  roBerta-base model trained on 58M tweets and finetuned for sentiment analysis.  
https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment


How to use:
Click the play button on the side of each code cell to to run that portion of code. 
If you would like to see how the model classifies an individual text message, alter the 'text' string in the second to last code cell. 


In [None]:
!pip install transformers

In [None]:
from transformers import AutoModelForSequenceClassification
from transformers import TFAutoModelForSequenceClassification
from transformers import pipeline
from transformers import AutoTokenizer
import numpy as np
from scipy.special import softmax
import csv
import urllib.request



In [None]:
# Select the appropiate task:
# emoji, emotion, hate, irony, offensive, sentiment

task='sentiment'
MODEL = f"cardiffnlp/twitter-roberta-base-{task}"

tokenizer = AutoTokenizer.from_pretrained('cardiffnlp/twitter-roberta-base')

In [None]:
# download label mapping
labels=[]
mapping_link = f"https://raw.githubusercontent.com/cardiffnlp/tweeteval/main/datasets/{task}/mapping.txt"
with urllib.request.urlopen(mapping_link) as f:
    html = f.read().decode('utf-8').split("\n")
    csvreader = csv.reader(html, delimiter='\t')
labels = [row[1] for row in csvreader if len(row) > 1]

# PT (pytorch)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
model.save_pretrained(MODEL)




Model predicts positive, neutral, or negative sentiment based on 'text' field. 

In [None]:
# Model output

text = 'thanks for the reminder doctor'
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores)

ranking = np.argsort(scores)
ranking = ranking[::-1]
for i in range(scores.shape[0]):
    l = labels[ranking[i]]
    s = scores[ranking[i]]
    print(f"{i+1}) {l} {np.round(float(s), 4)}")

In [None]:
# Function for multiple texts within a list

texts= ['hello doctor, i wanted to thank you for the message','thank you for the text message, but i do not quite understand how to use it', 'please stop sending me text messages']

df = []
def sentiment (texts):
  df = []
  for text in texts:
    encoded_input = tokenizer(text, return_tensors='pt')
    output = model(**encoded_input)
    scores = output[0][0].detach().numpy()
    scores = softmax(scores) #score are ordered in negative, neutral, positive
    df.append(scores)
  return df

texts_sentiments = sentiment(texts)
review = []

for i in range(len(texts)):
  if texts_sentiments[i][2]>0.8:
    review.append('positive')
  else:
    review.append('review')

print(texts)
print(review)
