# Sentiment Analysis

In this notebook, you'll see how to perform sentiment analysis using Python and state-of-the-art natural language processing tools.

In [1]:
!bash setup.sh
print("Installation complete.")

Installation complete.


The first thing we want to do it to load the model that we're going to use for analysis:

In [2]:
# Instantiate the model
from transformers import pipeline
from IPython.display import clear_output
distilled_student_sentiment_classifier = pipeline(
    model="lxyuan/distilbert-base-multilingual-cased-sentiments-student", 
    return_all_scores=True
)
clear_output(wait=True)
print("Model Successfully loaded.")

Model Successfully loaded.


We can then define a couple of sentences on which we want to perform sentiment analysis.

For every sentence that we put in the list, we get back a prediction about sentiment and the model's certainty about it's predictions:

In [3]:
# English
english_sentences = ["I love this film!", "I hate fishing!"]
for english_sentence in english_sentences:
    english_results = distilled_student_sentiment_classifier(english_sentence)
    print(f"English Sentence: {english_sentence}")
    most_probable = sorted(english_results[0], key=lambda d: d['score'], reverse=True)[0]
    print(f"Sentiment: {most_probable['label']} ({int(most_probable['score']*100)}%)")

English Sentence: I love this film!
Sentiment: positive (99%)
English Sentence: I hate fishing!
Sentiment: negative (96%)


The same model can also be used to predict sentiment for Danish texts:

In [4]:
# Danish
danish_sentence = "Jeg elsker denne film!"
danish_results = distilled_student_sentiment_classifier(danish_sentence)
print(f"Danish sentence: {danish_sentence}")
most_probable = sorted(danish_results[0], key=lambda d: d['score'], reverse=True)[0]
print(f"Sentiment: {most_probable['label']} ({int(most_probable['score']*100)}%)")

Danish sentence: Jeg elsker denne film!
Sentiment: positive (95%)


## Emotion classification

Sometimes, having only the ability to predict between positive/negative is not particularly useful. Instead, we want to be able to categorise a wider range of emotions. 

In [5]:
from danlp.models import load_bert_emotion_model
classifier = load_bert_emotion_model()
clear_output(wait=True)

Downloading file /tmp/tmph5qk9j82


100% |########################################################################|
  0% |                                                                        |

Downloading file /tmp/tmprkqk0suw


100% |########################################################################|
You passed along `num_labels=8` with an incompatible id to label map: {'0': 'LABEL_0', '1': 'LABEL_1'}. The number of labels wil be overwritten to 2.
You passed along `num_labels=8` with an incompatible id to label map: {'0': 'LABEL_0', '1': 'LABEL_1'}. The number of labels wil be overwritten to 2.


We can see the range of emotions available for predictions in this model:

In [6]:
for class_label in classifier._classes()[0]:
    print(class_label)

Glæde/Sindsro
Tillid/Accept
Forventning/Interrese
Overasket/Målløs
Vrede/Irritation
Foragt/Modvilje
Sorg/trist
Frygt/Bekymret


We can then make predictions for individual sentences:

In [7]:
sentence1 = 'der er et træ i haven'
sentence1_result = classifier.predict(sentence1)
print(f"{sentence1}: {sentence1_result}")

der er et træ i haven: No emotion


In [8]:
sentence2 = 'jeg ejer en rød bil og det er en god bil'
sentence2_result = classifier.predict(sentence2)
print(f"{sentence2}: {sentence2_result}")

jeg ejer en rød bil og det er en god bil: Tillid/Accept


In [9]:
sentence3 = 'jeg ejer en rød bil men den er gået i stykker'
sentence3_result = classifier.predict(sentence3)
print(f"{sentence3}: {sentence3_result}")

jeg ejer en rød bil men den er gået i stykker: Sorg/trist
