# DaCy and Sentiment
DaCy currently does not include its own tools for sentiment extraction, but a couple of good tools already exists. DaCy providers wrappers for these to use them in the SpaCy/DaCy framework.

In [1]:
#!pip install dacy[all]
#!python -m spacy download da_core_news_sm

You should consider upgrading via the '/Users/au561649/.virtualenvs/dacy_tutorials/bin/python -m pip install --upgrade pip' command.[0m


In [2]:
import dacy
import spacy

# Overiew of Models
--- 

| Name  | Creator   | Domain   | Output Type  | Model Type | 
|:---|:-------------|:------|:------|:------|
| Senda | Ekstra Bladet  | Twitter  | `['postive', 'neutral', 'negative']` | Danish Transformer v2 by BotXO | 
| BertTone | DaNLP  | Europarl and Twitter  | `['postive', 'neutral', 'negative'] and ['subjective', 'objective']` | Danish Transformer v2 by BotXO | 
| BertEmotion | DaNLP | Social Media   | `["Emotional", "No emotion"] and  ["Glæde/Sindsro", "Tillid/Accept", ... ]` | Danish Transformer v2 by BotXO | 
| DaVader | Sentida    | Microblogs and Social media  | `Polarity score (continuous)`     | Rule-based | 

*Note* that DaVader is a variation of Sentida and not the original implementation.


## Senda

Senda is a model trained by Ekstra Bladet on a [danish Twitter corpus](https://github.com/alexandrainst/danlp/blob/master/docs/docs/tasks/sentiment_analysis.md) tagged for polarity. Compared to the BertTone model senda should have a higher performance on Twitter data. Read more about `senda` on its associated [github](https://github.com/ebanalyse/senda).

Here I will show a simple use case of the model and how to add it to your pipeline:

In [4]:
from dacy.sentiment import add_senda

# an empty pipeline - replace it with your pipeline of choice
nlp = spacy.blank("da")

nlp = add_senda(nlp)

In [5]:
texts = ["Sikke en dejlig dag det er i dag", "Sikke noget forfærdeligt møgvejr det er i dag", "FC København og Brøndby IF i duel om mesterskabet"]

docs = nlp.pipe(texts)

for doc in docs:
    print(doc._.polarity)
    print(doc._.polarity_prop)

positive
{'prop': array([0.063, 0.169, 0.768], dtype=float32), 'labels': ['negative', 'neutral', 'positive']}
negative
{'prop': array([0.718, 0.194, 0.088], dtype=float32), 'labels': ['negative', 'neutral', 'positive']}
neutral
{'prop': array([0.041, 0.869, 0.09 ], dtype=float32), 'labels': ['negative', 'neutral', 'positive']}


## BertTone
---

BertTone is a model trained by DaNLP, (well, two to be exact). One for classification of polarity (whether a sentence is positive, negative or neutral) and one for subjectivity (whether a text is subjective or not).

To read more about BertTone as well as its performance matched against other models see DaNLP's [GitHub](https://github.com/alexandrainst/danlp/blob/master/docs/docs/tasks/sentiment_analysis.md).

Here I will show a simple use case of both models. If you wish to inspect the TransformerData to see e.g. the used wordpieces you can check out the `doc._.berttone_subj_trf_data` or `doc._.berttone_pol_trf_data`

In [6]:
from dacy.sentiment import add_berttone_subjectivity

nlp = spacy.blank("da")
nlp = add_berttone_subjectivity(nlp)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Downloading file /var/folders/7m/95dm37bj475fgzncclln12fcyj4nph/T/tmpv2l07ewu: : 411MB [01:37, 4.22MB/s]                         
Unzipping bert.subjective 


In [7]:
texts = ["Analysen viser, at økonomien bliver forfærdelig dårlig", 
         "Jeg tror alligevel, det bliver godt"]

docs = nlp.pipe(texts)

for doc in docs:
    print(doc._.subjectivity)
    print(doc._.subjectivity_prop)

objective
{'prop': array([1., 0.], dtype=float32), 'labels': ['objective', 'subjective']}
subjective
{'prop': array([0., 1.], dtype=float32), 'labels': ['objective', 'subjective']}


In [9]:
from dacy.sentiment import add_berttone_polarity
nlp = add_berttone_polarity(nlp, force_extension=True) # force_extension let us overwrite the polarity from using senda

docs = nlp.pipe(texts)

for doc in docs:
    print(doc._.polarity)
    print(doc._.polarity_prop)

Model bert.polarity exists in /Users/au561649/.danlp/bert.polarity
negative
{'prop': array([0.002, 0.008, 0.99 ], dtype=float32), 'labels': ['positive', 'neutral', 'negative']}
positive
{'prop': array([0.981, 0.019, 0.   ], dtype=float32), 'labels': ['positive', 'neutral', 'negative']}


BertEmotion
---

Siliar to BertTone, BertEmotion is a model trained by DaNLP (again, two to be exact). One for classifying whether a text is emotionally laden or not, and one for specific emotion classification from the following list:

- "Glæde/Sindsro"
- "Tillid/Accept"
- "Forventning/Interrese"
- "Overasket/Målløs"
- "Vrede/Irritation"
- "Foragt/Modvilje"
- "Sorg/trist"
- "Frygt/Bekymret"

Their transformerData can be accessed using `bertemotion_laden_trf_data` for the model whether a text is emotionally laden and `bertemotion_emo_trf_data` for the model predicting emotion. Similarly to above, you can always use the `*_prop` prefix to extract the probabilities of each label.

In [10]:
from dacy.sentiment import add_bertemotion_emo, add_bertemotion_laden
nlp = add_bertemotion_laden(nlp)  # whether a text is emotionally laden
nlp = add_bertemotion_emo(nlp)    # what emotion is portrayed

Downloading file /var/folders/7m/95dm37bj475fgzncclln12fcyj4nph/T/tmpky9rq3r3: : 411MB [01:28, 4.65MB/s]                         
Unzipping bert.noemotion 
Downloading file /var/folders/7m/95dm37bj475fgzncclln12fcyj4nph/T/tmpdpfuybr4: : 411MB [01:36, 4.25MB/s]                         
Unzipping bert.emotion 


In [6]:
texts = ['bilen er flot', 
         'jeg ejer en rød bil og det er en god bil', 
         'jeg ejer en rød bil men den er gået i stykker', 
         "Ifølge TV udsendelsen så bliver vejret skidt imorgen",  
         "Fuck jeg hader bare Hitler. Han er bare så FUCKING træls!",
         "Har i set at Tesla har landet en raket på månen? Det er vildt!!",
         "Nu må vi altså få ændret noget",
         "En sten kan ikke flyve. Morlille kan ikke flyve. Ergo er morlille en sten!"]

docs = nlp.pipe(texts)

for doc in docs:
    print(doc._.laden)
    print("\t", doc._.emotion)

Emotional
	 Tillid/Accept
Emotional
	 Tillid/Accept
Emotional
	 Sorg/trist
Emotional
	 Frygt/Bekymret
Emotional
	 Sorg/trist
Emotional
	 Overasket/Målløs
Emotional
	 Forventning/Interrese
Emotional
	 Foragt/Modvilje


Unfortunately, it seems to be difficult to construct a sentence that is predicted as neutral (edit: I actually made an [issue](https://github.com/alexandrainst/danlp/issues/122) on this on DaNLP git, feel free to check it out for more information). As with any ML model, use with care and evaluate thoroughly.

## DaVader

---

DaVader is a Danish Sentiment model developed using [Vader](https://github.com/fnielsen/afinn) and the dictionary lists from [SentiDa](https://github.com/guscode/sentida) and [AFINN](https://github.com/fnielsen/afinn). This adaption is developed by Center for Humanities Computing Aarhus and Kenneth Enevoldsen. It is a lexicon and rule-based sentiment analysis tool which predicts sentiment valence: the degree to which a text is positive or negative - as opposed to BertTone which simply predicts whether or not it is.

An additional advantage of it being rule-based is that it is transparent (the entire lexion can be found in the sentiment folder) and very fast compared to transformer-based approaches.

In [11]:
from spacy.tokens import Doc
from dacy.sentiment import da_vader_getter

Doc.set_extension("vader_da", getter=da_vader_getter)

In [12]:
nlp = spacy.load("da_core_news_sm")
texts = ['Jeg er så glad', 'jeg ejer en rød bil og det er en god bil', 'jeg ejer en rød bil men den er gået i stykker']

docs = nlp.pipe(texts)

for doc in docs:
    print(doc._.vader_da)

{'neg': 0.0, 'neu': 0.36, 'pos': 0.64, 'compound': 0.7456}
{'neg': 0.088, 'neu': 0.395, 'pos': 0.518, 'compound': 0.674}
{'neg': 0.1, 'neu': 0.688, 'pos': 0.212, 'compound': 0.0772}


If you are have never used a VADER model before we suggest you read the ["about the scoring"](https://github.com/cjhutto/vaderSentiment#about-the-scoring) on the website for the original (English) VADER implementation.