<a href="https://github.com/kennethenevoldsen/asent"><img src="https://github.com/KennethEnevoldsen/asent/blob/main/docs/img/logo_black_font.png?raw=true" width="300" /></a>

## Installation
Before we start we should install asent this can be done simply by commenting the following lines out:

In [1]:
#!pip install asent
#!python -m spacy download en_core_web_lg

# Tutorial

> *Note*: This tutorial is English but the library also allows for multiple other languages to see all languages available check out the [Languages section](https://kennethenevoldsen.github.io/asent/languages/index.html) on the website.

Asent is a package for fast and transparent sentiment analysis. The package applied uses a dictionary of words rated as either positive or negative and a series of rules to determine whether a word, sentence or a document is positive or negative. The current rules account for negations (i.e. "not happy"), intensifiers ("very happy") and account for contrastive conjugations (i.e. "but") as well as other emphasis markers such as exclamation marks, casing and question marks. The following will take you through how the sentiment is calculated in a step by step fashion.

To start of with we will need a spaCy pipeline as well as we will need to add the asent pipeline `asent_en_v1` to it, where `en` indicate that it is the English pipeline and that `v1` indicate that it is version 1.


In [2]:
import asent
import spacy

# load spacy pipeline
nlp = spacy.load("en_core_web_lg")

# add the rule-based sentiment model
nlp.add_pipe("asent_en_v1")

<asent.component.Asent at 0x11282f9d0>

If you want to see all the available components you can simply run:

In [4]:
for c in asent.components.get_all():
    print(c)

asent_da_v1
asent_en_v1
asent_no_v1
asent_sv_v1


## Token valence and polarity
As seen in figure 1. token valence is simply the value gained from a lookup in a rated dictionary. For instance if the have the example sentence "I am not very happy" the word "happy" have a positive human rating of 2.7 which is not amplified by the word being in all-caps.


<h3 align="center">
<figure>
<img src="https://raw.githubusercontent.com/KennethEnevoldsen/asent/main/docs/img/token_polarity.png" width="700" />
</figure>
  <small>
  Figure 1: Calculation of token polarity and valence
  </small>
</h3>

We can extract valence quite easily using the `valence` extension:

In [4]:
doc = nlp("I am not very happy.")

for token in doc:
    print(token, "\t", token._.valence)

I 	 0.0
am 	 0.0
not 	 0.0
very 	 0.0
happy 	 2.7
. 	 0.0


Naturally, in this context happy should not be perceived positively as it is negated, thus we should look at token polarity. Token polarity examines if a word is negated and it so multiplies the values by a negative constant. This constant is emperically derived to be 0.74 [(Hutto and Gilbert, 2014)](https://ojs.aaai.org/index.php/ICWSM/article/view/14550). Similarly with the specific example we chose we can also see that "happy" is intensified by the word "very", while increases it polarity. The constant 0.293 is similarly, emperically derived by Hutto and Gilbert. We can similarly extract the polarity using the `polarity` extension:

In [5]:
for token in doc:
    print(token._.polarity)

polarity=0.0 token=I span=I
polarity=0.0 token=am span=am
polarity=0.0 token=not span=not
polarity=0.0 token=very span=very
polarity=-2.215 token=happy span=not very happy
polarity=0.0 token=. span=.


Notice that here we even get further information, that token "happy", has a polarity of -2.215 and that this includes the span (sequence of tokens) "not very happy".

## Visualizing polarity
Asent also include a series of methods to visualize the token polarity:


In [6]:
asent.visualize(doc)

## Document and Span Polarity

We want to do more than simply calculate the polarity of the token, we want to extract information about the entire sentence (span) and aggregate this across the entire document.

<h3 align="center">
<figure>
<img src="https://raw.githubusercontent.com/KennethEnevoldsen/asent/main/docs/img/doc_polarity.png" width="600" />
</figure>
  <small>
  Figure 2: Calculation of document polarity
  </small>
</h3>

The calculation of the sentence polarity includes a couple of steps. 
First, it checks if the sentence contains a contrastive conjugation (e.g. "but"), then overweight things after the but and underweight previous elements. This seems quite natural for example the sentence "The movie was great, but the acting was horrible", noticeably put more weight on the second statement. This has also been shown empirically by [(Hutto and Gilbert, 2014)](https://ojs.aaai.org/index.php/ICWSM/article/view/14550). Afterwards, the model takes into account question marks and exclamations marks, which both increases the polarity of the sentence with negative sentences becoming more negative and positive sentences becoming less negative. Lastly, the polarity is normalized between approximately -1 and 1.

You can easily extract the sentence polarity and the document polarity using: 

In [7]:
for sentence in doc.sents:
    print(sentence._.polarity)

neg=0.391 neu=0.609 pos=0.0 compound=-0.4964 span=I am not very happy.


In [8]:
# or for multiple sentences:
print(doc._.polarity)

neg=0.391 neu=0.609 pos=0.0 compound=-0.4964


Here we see the normalized score for both the `compound`, or aggregated, polarity as well the the neutral `neu`, negative `neg`, and positive `pos`.