<a href="https://github.com/kennethenevoldsen/asent"><img src="https://github.com/KennethEnevoldsen/asent/blob/main/docs/img/logo_black_font.png?raw=true" width="300" /></a>

## Installation
Before we start we should install asent this can be done simply by commenting the following lines out:

In [2]:
# !pip install asent

# Tutorial

> *Note*: This tutorial is English but the library also allows for multiple other languages to see all languages available check out the [Languages section](https://kennethenevoldsen.github.io/asent/languages/index.html) on the website.

Asent is a package for fast and transparent sentiment analysis. The package applied uses a dictionary of words rated as either positive or negative and a series of rules to determine whether a word, sentence or a document is positive or negative. The current rules account for negations (i.e. "not happy"), intensifiers ("very happy") and account for contrastive conjugations (i.e. "but") as well as other emphasis markers such as exclamation marks, casing and question marks. The following will take you through how the sentiment is calculated in a step by step fashion.

To start of with we will need a spaCy pipeline as well as we will need to add the asent pipeline `asent_en_v1` to it, where `en` indicate that it is the English pipeline and that `v1` indicate that it is version 1.


In [3]:
import asent
import spacy

# create (or load) spacy pipeline
nlp = spacy.blank("en")
nlp.add_pipe("sentencizer")

# add the rule-based sentiment model
nlp.add_pipe("asent_en_v1")



<asent.component.Asent at 0x17ac0a310>

If you want to see all the available components you can simply run:

In [4]:
for c in asent.components.get_all():
    print(c)

asent_yi_v1
asent_io_v1
asent_ku_v1
asent_kn_v1
asent_it_v1
asent_uz_v1
asent_de_v1
asent_mt_v1
asent_fa_v1
asent_nl_v1
asent_lv_v1
asent_gu_v1
asent_br_v1
asent_eo_v1
asent_he_v1
asent_tk_v1
asent_et_v1
asent_ro_v1
asent_ur_v1
asent_ja_v1
asent_zhw_v1
asent_vo_v1
asent_fi_v1
asent_fr_v1
asent_ar_v1
asent_fo_v1
asent_vi_v1
asent_id_v1
asent_el_v1
asent_cs_v1
asent_th_v1
asent_ia_v1
asent_sk_v1
asent_hu_v1
asent_az_v1
asent_mr_v1
asent_km_v1
asent_te_v1
asent_bg_v1
asent_fy_v1
asent_bn_v1
asent_es_v1
asent_tl_v1
asent_sq_v1
asent_ka_v1
asent_hy_v1
asent_gl_v1
asent_rm_v1
asent_tr_v1
asent_uk_v1
asent_lb_v1
asent_pl_v1
asent_ga_v1
asent_lt_v1
asent_nn_v1
asent_is_v1
asent_ta_v1
asent_gd_v1
asent_ms_v1
asent_zh_v1
asent_ca_v1
asent_ht_v1
asent_af_v1
asent_be_v1
asent_hi_v1
asent_hr_v1
asent_cy_v1
asent_mk_v1
asent_ko_v1
asent_wa_v1
asent_sr_v1
asent_an_v1
asent_ky_v1
asent_pt_v1
asent_eu_v1
asent_sw_v1
asent_ru_v1
asent_bs_v1
asent_sl_v1
asent_la_v1
asent_da_v1
asent_en_v1
asent_no_v1
ase

## Token valence and polarity
As seen in figure 1. token valence is simply the value gained from a lookup in a rated dictionary. For instance if the have the example sentence "I am not very happy" the word "happy" has a positive human rating of 2.7 which is not amplified by the word being in all-caps.


<h3 align="center">
<figure>
<img src="https://raw.githubusercontent.com/KennethEnevoldsen/asent/main/docs/img/token_polarity.png" width="700" />
</figure>
  <small>
  Figure 1: Calculation of token polarity and valence
  </small>
</h3>

We can extract valence quite easily using the `valence` extension:

In [5]:
doc = nlp("I am not very happy.")

for token in doc:
    print(token, "\t", token._.valence)

I 	 0.0
am 	 0.0
not 	 0.0
very 	 0.0
happy 	 2.7
. 	 0.0


Naturally, in this context happy should not be perceived positively as it is negated, thus we should look at token polarity. Token polarity examines if a word is negated and, if so, multiplies the values by a negative constant. This constant is emperically derived to be 0.74 [(Hutto and Gilbert, 2014)](https://ojs.aaai.org/index.php/ICWSM/article/view/14550). Similarly with the specific example we chose we can also see that "happy" is intensified by the word "very", while increases it polarity. The constant 0.293 is also emperically derived by Hutto and Gilbert. We can similarly extract the polarity using the `polarity` extension:

In [6]:
for token in doc:
    print(token._.polarity)

polarity=0.0 token=I span=I
polarity=0.0 token=am span=am
polarity=0.0 token=not span=not
polarity=0.0 token=very span=very
polarity=-2.215 token=happy span=not very happy
polarity=0.0 token=. span=.


Notice that here we even get further information, that token "happy", has a polarity of -2.215 and that this includes the span (sequence of tokens) "not very happy".

## Visualizing polarity
Asent also include a series of methods to visualize the token polarity:


In [7]:
doc = nlp("I am not very happy, but aslo not very especially sad")
asent.visualize(doc, style="prediction")


And if you want more information as to why it obtains the score it does:

In [8]:
asent.visualize(doc[:5], style="analysis")

## Document and Span Polarity

We want to do more than simply calculate the polarity of the token, we want to extract information about the entire sentence (span) and aggregate this across the entire document.

<h3 align="center">
<figure>
<img src="https://raw.githubusercontent.com/KennethEnevoldsen/asent/main/docs/img/doc_polarity.png" width="600" />
</figure>
  <small>
  Figure 2: Calculation of document polarity
  </small>
</h3>

The calculation of the sentence polarity includes a couple of steps. 

First, it checks if the sentence contains a contrastive conjugation (e.g. "but"), then overweighs things after the but and underweighs previous elements. This seems quite natural for e.g. the sentence "The movie was great, but the acting was horrible", where the second statement is noticeably more important. This has also been shown empirically by [(Hutto and Gilbert, 2014)](https://ojs.aaai.org/index.php/ICWSM/article/view/14550). 

Afterwards, the model takes into account question marks and exclamations marks, which both increase the polarity of the sentence – negative sentences become more negative and positive sentences become less negative. Lastly, the polarity is normalized between approximately -1 and 1.

You can easily extract the sentence polarity and the document polarity using: 

In [9]:
doc = nlp("I am not very happy.")
for sentence in doc.sents:
    print(sentence._.polarity)

neg=0.391 neu=0.609 pos=0.0 compound=-0.4964 span=I am not very happy.


In [10]:
# or for multiple sentences:
print(doc._.polarity)

neg=0.391 neu=0.609 pos=0.0 compound=-0.4964 n_sentences=1


Here we see the normalized score for both the `compound`, or aggregated, polarity as well the the neutral `neu`, negative `neg`, and positive `pos`.

## Processing mulitple Sentences
So far we have only looked at a singular sentence. However most documents contain multiple sentences. Here asent treats each sentence as a separate and then aggregates the polarity across all sentences. This also means that checks for contrastive conjugations and negations are only done within the sentence. This is illustrated in figure 3.

<h3 align="center">
<figure>
<img src="https://raw.githubusercontent.com/KennethEnevoldsen/asent/main/docs/img/multi_sentence.png" width="600" />
</figure>
  <small>
  Figure 3: Calculation of document polarity for mulitple sentences
  </small>
</h3>

We can also examine this sentence in practice:

In [11]:
sentence = "Product looks nice. However some apps crash from time to time"
doc = nlp(sentence)
asent.visualize(doc, style="sentence-prediction")

In [12]:
for sentence in doc.sents:
    print(sentence._.polarity)

neg=0.0 neu=0.517 pos=0.483 compound=0.4215 span=Product looks nice.
neg=0.278 neu=0.722 pos=0.0 compound=-0.4019 span=However some apps crash from time to time


In [13]:
print(doc._.polarity)

neg=0.139 neu=0.619 pos=0.241 compound=0.0098 n_sentences=2
