<img src="https://github.com/Christina1281995/demo-repo/blob/main/header_absa1.png?raw=true" width="95%">

#### What could be improved in this sentiment analysis?

<img src="https://github.com/Christina1281995/demo-repo/blob/main/limitsofSA.png?raw=true" width="75%">

#### Document Level

What we have been doing so far is called "document-level" analysis.

<img src="https://github.com/Christina1281995/demo-repo/blob/main/document level.png?raw=true" width="75%">

Let's take a look at the sentiment result in this tweet:

In [2]:
from transformers import pipeline

classifier = pipeline("text-classification")
classifier("This disease #covid19 is REALLY starting to annoy me, but at least the lockdown lets me spend more time with the family which has been amazing! Work-from-home is tough though...")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'NEGATIVE', 'score': 0.7361530661582947}]

#### Sentence Level

<img src="https://github.com/Christina1281995/demo-repo/blob/main/sentence level.png?raw=true" width="75%">

In [3]:
# Sentence 1:
classifier("This disease #covid19 is REALLY starting to annoy me, but at least the lockdown lets me spend more time with the family which has been amazing!")

[{'label': 'POSITIVE', 'score': 0.9957256317138672}]

In [4]:
# Sentence 2:
classifier("Work-from-home is tough though...")

[{'label': 'NEGATIVE', 'score': 0.994464099407196}]

#### Aspect Level

In the <b>ABSA problem</b>, the concerned target on which the sentiment is expressed shifts from an entire document to an <b>entity or a certain aspect</b> of an entity.


<img src="https://github.com/Christina1281995/demo-repo/blob/main/ABSA.png?raw=true" width="95%">

#### The ABSA-problem and tasks

<img src="https://github.com/Christina1281995/demo-repo/blob/main/an%20absa%20system.png?raw=true" width="50%" align="right">

ABSA encompasses the identification of one or more of <b>four sentiment elements</b>. Depending on the goal researchers set, Zhang et al. (2022) divide their ABSA methods into either <b>Single ABSA</b> tasks (the more conventional method for ABSA, where a method is developed to tackle one single ABSA goal) or <b>Compound ABSA</b> tasks (more recent trends have moved towards developing methods that address two or more sentiment goals in a single method, thereby capturing the dependency between them).

- <b>aspect category</b> c defines a unique aspect of an entity and is supposed to fall into a category set C, predefined for each specific domain of interest. For example, ```food``` and ```service``` can be aspect categories for the restaurant domain.

- <b>aspect term</b> a is the opinion target which explicitly appears in the given text, e.g., ```“pizza”``` in the sentence “The pizza is delicious.” When the target is implicitly expressed (e.g., “It is overpriced!”), we denote the aspect term as a special one named “null”.

- <b>opinion term</b> o is the expression given by the opinion holder to express his/her sentiment towards the target. For instance, ```“delicious”``` is the opinion term in the running example “The pizza is delicious”.

- <b>sentiment polarity</b> p describes the orientation of the sentiment over an aspect category or an aspect term, which usually includes ```positive```, ```negative```, and ```neutral```.

<img src="https://github.com/Christina1281995/demo-repo/blob/main/e2e.png?raw=true" width="60%" align="center">

But how can we do that?

#### GRACE: Gradient Harmonized and Cascaded Labeling for Aspect-Based Sentiment Analysis

The method designed by Luo et al. (2020), described in the paper [GRACE: Gradient Harmonized and Cascaded Labeling for Aspect-based Sentiment Analysis. Huaishao Luo, Lei Ji, Tianrui Li, Nan Duan, Daxin Jiang. Findings of EMNLP, 2020.](https://arxiv.org/abs/2009.10557), implements a gradient harmonized and cascaded labeling model.

The method falls into the “End 2 End” category of aspect-based sentiment analysis tasks, meaning it solves two ABSA sub-tasks, ATE (asect term extraction) and ASC (aspect semtiment classification), in one model or methodology. Recent advances in the E2E methods leverage the interdependencies between aspect term detection and its sentiment classification to enhance model performances. This stands in contrast to pipeline approaches, which tackle one ABSA sub-task after the other in an isolated manner.

<img src="https://github.com/Christina1281995/demo-repo/blob/main/grace_structure.png?raw=true" width="90%">

<img src="https://github.com/Christina1281995/demo-repo/blob/main/joint.PNG?raw=true" align="right" width="35%">

<u>Key Characteristics:</u>

- Co-extraction of ATE and ASC
- 2 cascading modules
   - 12 stacked transformer encoder blocks for ATE
   - 3 shared transformer encoder blocks and 2 transformer decoder blocks for ASC
- Focus on interaction
- Joint approach
- Shared shallow layers (n=3)
   - higher layers in BERT are usually task-specific
   - it is assumed that can be useful to share the shallow layers
   - generates a shared “baseline understanding”

<img src="https://github.com/Christina1281995/demo-repo/blob/main/grad.PNG?raw=true" align="right" width="35%">

- <b>Virtual adversairal training</b>: the robustness of the model is improved bz preturbing the input data in small ways so that its difficult for the model to classify (to implement this, the direction and distance of the perturbations is calculated)

- <b>Gradient harmonized loss</b>: the model is trained with cross entropy loss, but to optimise the model to “focus” more on the “hard” labels, a gradient norm is calculated for each label (where “easy” labels have low gradients) and a weight for the loss calculation is assigned to each label based on the gradient density (histogram statistic). The idea is to decrease the weight of loss form labels with low gradient norms.

<br>

<u>Architecture:</u>

- <b>Activation</b> function: GeLU (Gaussian Error Linear Unit, non-linear function that maps negative input values to negative outputs and positive input values to positive outputs)
- Initial <b>tokenization and embeddings</b> (WordPiece, a subword tokenization method used for the original BERT model)
   - A nn.Embeddings layer combines word embeddings, positional embeddings and token type embeddings (n=2)
- n x the <b>encoder block</b> (12 in this configuration, same as original BERT model)
   - Multi-head Scaled-dot product attention with Softmax to generate context layer
   - ‘Intermediate’: linear layer and activation function
   - ‘Output’: liner layer, layer normalisation, dropout
- The <b>classification head</b> for ATE (nn.Linear, Softmax)
- n x the <b>decoder block</b> (2 in this configuration)
- The <b>classification head</b> for ASC (nn.Linear, Softmax)