# Aspect term extraction and sentiment classification

This Notebook is created and Run in colab by Devansh Mistry. Deployed on Gradio.

This Notebook is using [PyABSA](https://https://github.com/yangheng95/PyABSA) library, Citation

In [None]:
@article{YangL22,
  author    = {Heng Yang and
               Ke Li},
  title     = {PyABSA: Open Framework for Aspect-based Sentiment Analysis},
  journal   = {CoRR},
  volume    = {abs/2208.01368},
  year      = {2022},
  url       = {https://doi.org/10.48550/arXiv.2208.01368},
  doi       = {10.48550/arXiv.2208.01368},
  eprinttype = {arXiv},
  eprint    = {2208.01368},
  timestamp = {Tue, 08 Nov 2022 21:46:32 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2208-01368.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

In [None]:
! pip install pyabsa==1.16.27 

In [3]:
from pyabsa import available_checkpoints
checkpoint_map = available_checkpoints('atepc', show_ckpts=True)

********** Available atepc model checkpoints for Version:1.16.27 (this version) **********
----------------------------------------------------------------------------------------------------
Checkpoint Name: english
id: 
Training Model: FAST-LCFS-ATEPC
Training Dataset: English
Language: English
Description: Trained on RTX3090, this checkpoint use bert-spc in ATEPC training
Available Version: 1.16.0+
Checkpoint File: fast_lcf_atepc_English_cdw_apcacc_85.4_apcf1_82.53_atef1_80.19.zip
Author: H, Yang (yangheng@m.scnu.edu.cn)
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
Checkpoint Name: chinese
id: 
Training Model: FAST-LCF-ATEPC
Training Dataset: Chinese
Language: Chinese
Description: Trained on RTX3090 BERT-BASE-CHINESE
Available Version: 1.16.0+
Checkpoint File: fast_lcf_atepc_Chinese_cdw_apcacc_96.09_apcf1_95.14_atef1_83.69.zip
A

In [4]:
from pyabsa import ABSADatasetList, available_checkpoints
from pyabsa import ATEPCCheckpointManager

aspect_extractor = ATEPCCheckpointManager.get_aspect_extractor(checkpoint='english',
                                   auto_device=False  # False means load model on CPU
                                   )

There may be some checkpoints available for early versions of PyABSA, see ATEPC
Downloading checkpoint:english ...
Notice: The pretrained model are used for testing, it is recommended to train the model on your own custom datasets


577MB [00:07, 80.91MB/s, Downloading checkpoint...]                         

Find zipped checkpoint: ./checkpoints/ATEPC_ENGLISH_CHECKPOINT/fast_lcf_atepc_English_cdw_apcacc_85.4_apcf1_82.53_atef1_80.19.zip, unzipping...





Done.
If the auto-downloading failed, please download it via browser: https://huggingface.co/spaces/yangheng/PyABSA-ATEPC/resolve/main/checkpoint/English/ATEPC/fast_lcf_atepc_English_cdw_apcacc_85.4_apcf1_82.53_atef1_80.19.zip 
Load aspect extractor from ./checkpoints/ATEPC_ENGLISH_CHECKPOINT
config: ./checkpoints/ATEPC_ENGLISH_CHECKPOINT/fast_lcf_atepc.config
state_dict: ./checkpoints/ATEPC_ENGLISH_CHECKPOINT/fast_lcf_atepc.state_dict
model: None
tokenizer: ./checkpoints/ATEPC_ENGLISH_CHECKPOINT/fast_lcf_atepc.tokenizer


Downloading (…)lve/main/config.json:   0%|          | 0.00/579 [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/371M [00:00<?, ?B/s]

Some weights of the model checkpoint at microsoft/deberta-v3-base were not used when initializing DebertaV2Model: ['lm_predictions.lm_head.LayerNorm.weight', 'mask_predictions.classifier.weight', 'mask_predictions.LayerNorm.bias', 'lm_predictions.lm_head.dense.weight', 'mask_predictions.dense.bias', 'mask_predictions.dense.weight', 'mask_predictions.classifier.bias', 'lm_predictions.lm_head.bias', 'mask_predictions.LayerNorm.weight', 'lm_predictions.lm_head.dense.bias', 'lm_predictions.lm_head.LayerNorm.bias']
- This IS expected if you are initializing DebertaV2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaV2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading (…)okenizer_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

Downloading (…)"spm.model";:   0%|          | 0.00/2.46M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [5]:
#indvidual
examples = ['But the staff was so perfect to us, but the service was bad .',
            ]
inference_source = examples
atepc_result = aspect_extractor.extract_aspect(inference_source=inference_source,  #
                          save_result=True,
                          print_result=True,  # print the result
                          pred_sentiment=True,  # Predict the sentiment of extracted aspect terms
                          )

  lcf_cdm_vec = torch.tensor([f.lcf_cdm_vec for f in infer_features], dtype=torch.float32)


The results of aspect term extraction have been saved in /content/atepc_inference.result.json
Example 0: But the <staff:Positive Confidence:0.9991143345832825> was so perfect to us , but the <service:Negative Confidence:0.9997571110725403> was bad .


  probs = [float(x) for x in F.softmax(i_apc_logits).cpu().numpy().tolist()]


In [None]:
inference_source = ABSADatasetList.SemEval
atepc_result = aspect_extractor.extract_aspect(inference_source=inference_source,  #
                          save_result=True,
                          print_result=True,  # print the result
                          pred_sentiment=True,  # Predict the sentiment of extracted aspect terms
                          )



---

Deployment

In [None]:
! pip install gradio

In [7]:
import gradio as gr
import pandas as pd

In [8]:
def inference(text):
    result = aspect_extractor.extract_aspect(inference_source=[text],
                                             pred_sentiment=True)

    result = pd.DataFrame({
        'aspect': result[0]['aspect'],
        'sentiment': result[0]['sentiment']
    })

    return result

In [9]:
if __name__ == '__main__':
    iface = gr.Interface(
        fn=inference,
        inputs=["text"],
        examples=[['The wine list is incredible and extensive and diverse , the food is all incredible and the staff was all very nice ,'
                   ' good at their jobs and cultured .'],
                  ['Though the menu includes some unorthodox offerings (a peanut butter roll, for instance), the classics are pure and '
                   'great--we have never had better sushi anywhere, including Japan.'],
                  ['Everything, from the soft bread, soggy salad, and 50 minute wait time, with an incredibly rude service to deliver'
                   ' below average food .'],
                  ['Even though it is running Snow Leopard, 2.4 GHz C2D is a bit of an antiquated CPU and thus the occasional spinning '
                   'wheel would appear when running Office Mac applications such as Word or Excel .'],
                  ['This demo is trained on the laptop and restaurant and other review datasets from ABSADatasets (https://github.com/yangheng95/ABSADatasets)'],
                  ['To fit on your data, please train the model on your own data, see the PyABSA (https://github.com/yangheng95/PyABSA)'],
                  ],
        outputs="dataframe",
        title='Aspect Term Extraction for Short Texts (powered by PyABSA)'
    )

    iface.launch()

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Note: opening Chrome Inspector may crash demo inside Colab notebooks.

To create a public link, set `share=True` in `launch()`.


<IPython.core.display.Javascript object>