# Aspect-based Sentiment Classification
This is the tutorials of using PyABSA for aspect sentiment classification
Drafted for v2.0 and higher versions. Note there are many breaking changes in v2.0,
so you do not need to upgrade to v2.0 and higher versions if you are using code, API, checkpoints,
datasets or anything from v1.0. Let's begin the introduction.

In [3]:
!pip install pyabsa >= 2.0.0
from pyabsa import AspectPolarityClassification as APC


[notice] A new release of pip available: 22.3 -> 22.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip
  return process_handler(cmd, _system_body)
  return process_handler(cmd, _system_body)
  return process_handler(cmd, _system_body)


### APCModelList
There are three types of APC models for aspect sentiment classification:
1. LCF-based APC models, there models are available at APCModelList. These models are proposed by the authors.
2. GloVe or Word2Vec based embedding-based model, they are called GloVeAPCModelList
3. BERT-adapted models, which replace GloVe or Word2Vec embedding with Pretrained models, a.k.a, BERTAPCModelList
Notice: when you select to use a model, please make sure to carefully manage the configurations, e.g., for glove-based models, you need to set hidden dim and embed_dim manually.
We already provide some pre-defined configurations. Refer to the source code if you have any question
e.g.,

In [4]:
# config = APC.APCConfigManager.get_apc_config_glove()  # get pre-defined configuration for GloVe model, the default embed_dim=300
# config = APC.APCConfigManager.get_apc_config_multilingual()  # this config contains 'pretrained_bert', it is based on pretrained models
config = APC.APCConfigManager.get_apc_config_english()


### APCDatasetList
There are the [datasets](https://github.com/yangheng95/ABSADatasets) from publication or third-party contribution. There dataset can be downloaded and processed automatically.
In pyabsa, you can pass a set of datasets to train a model.
e.g., for using integrated datasets:


In [5]:
from pyabsa import DatasetItem

dataset = APC.APCDatasetList.Laptop14
# now the dataset is a DatasetItem object, which has a name and a list of subdatasets
# e.g., SemEval dataset contains Laptop14, Restaurant14, Restaurant16 datasets

You can use your own dataset provided that it is formatted according to [ABSADatasets](https://github.com/yangheng95/ABSADatasets#important-rename-your-dataset-filename-before-use-it-in-pyabsa)

In [None]:
# Put your dataset into integrated_datasets folder, it this folder does not exist, you need to call:
from pyabsa import download_all_available_datasets
download_all_available_datasets()

to pass datasets to PyABSA trainers, you can

In [None]:
my_dataset = DatasetItem('my_dataset', ['my_dataset1', 'my_dataset2'])
# my_dataset1 and my_dataset2 are the dataset folders. In there folders, the train dataset is necessary


### Training
Let's prepare to train

In [7]:
from pyabsa import ModelSaveOption, DeviceTypeOption

config.num_epoch = 1

trainer = APC.APCTrainer(
    config=config,
    dataset=dataset,
    from_checkpoint='english',
    # if you want to resume training from our pretrained checkpoints, you can pass the checkpoint name here
    auto_device=DeviceTypeOption.AUTO,
    path_to_save=None,  # set a path to save checkpoints, if it is None, save checkpoints at 'checkpoints' folder
    checkpoint_save_mode=ModelSaveOption.SAVE_MODEL_STATE_DICT,
    load_aug=False,
    # there are some augmentation dataset for integrated datasets, you use them by setting load_aug=True to improve performance
)


  _warn("subprocess %s is still running" % self.pid,
  results = os.popen(cmd).readlines()
  _warn("subprocess %s is still running" % self.pid,
  _warn("subprocess %s is still running" % self.pid,
  _warn("subprocess %s is still running" % self.pid,


2022-11-17 15:16:42,069 INFO: PyABSA version: 2.0.4
2022-11-17 15:16:42,072 INFO: Transformers version: 4.21.1
2022-11-17 15:16:42,073 INFO: Torch version: 1.12.1+cuda11.6
2022-11-17 15:16:42,073 INFO: Device: NVIDIA GeForce RTX 3070
2022-11-17 15:16:42,162 INFO: Local dataset version: 2022.11.07
2022-11-17 15:16:42,164 INFO: Remote dataset version: 2022.10.25
2022-11-17 15:16:42,165 INFO: Searching dataset 113.Laptop14 in local disk...
2022-11-17 15:16:42,208 INFO: You can set load_aug=True in a trainer to augment your dataset (English only yet) and improve performance.
2022-11-17 15:16:42,209 INFO: Please use a new folder to perform new text augment if the former augment exited unexpectedly


  _warn(f"unclosed running multiprocessing pool {self!r}",
Some weights of the model checkpoint at yangheng/deberta-v3-base-absa-v1.1 were not used when initializing DebertaV2Model: ['pooler.dense.weight', 'classifier.weight', 'classifier.bias', 'pooler.dense.bias']
- This IS expected if you are initializing DebertaV2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaV2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Load dataset from integrated_datasets\apc_datasets\110.SemEval\113.laptop14\Laptops_Train.xml.seg


100%|██████████| 2328/2328 [00:01<00:00, 1669.08it/s, preparing dataloader...]

2022-11-17 15:16:49,240 INFO: Dataset Label Details: {'Neutral': 464, 'Positive': 994, 'Negative': 870, 'Sum': 2328}



  d[key] = torch.tensor(value)


Load dataset from integrated_datasets\apc_datasets\110.SemEval\113.laptop14\Laptops_Test_Gold.xml.seg


100%|██████████| 638/638 [00:00<00:00, 2580.90it/s, preparing dataloader...]

2022-11-17 15:16:50,949 INFO: Dataset Label Details: {'Neutral': 169, 'Positive': 341, 'Negative': 128, 'Sum': 638}





[31mCaching dataset... please remove cached dataset if any problem happens.[0m
2022-11-17 15:16:54,282 INFO: cuda memory allocated:753143296
2022-11-17 15:16:54,283 INFO: ABSADatasetsVersion:2022.11.07	-->	Calling Count:0
2022-11-17 15:16:54,284 INFO: MV:<metric_visualizer.metric_visualizer.MetricVisualizer object at 0x0000020A0C2B5EE0>	-->	Calling Count:0
2022-11-17 15:16:54,284 INFO: PyABSAVersion:2.0.4	-->	Calling Count:2
2022-11-17 15:16:54,284 INFO: SRD:3	-->	Calling Count:0
2022-11-17 15:16:54,285 INFO: TorchVersion:1.12.1+cuda11.6	-->	Calling Count:2
2022-11-17 15:16:54,285 INFO: TransformersVersion:4.21.1	-->	Calling Count:2
2022-11-17 15:16:54,286 INFO: auto_device:True	-->	Calling Count:5
2022-11-17 15:16:54,286 INFO: batch_size:16	-->	Calling Count:2
2022-11-17 15:16:54,287 INFO: cache_dataset:True	-->	Calling Count:1
2022-11-17 15:16:54,287 INFO: checkpoint_save_mode:1	-->	Calling Count:8
2022-11-17 15:16:54,288 INFO: cross_validate_fold:-1	-->	Calling Count:1
2022-11-17 

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  bucket_pos = np.where(abs_pos <= mid, relative_pos, log_pos * sign).astype(np.int)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  bucket_pos = np.where(abs_pos <= mid, relative_pos, log_pos * sign).astype(np.int)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  bucket_pos = np.where(abs_pos <= mid, relative_pos, log_pos * sign).astype(np.int)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  bucket_pos = np.where(abs_pos <= mid, relative_pos, log_pos * sign).astype(np.int)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  bucket_pos = np.where(abs_pos <= mid, relative_p

2022-11-17 15:19:07,969 INFO:  ------------------------------------- Metric Visualizer ------------------------------------- 
╒════════════════════════════╤═══════════════════╤══════════╤═════════════════════════════════════════════════════════════════════════╕
│ Metric                     │ Model & Dataset   │ Values   │ Summary                                                                 │
╞════════════════════════════╪═══════════════════╪══════════╪═════════════════════════════════════════════════════════════════════════╡
│ Max-Test-Acc w/o Valid Set │ BERT_SPC-Laptop14 │ [84.17]  │ ['Avg:84.17, Median: 84.17, IQR: 0.0, STD:0.0, Max: 84.17, Min: 84.17'] │
├────────────────────────────┼───────────────────┼──────────┼─────────────────────────────────────────────────────────────────────────┤
│ Max-Test-F1 w/o Valid Set  │ BERT_SPC-Laptop14 │ [81.33]  │ ['Avg:81.33, Median: 81.33, IQR: 0.0, STD:0.0, Max: 81.33, Min: 81.33'] │
╘════════════════════════════╧═══════════════════╧════════


  pickle.dump(self, open(self.dump_pointer, mode='wb'))


Training finished, we hope you can share your checkpoint with community, please see: https://github.com/yangheng95/PyABSA/blob/release/demos/documents/share-checkpoint.md
ABSADatasetsVersion:2022.11.07	-->	Calling Count:0
MV:<metric_visualizer.metric_visualizer.MetricVisualizer object at 0x0000020A0C2B5EE0>	-->	Calling Count:3
PyABSAVersion:2.0.4	-->	Calling Count:2
SRD:3	-->	Calling Count:0
TorchVersion:1.12.1+cuda11.6	-->	Calling Count:2
TransformersVersion:4.21.1	-->	Calling Count:2
auto_device:True	-->	Calling Count:151
batch_size:16	-->	Calling Count:5
cache_dataset:True	-->	Calling Count:1
checkpoint_save_mode:1	-->	Calling Count:8
cross_validate_fold:-1	-->	Calling Count:2
dataset_file:{'train': ['integrated_datasets\\apc_datasets\\110.SemEval\\113.laptop14\\Laptops_Train.xml.seg'], 'test': ['integrated_datasets\\apc_datasets\\110.SemEval\\113.laptop14\\Laptops_Test_Gold.xml.seg'], 'valid': []}	-->	Calling Count:17
dataset_name:Laptop14	-->	Calling Count:11
dca_layer:3	-->	Calli

  self.config.logger.removeHandler(self.config.logger.handlers[0])
  _warn("subprocess %s is still running" % self.pid,
  results = os.popen(cmd).readlines()
  _warn("subprocess %s is still running" % self.pid,


Load sentiment classifier from checkpoints/bert_spc_Laptop14_acc_84.17_f1_81.33
config: checkpoints/bert_spc_Laptop14_acc_84.17_f1_81.33\bert_spc.config
state_dict: checkpoints/bert_spc_Laptop14_acc_84.17_f1_81.33\bert_spc.state_dict
model: None
tokenizer: checkpoints/bert_spc_Laptop14_acc_84.17_f1_81.33\bert_spc.tokenizer


  _warn("subprocess %s is still running" % self.pid,
  _warn("subprocess %s is still running" % self.pid,
Some weights of the model checkpoint at yangheng/deberta-v3-base-absa-v1.1 were not used when initializing DebertaV2Model: ['pooler.dense.weight', 'classifier.weight', 'classifier.bias', 'pooler.dense.bias']
- This IS expected if you are initializing DebertaV2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaV2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[31mCaching dataset... please remove cached dataset if any problem happens.[0m


to load trained model for inference:

In [8]:
from pyabsa.tasks.AspectPolarityClassification import SentimentClassifier

sentiment_classifier = trainer.load_trained_model()
assert isinstance(sentiment_classifier, SentimentClassifier)

### Inference

Use our checkpoints to initialize a SentimentClassifier

In [9]:
from pyabsa import available_checkpoints
ckpts = available_checkpoints()
# find a suitable checkpoint and use the name:
sentiment_classifier = APC.SentimentClassifier(checkpoint='english')  # here I use the english checkpoint which is trained on all English datasets in PyABSA

Load sentiment classifier from checkpoints\APC_ENGLISH_CHECKPOINT\fast_lsa_t_v2_English_acc_82.21_f1_81.81
config: checkpoints\APC_ENGLISH_CHECKPOINT\fast_lsa_t_v2_English_acc_82.21_f1_81.81\fast_lsa_t_v2.config
state_dict: checkpoints\APC_ENGLISH_CHECKPOINT\fast_lsa_t_v2_English_acc_82.21_f1_81.81\fast_lsa_t_v2.state_dict
model: None
tokenizer: checkpoints\APC_ENGLISH_CHECKPOINT\fast_lsa_t_v2_English_acc_82.21_f1_81.81\fast_lsa_t_v2.tokenizer


  _warn("subprocess %s is still running" % self.pid,
  _warn("subprocess %s is still running" % self.pid,
  _warn("subprocess %s is still running" % self.pid,
  _warn("subprocess %s is still running" % self.pid,
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Some weights of the model checkpoint at microsoft/deberta-v3-base were not used when initializing DebertaV2Model: ['mask_predictions.classifier.weight', 'lm_predictions.lm_head.dense.weight', 'mask_predictions.classifier.bias', 'mask_predictions.LayerNorm.weight', 'mask_predictions.dense.bias', 'lm_predictions.lm_head.bias', 'lm_predictions.lm_head.dense.bias', 'lm_predictions.lm_head.LayerNorm.bias', 'mask_predictions.dense.weight', 'mask_predictions.LayerNorm.bias', 'lm_predictions.lm_head.LayerNorm.weight']
- This IS expected if you are initia

### Simple Prediction

In [8]:
examples = [
    'everything is always cooked to perfection , the [B-ASP]service[E-ASP] is excellent , the [B-ASP]decor[E-ASP] cool and understated . $LABEL$ Positive, Positive',
    'Great [B-ASP]taste[E-ASP] ever tried.'
    'I think this laptop is great!'  # if you dont label an aspect, then pyabsa try to give you a 'global sentiment'. But please avoid doing that.
]
for ex in examples:
    sentiment_classifier.predict(
        text=ex,
        print_result=True,
        ignore_error=True,  # ignore an invalid example, if it is False, invalid examples will raise Exceptions
        eval_batch_size=32
    )


Example 0: everything is always cooked to perfection , the [32m<service:Positive(confidence:0.999, ref:Positive)>[0m is excellent , the [32m<decor:Positive(confidence:0.999, ref:Positive)>[0m cool and understated .
Example 0: Great [31m<taste:Positive(confidence:0.999, ref:-100)>[0m ever tried.I think this laptop is great!


### Batch Inference

In [10]:
sentiment_classifier.batch_predict(
    target_file=APC.APCDatasetList.Laptop14,  # the batch_predict() is only available for a file only, please put the examples in a file
    print_result=True,
    save_result=False,
    ignore_error=True,
    eval_batch_size=32
)

Try to load 113.Laptop14 dataset from local disk


100%|██████████| 638/638 [00:00<00:00, 1888.37it/s, preparing apc inference dataloader...]
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  bucket_pos = np.where(abs_pos <= mid, relative_pos, log_pos * sign).astype(np.int)
100%|██████████| 20/20 [00:06<00:00,  3.18it/s, run inference...]

Example 0:  [32m<Boot time:Positive(confidence:0.998, ref:Positive)>[0m is super fast , around anywhere from 35 seconds to 1 minute .
Example 1:  [32m<tech support:Negative(confidence:0.99, ref:Negative)>[0m would not fix the problem unless I bought your plan for $ 150 plus .
Example 2:  [32m<Set up:Positive(confidence:0.997, ref:Positive)>[0m was easy .
Example 3: Did not enjoy the new [32m<Windows 8:Negative(confidence:0.991, ref:Negative)>[0m and [32m<touchscreen functions:Negative(confidence:0.992, ref:Negative)>[0m .
Example 4: Other than not being a fan of [32m<click pads:Negative(confidence:0.988, ref:Negative)>[0m -LRB- industry standard these days -RRB- and the lousy [32m<internal speakers:Negative(confidence:0.992, ref:Negative)>[0m , it 's hard for me to find things about this notebook I do n't like , especially considering the $ 350 [31m<price tag:Negative(confidence:0.962, ref:Positive)>[0m .
Example 5: No [31m<installation disk (DVD):Negative(confidence:0




[{'text': ' Boot time is super fast , around anywhere from 35 seconds to 1 minute .',
  'aspect': ['Boot time'],
  'sentiment': ['Positive'],
  'confidence': [0.9975171089172363],
  'probs': [array([4.7845370e-04, 2.0043901e-03, 9.9751711e-01], dtype=float32)],
  'ref_sentiment': ['Positive'],
  'ref_check': ['Correct'],
  'perplexity': 'N.A.'},
 {'text': ' tech support would not fix the problem unless I bought your plan for $ 150 plus .',
  'aspect': ['tech support'],
  'sentiment': ['Negative'],
  'confidence': [0.9895620942115784],
  'probs': [array([0.9895621 , 0.00678456, 0.00365334], dtype=float32)],
  'ref_sentiment': ['Negative'],
  'ref_check': ['Correct'],
  'perplexity': 'N.A.'},
 {'text': ' Set up was easy .',
  'aspect': ['Set up'],
  'sentiment': ['Positive'],
  'confidence': [0.9974567294120789],
  'probs': [array([8.5592648e-04, 1.6873435e-03, 9.9745673e-01], dtype=float32)],
  'ref_sentiment': ['Positive'],
  'ref_check': ['Correct'],
  'perplexity': 'N.A.'},
 {'text':

### Annotate your own datasets via PyABSA
[Auto-Annotation](https://github.com/yangheng95/ABSADatasets#auto-annoate-your-datasets-via-pyabsa)  # available for v1.0 currently
[Manually-Annotation](https://github.com/yangheng95/ABSADatasets/tree/v1.2/DPT)

### Deploy a APC demo
TBC ...
