## Sentiment Classification based on PaddleNLP
[PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP) is a big NLP lib involve pretrained models and kinds of datasets. We can use it to process NLP tasks such as statistic stock sentiment by a large collection of comments.

As a quik implements of some features we can check [this link](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/model_zoo/taskflow.md)

In order to make up a model both fast and accurate. We can also use some pre-trained models to fit on transfrom trainning task. Check [API Document](https://paddlenlp.readthedocs.io/zh/latest/)

In this page I record an example of how to analysis sentiment by stock comments.

## Install
As version update you can check and update below:

git clone https://github.com/PaddlePaddle/PaddleNLP.git

cd PaddleNLP

git checkout develop

In [1]:
## requirments
# paddlepaddle>=2.4.1
# paddleocr
# pre-commit
# pytest
# parameterized
# pytest-cov
# regex
# pytest-xdist
# fast_tokenizer_python
# emoji
# ftfy
# unidecode

# ! pip install --upgrade paddlenlp -i https://pypi.tuna.tsinghua.edu.cn/simple

## Demo1
We can use any word as extract dimension such as ‘情绪词’，‘观点词’，‘量词’

In [12]:
from paddlenlp import Taskflow
schema =  [{"评价维度":["情绪词", "情感倾向[正向,负向,未提及]"]}]
senta = Taskflow("sentiment_analysis", model="uie-senta-base", schema=schema)

E0310 21:58:43.314036 239740416 analysis_config.cc:579] Please compile with MKLDNN first to use MKLDNN
[32m[2023-03-10 21:58:44,330] [    INFO][0m - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load '/Users/jiaruiming/.paddlenlp/taskflow/sentiment_analysis/uie-senta-base'.[0m


In [13]:
senta('人家自己企业都不在意我们瞎操什么心。大不了一起没咯。')

[{'评价维度': [{'text': '企业',
    'start': 4,
    'end': 6,
    'probability': 0.8930553177697718,
    'relations': {'情绪词': [{'text': '在意',
       'start': 8,
       'end': 10,
       'probability': 0.5746942925425458}],
     '情感倾向[正向,负向,未提及]': [{'text': '负向',
       'probability': 0.9995962784110795}]}}]}]

## Demo 2
This is just a fast sentiment model

In [23]:
senta_fast = Taskflow("sentiment_analysis")
senta_fast("跌得好，顶哌哌。")

E0310 23:12:36.725665 239740416 analysis_config.cc:579] Please compile with MKLDNN first to use MKLDNN


[{'text': '跌得好，顶哌哌。', 'label': 'negative', 'score': 0.624422550201416}]

## Demo 3
Identify the batch_size, we can got input batch data

In [17]:
sentiment_list = [
    "总有那些人制造恐慌，小美到‖块，我把头给板登",
    "春江未暖鸭先知，即将一反常态轰轰烈烈涨起来",
    "接下来，需要调整日线的ADX指标了，跌幅会更大，或者需要更长时间的盘整。要扭转颓势的另一种方式是放量。没办法的事",
    "板块表现看特一药业，周K承进攻态势\n美诺华本周收小十字星，下周应跟随板块有所表现，图形指标都支持等待周kdj金叉，只待放量。"
]
schema =  ["情感倾向[正向,负向,未提及]"]
senta_fast = Taskflow("sentiment_analysis", model = 'uie-senta-base', batch_size=50, schema=schema)

E0310 22:10:48.471343 239740416 analysis_config.cc:579] Please compile with MKLDNN first to use MKLDNN
[32m[2023-03-10 22:10:49,855] [    INFO][0m - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load '/Users/jiaruiming/.paddlenlp/taskflow/sentiment_analysis/uie-senta-base'.[0m


In [18]:
senta_fast(sentiment_list)

[{'情感倾向[正向,负向,未提及]': [{'text': '正向', 'probability': 0.9968331835140134}]},
 {'情感倾向[正向,负向,未提及]': [{'text': '正向', 'probability': 0.9824898061436187}]},
 {'情感倾向[正向,负向,未提及]': [{'text': '负向', 'probability': 0.5023584285346772}]},
 {'情感倾向[正向,负向,未提及]': [{'text': '正向', 'probability': 0.9941705345460861}]}]

In [21]:
# import paddlenlp
# tokenizer = paddlenlp.transformers.ErnieTokenizer.from_pretrained('uie-senta-base')
# encoded_text =tokenizer(text="板块表现看特一药业，周K承进攻态势\n美诺华本周收小十字星，\
#                         下周应跟随板块有所表现，图形指标都支持等待周kdj金叉，只待放量。")

[32m[2023-03-10 22:23:05,868] [    INFO][0m - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_base_zh_vocab.txt and saved to /Users/jiaruiming/.paddlenlp/models/uie-senta-base[0m
[32m[2023-03-10 22:23:06,320] [    INFO][0m - Downloading ernie_3.0_base_zh_vocab.txt from https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_base_zh_vocab.txt[0m
100%|████████████████████████████████████████| 182k/182k [00:00<00:00, 1.71MB/s]
[32m[2023-03-10 22:23:06,768] [    INFO][0m - tokenizer config file saved in /Users/jiaruiming/.paddlenlp/models/uie-senta-base/tokenizer_config.json[0m
[32m[2023-03-10 22:23:06,772] [    INFO][0m - Special tokens file saved in /Users/jiaruiming/.paddlenlp/models/uie-senta-base/special_tokens_map.json[0m
