Chinese_Sentiment

Chinese Sentiment Analysis Model (Three-class Classification)

Model Purpose

Identify the sentiment of social media comments.

Example:

Comments	Labels
這個食物很好吃	positive
這個食物很難吃	negative
這個食物味道普通	neutral

References

Bert
Base model : RoBERTa-wwm-ext, Chinese
Framework : TensorFlow

Training Dataset

test & train dataset source :

Crawl and extract comment data, then clean the data before feeding it into Sentiment Analysis using Azure AI services (testing has shown Azure to be the most accurate).

val dataset source :

The validation set is manually evaluated to determine the most accurate sentiment of the comments.

Annotation format : csv

test	train	val
91393	463708	1496

Data example:

食物不好吃,negative

食物很好吃,positive

食物味道普通,neutral

Training Process

Environment : kaggle platform

Download base model : RoBERTa-wwm-ext, Chinese
Use the Kaggle platform for training (*if the dataset is too large, it may exceed Kaggle's maximum execution time limit, making training impossible).
Place the bert-code-new folder, base model, and sentiment-data folder into the Kaggle dataset.
Use the following code

!python /kaggle/input/bert-code-new/run_classifier.py --task_name=mytask --do_train=true --do_eval=true --data_dir=/kaggle/input/sentiment-data --vocab_file=/kaggle/input/chinese-roberta-wwm-ext-l-12-h-768-a-12/vocab.txt --bert_config_file=/kaggle/input/chinese-roberta-wwm-ext-l-12-h-768-a-12/bert_config.json --init_checkpoint=/kaggle/input/chinese-roberta-wwm-ext-l-12-h-768-a-12/bert_model.ckpt --max_seq_length=300 --train_batch_size=16 --learning_rate=1e-5 --num_train_epochs=2.0 --output_dir=output

Hardware and environment (kaggle):

Training time : 11 hours
Hardware : GPU P100
Environment : Python 3.10.12

Results

Model performance:
- eval_accuracy : 0.796531
- global_step = 59292

General usage

Trained model

Environment (local) : Python 3.8.16, macOS 14.6.1

Install dependencies:

pip install -r ./bert-code-new/requirements.txt

Run inference:

python ./bert-code-new/run_inference.py --task_name=mytask /
--do_predict=true /
--data_dir=test-for-inference /
--vocab_file=chinese-roberta-wwm-ext-l12-h768-a12/vocab.txt /
--bert_config_file=chinese-roberta-wwm-ext-l12-h768-a12/bert_config.json /
--init_checkpoint=chinese-roberta-wwm-ext-l12-h768-a12 /
--max_seq_length=512 /
--output_dir=output_predict

Example output:
- output format : tsv
- Example output : 0.04118731(negative) 0.9210373(neutral) 0.037775367(positive)

Simple docker usage

Install docker images:
```
./build_images.sh
```
Run inference:
```
./run_predict.sh
```
Example output:
- outputfile : output_predict
- output format : tsv
- Example output : 0.04118731 0.9210373 0.037775367

Todo

fastapi & docker compose

Citation

@article{devlin2018bert,
  title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
  author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
  journal={arXiv preprint arXiv:1810.04805},
  year={2018}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Chinese_Sentiment

Table of Contents

Model Purpose

References

Training Dataset

Training Process

Results

General usage

Simple docker usage

Todo

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
bert-code-new		bert-code-new
chinese-roberta-wwm-ext-l12-h768-a12		chinese-roberta-wwm-ext-l12-h768-a12
docker		docker
output_predict		output_predict
test-for-inference		test-for-inference
.gitignore		.gitignore
README.md		README.md
build_images.sh		build_images.sh
run_predict.sh		run_predict.sh
train_example.ipynb		train_example.ipynb

StevenHsu22/Chinese_Sentiment

Folders and files

Latest commit

History

Repository files navigation

Chinese_Sentiment

Table of Contents

Model Purpose

References

Training Dataset

Training Process

Results

General usage

Simple docker usage

Todo

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages