NlpSentimentAnalysis

Introduction

Fine-grained emotional analysis of online reviews is of great value to deeply understand businesses and users and to tap users'emotions. It is widely used in the Internet industry, mainly for personalized recommendation, intelligent search, product feedback, business security and so on. This project completes the task of fine-grained emotional analysis through a high-quality massive data set, which contains six categories and 20 fine-grained elements. We need to build an algorithm based on the sentiment tendency of the annotated fine-grained elements, mine the user comments, determine the prediction accuracy by calculating the error between the predicted value and the real value of the scene, and evaluate the proposed prediction algorithm.

Usage

Install pytorch 1.0 for Python 3.6+
Run pip3 install -r requirements.txt to install python dependencies.
Run python main.py --mode data to build tensors from the raw dataset.
Run python main.py --mode train to train the model. After training, log/model.pt will be generated.
Run python main.py --mode test to test an pretrained model. Default model file is log/model.pt

Preparing

We used the following word segmentation tools:

pyhanlp: Python interfaces for HanLP

Pyltp Word Segmentation Tool of Harbin University of Technology

Word Vectors:Chinese Word Vectors

Reference: Shen Li, Zhe Zhao, Renfen Hu, Wensi Li, Tao Liu, Xiaoyong Du, Analogical Reasoning on Chinese Morphological and Semantic Relations, ACL 2018.

Structure

preproc.py: downloads dataset and builds input tensors.

main.py: program entry; functions about training and testing.

models.py: The sentiment analaysis neural network structure.

config.py: configurations.

utils.py: Some of the basic tools for task.

thread_sepwords.py:Use multi-processing thread to process the raw data for words.

thread_sepsentences.py:Use multi-processing thread to process the raw data for sentences.

Differences from the paper

The paper doesn't mention which activation function they used. I use relu.
I don't set the embedding of <UNK> trainable.
The connector between embedding layers and embedding encoders may be different from the implementation of Google, since the description in the paper is inconsistent (residual block can't be used because the dimensions of input and output are different) and they don't say how they implemented it.

TODO

Reduce memory usage
Improve converging speed (to reach 60 F1 scores in 1000 iterations)
Reach state-of-art scroes of the original paper
Performance analysis
Test on AI-Challenger2018 dataset

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
CoreCode		CoreCode
Demo1		Demo1
Demo2		Demo2
Demo3		Demo3
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CoreCode

CoreCode

Demo1

Demo1

Demo2

Demo2

Demo3

Demo3

README.md

README.md

Repository files navigation

NlpSentimentAnalysis

Introduction

Usage

Preparing

Structure

Differences from the paper

TODO

About

Releases

Packages

Languages

MobtgZhang/NlpSentimentAnalysis

Folders and files

Latest commit

History

Repository files navigation

NlpSentimentAnalysis

Introduction

Usage

Preparing

Structure

Differences from the paper

TODO

About

Resources

Stars

Watchers

Forks

Languages