Skip to content

This task is a basic NLP task: Text Classification for Sentiment Analysis.

License

Notifications You must be signed in to change notification settings

blaire101/Sentiment-Analysis-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentiment Analysis Project

About datasets, please auther...

Test Description

This task is a traditional NLP task: Text Classification for Sentiment Analysis.

  • process raw data to extract features.
  • build one or more model (e.g. Logistic Regression, Decision Tree, Neural Network) to train on such data set
  • test trained model on some evaluation metric (e.g. Precision, Recall, AUC)

Development Environment

Depend mainly on list the tools and libraries . For details, please check requirements.txt

Project Structure

This is a detailed description of the organizational structure of the project.

1. organization structure

directory structure description
data data files and glove.840B.300d.txt
ipynb notebook code
model model files by deep learning
model_best best model files by a method of deep learning
predict_result prediction result for test.json
images some pictures for data analysis
src main code

2. code structure

src, core code description
src/config.py project configuration information module, mainly including file reading or storage path information
src/constant.py constant variables or infrequently changing variables
src/util.py data processing module, mainly including data reading and processing functions
src/model.py deep learning model definition

3. deep learning model

src, main train and predict code description
src/main_train_dl_... model training module, model training process includes data processing, feature extraction, model training, model validation and other steps.
src/main_train_dl_1_rnn_simple_and_predict.py rnn simple non glove_embedding
src/main_train_dl_2_rnn_glove_embedding_and_predict.py rnn_glove_embedding
src/main_train_dl_3_cnn_and_predict.py cnn_glove_embedding
src/main_train_dl_4_rcnn_glove_embedding_and_predict.py rcnn_glove_embedding
src/main_train_dl_5_elmo_like_and_predict.py elmo_like_glove_embedding

4. machine learning model

src, main train and predict code description
src/main_train_ml_... Model training module, model training process includes data processing, feature extraction, model training, model validation and other steps.
src/main_train_ml_1_lr_word_and_predict.py logistic regression + word ngram
src/main_train_ml_2_lr_char_and_predict.py logistic regression + char ngram
src/main_train_ml_3_lr_word_char_and_predict.py logistic regression + word_char ngram

Instructions

  • Prepare pyenv & pip install -r requirement.txt
  • config data file storage path in config.py
  • run script: run.sh , The training model is saved and the (precision_score, recall_score, f1-score) of the test set will be shown by log.

Tips:

  • sh run.sh , only run a best model
python src/main_train_dl_4_rcnn_glove_embedding_and_predict.py -ep 10

rcnn model metrics result:

=====test run result=====:

test f1_score: 0.8123589611797799
test precision_score: 0.8125411611403617
test recall_score: 0.8123827392120075

other machine learning model, train and predict

python src/main_train_ml_1_lr_word_and_predict.py
python src/main_train_ml_2_lr_char_and_predict.py
python src/main_train_ml_3_lr_word_char_and_predict.py

other deep learning model, train and predict

  1. when you train the following model, maybe you need to make sure that you have comment out the prediction code part and metrics code part.
  2. When you have completed the model training, you need to open the comments (prediction code part and metrics code part) and load the best model you have created.
  3. when you want to prediction and evaluation model, you need to make sure that you have comment out the train code part or set parameter -ep 0
python src/main_train_dl_1_rnn_simple_and_predict.py -ep 10
python src/main_train_dl_2_rnn_glove_embedding_and_predict.py -ep 10
python src/main_train_dl_3_cnn_and_predict.py  -ep 10
python src/main_train_dl_5_elmo_like_and_predict.py -ep 10

Submit Requirement

The output should include

  • prediction result for test.json
  • evaluation result for training and predicting performance

Reference

About

This task is a basic NLP task: Text Classification for Sentiment Analysis.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages