About datasets, please auther...
This task is a traditional NLP task: Text Classification for Sentiment Analysis.
- process raw data to extract features.
- build one or more model (e.g. Logistic Regression, Decision Tree, Neural Network) to train on such data set
- test trained model on some evaluation metric (e.g. Precision, Recall, AUC)
Depend mainly on list the tools and libraries . For details, please check requirements.txt
This is a detailed description of the organizational structure of the project.
directory structure | description |
---|---|
data | data files and glove.840B.300d.txt |
ipynb | notebook code |
model | model files by deep learning |
model_best | best model files by a method of deep learning |
predict_result | prediction result for test.json |
images | some pictures for data analysis |
src | main code |
src, core code | description |
---|---|
src/config.py | project configuration information module, mainly including file reading or storage path information |
src/constant.py | constant variables or infrequently changing variables |
src/util.py | data processing module, mainly including data reading and processing functions |
src/model.py | deep learning model definition |
src, main train and predict code | description |
---|---|
src/main_train_dl_... | model training module, model training process includes data processing, feature extraction, model training, model validation and other steps. |
src/main_train_dl_1_rnn_simple_and_predict.py | rnn simple non glove_embedding |
src/main_train_dl_2_rnn_glove_embedding_and_predict.py | rnn_glove_embedding |
src/main_train_dl_3_cnn_and_predict.py | cnn_glove_embedding |
src/main_train_dl_4_rcnn_glove_embedding_and_predict.py | rcnn_glove_embedding |
src/main_train_dl_5_elmo_like_and_predict.py | elmo_like_glove_embedding |
src, main train and predict code | description |
---|---|
src/main_train_ml_... | Model training module, model training process includes data processing, feature extraction, model training, model validation and other steps. |
src/main_train_ml_1_lr_word_and_predict.py | logistic regression + word ngram |
src/main_train_ml_2_lr_char_and_predict.py | logistic regression + char ngram |
src/main_train_ml_3_lr_word_char_and_predict.py | logistic regression + word_char ngram |
- Prepare pyenv & pip install -r requirement.txt
- config data file storage path in config.py
- run script: run.sh , The training model is saved and the (precision_score, recall_score, f1-score) of the test set will be shown by log.
Tips:
- sh run.sh , only run a best model
python src/main_train_dl_4_rcnn_glove_embedding_and_predict.py -ep 10
rcnn model metrics result:
=====test run result=====:
test f1_score: 0.8123589611797799
test precision_score: 0.8125411611403617
test recall_score: 0.8123827392120075
other machine learning model, train and predict
python src/main_train_ml_1_lr_word_and_predict.py
python src/main_train_ml_2_lr_char_and_predict.py
python src/main_train_ml_3_lr_word_char_and_predict.py
other deep learning model, train and predict
- when you train the following model, maybe you need to make sure that you have comment out the prediction code part and metrics code part.
- When you have completed the model training, you need to open the comments (prediction code part and metrics code part) and load the best model you have created.
- when you want to prediction and evaluation model, you need to make sure that you have comment out the train code part or set parameter
-ep 0
python src/main_train_dl_1_rnn_simple_and_predict.py -ep 10
python src/main_train_dl_2_rnn_glove_embedding_and_predict.py -ep 10
python src/main_train_dl_3_cnn_and_predict.py -ep 10
python src/main_train_dl_5_elmo_like_and_predict.py -ep 10
The output should include
- prediction result for test.json
- evaluation result for training and predicting performance