Semi-supervised_learning_for_part_of_speech_tagging

NLP Capstone Project with Columbia University and J.P. Morgan

Please check dev branch for the latest version.

Preparations

Dependencies Preparations

(Ignore this step) Check the cuda version on GCP as

sudo nvidia-smi

No need to configure virtual environment for this machine. Just use the base environment and install the dependencies as

pip install -r requirements.txt

Model/Data Preparations

Download model/base_model.pt from Google Drive into your model/ directory.
Download data/gweb_sancl/ from Google Drive into your data/ directory.

Directory Preparation

Make sure you have the following directories in the root before running the script

.
├── Analysis_int_res.ipynb
├── Analysis_output_Online_fixed_self_learning.ipynb
├── Analysis_output_Online_nonfixed_self_learning.ipynb
├── LICENSE
├── Online_fixed_self_learning_v5.ipynb
├── Online_nonfixed_self_learning_v5.ipynb
├── Online_token_self_learning_v5.ipynb
├── README.md
├── Scratch_fixed_self_learning_v5.ipynb
├── Scratch_nonfixed_self_learning_v5.ipynb
├── Scratch_token_self_learning_v5.ipynb
├── analysis.py
├── build_model.py
├── create_pseudo_data.py
├── create_pseudo_data_by_tokens.py
├── data
│   └── gweb_sancl
│       ├── pos_fine
│       │   ├── answers
│       │   ├── emails
│       │   ├── newsgroups
│       │   ├── reviews
│       │   ├── weblogs
│       │   └── wsj
│       └── unlabeled
│           └── gweb-answers.unlabeled.txt
├── docs
├── intermediate_result
├── metrics
├── model
├── plots_tags
├── requirements.txt
├── result
├── scripts
├── setup.sh
└── utils.py

Directory Structure

Here follows the brief introduction about the specific details for each directory:

metrics: store the metrics at each loop after self training including precision, f1 and recall
plots: store the plots for metrics at different parameter settings
model: store the model settings to save time
data: store the data we are gonna use
docs: store the meeting records for the project
pickles: store the serialized python object after self-training for future usages.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
docs		docs
intermediate_result		intermediate_result
metrics		metrics
model		model
plots_tags		plots_tags
result		result
scripts		scripts
.DS_Store		.DS_Store
.gitattributes		.gitattributes
Analysis_int_res.ipynb		Analysis_int_res.ipynb
Analysis_output_Online_fixed_self_learning.ipynb		Analysis_output_Online_fixed_self_learning.ipynb
Analysis_output_Online_nonfixed_self_learning.ipynb		Analysis_output_Online_nonfixed_self_learning.ipynb
LICENSE		LICENSE
Online_fixed_self_learning_v5.ipynb		Online_fixed_self_learning_v5.ipynb
Online_nonfixed_self_learning_v5.ipynb		Online_nonfixed_self_learning_v5.ipynb
Online_token_self_learning_v5.ipynb		Online_token_self_learning_v5.ipynb
README.md		README.md
Scratch_fixed_self_learning_v5.ipynb		Scratch_fixed_self_learning_v5.ipynb
Scratch_nonfixed_self_learning_v5.ipynb		Scratch_nonfixed_self_learning_v5.ipynb
Scratch_token_self_learning_v5.ipynb		Scratch_token_self_learning_v5.ipynb
analysis.py		analysis.py
build_model.py		build_model.py
create_pseudo_data.py		create_pseudo_data.py
create_pseudo_data_by_tokens.py		create_pseudo_data_by_tokens.py
requirements.txt		requirements.txt
setup.sh		setup.sh
utils.py		utils.py

License

Alex2Yang97/Semi-supervised_learning_for_part_of_speech_tagging

Folders and files

Latest commit

History

Repository files navigation

Semi-supervised_learning_for_part_of_speech_tagging

NLP Capstone Project with Columbia University and J.P. Morgan

Preparations

Dependencies Preparations

Model/Data Preparations

Directory Preparation

Directory Structure

Results

About

Resources

License

Stars

Watchers

Forks

Languages