MLMET

Ultra-Fine Entity Typing with Weak Supervision from a Masked Language Model

Requires transformers, inflect.

Set DATA_DIR in config.py to your data directory
Put English Gigaword data to DATA_DIR/gigaword_eng_5 https://catalog.ldc.upenn.edu/LDC2011T07
Put full Ultra-fine Entity Typing data to DATA_DIR/ultrafine/uf_data https://www.cs.utexas.edu/~eunsol/html_pages/open_entity.html
Create the following directories: DATA_DIR/ultrafine/output/models, DATA_DIR/ultrafine/log, DATA_DIR/ultrafine/bert_labels
Run python prep.py to generate pronoun mentions.
Run python trainweak.py 0 to pretrain a model with existing weak samples.
Run python trainbertuf.py 0 to finetune the model trained in 6.
Run python genlabels.py to generate labels with BERT MLM. This can take a long time. You may want to adjust the code to do it in parallel.

Pre-generated labels for non-pronoun mention examples: https://drive.google.com/file/d/1FeaPyIovdkkumVZteeNWyWYErCd6kbB1/view?usp=sharing

Pre-generated labels for pronoun mention examples are in data/gigaword5_pronoun_s005_ama_ms_10types.zip . Before using it, run python verifypronoun.py to verify if your pronoun mentions match these labels. If python verifypronoun.py prints Verification FAILED!, try running python genpronfixed.py to generate pronoun mention examples.
Run python trainweak.py 1 to pretrain a model with newly generated weak labels.
Run python trainbertuf.py 1 to fine-tune the model in 9.
Run python trainufst.py to do self-training.

Provide feedback

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
exp		exp
models		models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
genlabels.py		genlabels.py
genpronfixed.py		genpronfixed.py
prep.py		prep.py
trainbertuf.py		trainbertuf.py
trainufst.py		trainufst.py
trainweak.py		trainweak.py
verifypronoun.py		verifypronoun.py