Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Latest commit b629f74 Mar 12, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
geniass
keras
tools
word2vec upload code Jun 22, 2016
wvlib upload code Jun 22, 2016
.gitattributes change vector location Nov 11, 2016
ExtrinsicEva.sh
README.md Update README.md Mar 12, 2018
createModel.sh
create_shf_low_text.sh upload code Jun 22, 2016
evaluate.py upload code Jun 22, 2016
intrinsicEva.sh
lower_shuffled_combine_tokenized.txt upload code Jun 22, 2016
pre-process.sh
tokenize_Text.py

README.md

BioNLP-2016

Here are the scripts, code and vectors for the ACL BioNLP 2016 workshop paper:

Chiu et al. How to Train good Word Embeddings for Biomedical NLP

API Package

word2vec: original word2vec from Mikolov: https://code.google.com/archive/p/word2vec/
wvlib: lib to read word2vec file: https://github.com/spyysalo/wvlib
geniass: lib to segment bioMedical text: http://www.nactem.ac.uk/y-matsu/geniass/

Scripts

pre-process.sh: segment and tokenized input text (e.g. raw PubMed or PMC text)
create_shf_low_text.sh: create lowercased and sentence-shuffled text (input: tokenized text)
createModel.sh: Create word2vec.bin file with different parameters
intrinsicEva.sh: run intrinsic evaluation on UMNSRS and Mayo data-set (input: Dir. for testing vector)
ExtrinsicEva.sh: run extrinsic evaluation

Code

Pre-processing:
tokenize_text.py: tokenized text (requires NLTK)
geniass: segment sentence

Intrinsic evaluation:
evaluate.py: perform intrinisic evaluation

Extrinsic evaluation: (Keras folder: Need either tensorflow or theano installed):
mlp.py: simple feed-forward Neural Network
setting.py: parameters for the Neual Network

Word vectors

https://drive.google.com/open?id=0BzMCqpcgEJgiUWs0ZnU0NlFTam8

License

All data on this page is made available under the Creative Commons Attribution (CC BY) license

You can’t perform that action at this time.