Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Latest commit b629f74 Mar 12, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
word2vec upload code Jun 22, 2016
wvlib upload code Jun 22, 2016
.gitattributes change vector location Nov 11, 2016 Update Mar 12, 2018 upload code Jun 22, 2016 upload code Jun 22, 2016
lower_shuffled_combine_tokenized.txt upload code Jun 22, 2016


Here are the scripts, code and vectors for the ACL BioNLP 2016 workshop paper:

Chiu et al. How to Train good Word Embeddings for Biomedical NLP

API Package

word2vec: original word2vec from Mikolov:
wvlib: lib to read word2vec file:
geniass: lib to segment bioMedical text:

Scripts segment and tokenized input text (e.g. raw PubMed or PMC text) create lowercased and sentence-shuffled text (input: tokenized text) Create word2vec.bin file with different parameters run intrinsic evaluation on UMNSRS and Mayo data-set (input: Dir. for testing vector) run extrinsic evaluation


Pre-processing: tokenized text (requires NLTK)
geniass: segment sentence

Intrinsic evaluation: perform intrinisic evaluation

Extrinsic evaluation: (Keras folder: Need either tensorflow or theano installed): simple feed-forward Neural Network parameters for the Neual Network

Word vectors


All data on this page is made available under the Creative Commons Attribution (CC BY) license

You can’t perform that action at this time.