Skip to content

Latest commit

 

History

History
31 lines (23 loc) · 3.27 KB

README.md

File metadata and controls

31 lines (23 loc) · 3.27 KB

BioReddit Embeddings

This repository contains word embeddings trained on medical subreddits. We provide embeddings for GloVe (Pennington et al., 2014), ELMo (Peters et al., 2018), and Flair (Akbik et al., 2018).

The embeddings are trained on ~800,000 Reddit posts from over 60 medical-themed communities. We describe the training and evaluation process of the embeddings in Basaldella and Collier, BioReddit: Word Embeddings for User-Generated Biomedical NLP, presented at the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019), co-located with the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP 2019).

Embeddings

You can download the embeddings in the release section of this repository or using the links in the table below:

Embedding Download Link
ELMo options, weights
Flair forward, backward
GloVe 50 txt, bin
Glove 100 txt, bin
Glove 200 txt, bin
FastText See COMETA
BERT See COMETA

Code

You can find the code used to download the subreddits here.