Skip to content

Word embeddings trained on medical subreddits.

Notifications You must be signed in to change notification settings

basaldella/bioreddit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

BioReddit Embeddings

This repository contains word embeddings trained on medical subreddits. We provide embeddings for GloVe (Pennington et al., 2014), ELMo (Peters et al., 2018), and Flair (Akbik et al., 2018).

The embeddings are trained on ~800,000 Reddit posts from over 60 medical-themed communities. We describe the training and evaluation process of the embeddings in Basaldella and Collier, BioReddit: Word Embeddings for User-Generated Biomedical NLP, presented at the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019), co-located with the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP 2019).

Embeddings

You can download the embeddings in the release section of this repository or using the links in the table below:

Embedding Download Link
ELMo options, weights
Flair forward, backward
GloVe 50 txt, bin
Glove 100 txt, bin
Glove 200 txt, bin
FastText See COMETA
BERT See COMETA

Code

You can find the code used to download the subreddits here.

About

Word embeddings trained on medical subreddits.

Resources

Stars

Watchers

Forks

Packages

No packages published