You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was recently trying to reproduce the main paper of this repo (https://www.ncbi.nlm.nih.gov/pubmed/28736769), yet am having a hard time obtaining the train/test datasets, or to be more precise:
MED-LINE dataset, containing 89,942 training docs, 500 validation and 46,911 for testing
You will need to pull the titles and abstracts for each PMID. For the validation split, I think we simply removed 5000 documents from the training set at random.
If you are interested in MeSH indexing, I recommend taking a look at the BioASQ competition. There are more than 27k MeSH terms, not just 29. The paper you are referring to was a proof of concept on a small number of MeSH terms. I have linked to our paper using all MeSH terms below, as well as the BioASQ competition website:
Dear authors of bio-CNN,
I was recently trying to reproduce the main paper of this repo (https://www.ncbi.nlm.nih.gov/pubmed/28736769), yet am having a hard time obtaining the train/test datasets, or to be more precise:
Have I simply missed the available datasets, I apologize, yet I cannot seem to find them anywhere.
Thank you very much.
The text was updated successfully, but these errors were encountered: