Original datasets - reproducibility #1

SkBlaz · 2018-10-04T12:39:08Z

Dear authors of bio-CNN,

I was recently trying to reproduce the main paper of this repo (https://www.ncbi.nlm.nih.gov/pubmed/28736769), yet am having a hard time obtaining the train/test datasets, or to be more precise:

MED-LINE dataset, containing 89,942 training docs, 500 validation and 46,911 for testing
check tags (12), low recall terms (7) and low precision terms (10)
Hence, a total of 29 target terms (if I understand correctly, this is still new to me)

Have I simply missed the available datasets, I apologize, yet I cannot seem to find them anywhere.

Thank you very much.

AnthonyMRios · 2018-10-04T14:12:53Z

Hi SkBlaz,

Here is a direct link to the dataset PMIDs: https://ii.nlm.nih.gov/DataSets/2013_MTI_ML_DataSet.tar

You will need to pull the titles and abstracts for each PMID. For the validation split, I think we simply removed 5000 documents from the training set at random.

If you are interested in MeSH indexing, I recommend taking a look at the BioASQ competition. There are more than 27k MeSH terms, not just 29. The paper you are referring to was a proof of concept on a small number of MeSH terms. I have linked to our paper using all MeSH terms below, as well as the BioASQ competition website:

http://participants-area.bioasq.org/general_information/Task6a/
https://ieeexplore.ieee.org/abstract/document/7349667

SkBlaz · 2018-10-05T05:00:32Z

Thank you very much! This is exactly what I was looking for.

SkBlaz closed this as completed Oct 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Original datasets - reproducibility #1

Original datasets - reproducibility #1

SkBlaz commented Oct 4, 2018

AnthonyMRios commented Oct 4, 2018 •

edited

Loading

SkBlaz commented Oct 5, 2018

Original datasets - reproducibility #1

Original datasets - reproducibility #1

Comments

SkBlaz commented Oct 4, 2018

AnthonyMRios commented Oct 4, 2018 • edited Loading

SkBlaz commented Oct 5, 2018

AnthonyMRios commented Oct 4, 2018 •

edited

Loading