Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Original datasets - reproducibility #1

Closed
SkBlaz opened this issue Oct 4, 2018 · 2 comments
Closed

Original datasets - reproducibility #1

SkBlaz opened this issue Oct 4, 2018 · 2 comments

Comments

@SkBlaz
Copy link

SkBlaz commented Oct 4, 2018

Dear authors of bio-CNN,

I was recently trying to reproduce the main paper of this repo (https://www.ncbi.nlm.nih.gov/pubmed/28736769), yet am having a hard time obtaining the train/test datasets, or to be more precise:

  1. MED-LINE dataset, containing 89,942 training docs, 500 validation and 46,911 for testing
  2. check tags (12), low recall terms (7) and low precision terms (10)
  3. Hence, a total of 29 target terms (if I understand correctly, this is still new to me)

Have I simply missed the available datasets, I apologize, yet I cannot seem to find them anywhere.

Thank you very much.

@AnthonyMRios
Copy link
Owner

AnthonyMRios commented Oct 4, 2018

Hi SkBlaz,

Here is a direct link to the dataset PMIDs: https://ii.nlm.nih.gov/DataSets/2013_MTI_ML_DataSet.tar

You will need to pull the titles and abstracts for each PMID. For the validation split, I think we simply removed 5000 documents from the training set at random.

If you are interested in MeSH indexing, I recommend taking a look at the BioASQ competition. There are more than 27k MeSH terms, not just 29. The paper you are referring to was a proof of concept on a small number of MeSH terms. I have linked to our paper using all MeSH terms below, as well as the BioASQ competition website:

http://participants-area.bioasq.org/general_information/Task6a/
https://ieeexplore.ieee.org/abstract/document/7349667

@SkBlaz
Copy link
Author

SkBlaz commented Oct 5, 2018

Thank you very much! This is exactly what I was looking for.

@SkBlaz SkBlaz closed this as completed Oct 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants