Skip to content

Closes #208#426

Merged
sg-wbi merged 10 commits intobigscience-workshop:masterfrom
nbroad1881:bioasq-2021-mesinesp
Apr 12, 2022
Merged

Closes #208#426
sg-wbi merged 10 commits intobigscience-workshop:masterfrom
nbroad1881:bioasq-2021-mesinesp

Conversation

@nbroad1881
Copy link
Copy Markdown
Contributor

Name: BioASQ Task 2021 MESINESP / MESINESP2
Description: None provided
Task: DOC_CLASS
Paper: http://ceur-ws.org/Vol-2936/paper-11.pdf
Data: https://zenodo.org/record/5602914#.YhSXJ5PMKWt
License: CC BY 4.0

Checkbox

  • Confirm that this PR is linked to the dataset issue.
  • Create the dataloader script biodatasets/my_dataset/my_dataset.py (please use only lowercase and underscore for dataset naming).
  • Provide values for the _CITATION, _DATASETNAME, _DESCRIPTION, _HOMEPAGE, _LICENSE, _URLs, _SUPPORTED_TASKS, _SOURCE_VERSION, and _BIGBIO_VERSION variables.
  • Implement _info(), _split_generators() and _generate_examples() in dataloader script.
  • Make sure that the BUILDER_CONFIGS class attribute is a list with at least one BigBioConfig for the source schema and one for a bigbio schema.
  • Confirm dataloader script works with datasets.load_dataset function.
  • Confirm that your dataloader script passes the test suite run with python -m tests.test_bigbio biodatasets/my_dataset/my_dataset.py.
  • If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files.

data_dir as dict
pass ner_filepath, corpus_filepath

use _corpus_to_dict function
need to fix entity offsets
fix entities to match schema
add citation, description

change year to string because some values are 'Not Available'
check for tracks to get top level folder
fix formatting
Copy link
Copy Markdown
Collaborator

@sg-wbi sg-wbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nbroad1881 Thank you for your contribution! LGTM! I think there's only one minor fix to be done regarding the different subset_ids. Could you please check this? Thank you!

Comment thread biodatasets/bioasq_2021_mesinesp/bioasq_2021_mesinesp.py Outdated
@sg-wbi sg-wbi self-assigned this Apr 12, 2022
Copy link
Copy Markdown
Collaborator

@sg-wbi sg-wbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great submission @nbroad1881! Thank you for your contribution!

@sg-wbi sg-wbi merged commit 7c0be77 into bigscience-workshop:master Apr 12, 2022
@nbroad1881 nbroad1881 deleted the bioasq-2021-mesinesp branch April 12, 2022 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants