Skip to content

Closes #161#460

Closed
trishalaneeraj wants to merge 20 commits intobigscience-workshop:masterfrom
trishalaneeraj:mayosrs
Closed

Closes #161#460
trishalaneeraj wants to merge 20 commits intobigscience-workshop:masterfrom
trishalaneeraj:mayosrs

Conversation

@trishalaneeraj
Copy link
Copy Markdown
Contributor

Closes #161

Checkbox

  • Confirm that this PR is linked to the dataset issue.
  • Create the dataloader script biodatasets/my_dataset/my_dataset.py (please use only lowercase and underscore for dataset naming).
  • Provide values for the _CITATION, _DATASETNAME, _DESCRIPTION, _HOMEPAGE, _LICENSE, _URLs, _SUPPORTED_TASKS, _SOURCE_VERSION, and _BIGBIO_VERSION variables.

NOTE: _LICENSE is "Unknown" in this case

  • Implement _info(), _split_generators() and _generate_examples() in dataloader script.
  • Make sure that the BUILDER_CONFIGS class attribute is a list with at least one BigBioConfig for the source schema and one for a bigbio schema.
  • Confirm dataloader script works with datasets.load_dataset function.
  • Confirm that your dataloader script passes the test suite run with python -m tests.test_bigbio biodatasets/my_dataset/my_dataset.py.
  • If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files. (NOT APPLICABLE)

Daniel León Periñán and others added 2 commits April 14, 2022 12:20
* Create dataset loader for PsyTAR

* Fix config instance when only data_dir passed

* Style and remove unused dependency

* fix: updates psytarclass to inherit from bb

Co-authored-by: Natasha Seelam <nseelam1@gmail.com>
* Add support for n2c2 2010

* Add support for n2c2 2010

* Format code

* Import BigBioConfig from utils; cosmetic refactor

Co-authored-by: Ayush Singh <singh.ay@northeastern.edu>
@debajyotidatta debajyotidatta self-assigned this Apr 15, 2022
Comment thread biodatasets/mayosrs/mayosrs.py Outdated
{
"text_1": datasets.Value("string"),
"text_2": datasets.Value("string"),
"label": datasets.Value("string"),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to float in the source schema?

leonweber and others added 16 commits April 15, 2022 19:45
* update README - add images for task assigment

* add pubmed_qa

* update pubmed_qa.py

* update question type to yesnomaybe, use [maybe, LONG_ANSWER] as the answer for PQA-Unlabeled

* fix: update to new qa schema

* remove unused files

* add 10-fold data for pqal (subset_id pqal_fold{k}_[source|bigbio]), remove LONG_ANSWER, update question type to yesno

* add 10-fold data for pqal (subset_id pqal_fold{k}_[source|bigbio]), remove LONG_ANSWER, update question type to yesno

* update pubmed_qa.py - add description for each dataset subset, change naming for the subset_id following bigbio convention, update None to BigBioValues.NULL on the bigbio schema

* format, remove print, add TODO

format, remove print, add TODO

Co-authored-by: Natasha Seelam <nseelam1@gmail.com>
Co-authored-by: Gabriel Altay <gabriel.altay@gmail.com>
* add back images that were removed in bigscience-workshop#357

* oops! rename images
* run tests by config name, cleaned up a bit

* referenced -> existing

referenced -> existing
* n2c2 2006 de-identification task

* remove name == main block

remove name == main block

Co-authored-by: Gabriel Altay <gabriel.altay@gmail.com>
* Initial NLM-WSD commit

* Further development

* Further development

* Reformat

* add custom local config 

add custom local config

* import dataclass

import dataclass

* fix custom config typos

fix custom config typos

Co-authored-by: Gabriel Altay <gabriel.altay@gmail.com>
@trishalaneeraj
Copy link
Copy Markdown
Contributor Author

Hi @debajyotidatta I've addressed your comments in a new PR: #479

Could you please review the changes there and I'll go ahead and close this one?

@trishalaneeraj trishalaneeraj mentioned this pull request Apr 17, 2022
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create dataset loader for MayoSRS Reference Standard

9 participants