Closes #25 #357

SamuelCahyawijaya · 2022-04-06T18:23:09Z

Please name your PR after the issue it closes. You can use the following line: "Closes #ISSUE-NUMBER" where you replace the ISSUE-NUMBER with the one corresponding to your dataset.

If the following information is NOT present in the issue, please populate:

Name:: PubmedQA
Description:: https://pubmedqa.github.io/
Paper: https://aclanthology.org/D19-1259/
Data: https://github.com/pubmedqa/pubmedqa

Checkbox

Confirm that this PR is linked to the dataset issue.
Create the dataloader script biodatasets/my_dataset/my_dataset.py (please use only lowercase and underscore for dataset naming).
Provide values for the _CITATION, _DATASETNAME, _DESCRIPTION, _HOMEPAGE, _LICENSE, _URLs, _SUPPORTED_TASKS, _SOURCE_VERSION, and _BIGBIO_VERSION variables.
Implement _info(), _split_generators() and _generate_examples() in dataloader script.
Make sure that the BUILDER_CONFIGS class attribute is a list with at least one BigBioConfig for the source schema and one for a bigbio schema.
Confirm dataloader script works with datasets.load_dataset function.
Confirm that your dataloader script passes the test suite run with python -m tests.test_bigbio biodatasets/my_dataset/my_dataset.py.

Note: Need to specify a specific subset_id, i.e., pqal, pqau, and pqaa, to run the unit test.

…nswer for PQA-Unlabeled

hakunanatasha

Hi @SamuelCahyawijaya

Please remove the images.

I'm having trouble running the unittests for pqaa and pqau but pqal passes - when I look at google I see it limits the download because of size - can you check if these work?

…to pubmed_qa

…nto pubmed_qa

SamuelCahyawijaya · 2022-04-08T05:17:51Z

Hi @hakunanatasha, same as in the other PR, I have deleted the two image files here.
For the pqau and pqaa, I think it is the bug from HF datasets as mentioned here: huggingface/datasets#3787, and it has been included in the datasets==2.0.0 (https://github.com/huggingface/datasets/releases/tag/2.0.0).

I tested on datasets==2.0.0 and it works just fine. Is it possible to update the datasets requirement in the requirements.txt to datasets==2.0.0 to cope with this problem?

I tested some other datasets (mediqa_qa, mediqa_rqe, pubmed_qa, paramed, pico_extraction, medhop. scital, and mqp) using the datasets==2.0.0 and all of them seem to work just fine.

hakunanatasha · 2022-04-08T13:54:10Z

@SamuelCahyawijaya very interesting - yes if that's the case, let's update the reqs.

…emove LONG_ANSWER, update question type to yesno

SamuelCahyawijaya · 2022-04-08T17:13:15Z

@hakunanatasha : I have added the 10-fold configs for source and bigbio schemas. The subset_id for pqal changes from pqal_[source|bigbio] to pqal_fold{k}_[source|bigbio], k in [0..9].

…emove LONG_ANSWER, update question type to yesno

galtay · 2022-04-15T00:09:02Z

don't worry about the images (we can just merge this and then add the images back right after)
I'll add some comments to the code though

biodatasets/pubmed_qa/pubmed_qa.py

…to pubmed_qa

… naming for the subset_id following bigbio convention, update None to BigBioValues.NULL on the bigbio schema

galtay · 2022-04-16T01:16:35Z

Hi @SamuelCahyawijaya, I think I understand this dataset a bit better now. Are all the BigBio Q/A schemas using the yes/no answers? It seems that we might also be able to support long_answer Q/A and maybe even text classification with the MESH labels. I think we can wrap up it up for now but let's flag this for later development.

format, remove print, add TODO

* add back images that were removed in #357 * oops! rename images

SamuelCahyawijaya · 2022-04-17T04:40:16Z

Hi @galtay, yes, you are right. The dataset can be utilized to support long_answer Q/A and some other possible tasks. Before I include the long answer as the label in the bigbio schema, but then I remove it, since we need to maintain the QA type.

Let me know if later we plan to implement a different task / QA type for this. I can help to implement that part.

SamuelCahyawijaya added 4 commits March 30, 2022 11:26

update README - add images for task assigment

fa34f96

merge with upstream master

e117f76

add pubmed_qa

b73671e

update pubmed_qa.py

f7cc9a9

SamuelCahyawijaya requested review from galtay, hakunanatasha, jason-fries, leonweber, ruisi-su and sunnnymskang as code owners April 6, 2022 18:23

update question type to yesnomaybe, use [maybe, LONG_ANSWER] as the a…

1596cde

…nswer for PQA-Unlabeled

jason-fries self-assigned this Apr 7, 2022

fix: update to new qa schema

1eebdc2

hakunanatasha requested changes Apr 8, 2022

View reviewed changes

SamuelCahyawijaya added 3 commits April 8, 2022 11:43

Merge branch 'master' of github.com:bigscience-workshop/biomedical in…

26f0301

…to pubmed_qa

Merge branch 'pubmed_qa' of github.com:SamuelCahyawijaya/biomedical i…

7bc88fe

…nto pubmed_qa

remove unused files

be09b35

add 10-fold data for pqal (subset_id pqal_fold{k}_[source|bigbio]), r…

eff7494

…emove LONG_ANSWER, update question type to yesno

SamuelCahyawijaya requested a review from sg-wbi as a code owner April 8, 2022 17:09

add 10-fold data for pqal (subset_id pqal_fold{k}_[source|bigbio]), r…

4852e8b

…emove LONG_ANSWER, update question type to yesno

galtay self-assigned this Apr 14, 2022

galtay requested changes Apr 15, 2022

View reviewed changes

biodatasets/pubmed_qa/pubmed_qa.py Show resolved Hide resolved

biodatasets/pubmed_qa/pubmed_qa.py Outdated Show resolved Hide resolved

biodatasets/pubmed_qa/pubmed_qa.py Show resolved Hide resolved

biodatasets/pubmed_qa/pubmed_qa.py Show resolved Hide resolved

SamuelCahyawijaya added 2 commits April 15, 2022 10:08

Merge branch 'master' of github.com:bigscience-workshop/biomedical in…

d8ec445

…to pubmed_qa

update pubmed_qa.py - add description for each dataset subset, change…

8dbeac4

… naming for the subset_id following bigbio convention, update None to BigBioValues.NULL on the bigbio schema

SamuelCahyawijaya requested a review from debajyotidatta as a code owner April 15, 2022 08:18

format, remove print, add TODO

4eb320c

format, remove print, add TODO

galtay approved these changes Apr 16, 2022

View reviewed changes

galtay merged commit 8e7c461 into bigscience-workshop:master Apr 16, 2022

galtay added a commit to galtay/biomedical that referenced this pull request Apr 16, 2022

add back images that were removed in bigscience-workshop#357

aa8f74e

galtay mentioned this pull request Apr 16, 2022

add back images that were removed in #357 #472

Merged

galtay added a commit that referenced this pull request Apr 16, 2022

add back images that were removed in #357 (#472)

4e3e724

galtay added a commit that referenced this pull request Apr 16, 2022

Add imgs back (#473)

7b88e03

* add back images that were removed in #357 * oops! rename images

SamuelCahyawijaya mentioned this pull request May 2, 2022

Create dataset loader for PubMedQA #25

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closes #25 #357

Closes #25 #357

Uh oh!

SamuelCahyawijaya commented Apr 6, 2022 •

edited

Loading

Uh oh!

hakunanatasha left a comment

Uh oh!

SamuelCahyawijaya commented Apr 8, 2022

Uh oh!

hakunanatasha commented Apr 8, 2022

Uh oh!

SamuelCahyawijaya commented Apr 8, 2022

Uh oh!

galtay commented Apr 15, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

galtay commented Apr 16, 2022

Uh oh!

SamuelCahyawijaya commented Apr 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Closes #25 #357

Closes #25 #357

Uh oh!

Conversation

SamuelCahyawijaya commented Apr 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checkbox

Uh oh!

hakunanatasha left a comment

Choose a reason for hiding this comment

Uh oh!

SamuelCahyawijaya commented Apr 8, 2022

Uh oh!

hakunanatasha commented Apr 8, 2022

Uh oh!

SamuelCahyawijaya commented Apr 8, 2022

Uh oh!

galtay commented Apr 15, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

galtay commented Apr 16, 2022

Uh oh!

SamuelCahyawijaya commented Apr 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SamuelCahyawijaya commented Apr 6, 2022 •

edited

Loading