Skip to content

Conversation

jaketae
Copy link
Member

@jaketae jaketae commented Dec 6, 2021

Fixes #680 by adding a raise statement after the directory check.

Using the same examples from the original issue thread:

>>> DatasetTemplates('ag_news').__dict__
{'dataset_name': 'ag_news', 'subset_name': None, 'templates': {'24e44a81-a18a-42dd-a71c-5b31b2d2cb39': <promptsource.templates.Template object at 0x7f06c1d61bb0>, '8fdc1056-1029-41a1-9c67-354fc2b8ceaf': <promptsource.templates.Template object at 0x7f06c1d61b80>, '918267e0-af68-4117-892d-2dbe66a58ce9': <promptsource.templates.Template object at 0x7f06c1d64b50>, '9345df33-4f23-4944-a33c-eef94e626862': <promptsource.templates.Template object at 0x7f06c1d64b80>, '98534347-fff7-4c39-a795-4e69a44791f7': <promptsource.templates.Template object at 0x7f06c1d64bb0>, 'b401b0ee-6ffe-4a91-8e15-77ee073cd858': <promptsource.templates.Template object at 0x7f06c1d64af0>, 'cb355f33-7e8c-4455-a72b-48d315bd4f60': <promptsource.templates.Template object at 0x7f06c1d64b20>}, 'name_to_id_mapping': {'classify_question_first': '24e44a81-a18a-42dd-a71c-5b31b2d2cb39', 'classify_with_choices_question_first': '8fdc1056-1029-41a1-9c67-354fc2b8ceaf', 'recommend': '918267e0-af68-4117-892d-2dbe66a58ce9', 'which_section_choices': '9345df33-4f23-4944-a33c-eef94e626862', 'which_section': '98534347-fff7-4c39-a795-4e69a44791f7', 'classify_with_choices': 'b401b0ee-6ffe-4a91-8e15-77ee073cd858', 'classify': 'cb355f33-7e8c-4455-a72b-48d315bd4f60'}}
>>> DatasetTemplates('superglue').__dict__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jaketae/documents/promptsource/promptsource/templates.py", line 338, in __init__
    self.templates: Dict = self.read_from_file()
  File "/home/jaketae/documents/promptsource/promptsource/templates.py", line 383, in read_from_file
    raise ValueError(
ValueError: Dataset superglue not found
>>> DatasetTemplates('super_glue').__dict__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jaketae/documents/promptsource/promptsource/templates.py", line 338, in __init__
    self.templates: Dict = self.read_from_file()
  File "/home/jaketae/documents/promptsource/promptsource/templates.py", line 383, in read_from_file
    raise ValueError(
ValueError: Dataset super_glue not found
>>> DatasetTemplates('super_glue/rte').__dict__
{'dataset_name': 'super_glue/rte', 'subset_name': None, 'templates': {'2b52a83c-0021-41fe-b44c-5aaa076d71a2': <promptsource.templates.Template object at 0x7f06c1d73250>, '2d0d63da-ffcf-4f6e-941a-b8da922be43e': <promptsource.templates.Template object at 0x7f06c1d73220>, '4163e6f1-1a83-4c73-b867-02eb7ac80316': <promptsource.templates.Template object at 0x7f06c1d761f0>, '8fb1c6aa-20e9-438c-bece-c6af1c746449': <promptsource.templates.Template object at 0x7f06c1d76220>, '9e078fb4-505b-413c-bb5e-3cd16ddcf5d7': <promptsource.templates.Template object at 0x7f06c1d76250>, 'b8dc85c6-28b6-4340-979a-8e77c2a0dde8': <promptsource.templates.Template object at 0x7f06c1d76190>, 'e2fb58f2-b1f2-4aef-b74b-c4ee1c571fff': <promptsource.templates.Template object at 0x7f06c1d761c0>, 'ed1f4b75-8826-4852-9bd6-aedf368678f5': <promptsource.templates.Template object at 0x7f06c1d76040>, 'ee0ce095-122a-4509-bf0b-33d1495295f7': <promptsource.templates.Template object at 0x7f06c1d76130>, 'fb4f8144-37f5-4977-88da-37a5d0bfd0e8': <promptsource.templates.Template object at 0x7f06c1d76280>}, 'name_to_id_mapping': {'MNLI crowdsource': '2b52a83c-0021-41fe-b44c-5aaa076d71a2', 'guaranteed true': '2d0d63da-ffcf-4f6e-941a-b8da922be43e', 'can we infer': '4163e6f1-1a83-4c73-b867-02eb7ac80316', 'GPT-3 style': '8fb1c6aa-20e9-438c-bece-c6af1c746449', 'does this imply': '9e078fb4-505b-413c-bb5e-3cd16ddcf5d7', 'should assume': 'b8dc85c6-28b6-4340-979a-8e77c2a0dde8', 'does it follow that': 'e2fb58f2-b1f2-4aef-b74b-c4ee1c571fff', 'based on the previous passage': 'ed1f4b75-8826-4852-9bd6-aedf368678f5', 'justified in saying': 'ee0ce095-122a-4509-bf0b-33d1495295f7', 'must be true': 'fb4f8144-37f5-4977-88da-37a5d0bfd0e8'}}

cc @arnaudstiegler @stephenbach @VictorSanh

@VictorSanh
Copy link
Member

lgtm, thanks @jaketae !
'd like to have @stephenbach's review just to make sure there are no downstream impact i missed

@stephenbach stephenbach self-assigned this Dec 20, 2021
@stephenbach
Copy link
Member

Thanks @jaketae! This would be a really nice thing to add. The challenge is that a templates.yaml might not exist for two reasons:

  1. Someone had a typo in the dataset name they wanted, or
  2. No one has written prompts for that dataset yet

So with the current PR, if you go to prompt a new dataset, it also raises the exception. Not sure how best to fix. One option is to have the DatasetTemplates constructor call list_datasets in utils.py. I don't love this though, because we wrap that function in streamlit's caching system in app.py. Might not matter in practice, but it's an extra call to the HF API everytime the class is instantiated.

Any ideas?

@VictorSanh
Copy link
Member

ah, that's a very good point @stephenbach

here's what I suggest: instead of an error, we raise a warning, something like "You are instantiating DatasetTemplates for XX, but XX doesn't have any prompts yet. Please ignore if you are creating new prompts for this dataset."

@jaketae
Copy link
Member Author

jaketae commented Jan 5, 2022

@VictorSanh @stephenbach Thank you for pointing out the edge case and the proposal! I'll implement your suggestion and keep you posted.

@VictorSanh
Copy link
Member

Did you get a chance to look at this @jaketae ? will pin v0.2 very soon...

@jaketae
Copy link
Member Author

jaketae commented Jan 27, 2022

Hey @VictorSanh, apologies for the delay. I replaced the ValueError with a warning as suggested. Hope this isn't a blocker, let me know if there's anything else that needs to be done to wrap up!

@VictorSanh VictorSanh merged commit 18436d0 into bigscience-workshop:main Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Instantiating DatasetTemplates - Raising an error when dataset name is not found
3 participants