Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerate get_infos by caching the DataseInfoDicts #778

Merged
merged 3 commits into from
May 22, 2022

Conversation

VictorSanh
Copy link
Member

@VictorSanh VictorSanh commented May 20, 2022

  • Accelerate the download of the dataset info dictionaries by caching them.
    I reduced the download from 60sec to 6-7sec on a linux instance, from 20sec to 2 sec on my MacBook :)

  • Also fixed bug with some community datasets

@stephenbach stephenbach self-assigned this May 22, 2022
Copy link
Member

@stephenbach stephenbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@VictorSanh VictorSanh merged commit d1f16cf into main May 22, 2022
@stephenbach stephenbach deleted the vs/accelerate_get_infos branch May 23, 2022 17:20
stephenbach added a commit that referenced this pull request May 24, 2022
* Add templates for schema_guided_dstc8 response generation

* Remove extra newlines at the end of targets for schema_guided_dstc8

* Accelerate `get_infos` by caching the `DataseInfoDict`s (#778)

* accelerate `get_infos` by caching the `DataseInfoDict`s

* quality

* consistency

* Revert changes to app.py.

* Update promptsource/app.py

Co-authored-by: Victor SANH <victorsanh@gmail.com>
Co-authored-by: Stephen Bach <stephenhbach@gmail.com>
jzf2101 added a commit that referenced this pull request May 27, 2022
* Added prompts for English crows_pairs_multilingual

* Added prompts for English crows_pairs_multilingual minor change

* Added prompts for English crows_pairs_multilingual minor change

* Added prompts for English crows_pairs_multilingual change target label

* Added prompts for English crows_pairs_multilingual fix target

* Added prompts for English crows_pairs_multilingual added A. prompts

* Added prompts for French crows_pairs_multilingual added A. prompts

* Change crows_pairs_multilingual metric to Accuracy

* Added randomness to CrowsPairsMultilingual prompts choice order+integrated other suggestions

* Fixed removed newlines from prompts

* Adding extra prompts for CrowS-Pairs French

* Update templates.py

* Indicate which prompts are reflecting the original task

* Moved CrowS-Pairs-Multilingual to Bias WG organisation

* Accelerate `get_infos` by caching the `DataseInfoDict`s (#778)

* accelerate `get_infos` by caching the `DataseInfoDict`s

* quality

* consistency

Co-authored-by: Victor SANH <victorsanh@gmail.com>
Co-authored-by: J Forde <jzf2101@users.noreply.github.com>
jzf2101 added a commit that referenced this pull request May 28, 2022
* Added prompts for English crows_pairs_multilingual

* Added prompts for English crows_pairs_multilingual minor change

* Added prompts for English crows_pairs_multilingual minor change

* Added prompts for English crows_pairs_multilingual change target label

* Added prompts for English crows_pairs_multilingual fix target

* Added prompts for English crows_pairs_multilingual added A. prompts

* Added prompts for French crows_pairs_multilingual added A. prompts

* Change crows_pairs_multilingual metric to Accuracy

* Added randomness to CrowsPairsMultilingual prompts choice order+integrated other suggestions

* Fixed removed newlines from prompts

* Adding extra prompts for CrowS-Pairs French

* Update templates.py

* Indicate which prompts are reflecting the original task

* Moved CrowS-Pairs-Multilingual to Bias WG organisation

* Accelerate `get_infos` by caching the `DataseInfoDict`s (#778)

* accelerate `get_infos` by caching the `DataseInfoDict`s

* quality

* consistency

* Make targets one-token answers

* Make targets one-token answers for FR

Co-authored-by: Victor SANH <victorsanh@gmail.com>
Co-authored-by: J Forde <jzf2101@users.noreply.github.com>
stephenbach added a commit that referenced this pull request Jul 12, 2022
* Accelerate `get_infos` by caching the `DataseInfoDict`s (#778)

* accelerate `get_infos` by caching the `DataseInfoDict`s

* quality

* consistency

* fix `filter_english_datasets` since `languages` became `language` in dataset metadatas

* fix empty documents - multi_news (#793)

* fix empty documents - multi_news

* fix test - unrecognized variable

* Language tags (#771)

* Added languages widget to UI.

* Style fixes.

* Added English tag to existing datasets.

* Add languages to viewer mode.

* Update language codes.

* Update CONTRIBUTING.md.

* Update screenshot.

* Add "Prompt" to UI to clarify languages tag usage.

* Add blank languages list.

Co-authored-by: Victor SANH <victorsanh@gmail.com>
stephenbach added a commit that referenced this pull request Oct 26, 2022
* remove language restrictions

* add arabic dataset to primary_task

* Accelerate `get_infos` by caching the `DataseInfoDict`s (#778)

* accelerate `get_infos` by caching the `DataseInfoDict`s

* quality

* consistency

* add arabic prompts

* cleaning

* Consistency in prompt naming.

* cleaning

* fix `filter_english_datasets` since `languages` became `language` in dataset metadatas

* fix empty documents - multi_news (#793)

* fix empty documents - multi_news

* fix test - unrecognized variable

* Language tags (#771)

* Added languages widget to UI.

* Style fixes.

* Added English tag to existing datasets.

* Add languages to viewer mode.

* Update language codes.

* Update CONTRIBUTING.md.

* Update screenshot.

* Add "Prompt" to UI to clarify languages tag usage.

* update

* update prompts

* Remove duplicates lines

* update

* regenerate prompts

* cleaning

* lang tag missing

Co-authored-by: Victor SANH <victorsanh@gmail.com>
Co-authored-by: Stephen Bach <stephenhbach@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants