Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add check dataset appearances, update vicuna to dataset entry #2818

Merged

Conversation

CloseChoice
Copy link
Collaborator

@CloseChoice CloseChoice commented Apr 21, 2023

  • add a script which can be used to check if any dataset consists specific regular expressions or words
  • update vicuna so that it can be used with the new DatasetEntry class
  • remove single references from vicuna (so [1] is removed, but I found with the script mentioned above that there are a couple of occurances where our re_reference_remove regex hits, but it is actually a list (for language like e.g. python), therefore just remove single references)
  • if human response is none, then sample from multiple answers like ['please continue', '...']

This PR depends on #2809

Copy link
Collaborator

@andreaskoepf andreaskoepf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nice that we have this dataset checker tool now!

return None
elif speaker == "human":
# replace empty messages with one of the following
message = random.choice(["...", "Please continue", "Go on", ""])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@andreaskoepf andreaskoepf merged commit cd3f07f into LAION-AI:main Apr 21, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants