Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joke explanation generalization #2899

Merged

Conversation

sampatkalyan
Copy link
Contributor

For the Issue #2827.
I have made changes to JokeExplaniation Class.
This PR implements the DatasetEntry class in the JokeExplaination class to generalize the data. The DatasetEntry class provides a consistent data structure for storing joke-explanation pairs, making it easier to work with the data.
and made changes in AlpacaGpt4 to correct the annotation in one of its methods.
The changes in this PR include:

  • Adding a new DatasetEntry class to represent joke-explanation pairs
  • Updating the JokeExplaination class to use DatasetEntry objects to store data
  • Replacing the AlpacaGpt4 class getitem method with correct annotation

@@ -343,16 +343,16 @@ def __init__(self, cache_dir) -> None:
# DO NOT change this
# its the data that had syntax error
explanation = data["explaination"]
self.pairs.append((joke, explanation))
self.pairs.append(DatasetEntry([joke], [explanation]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you rely here on the order of the keywords. Could we make this explicit by using

Suggested change
self.pairs.append(DatasetEntry([joke], [explanation]))
self.pairs.append(DatasetEntry(questions=[joke], answers=[explanation]))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix it. Thank you

model/model_training/custom_datasets/qa_datasets.py Outdated Show resolved Hide resolved
@@ -610,6 +610,6 @@ def _process_instruction(self, row: dict[str, str], input_max_length: int) -> Da
def __len__(self) -> int:
return len(self.rows)

def __getitem__(self, index: int) -> list[str] | tuple[str]:
def __getitem__(self, index: int) -> DatasetEntry:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for updating

Added named parameters while creating DatasetEntry objects.
Removed redundant variables like question and answer and the if condition in __init__function which depended on  question and answer which is never used or changed.
Copy link
Collaborator

@CloseChoice CloseChoice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andreaskoepf andreaskoepf merged commit cab4b58 into LAION-AI:main Apr 27, 2023
1 check passed
@sampatkalyan sampatkalyan deleted the JokeExplaination-generalization branch April 28, 2023 06:19
grgau pushed a commit to grgau/Open-Assistant that referenced this pull request May 8, 2023
For the Issue LAION-AI#2827.
I have made changes to JokeExplaniation Class.
This PR implements the DatasetEntry class in the JokeExplaination class
to generalize the data. The DatasetEntry class provides a consistent
data structure for storing joke-explanation pairs, making it easier to
work with the data.
and made changes in AlpacaGpt4 to correct the annotation in one of its
methods.
The changes in this PR include:
- Adding a new DatasetEntry class to represent joke-explanation pairs
- Updating the JokeExplaination class to use DatasetEntry objects to
store data
- Replacing the AlpacaGpt4 class __getitem__ method with correct
annotation

---------

Co-authored-by: sampatkalyan <120446217+Andavarapu-Sampat-Kalyan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants