Joke explanation generalization #2899

sampatkalyan · 2023-04-25T12:04:15Z

For the Issue #2827.
I have made changes to JokeExplaniation Class.
This PR implements the DatasetEntry class in the JokeExplaination class to generalize the data. The DatasetEntry class provides a consistent data structure for storing joke-explanation pairs, making it easier to work with the data.
and made changes in AlpacaGpt4 to correct the annotation in one of its methods.
The changes in this PR include:

Adding a new DatasetEntry class to represent joke-explanation pairs
Updating the JokeExplaination class to use DatasetEntry objects to store data
Replacing the AlpacaGpt4 class getitem method with correct annotation

CloseChoice · 2023-04-25T12:13:55Z

model/model_training/custom_datasets/qa_datasets.py

@@ -343,16 +343,16 @@ def __init__(self, cache_dir) -> None:
                # DO NOT change this
                # its the data that had syntax error
                explanation = data["explaination"]
-                self.pairs.append((joke, explanation))
+                self.pairs.append(DatasetEntry([joke], [explanation]))


you rely here on the order of the keywords. Could we make this explicit by using

Suggested change

self.pairs.append(DatasetEntry([joke], [explanation]))

self.pairs.append(DatasetEntry(questions=[joke], answers=[explanation]))

Will fix it. Thank you

model/model_training/custom_datasets/qa_datasets.py

CloseChoice · 2023-04-25T12:15:39Z

model/model_training/custom_datasets/qa_datasets.py

@@ -610,6 +610,6 @@ def _process_instruction(self, row: dict[str, str], input_max_length: int) -> Da
    def __len__(self) -> int:
        return len(self.rows)

-    def __getitem__(self, index: int) -> list[str] | tuple[str]:
+    def __getitem__(self, index: int) -> DatasetEntry:


thanks for updating

Added named parameters while creating DatasetEntry objects. Removed redundant variables like question and answer and the if condition in __init__function which depended on question and answer which is never used or changed.

CloseChoice

LGTM

For the Issue LAION-AI#2827. I have made changes to JokeExplaniation Class. This PR implements the DatasetEntry class in the JokeExplaination class to generalize the data. The DatasetEntry class provides a consistent data structure for storing joke-explanation pairs, making it easier to work with the data. and made changes in AlpacaGpt4 to correct the annotation in one of its methods. The changes in this PR include: - Adding a new DatasetEntry class to represent joke-explanation pairs - Updating the JokeExplaination class to use DatasetEntry objects to store data - Replacing the AlpacaGpt4 class __getitem__ method with correct annotation --------- Co-authored-by: sampatkalyan <120446217+Andavarapu-Sampat-Kalyan@users.noreply.github.com>

ASampatKalyan added 2 commits April 25, 2023 16:56

generalized JokeExplaniation dataset class with DataSetEntry

cb356df

corrected annotation for AlpacaGpt4 __getitem__ dunder method

85bd103

sampatkalyan requested review from theblackcat102, sanagno, dvruette, andreaskoepf and yk as code owners April 25, 2023 12:04

CloseChoice reviewed Apr 25, 2023

View reviewed changes

Suggested Changes

6d86e89

Added named parameters while creating DatasetEntry objects. Removed redundant variables like question and answer and the if condition in __init__function which depended on question and answer which is never used or changed.

CloseChoice approved these changes Apr 25, 2023

View reviewed changes

andreaskoepf approved these changes Apr 27, 2023

View reviewed changes

andreaskoepf merged commit cab4b58 into LAION-AI:main Apr 27, 2023
1 check passed

sampatkalyan deleted the JokeExplaination-generalization branch April 28, 2023 06:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Joke explanation generalization #2899

Joke explanation generalization #2899

sampatkalyan commented Apr 25, 2023

CloseChoice Apr 25, 2023

sampatkalyan Apr 25, 2023

CloseChoice Apr 25, 2023

CloseChoice left a comment

	self.pairs.append(DatasetEntry([joke], [explanation]))
	self.pairs.append(DatasetEntry(questions=[joke], answers=[explanation]))

Joke explanation generalization #2899

Joke explanation generalization #2899

Conversation

sampatkalyan commented Apr 25, 2023

CloseChoice Apr 25, 2023

Choose a reason for hiding this comment

sampatkalyan Apr 25, 2023

Choose a reason for hiding this comment

CloseChoice Apr 25, 2023

Choose a reason for hiding this comment

CloseChoice left a comment

Choose a reason for hiding this comment