added default_label argument and functionality + isort formatting #27

Nadav-Barak · 2023-06-03T14:08:21Z

added needed requirements
isort the imports
Added default_label argument + docstring + functionality

Resolves #14

skllm/models/gpt_zero_shot_clf.py

OKUA1 · 2023-06-03T15:20:34Z

skllm/models/gpt_zero_shot_clf.py

+    openai_model : str , default : "gpt-3.5-turbo"
+        The OpenAI model to use. See https://beta.openai.com/docs/api-reference/available-models for a list of
+        available models.
+    default_label : Optional[str] , default : None


It should be set to "Random" by default, otherwise the default behaviour is changed, which is undesired and also non-intuitive.

OKUA1 · 2023-06-04T09:55:51Z

skllm/models/gpt_zero_shot_clf.py

+                if coin_flip == 1:
+                    result.append(cls)
+        else:
+            result = self.default_label


I think that this behaviour might still be a bit confusing for the users. If a string != "Random" is provided instead of a list, the label will again be a string. So, I would still add an additional type check and convert to list whenever applicable.

If you intentionally want to have a flexibility of having non-list outputs, maybe this could be done for default_label = None as a special case (this should be properly documented then). But in my opinion, it would be always better to have a list as an output.

See lines 221-222, The output can be either a list or a None.
They main point of this PR is to facilitate a way to ignore the model predictions when he fails,
that is most commonly achieved by setting label = None

@Nadav-Barak you mean lines 210-211? I must have missed that change.

OKUA1 · 2023-06-04T10:03:31Z

skllm/models/gpt_zero_shot_clf.py

+        if len(labels) == 0:
+            labels = self._get_default_label()
+        if labels is not None and len(labels) > self.max_labels:
+            labels = random.choices(labels, k=self.max_labels)


I would simply truncate a list as it was before. We assume that the entries in the list returned by GPT are sorted based on certainty. So, under this assumption, if the max number of labels is exceeded, it is better to take the ones from the beginning of the list. However, there is no quantitative evidence for that.

OKUA1 · 2023-06-04T10:13:51Z

skllm/models/gpt_zero_shot_clf.py

        self._set_keys(openai_key, openai_org)
        self.openai_model = openai_model
+        self.default_label = default_label
+        random.seed(random_state)


I would not set a seed like this as there are plenty of edge cases where this could break.

Imagine an extreme example:

clf1 = Classifier(seed = 41) clf2 = Classifier(seed = 42) clf1.fit(X, y).predict(X) # a wrong seed will be used

Even per sklearn official guidelines, preferably, there should be no logic inside the __init__ method and its only purpose should be to store the arguments.

Nadav-Barak added 2 commits June 3, 2023 17:00

added default_label argument and functionality + isort formatting

fb68b55

formatting

aba2930

OKUA1 reviewed Jun 3, 2023

View reviewed changes

skllm/models/gpt_zero_shot_clf.py Show resolved Hide resolved

OKUA1 reviewed Jun 3, 2023

View reviewed changes

CR Comments - changed default to 'Random'

21484d4

Nadav-Barak requested a review from OKUA1 June 4, 2023 08:48

OKUA1 reviewed Jun 4, 2023

View reviewed changes

OKUA1 requested changes Jun 4, 2023

View reviewed changes

CR Comments v2

be7b50f

Nadav-Barak requested a review from OKUA1 June 4, 2023 12:41

OKUA1 approved these changes Jun 4, 2023

View reviewed changes

OKUA1 merged commit 54f11ad into BeastByteAI:main Jun 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

added default_label argument and functionality + isort formatting #27

added default_label argument and functionality + isort formatting #27

Uh oh!

Nadav-Barak commented Jun 3, 2023 •

edited

Loading

Uh oh!

Uh oh!

OKUA1 Jun 3, 2023

Uh oh!

Nadav-Barak Jun 4, 2023

Uh oh!

OKUA1 Jun 4, 2023

Uh oh!

Nadav-Barak Jun 4, 2023

Uh oh!

OKUA1 Jun 4, 2023

Uh oh!

OKUA1 Jun 4, 2023

Uh oh!

Nadav-Barak Jun 4, 2023

Uh oh!

OKUA1 Jun 4, 2023

Uh oh!

Nadav-Barak Jun 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

added default_label argument and functionality + isort formatting #27

added default_label argument and functionality + isort formatting #27

Uh oh!

Conversation

Nadav-Barak commented Jun 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Nadav-Barak commented Jun 3, 2023 •

edited

Loading