Skip to content

Conversation

@Nadav-Barak
Copy link
Contributor

@Nadav-Barak Nadav-Barak commented Jun 3, 2023

  1. added needed requirements
  2. isort the imports
  3. Added default_label argument + docstring + functionality

Resolves #14

openai_model : str , default : "gpt-3.5-turbo"
The OpenAI model to use. See https://beta.openai.com/docs/api-reference/available-models for a list of
available models.
default_label : Optional[str] , default : None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be set to "Random" by default, otherwise the default behaviour is changed, which is undesired and also non-intuitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@Nadav-Barak Nadav-Barak requested a review from OKUA1 June 4, 2023 08:48
if coin_flip == 1:
result.append(cls)
else:
result = self.default_label
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this behaviour might still be a bit confusing for the users. If a string != "Random" is provided instead of a list, the label will again be a string. So, I would still add an additional type check and convert to list whenever applicable.

If you intentionally want to have a flexibility of having non-list outputs, maybe this could be done for default_label = None as a special case (this should be properly documented then). But in my opinion, it would be always better to have a list as an output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See lines 221-222, The output can be either a list or a None.
They main point of this PR is to facilitate a way to ignore the model predictions when he fails,
that is most commonly achieved by setting label = None

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nadav-Barak you mean lines 210-211? I must have missed that change.

if len(labels) == 0:
labels = self._get_default_label()
if labels is not None and len(labels) > self.max_labels:
labels = random.choices(labels, k=self.max_labels)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simply truncate a list as it was before. We assume that the entries in the list returned by GPT are sorted based on certainty. So, under this assumption, if the max number of labels is exceeded, it is better to take the ones from the beginning of the list. However, there is no quantitative evidence for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

self._set_keys(openai_key, openai_org)
self.openai_model = openai_model
self.default_label = default_label
random.seed(random_state)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not set a seed like this as there are plenty of edge cases where this could break.

Imagine an extreme example:

clf1 = Classifier(seed = 41) 
clf2 = Classifier(seed = 42)
clf1.fit(X, y).predict(X) # a wrong seed will be used

Even per sklearn official guidelines, preferably, there should be no logic inside the __init__ method and its only purpose should be to store the arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@Nadav-Barak Nadav-Barak requested a review from OKUA1 June 4, 2023 12:41
@OKUA1 OKUA1 merged commit 54f11ad into BeastByteAI:main Jun 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add flag to control whether unknown labels are returned as None

2 participants