-
Notifications
You must be signed in to change notification settings - Fork 281
added default_label argument and functionality + isort formatting #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
skllm/models/gpt_zero_shot_clf.py
Outdated
| openai_model : str , default : "gpt-3.5-turbo" | ||
| The OpenAI model to use. See https://beta.openai.com/docs/api-reference/available-models for a list of | ||
| available models. | ||
| default_label : Optional[str] , default : None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be set to "Random" by default, otherwise the default behaviour is changed, which is undesired and also non-intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
| if coin_flip == 1: | ||
| result.append(cls) | ||
| else: | ||
| result = self.default_label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this behaviour might still be a bit confusing for the users. If a string != "Random" is provided instead of a list, the label will again be a string. So, I would still add an additional type check and convert to list whenever applicable.
If you intentionally want to have a flexibility of having non-list outputs, maybe this could be done for default_label = None as a special case (this should be properly documented then). But in my opinion, it would be always better to have a list as an output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See lines 221-222, The output can be either a list or a None.
They main point of this PR is to facilitate a way to ignore the model predictions when he fails,
that is most commonly achieved by setting label = None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Nadav-Barak you mean lines 210-211? I must have missed that change.
skllm/models/gpt_zero_shot_clf.py
Outdated
| if len(labels) == 0: | ||
| labels = self._get_default_label() | ||
| if labels is not None and len(labels) > self.max_labels: | ||
| labels = random.choices(labels, k=self.max_labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would simply truncate a list as it was before. We assume that the entries in the list returned by GPT are sorted based on certainty. So, under this assumption, if the max number of labels is exceeded, it is better to take the ones from the beginning of the list. However, there is no quantitative evidence for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
skllm/models/gpt_zero_shot_clf.py
Outdated
| self._set_keys(openai_key, openai_org) | ||
| self.openai_model = openai_model | ||
| self.default_label = default_label | ||
| random.seed(random_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not set a seed like this as there are plenty of edge cases where this could break.
Imagine an extreme example:
clf1 = Classifier(seed = 41)
clf2 = Classifier(seed = 42)
clf1.fit(X, y).predict(X) # a wrong seed will be usedEven per sklearn official guidelines, preferably, there should be no logic inside the __init__ method and its only purpose should be to store the arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
Resolves #14