Additionally, Scikit-LLM will ensure that the obtained response contains a valid label. If this is not the case, a label will be selected randomly (label probabilities are proportional to label occurrences in the training set).
In many pipelines it is better to return a None label to such examples instead of choosing one at random.
Would want a flag to control this behavior:
either set a specific label (like -1) in those cases / set None / select label at random (correct behavior)