Skip to content

Conversation

@voorhs
Copy link
Collaborator

@voorhs voorhs commented Feb 18, 2025

No description provided.

Comment on lines +51 to +57
in_domain_samples = self.dataset[self.split].filter(lambda sample: sample[Dataset.label_feature] is not None)
if self.dataset.multilabel:
filter_fn = lambda sample: sample[Dataset.label_feature][intent_data.id] == 1 # noqa: E731
else:
filter_fn = lambda sample: sample[Dataset.label_feature] == intent_data.id # noqa: E731

filtered_split = in_domain_samples.filter(filter_fn)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if self.dataset.multilabel:
    filter_fn = lambda sample: sample[Dataset.label_feature].get(intent_data.id, 0) == 1
else:
    filter_fn = lambda sample: (sample[Dataset.label_feature] is not None) and (sample[Dataset.label_feature] == intent_data.id)

filtered_split = self.dataset[self.split].filter(filter_fn)

Фильтрация на not None нужна только в single-label случае

Copy link
Collaborator Author

@voorhs voorhs Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOS примеры могут быть и в мультилейбл датасете, плюс вне зависимости от того мультилейбл или сингллейбл всегда OOS примеры помечаются как label=None

это гарантируется здесь: https://github.com/deeppavlov/AutoIntent/blob/dev/autointent/schemas/_schemas.py#L100

@voorhs voorhs merged commit 6b07e3b into dev Feb 22, 2025
22 checks passed
@voorhs voorhs deleted the feat/augment-multilabel-datasets branch February 22, 2025 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants