Skip to content

Conversation

@sfluegel05
Copy link
Collaborator

In partial data (e.g. ChEBIOver50Partial) a top class is selected and only subclasses of this top class are used as labels.
The samples are filtered s.t. only samples with at least 1 positive label (i.e., subclasses of the top class) are used as samples.

The problem

no generalisation beyond beyond the top class (I trained a model on 30 labels (subclasses of 22712) and got ~70% macro-F1 with only samples that are subclasses of 22712, but 10% macro-F1 when looking at samples from the whole ChEBI.

Solution

Remove filters that only allow samples with at least 1 positive label, add a ratio parameter external_data_ratio that determines the amount of negative samples (0 for n negative samples, the current behaviour, 1 for all possible negative samples)

@sfluegel05 sfluegel05 marked this pull request as ready for review August 1, 2025 12:37
@sfluegel05 sfluegel05 merged commit fe01f5a into dev Aug 1, 2025
6 checks passed
@sfluegel05 sfluegel05 deleted the feature/better-partial-data branch August 1, 2025 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants