allow negative samples (and regulate their amount for partial data) #116

sfluegel05 · 2025-08-01T08:36:51Z

In partial data (e.g. ChEBIOver50Partial) a top class is selected and only subclasses of this top class are used as labels.
The samples are filtered s.t. only samples with at least 1 positive label (i.e., subclasses of the top class) are used as samples.

The problem

no generalisation beyond beyond the top class (I trained a model on 30 labels (subclasses of 22712) and got ~70% macro-F1 with only samples that are subclasses of 22712, but 10% macro-F1 when looking at samples from the whole ChEBI.

Solution

Remove filters that only allow samples with at least 1 positive label, add a ratio parameter external_data_ratio that determines the amount of negative samples (0 for n negative samples, the current behaviour, 1 for all possible negative samples)

sfluegel05 added 4 commits July 31, 2025 18:20

use ratio parameter to add external data

0a00d11

remove non-positive filter (allow samples with no positive labels)

f7c4eb7

fix unit tests

3702b5a

set external data ratio to 0 for test

a47a675

sfluegel05 marked this pull request as ready for review August 1, 2025 12:37

sfluegel05 merged commit fe01f5a into dev Aug 1, 2025
6 checks passed

sfluegel05 deleted the feature/better-partial-data branch August 1, 2025 12:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

allow negative samples (and regulate their amount for partial data) #116

allow negative samples (and regulate their amount for partial data) #116

Uh oh!

sfluegel05 commented Aug 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

allow negative samples (and regulate their amount for partial data) #116

allow negative samples (and regulate their amount for partial data) #116

Uh oh!

Conversation

sfluegel05 commented Aug 1, 2025

The problem

Solution

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants