Data processing performance needs to be improved

Data preprocessing is currently not as efficient as it could / should be. On my machine, it took several minutes to create a new dataset (for ChEBI version 231). Large parts of that went into the creation of data splits.

While most of this is not a problem since datasets can be reused between training runs, after adding dynamic datasplits in PR #29, the split creation is repeated at the start of each run. 

## Tasks
- find out which steps of the preprocessing take up the most time
- if possible, find more efficient solutions for steps that are currently inefficient

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data processing performance needs to be improved #32

Tasks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Data processing performance needs to be improved #32

Description

Tasks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions