Skip to content

Read dataset metadata#350

Merged
Gautzilla merged 6 commits intoProject-OSmOSE:mainfrom
Gautzilla:read-dataset-metadata
Mar 19, 2026
Merged

Read dataset metadata#350
Gautzilla merged 6 commits intoProject-OSmOSE:mainfrom
Gautzilla:read-dataset-metadata

Conversation

@Gautzilla
Copy link
Contributor

🐳 WHATISHAPPENING

We're having some trouble when deserializing osekit.public_api.dataset.Dataset instances from .json files that are linked to a lot of audio files, because it implies fully deserializing all the analysis datasets and it can take quite some time (I have to open each file to read the metadata because the AudioFiles are deserialized too on dataset deserialization

🐳 HOWDOWEFIXIT

This is a first (naive) step: now, deserializing the public Dataset will only store the path of the analysis datasets jsons:

image

Then, these analysis datasets will only be deserialized on request, e.g.:

sds = dataset.get_dataset("maxicoolyo") # Deserializes, stores and returns the SpectroDataset
image

@Gautzilla Gautzilla self-assigned this Mar 16, 2026
@Gautzilla Gautzilla added the APLOSE related The changes are impacted APLOSE behavior label Mar 16, 2026
@coveralls
Copy link
Collaborator

coveralls commented Mar 16, 2026

Coverage Status

coverage: 98.805% (+0.009%) from 98.796%
when pulling 4135adb on Gautzilla:read-dataset-metadata
into 80e88b1 on Project-OSmOSE:main.

@ElodieENSTA
Copy link
Member

I tested it in APLOSE and it works!

I just found that calling the path to the json "dataset" is a bit misleading as this is not a dataset we can get with get_dataset

@Gautzilla
Copy link
Contributor Author

I just found that calling the path to the json "dataset" is a bit misleading as this is not a dataset we can get with get_dataset

Yup, I agree that the distinction between the public dataset and the core/analysis datasets really is unclear...

Maybe I should take a dive in the code and rename everything more explicitly? e.g. systematically rename dataset with analysis_dataset when it concerns an analysis (core) dataset, and leave the raw dataset for everything that concerns the Public API dataset?

@mathieudpnt might have an opinion on this too

@mathieudpnt
Copy link
Contributor

I just found that calling the path to the json "dataset" is a bit misleading as this is not a dataset we can get with get_dataset

Yup, I agree that the distinction between the public dataset and the core/analysis datasets really is unclear...

Maybe I should take a dive in the code and rename everything more explicitly? e.g. systematically rename dataset with analysis_dataset when it concerns an analysis (core) dataset, and leave the raw dataset for everything that concerns the Public API dataset?

@mathieudpnt might have an opinion on this too

IMO dataset belongs to public_API, I would call analysis what we currently call "dataset" from core_API and analysis_config what we call currently "analysis" in core_API

@Gautzilla
Copy link
Contributor Author

I opened #351 to address the renaming issues!

@Gautzilla
Copy link
Contributor Author

I tested it in APLOSE and it works!

Cool! So do we merge it as it is? Or do you still need anything else?

@Gautzilla Gautzilla changed the title [DRAFT] Read dataset metadata Read dataset metadata Mar 19, 2026
@Gautzilla Gautzilla marked this pull request as ready for review March 19, 2026 09:43
@Gautzilla Gautzilla merged commit cfc3d36 into Project-OSmOSE:main Mar 19, 2026
2 checks passed
@Gautzilla Gautzilla deleted the read-dataset-metadata branch March 19, 2026 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

APLOSE related The changes are impacted APLOSE behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants