Skip to content

Custom Output Format#3

Merged
artpelling merged 23 commits intomainfrom
feature/output-format
Apr 14, 2026
Merged

Custom Output Format#3
artpelling merged 23 commits intomainfrom
feature/output-format

Conversation

@siavashzaid
Copy link
Copy Markdown
Collaborator

@siavashzaid siavashzaid commented Feb 24, 2026

Adds an output_format parameter to both dataset getters, allowing users to choose between three return formats: 'pyfar' (default), 'hdf5', and 'numpy'.

Changes in src/irdl/fabian.py:

  • Added output_format parameter to get_fabian() with validation
    • 'pyfar': existing behaviour, unchanged
    • 'hdf5': converts SOFA to an HDF5 file and deletes the intermediate SOFA file afterwards (ZIP remains as source of truth)
    • 'numpy': returns a dict of NumPy arrays including sampling_rate
  • Added load_sofa() helper to extract data from SOFA files
  • Updated docstring: improved output_format parameter description and expanded Returns section to cover all three formats

Changes in src/irdl/miracle.py:

  • Added output_format parameter to get_miracle() with validation
    • 'hdf5': returns the path to the existing HDF5 file directly (no deletion, as it is the downloaded source)
    • 'pyfar': reads from HDF5 and returns pyfar objects
    • 'numpy': reads from HDF5 and returns a dict of NumPy arrays including sampling_rate
  • Added load_h5() helper to read data from HDF5 files
  • Updated docstring to match fabian.py

Notes:

The @process decorator in miracle.py is redundant since pooch.fetch() already handles caching

@artpelling
Copy link
Copy Markdown
Owner

Looks like there are some issues with the docs https://github.com/artpelling/irdl/actions/runs/22352522870/job/64683113044?pr=3#step:4:97

Copy link
Copy Markdown
Owner

@artpelling artpelling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looking good already. I've requested some minor changes regarding docstring formatting. I also did a push beforehand to fix the docs building, so don't forget to pull first.

I also raised some conceptual questions I would like to get your input on (we can also defer them to a later PR).

Comment thread src/irdl/miracle.py Outdated
Comment thread src/irdl/miracle.py Outdated
Comment thread src/irdl/miracle.py Outdated
Comment thread src/irdl/sofa.py
Comment thread src/irdl/fabian.py Outdated
Comment thread src/irdl/miracle.py Outdated
Comment thread src/irdl/miracle.py Outdated
Comment thread src/irdl/miracle.py Outdated
Comment thread src/irdl/miracle.py Outdated
Comment thread src/irdl/miracle.py Outdated
pup.fetch(scenario, progressbar=True)

@process
@process # is always true because we dont extract and pup.fetch checks if file exists already => remove?
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! Let's leave it like this for now. I am not super happy with the decorator (also its docstring is confusing). I will think a bit how to deal with this.

@artpelling artpelling linked an issue Feb 25, 2026 that may be closed by this pull request
@artpelling artpelling force-pushed the feature/output-format branch from ed70909 to f0a994f Compare March 19, 2026 09:16
@artpelling
Copy link
Copy Markdown
Owner

Looks good I think we can merge it soon. Could you also rename fabian.py to sofa.py?

@artpelling
Copy link
Copy Markdown
Owner

And add yourself to authors in pyproject.toml :)

Comment thread src/irdl/ista.py
Comment thread src/irdl/ista.py Outdated

assert dataset_split in [None, "C1", "C2", "C3", "C4"], "datasetsplit must be None or in [C1, C2, C3, C4]"

assert not (scenario[-1] != "D" and dataset_split is None), "full datasets need a split"
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full datasets are allowed. If a full set is queried, all splits need to be fetched and assembled.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done in a way such that not all data needs to be loaded in memory. Maybe you can get some ideas from here: https://stackoverflow.com/questions/18492273/combining-hdf5-files

External linking probably won't work, though, because we also need to interlace the data. It should work, however, to create a large h5 file and fill it up sequentially.

Comment thread src/irdl/ista.py Outdated
Comment thread src/irdl/ista.py Outdated
@artpelling artpelling force-pushed the feature/output-format branch 2 times, most recently from 0d0edd1 to 7a5c002 Compare April 2, 2026 12:04
Comment thread src/irdl/ista.py Outdated

assert dataset_split in [None, "C1", "C2", "C3", "C4"], "datasetsplit must be None or in [C1, C2, C3, C4]"

assert not (scenario[-1] != "D" and dataset_split is None), "full datasets need a split"
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done in a way such that not all data needs to be loaded in memory. Maybe you can get some ideas from here: https://stackoverflow.com/questions/18492273/combining-hdf5-files

External linking probably won't work, though, because we also need to interlace the data. It should work, however, to create a large h5 file and fill it up sequentially.

Comment thread src/irdl/ista.py
assert output_format in ["pyfar", "hdf5", "numpy"], "unknown output format"
assert scenario in ["A1", "A2", "D1", "R2"], "scenario must be one of ['A1', 'A2', 'D1', 'R2']"
assert dataset_split in [None, "C1", "C2", "C3", "C4"], "dataset_split must be None or in [C1, C2, C3, C4]"

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check also for splits of D1 here

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean D1 should not be split because 33x33 leads to uneven splits?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. D1 is compatible with SR(A)1-D and should be treated in this way :)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I meant with my comment was that an assertion similar to the one in sriracha should be added that does not allow splits for D1

@artpelling artpelling force-pushed the feature/output-format branch from 7a5c002 to 4b2e58b Compare April 10, 2026 13:12
@artpelling artpelling force-pushed the feature/output-format branch from 0e34170 to dd0c8eb Compare April 13, 2026 14:39
Comment thread src/irdl/ista.py Outdated
return output_path


def check_memory(file_path):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function should probably be renamed to reflect the return type. Something like fits_in_memory or free_memory_available.

Also, it should be defined somewhere else, since we can use it for all datasets!

Comment thread src/irdl/ista.py Outdated
# check if the file can be loaded into memory for pyfar or numpy output formats.
# if not, fall back to returning the HDF5 file path
if output_format in ["pyfar", "numpy"] and not check_memory(path / scenario):
print(
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should ideally use warn and be emitted in the check_memory function.

@artpelling artpelling merged commit c9edc89 into main Apr 14, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Output format

2 participants