Custom Output Format by siavashzaid · Pull Request #3 · artpelling/irdl

siavashzaid · 2026-02-24T12:56:29Z

Adds an output_format parameter to both dataset getters, allowing users to choose between three return formats: 'pyfar' (default), 'hdf5', and 'numpy'.

Changes in src/irdl/fabian.py:

Added output_format parameter to get_fabian() with validation
- 'pyfar': existing behaviour, unchanged
- 'hdf5': converts SOFA to an HDF5 file and deletes the intermediate SOFA file afterwards (ZIP remains as source of truth)
- 'numpy': returns a dict of NumPy arrays including sampling_rate
Added load_sofa() helper to extract data from SOFA files
Updated docstring: improved output_format parameter description and expanded Returns section to cover all three formats

Changes in src/irdl/miracle.py:

Added output_format parameter to get_miracle() with validation
- 'hdf5': returns the path to the existing HDF5 file directly (no deletion, as it is the downloaded source)
- 'pyfar': reads from HDF5 and returns pyfar objects
- 'numpy': reads from HDF5 and returns a dict of NumPy arrays including sampling_rate
Added load_h5() helper to read data from HDF5 files
Updated docstring to match fabian.py

Notes:

The @process decorator in miracle.py is redundant since pooch.fetch() already handles caching

artpelling · 2026-02-25T17:37:25Z

Looks like there are some issues with the docs https://github.com/artpelling/irdl/actions/runs/22352522870/job/64683113044?pr=3#step:4:97

artpelling

Thanks, looking good already. I've requested some minor changes regarding docstring formatting. I also did a push beforehand to fix the docs building, so don't forget to pull first.

I also raised some conceptual questions I would like to get your input on (we can also defer them to a later PR).

artpelling · 2026-02-25T18:10:11Z

    pup.fetch(scenario, progressbar=True)

-    @process
+    @process  # is always true because we dont extract and pup.fetch checks if file exists already => remove?


good catch! Let's leave it like this for now. I am not super happy with the decorator (also its docstring is confusing). I will think a bit how to deal with this.

artpelling · 2026-03-19T13:24:30Z

Looks good I think we can merge it soon. Could you also rename fabian.py to sofa.py?

artpelling · 2026-03-19T13:25:27Z

And add yourself to authors in pyproject.toml :)

artpelling · 2026-03-30T11:11:52Z

+
+    assert dataset_split in [None, "C1", "C2", "C3", "C4"], "datasetsplit must be None or in [C1, C2, C3, C4]"
+
+    assert not (scenario[-1] != "D" and dataset_split is None), "full datasets need a split"


Full datasets are allowed. If a full set is queried, all splits need to be fetched and assembled.

This should be done in a way such that not all data needs to be loaded in memory. Maybe you can get some ideas from here: https://stackoverflow.com/questions/18492273/combining-hdf5-files

External linking probably won't work, though, because we also need to interlace the data. It should work, however, to create a large h5 file and fill it up sequentially.

artpelling · 2026-04-02T12:19:21Z

+
+    assert dataset_split in [None, "C1", "C2", "C3", "C4"], "datasetsplit must be None or in [C1, C2, C3, C4]"
+
+    assert not (scenario[-1] != "D" and dataset_split is None), "full datasets need a split"


This should be done in a way such that not all data needs to be loaded in memory. Maybe you can get some ideas from here: https://stackoverflow.com/questions/18492273/combining-hdf5-files

External linking probably won't work, though, because we also need to interlace the data. It should work, however, to create a large h5 file and fill it up sequentially.

artpelling · 2026-04-02T12:20:14Z

+    assert output_format in ["pyfar", "hdf5", "numpy"], "unknown output format"
+    assert scenario in ["A1", "A2", "D1", "R2"], "scenario must be one of ['A1', 'A2', 'D1', 'R2']"
+    assert dataset_split in [None, "C1", "C2", "C3", "C4"], "dataset_split must be None or in [C1, C2, C3, C4]"
+


check also for splits of D1 here

you mean D1 should not be split because 33x33 leads to uneven splits?

yes. D1 is compatible with SR(A)1-D and should be treated in this way :)

what I meant with my comment was that an assertion similar to the one in sriracha should be added that does not allow splits for D1

artpelling · 2026-04-13T14:46:13Z

+    return output_path
+
+
+def check_memory(file_path):


Function should probably be renamed to reflect the return type. Something like fits_in_memory or free_memory_available.

Also, it should be defined somewhere else, since we can use it for all datasets!

artpelling · 2026-04-13T14:50:17Z

+    # check if the file can be loaded into memory for pyfar or numpy output formats.
+    # if not, fall back to returning the HDF5 file path
+    if output_format in ["pyfar", "numpy"] and not check_memory(path / scenario):
+        print(


This should ideally use warn and be emitted in the check_memory function.

siavashzaid mentioned this pull request Feb 24, 2026

Feature Request: Output format #2

Closed

siavashzaid requested a review from artpelling February 24, 2026 13:18

artpelling requested changes Feb 25, 2026

View reviewed changes

artpelling linked an issue Feb 25, 2026 that may be closed by this pull request

Feature Request: Output format #2

Closed

artpelling force-pushed the feature/output-format branch from ed70909 to f0a994f Compare March 19, 2026 09:16

artpelling requested changes Mar 30, 2026

View reviewed changes

artpelling force-pushed the feature/output-format branch 2 times, most recently from 0d0edd1 to 7a5c002 Compare April 2, 2026 12:04

artpelling requested changes Apr 2, 2026

View reviewed changes

siavashzaid and others added 16 commits April 10, 2026 15:11

implemented custom output format

654415d

fix linting

1639011

fix format

69fa25b

fix docstring format

5877cb7

Apply suggestions from code review

684a747

added sriracha, ista dataset split for miracle and big refactor

169151c

linting and api reference fix

dd30e3f

sphinx: fail on warning

bbcea4b

rename to sofa and update docs

3a49c66

linting

4b37e6d

update make

e34399a

downgrade h5py due to conflict with necdf4

8f3b88f

update packages

34bc791

update README

4fb82f3

remove uncorrected

a2c646c

Update src/irdl/ista.py

4b2e58b

artpelling force-pushed the feature/output-format branch from 7a5c002 to 4b2e58b Compare April 10, 2026 13:12

update h5py

db3a21b

siavashzaid added 3 commits April 12, 2026 17:25

first draft for merging functions

7b3b394

linting and formatting :-)

5f9d122

check RAM size

dd0c8eb

artpelling force-pushed the feature/output-format branch from 0e34170 to dd0c8eb Compare April 13, 2026 14:39

artpelling requested changes Apr 13, 2026

View reviewed changes

artpelling and others added 3 commits April 13, 2026 17:03

Merge branch 'main' into feature/output-format

58b1221

ignores

e0bed5f

small refactor

15816fd

artpelling merged commit c9edc89 into main Apr 14, 2026
20 checks passed


		assert dataset_split in [None, "C1", "C2", "C3", "C4"], "datasetsplit must be None or in [C1, C2, C3, C4]"

		assert not (scenario[-1] != "D" and dataset_split is None), "full datasets need a split"

Conversation

siavashzaid commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes in src/irdl/fabian.py:

Changes in src/irdl/miracle.py:

Notes:

Uh oh!

artpelling commented Feb 25, 2026

Uh oh!

artpelling left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

artpelling commented Mar 19, 2026

Uh oh!

artpelling commented Mar 19, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

siavashzaid commented Feb 24, 2026 •

edited

Loading