Conversation
|
Looks like there are some issues with the docs https://github.com/artpelling/irdl/actions/runs/22352522870/job/64683113044?pr=3#step:4:97 |
artpelling
left a comment
There was a problem hiding this comment.
Thanks, looking good already. I've requested some minor changes regarding docstring formatting. I also did a push beforehand to fix the docs building, so don't forget to pull first.
I also raised some conceptual questions I would like to get your input on (we can also defer them to a later PR).
| pup.fetch(scenario, progressbar=True) | ||
|
|
||
| @process | ||
| @process # is always true because we dont extract and pup.fetch checks if file exists already => remove? |
There was a problem hiding this comment.
good catch! Let's leave it like this for now. I am not super happy with the decorator (also its docstring is confusing). I will think a bit how to deal with this.
ed70909 to
f0a994f
Compare
|
Looks good I think we can merge it soon. Could you also rename |
|
And add yourself to |
|
|
||
| assert dataset_split in [None, "C1", "C2", "C3", "C4"], "datasetsplit must be None or in [C1, C2, C3, C4]" | ||
|
|
||
| assert not (scenario[-1] != "D" and dataset_split is None), "full datasets need a split" |
There was a problem hiding this comment.
Full datasets are allowed. If a full set is queried, all splits need to be fetched and assembled.
There was a problem hiding this comment.
This should be done in a way such that not all data needs to be loaded in memory. Maybe you can get some ideas from here: https://stackoverflow.com/questions/18492273/combining-hdf5-files
External linking probably won't work, though, because we also need to interlace the data. It should work, however, to create a large h5 file and fill it up sequentially.
0d0edd1 to
7a5c002
Compare
|
|
||
| assert dataset_split in [None, "C1", "C2", "C3", "C4"], "datasetsplit must be None or in [C1, C2, C3, C4]" | ||
|
|
||
| assert not (scenario[-1] != "D" and dataset_split is None), "full datasets need a split" |
There was a problem hiding this comment.
This should be done in a way such that not all data needs to be loaded in memory. Maybe you can get some ideas from here: https://stackoverflow.com/questions/18492273/combining-hdf5-files
External linking probably won't work, though, because we also need to interlace the data. It should work, however, to create a large h5 file and fill it up sequentially.
| assert output_format in ["pyfar", "hdf5", "numpy"], "unknown output format" | ||
| assert scenario in ["A1", "A2", "D1", "R2"], "scenario must be one of ['A1', 'A2', 'D1', 'R2']" | ||
| assert dataset_split in [None, "C1", "C2", "C3", "C4"], "dataset_split must be None or in [C1, C2, C3, C4]" | ||
|
|
There was a problem hiding this comment.
check also for splits of D1 here
There was a problem hiding this comment.
you mean D1 should not be split because 33x33 leads to uneven splits?
There was a problem hiding this comment.
yes. D1 is compatible with SR(A)1-D and should be treated in this way :)
There was a problem hiding this comment.
what I meant with my comment was that an assertion similar to the one in sriracha should be added that does not allow splits for D1
7a5c002 to
4b2e58b
Compare
0e34170 to
dd0c8eb
Compare
| return output_path | ||
|
|
||
|
|
||
| def check_memory(file_path): |
There was a problem hiding this comment.
Function should probably be renamed to reflect the return type. Something like fits_in_memory or free_memory_available.
Also, it should be defined somewhere else, since we can use it for all datasets!
| # check if the file can be loaded into memory for pyfar or numpy output formats. | ||
| # if not, fall back to returning the HDF5 file path | ||
| if output_format in ["pyfar", "numpy"] and not check_memory(path / scenario): | ||
| print( |
There was a problem hiding this comment.
This should ideally use warn and be emitted in the check_memory function.
Adds an output_format parameter to both dataset getters, allowing users to choose between three return formats: 'pyfar' (default), 'hdf5', and 'numpy'.
Changes in src/irdl/fabian.py:
Changes in src/irdl/miracle.py:
Notes:
The @process decorator in miracle.py is redundant since pooch.fetch() already handles caching