Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weldx.asdf.util.read_buffer relies on reading from a 'closed' buffer #873

Closed
braingram opened this issue May 4, 2023 · 2 comments · Fixed by #875
Closed

weldx.asdf.util.read_buffer relies on reading from a 'closed' buffer #873

braingram opened this issue May 4, 2023 · 2 comments · Fixed by #875

Comments

@braingram
Copy link
Contributor

ASDF was allowing lazily loaded arrays to be read from an AsdfFile created by reading an io.BytesIO instance. This results in unexpected changes to the position of the read/write pointer within the stream.

See asdf-format/asdf#1539 for the relevant ASDF issue.

It appears that weldx relies upon this behavior for any tests using weldx.asdf.util.read_buffer. The relevant code is as follows:

weldx/weldx/asdf/util.py

Lines 171 to 189 in c2ee9b1

if open_kwargs is None:
open_kwargs = {"copy_arrays": True}
buffer.seek(0)
if _use_weldx_file is None:
_use_weldx_file = _USE_WELDX_FILE
if _use_weldx_file:
from weldx.asdf.file import WeldxFile
return WeldxFile(buffer, asdffile_kwargs=open_kwargs)
with asdf.open(
buffer,
extensions=None,
**open_kwargs,
) as af:
data = af.tree
return data

Note that data is returned outside of the asdf.open context (which closes the underlying 'file' when the context exits). As lazy_load=True was not provided to asdf.open the arrays within the AsdfFile are 'lazy loaded' (their data is not read from the file until the array is sliced or in other ways accessed). One example of a test that will fail when the above issue is fixed in ASDF is test_shape_validator:
def test_shape_validator(test_input):
result = write_read_buffer(
{"root": test_input},
)["root"]
assert compare_nested(test_input.__dict__, result.__dict__)
assert compare_nested(result.__dict__, test_input.__dict__)

Here result contains references to 'lazy loaded' arrays that hold references to the now closed file. The call to compare_nested will then attempt to load the arrays which when the issue is fixed in ASDF will result in an error.

I see one usage of read_buffer outside of tests in weldx.asdf.cli.welding_schema:

write_read_buffer(
tree,
asdffile_kwargs=dict(custom_schema=str(model_path)),
)

However this does not appear to use the result of write_read_buffer.

I would imagine that one of two solutions might work:

  1. used lazy_load=True with asdf.open to force loading of all arrays when the file is opened
  2. convert read_buffer (and write_read_buffer) to context managers to allow the file to stay open while the results are used

The first option seems like the easiest but will result in all arrays being loaded into memory (which might not be necessary for every call to read_buffer).

@CagtayFabry
Copy link
Member

thank you for reporting this, I like your suggestion using the context manager.

As you mentioned in your comments, the function is mostly used in our testing pipelines. I will try to look into the best way to prevent running into this for now.

braingram added a commit to braingram/weldx that referenced this issue May 5, 2023
braingram added a commit to braingram/weldx that referenced this issue May 5, 2023
braingram added a commit to braingram/weldx that referenced this issue May 5, 2023
braingram added a commit to braingram/weldx that referenced this issue May 5, 2023
@braingram
Copy link
Contributor Author

Thanks! I opened a PR with updates that should be compatible with a fix to asdf-format/asdf#1539

I looked a bit at the ASDF issue yesterday and it affects more than just BytesIO files and may require more extensive changes to fix. However I do expect that a fix for the ASDF issue should be included in the next release (currently planned as ASDF 3.0, with no estimate for when that will be released).

marscher pushed a commit to braingram/weldx that referenced this issue May 16, 2023
CagtayFabry added a commit that referenced this issue May 16, 2023
* add read_buffer_context and write_read_buffer_context

closes #873

* update changelog

* update dummy test

* add read_buffer_context and write_read_buffer_context

closes #873

* update changelog

* update dummy test

* removed internal keyword dummy_inline_arrays

---------

Co-authored-by: Çağtay Fabry <43667554+CagtayFabry@users.noreply.github.com>
Co-authored-by: Martin K. Scherer <m.scherer@fu-berlin.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants