Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFI implementation deviates from specification for array release #595

Closed
roee88 opened this issue Jul 22, 2021 · 8 comments
Closed

FFI implementation deviates from specification for array release #595

roee88 opened this issue Jul 22, 2021 · 8 comments
Labels

Comments

@roee88
Copy link
Contributor

roee88 commented Jul 22, 2021

The FFI implementation deviates from the C Data Interface specification in Release callback semantics – for producers.

The release callback MUST walk all children structures (including the optional dictionary) and call their own release callbacks.

The current implementation doesn't walk through the direct children but rather only through the children in private data. This blocks support for moving child arrays.

It also assumes that child arrays are not already released. This assumption is not true for the case of child array move. The release member must be checked to be not null before dropping/releasing an array.

The current implementation doesn't seem to release the dictionary array.

Please see the reference implementation in the spec (the first line seems to be a mistake and should check for the release member):

static void ReleaseExportedArray(struct ArrowArray* array) {
  // This should not be called on already released array
  assert(array->format != NULL);

  // Release children
  for (int64_t i = 0; i < array->n_children; ++i) {
    struct ArrowArray* child = array->children[i];
    if (child->release != NULL) {
      child->release(child);
      assert(child->release == NULL);
    }
  }

  // Release dictionary
  struct ArrowArray* dict = array->dictionary;
  if (dict != NULL && dict->release != NULL) {
    dict->release(dict);
    assert(dict->release == NULL);
  }

  // TODO here: release and/or deallocate all data directly owned by
  // the ArrowArray struct, such as the private_data.

  // Mark array released
  array->release = NULL;
}
@roee88 roee88 added the bug label Jul 22, 2021
@roee88
Copy link
Contributor Author

roee88 commented Jul 22, 2021

The same is true for schema release. As part of a fix, the allocated format, name, (and in the future: metadata, dictionary) should also go into private data. However, currently the code has a with_name function that would probably need to go away because private_data is already a raw pointer at that point. @kszucs what's the rational behind these with_* methods and could you suggest a clean alternative given the above?

@kszucs
Copy link
Member

kszucs commented Jul 22, 2021

If I recall correctly metadata and dictionary has not been implemented so far, though we should definitely support those.

The with_name and with_flags methods are used as setters here (perhaps we should use a set_ prefix instead). These are separate fields in the c-struct from the private_data field.

Regarding the ArrowArray abstraction I'd need to check the implementation again (which we should clean up eventually), but I'm currently busy with the release procedure. Hopefully I can have a more thorough look at the issue after that.

@roee88
Copy link
Contributor Author

roee88 commented Jul 23, 2021

The with_name and with_flags methods are used as setters here (perhaps we should use a set_ prefix instead). These are separate fields in the c-struct from the private_data field.

My point was that own allocations (like name) must go to private data, not sure if that was expressed clearly in my message. I'll create a draft PR to make it clearer. I can help contribute here but will probably need your guidelines for the with_name setters.

Regarding the ArrowArray abstraction I'd need to check the implementation again (which we should clean up eventually), but I'm currently busy with the release procedure. Hopefully I can have a more thorough look at the issue after that.

In the code and text above whenever I wrote ArrowArray I meant FFI_ArrowArray, I just used the spec name sorry for the confusion. I do agree that the ArrowArray abstraction is confusing and probably need to go away. The API should probably be similar to the bridge implementation in arrow cpp. This is more related to #596.

@roee88
Copy link
Contributor Author

roee88 commented Jul 24, 2021

On a second thought I'm not sure about this one. The rust implementation has an assumption that apart from nullifying the release callback everything is immutable. This might be a reasonable assumption even though the cpp implementation doesn't do the same and (at least to me) the spec aren't clear enough here. In contrast to the original description, the current release implementation doesn't block the possibility of (child) array move. I will verify this use case via a test case separately (e.g., to ensure that buffer lifetime is correct).

@roee88 roee88 closed this as completed Jul 24, 2021
@jorgecarleitao
Copy link
Member

cc @pitrou , that may help us further here.

@pitrou
Copy link
Member

pitrou commented Aug 30, 2021

@jorgecarleitao Sorry for the large delay, is there something you'd like me to take a look at?

@roee88
Copy link
Contributor Author

roee88 commented Sep 2, 2021

@pitrou the question is whether it's safe to assume that other than nullifying the release callback the consumer doesn't change anything in the structures.

Two examples to demonstrate the difference in implementation:

  1. In C++ the allocated format field is released based on a pointer stored in private data. In Rust it's released based on the format field itself (in the ArrowSchema structure).
  2. In C++ the release callback of every child in the children field is called. In Rust the release callback of every child in private data is called.

@pitrou
Copy link
Member

pitrou commented Sep 3, 2021

@pitrou the question is whether it's safe to assume that other than nullifying the release callback the consumer doesn't change anything in the structures.

It is, indeed.

In C++ the allocated format field is released based on a pointer stored in private data. In Rust it's released based on the format field itself (in the ArrowSchema structure).

I'm not sure what difference it makes. In both cases it is released (directly or indirectly) by calling the release callback, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants