-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-16913: [Java] Implement ArrowArrayStream #13465
Conversation
java/c/src/main/cpp/jni_wrapper.cc
Outdated
ThrowPendingException(message); | ||
} | ||
jclass global_class = (jclass)env->NewGlobalRef(local_class); | ||
if (!local_class) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a mistake?
if (!local_class) { | |
if (!global_class) { |
java/c/src/main/cpp/jni_wrapper.cc
Outdated
const int err_code = env->CallIntMethod(private_data->j_private_data_, | ||
kPrivateDataGetSchemaMethod, out_addr); | ||
if (env->ExceptionCheck()) { | ||
env->ExceptionDescribe(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's an exception, should it perhaps participate in last_error_
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally the JNI side sets the last error, the check here is just a last-resort safeguard. I suppose this can be refactored though: copy the Java-side error to the C++ side after get_next/get_stream, and get_last_error only has to return the C++-side error; then get_next/get_stream can also update last_error_ if it ends up catching a stray error.
java/c/src/main/cpp/jni_wrapper.cc
Outdated
if (env->ExceptionCheck()) { | ||
env->ExceptionDescribe(); | ||
env->ExceptionClear(); | ||
ThrowPendingException("Error calling close of private data"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this right? The release callback could be called from any context, such as a Python thread or R interpreter. In those contexts, a C++ exception would probably crash the process (or silently exit the thread)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, you're right. The existing handler has this issue too. I'll remove the throw. (Actually here I suppose we should do our best to free resources in C++/Java regardless.)
java/c/src/main/cpp/jni_wrapper.cc
Outdated
JNIEnvGuard guard(private_data->vm_); | ||
JNIEnv* env = guard.env(); | ||
|
||
const long out_addr = static_cast<long>(reinterpret_cast<uintptr_t>(out)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this doesn't work on 64-bit Windows? long
is 32 bits there...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, according to the JNI spec, a jlong
is always 64 bits, so perhaps we should use jlong
or simply int64_t
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks.
* @param stream C stream interface struct to import. | ||
* @return Imported reader | ||
*/ | ||
public static ArrowReader importStream(BufferAllocator allocator, ArrowArrayStream stream) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for the naming discrepancy (importStream
vs. exportArrayStream
)?
static class ExportedArrayStreamPrivateData implements PrivateData { | ||
final BufferAllocator allocator; | ||
final ArrowReader reader; | ||
int nextDictionary; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This member doesn't seem used anymore, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah it's not used. I missed this when backing out a change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. For the record, did you try to use this to communicate with e.g. PyArrow?
@lwhite1 Would you like to take a look? |
I have not yet, I need to give this a try: https://arrow.apache.org/docs/dev/python/integration/python_java.html#java-to-python-communication-using-the-c-data-interface and actually, I'll extend the doc page there as well. |
…wow, whatever GitHub did to their UI is rather frustrating. |
Hmm, there's a possible minor bug between PyArrow/C++/Java: Python can keep a reference to the reader until interpreter shutdown (at which point the JVM has been shut down), and then collects the reader. This frees the Changes needed:
|
Can Python perhaps release that reference once close() is called? |
Well, the Python-side reference is the Python reader object itself. But close() should be wired up to call the new RecordBatchReader::Close() so we can at least explicitly call the release callback at a suitable time. |
Though the Java improvements are welcome as well. We should probably try to do both. |
… |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question, this is great otherwise.
@amol- Do you want to take a look at the doc additions? |
Implements ArrowArrayStream for Java. The equivalent Java-side interface chosen is ArrowReader. Also: - Fixes a couple of JDK9 compatibility issues I ran into. I _think_ these will not normally affect people except during development (I think because I was mixing IntelliJ and Maven). - Manually clang-format the C++ code. Clean up some things to match Arrow convention and remove some unused declarations. - Extends the DictionaryProvider interface. This is a potentially breaking change; we could make the method default (and raise an exception) instead. Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: Alessandro Molina <amol@turbogears.org>
Implements ArrowArrayStream for Java. The equivalent Java-side interface chosen is ArrowReader.
Also: