Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-16566: [Java] Initialize JNI components on use instead of statically #13146

Merged
merged 2 commits into from
Jun 8, 2022

Conversation

jackylee-ch
Copy link
Contributor

@jackylee-ch jackylee-ch commented May 13, 2022

We are using Arrow in Spark. When speculation is turned on, it is possible for the task to be forcibly killed during the running process. It means, JniLoader.get().ensureLoaded() maybe failed with ClosedByInterruptException when the task was killed, and it should be accepted with this.
However, NativeMemoryPool use it as a static code, this leads to an init failure every time the NativeMemroyPool is trying to initialized and can never succeed. It would always fail to init NativeMemoryPool with java.lang.NoClassDefFoundError: Could not initialize class org.apache.arrow.dataset.jni.NativeMemoryPool.

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW

Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@jackylee-ch jackylee-ch changed the title [ARROW-16566] Could not initialize class org.apache.arrow.dataset.jni.NativeMemoryPool ARROW-16566: [JAVA] Could not initialize class org.apache.arrow.dataset.jni.NativeMemoryPool May 13, 2022
@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has no components in JIRA, make sure you assign one.

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@jackylee-ch
Copy link
Contributor Author

⚠️ Ticket has no components in JIRA, make sure you assign one.

Done

@lidavidm
Copy link
Member

lidavidm commented May 13, 2022

Could you elaborate on

When the specification is turned on

?

@@ -23,7 +23,7 @@
public class NativeMemoryPool implements AutoCloseable {
private final long nativeInstanceId;

static {
private static void ensureLoaded() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason not to inline this instead?

@jackylee-ch
Copy link
Contributor Author

Could you elaborate on

When the specification is turned on

?

Thanks for your attension.
When spark speculation is turned on, the spark program allows the user's logic to be executed in parelle, which means there will be two same logic running at the same time. spark will interrupt the unfinished task once one of them finished and then run other tasks.

Back to current PR, if it is killed when calling JniLoader.get.ensureLoaded in NativeMemoryPool initlization, it will cause the NativeMemoryPool to never be initialized successfully in the subsequent execution process.

@pitrou
Copy link
Member

pitrou commented May 13, 2022

cc @lwhite1 , who has started working on various JNI aspects.

@pitrou pitrou changed the title ARROW-16566: [JAVA] Could not initialize class org.apache.arrow.dataset.jni.NativeMemoryPool ARROW-16566: [Java] Could not initialize class org.apache.arrow.dataset.jni.NativeMemoryPool May 16, 2022
@pitrou
Copy link
Member

pitrou commented Jun 7, 2022

@jackylee-ch Would you like to follow up on @lidavidm 's suggestion above?

@jackylee-ch
Copy link
Contributor Author

@jackylee-ch Would you like to follow up on @lidavidm 's suggestion above?

Sure.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@lidavidm lidavidm changed the title ARROW-16566: [Java] Could not initialize class org.apache.arrow.dataset.jni.NativeMemoryPool ARROW-16566: [Java] Initialize JNI components on use instead of statically Jun 8, 2022
@lidavidm lidavidm merged commit 982960a into apache:master Jun 8, 2022
@jackylee-ch jackylee-ch deleted the fix_jni_load_failed branch June 9, 2022 00:32
@jackylee-ch
Copy link
Contributor Author

Thanks for your help @lidavidm @pitrou

jackylee-ch added a commit to jackylee-ch/arrow that referenced this pull request Jun 23, 2022
…cally (apache#13146)

We are using Arrow in Spark. When speculation is turned on, it is possible for the task to be forcibly killed during the running process. It means, `JniLoader.get().ensureLoaded()` maybe failed with `ClosedByInterruptException` when the task was killed, and it should be accepted with this.
However, `NativeMemoryPool` use it as a static code, this leads to an init failure every time the NativeMemroyPool is trying to initialized and can never succeed. It would always fail to init `NativeMemoryPool` with `java.lang.NoClassDefFoundError: Could not initialize class org.apache.arrow.dataset.jni.NativeMemoryPool`.

Lead-authored-by: jackylee-ch <lijunqing@baidu.com>
Co-authored-by: stczwd <qcsd2011@163.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
zhouyuan pushed a commit to oap-project/arrow that referenced this pull request Jun 27, 2022
…cally (apache#13146) (#119)

We are using Arrow in Spark. When speculation is turned on, it is possible for the task to be forcibly killed during the running process. It means, `JniLoader.get().ensureLoaded()` maybe failed with `ClosedByInterruptException` when the task was killed, and it should be accepted with this.
However, `NativeMemoryPool` use it as a static code, this leads to an init failure every time the NativeMemroyPool is trying to initialized and can never succeed. It would always fail to init `NativeMemoryPool` with `java.lang.NoClassDefFoundError: Could not initialize class org.apache.arrow.dataset.jni.NativeMemoryPool`.

Lead-authored-by: jackylee-ch <lijunqing@baidu.com>
Co-authored-by: stczwd <qcsd2011@163.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants