New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java][Linux] Enable GCS for Arrow Dataset #35245
Comments
Enables GCS when building the Arrow Dataset for Java and also fixes various java build failures. Without the changes to flight-sql-jdbc-driver/pom.xml the flight-sql-jdbc-driver build will fail with the following errors: [WARNING] Used undeclared dependencies found: [WARNING] org.bouncycastle:bcpkix-jdk15on:jar:1.61:runtime [WARNING] org.apache.arrow:arrow-memory-core:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.hamcrest:hamcrest:jar:2.2:runtime [WARNING] org.apache.arrow:flight-sql:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.mockito:mockito-core:jar:2.25.1:test [WARNING] org.apache.arrow:flight-core:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.slf4j:slf4j-api:jar:1.7.25:runtime [WARNING] io.netty:netty-common:jar:4.1.82.Final:runtime [WARNING] joda-time:joda-time:jar:2.10.14:runtime [WARNING] org.apache.calcite.avatica:avatica:jar:1.18.0:runtime [WARNING] com.google.protobuf:protobuf-java:jar:3.21.6:runtime [WARNING] org.apache.arrow:arrow-vector:jar:12.0.0-SNAPSHOT:runtime [WARNING] com.google.guava:guava:jar:31.1-jre:runtime [...] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.0.1:analyze-only (analyze) on project flight-sql-jdbc-driver: Dependency problems found -> [Help 1] And also fail with: Caused by: java.lang.NullPointerException: Could not find test data path. Set the environment variable ARROW_TEST_DATA or the JVM property arrow.test.dataRoot. at java.util.Objects.requireNonNull(Objects.java:228) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getTestDataRoot(FlightSqlTestCertificates.java:40) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getFlightTestDataRoot(FlightSqlTestCertificates.java:51) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.exampleTlsCerts(FlightSqlTestCertificates.java:60) at org.apache.arrow.driver.jdbc.ConnectionTlsTest.<clinit>(ConnectionTlsTest.java:59)
Enables GCS when building the Arrow Dataset for Java and also fixes various java build failures. Without the changes to flight-sql-jdbc-driver/pom.xml the flight-sql-jdbc-driver build will fail with the following errors: [WARNING] Used undeclared dependencies found: [WARNING] org.bouncycastle:bcpkix-jdk15on:jar:1.61:runtime [WARNING] org.apache.arrow:arrow-memory-core:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.hamcrest:hamcrest:jar:2.2:runtime [WARNING] org.apache.arrow:flight-sql:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.mockito:mockito-core:jar:2.25.1:test [WARNING] org.apache.arrow:flight-core:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.slf4j:slf4j-api:jar:1.7.25:runtime [WARNING] io.netty:netty-common:jar:4.1.82.Final:runtime [WARNING] joda-time:joda-time:jar:2.10.14:runtime [WARNING] org.apache.calcite.avatica:avatica:jar:1.18.0:runtime [WARNING] com.google.protobuf:protobuf-java:jar:3.21.6:runtime [WARNING] org.apache.arrow:arrow-vector:jar:12.0.0-SNAPSHOT:runtime [WARNING] com.google.guava:guava:jar:31.1-jre:runtime [...] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.0.1:analyze-only (analyze) on project flight-sql-jdbc-driver: Dependency problems found -> [Help 1] And also fail with: Caused by: java.lang.NullPointerException: Could not find test data path. Set the environment variable ARROW_TEST_DATA or the JVM property arrow.test.dataRoot. at java.util.Objects.requireNonNull(Objects.java:228) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getTestDataRoot(FlightSqlTestCertificates.java:40) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getFlightTestDataRoot(FlightSqlTestCertificates.java:51) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.exampleTlsCerts(FlightSqlTestCertificates.java:60) at org.apache.arrow.driver.jdbc.ConnectionTlsTest.<clinit>(ConnectionTlsTest.java:59)
Enables GCS when building the Arrow Dataset for Java and also fixes various java build failures. Without the changes to flight-sql-jdbc-driver/pom.xml the flight-sql-jdbc-driver build will fail with the following errors: [WARNING] Used undeclared dependencies found: [WARNING] org.bouncycastle:bcpkix-jdk15on:jar:1.61:runtime [WARNING] org.apache.arrow:arrow-memory-core:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.hamcrest:hamcrest:jar:2.2:runtime [WARNING] org.apache.arrow:flight-sql:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.mockito:mockito-core:jar:2.25.1:test [WARNING] org.apache.arrow:flight-core:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.slf4j:slf4j-api:jar:1.7.25:runtime [WARNING] io.netty:netty-common:jar:4.1.82.Final:runtime [WARNING] joda-time:joda-time:jar:2.10.14:runtime [WARNING] org.apache.calcite.avatica:avatica:jar:1.18.0:runtime [WARNING] com.google.protobuf:protobuf-java:jar:3.21.6:runtime [WARNING] org.apache.arrow:arrow-vector:jar:12.0.0-SNAPSHOT:runtime [WARNING] com.google.guava:guava:jar:31.1-jre:runtime [...] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.0.1:analyze-only (analyze) on project flight-sql-jdbc-driver: Dependency problems found -> [Help 1] And also fail with: Caused by: java.lang.NullPointerException: Could not find test data path. Set the environment variable ARROW_TEST_DATA or the JVM property arrow.test.dataRoot. at java.util.Objects.requireNonNull(Objects.java:228) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getTestDataRoot(FlightSqlTestCertificates.java:40) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getFlightTestDataRoot(FlightSqlTestCertificates.java:51) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.exampleTlsCerts(FlightSqlTestCertificates.java:60) at org.apache.arrow.driver.jdbc.ConnectionTlsTest.<clinit>(ConnectionTlsTest.java:59)
Enables GCS when building the Arrow Dataset for Java and also fixes various java build failures. Without the changes to flight-sql-jdbc-driver/pom.xml the flight-sql-jdbc-driver build will fail with the following errors: [WARNING] Used undeclared dependencies found: [WARNING] org.bouncycastle:bcpkix-jdk15on:jar:1.61:runtime [WARNING] org.apache.arrow:arrow-memory-core:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.hamcrest:hamcrest:jar:2.2:runtime [WARNING] org.apache.arrow:flight-sql:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.mockito:mockito-core:jar:2.25.1:test [WARNING] org.apache.arrow:flight-core:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.slf4j:slf4j-api:jar:1.7.25:runtime [WARNING] io.netty:netty-common:jar:4.1.82.Final:runtime [WARNING] joda-time:joda-time:jar:2.10.14:runtime [WARNING] org.apache.calcite.avatica:avatica:jar:1.18.0:runtime [WARNING] com.google.protobuf:protobuf-java:jar:3.21.6:runtime [WARNING] org.apache.arrow:arrow-vector:jar:12.0.0-SNAPSHOT:runtime [WARNING] com.google.guava:guava:jar:31.1-jre:runtime [...] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.0.1:analyze-only (analyze) on project flight-sql-jdbc-driver: Dependency problems found -> [Help 1] And also fail with: Caused by: java.lang.NullPointerException: Could not find test data path. Set the environment variable ARROW_TEST_DATA or the JVM property arrow.test.dataRoot. at java.util.Objects.requireNonNull(Objects.java:228) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getTestDataRoot(FlightSqlTestCertificates.java:40) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getFlightTestDataRoot(FlightSqlTestCertificates.java:51) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.exampleTlsCerts(FlightSqlTestCertificates.java:60) at org.apache.arrow.driver.jdbc.ConnectionTlsTest.<clinit>(ConnectionTlsTest.java:59)
### Rationale for this change Enables GCS when building the Arrow Dataset for Java and also fixes various java build failures. Currently we are using our own custom Arrow Dataset build with GCS turned on, but we would rather this be enabled in the official releases from Arrow. GCS support is already enabled for cpp, python, ruby, python, and r already, so there should be no reason not to enable this on java as well. ### What changes are included in this PR? - Changes to enable GCS for Java Arrow Dataset on just Linux for now. - Fixes to flight-sql-jdbc-driver/pom.xml. Without these fixes the flight-sql-jdbc-driver build will fail with the following errors: ``` [WARNING] Used undeclared dependencies found: [WARNING] org.bouncycastle:bcpkix-jdk15on:jar:1.61:runtime [WARNING] org.apache.arrow:arrow-memory-core:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.hamcrest:hamcrest:jar:2.2:runtime [WARNING] org.apache.arrow:flight-sql:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.mockito:mockito-core:jar:2.25.1:test [WARNING] org.apache.arrow:flight-core:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.slf4j:slf4j-api:jar:1.7.25:runtime [WARNING] io.netty:netty-common:jar:4.1.82.Final:runtime [WARNING] joda-time:joda-time:jar:2.10.14:runtime [WARNING] org.apache.calcite.avatica:avatica:jar:1.18.0:runtime [WARNING] com.google.protobuf:protobuf-java:jar:3.21.6:runtime [WARNING] org.apache.arrow:arrow-vector:jar:12.0.0-SNAPSHOT:runtime [WARNING] com.google.guava:guava:jar:31.1-jre:runtime [...] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.0.1:analyze-only (analyze) on project flight-sql-jdbc-driver: Dependency problems found -> [Help 1] ``` ``` Caused by: java.lang.NullPointerException: Could not find test data path. Set the environment variable ARROW_TEST_DATA or the JVM property arrow.test.dataRoot. at java.util.Objects.requireNonNull(Objects.java:228) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getTestDataRoot(FlightSqlTestCertificates.java:40) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getFlightTestDataRoot(FlightSqlTestCertificates.java:51) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.exampleTlsCerts(FlightSqlTestCertificates.java:60) at org.apache.arrow.driver.jdbc.ConnectionTlsTest.<clinit>(ConnectionTlsTest.java:59) ``` ### Are these changes tested? I've tested the build by running: ``` $HOME/.local/bin/archery docker run java-jni-manylinux-2014 ``` I've also tested the resulting `./java/dataset/target/arrow-dataset-12.0.0-SNAPSHOT.jar` from running the command and have verified that GCS support is enabled. ### Are there any user-facing changes? Yes, Java Arrow Dataset will now work with GCS. * Closes: #35245 Authored-by: Henry Mai <henrymai@users.noreply.github.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Any chance this can be in 12.0.0 milestone since it was merged on 4/20/2023 (which looks like the deadline for 12.0.0)? I noticed that there are even other issues that were closed even later than this one that made it into 12.0.0 here: https://github.com/apache/arrow/milestone/51?closed=1 |
It's too late for 12.0.0. :< https://arrow.apache.org/docs/dev/developers/release.html#creating-a-release-candidate
|
Alright, no problem. Thanks. |
### Rationale for this change Enables GCS when building the Arrow Dataset for Java and also fixes various java build failures. Currently we are using our own custom Arrow Dataset build with GCS turned on, but we would rather this be enabled in the official releases from Arrow. GCS support is already enabled for cpp, python, ruby, python, and r already, so there should be no reason not to enable this on java as well. ### What changes are included in this PR? - Changes to enable GCS for Java Arrow Dataset on just Linux for now. - Fixes to flight-sql-jdbc-driver/pom.xml. Without these fixes the flight-sql-jdbc-driver build will fail with the following errors: ``` [WARNING] Used undeclared dependencies found: [WARNING] org.bouncycastle:bcpkix-jdk15on:jar:1.61:runtime [WARNING] org.apache.arrow:arrow-memory-core:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.hamcrest:hamcrest:jar:2.2:runtime [WARNING] org.apache.arrow:flight-sql:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.mockito:mockito-core:jar:2.25.1:test [WARNING] org.apache.arrow:flight-core:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.slf4j:slf4j-api:jar:1.7.25:runtime [WARNING] io.netty:netty-common:jar:4.1.82.Final:runtime [WARNING] joda-time:joda-time:jar:2.10.14:runtime [WARNING] org.apache.calcite.avatica:avatica:jar:1.18.0:runtime [WARNING] com.google.protobuf:protobuf-java:jar:3.21.6:runtime [WARNING] org.apache.arrow:arrow-vector:jar:12.0.0-SNAPSHOT:runtime [WARNING] com.google.guava:guava:jar:31.1-jre:runtime [...] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.0.1:analyze-only (analyze) on project flight-sql-jdbc-driver: Dependency problems found -> [Help 1] ``` ``` Caused by: java.lang.NullPointerException: Could not find test data path. Set the environment variable ARROW_TEST_DATA or the JVM property arrow.test.dataRoot. at java.util.Objects.requireNonNull(Objects.java:228) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getTestDataRoot(FlightSqlTestCertificates.java:40) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getFlightTestDataRoot(FlightSqlTestCertificates.java:51) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.exampleTlsCerts(FlightSqlTestCertificates.java:60) at org.apache.arrow.driver.jdbc.ConnectionTlsTest.<clinit>(ConnectionTlsTest.java:59) ``` ### Are these changes tested? I've tested the build by running: ``` $HOME/.local/bin/archery docker run java-jni-manylinux-2014 ``` I've also tested the resulting `./java/dataset/target/arrow-dataset-12.0.0-SNAPSHOT.jar` from running the command and have verified that GCS support is enabled. ### Are there any user-facing changes? Yes, Java Arrow Dataset will now work with GCS. * Closes: apache#35245 Authored-by: Henry Mai <henrymai@users.noreply.github.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change Enables GCS when building the Arrow Dataset for Java and also fixes various java build failures. Currently we are using our own custom Arrow Dataset build with GCS turned on, but we would rather this be enabled in the official releases from Arrow. GCS support is already enabled for cpp, python, ruby, python, and r already, so there should be no reason not to enable this on java as well. ### What changes are included in this PR? - Changes to enable GCS for Java Arrow Dataset on just Linux for now. - Fixes to flight-sql-jdbc-driver/pom.xml. Without these fixes the flight-sql-jdbc-driver build will fail with the following errors: ``` [WARNING] Used undeclared dependencies found: [WARNING] org.bouncycastle:bcpkix-jdk15on:jar:1.61:runtime [WARNING] org.apache.arrow:arrow-memory-core:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.hamcrest:hamcrest:jar:2.2:runtime [WARNING] org.apache.arrow:flight-sql:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.mockito:mockito-core:jar:2.25.1:test [WARNING] org.apache.arrow:flight-core:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.slf4j:slf4j-api:jar:1.7.25:runtime [WARNING] io.netty:netty-common:jar:4.1.82.Final:runtime [WARNING] joda-time:joda-time:jar:2.10.14:runtime [WARNING] org.apache.calcite.avatica:avatica:jar:1.18.0:runtime [WARNING] com.google.protobuf:protobuf-java:jar:3.21.6:runtime [WARNING] org.apache.arrow:arrow-vector:jar:12.0.0-SNAPSHOT:runtime [WARNING] com.google.guava:guava:jar:31.1-jre:runtime [...] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.0.1:analyze-only (analyze) on project flight-sql-jdbc-driver: Dependency problems found -> [Help 1] ``` ``` Caused by: java.lang.NullPointerException: Could not find test data path. Set the environment variable ARROW_TEST_DATA or the JVM property arrow.test.dataRoot. at java.util.Objects.requireNonNull(Objects.java:228) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getTestDataRoot(FlightSqlTestCertificates.java:40) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getFlightTestDataRoot(FlightSqlTestCertificates.java:51) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.exampleTlsCerts(FlightSqlTestCertificates.java:60) at org.apache.arrow.driver.jdbc.ConnectionTlsTest.<clinit>(ConnectionTlsTest.java:59) ``` ### Are these changes tested? I've tested the build by running: ``` $HOME/.local/bin/archery docker run java-jni-manylinux-2014 ``` I've also tested the resulting `./java/dataset/target/arrow-dataset-12.0.0-SNAPSHOT.jar` from running the command and have verified that GCS support is enabled. ### Are there any user-facing changes? Yes, Java Arrow Dataset will now work with GCS. * Closes: apache#35245 Authored-by: Henry Mai <henrymai@users.noreply.github.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change Enables GCS when building the Arrow Dataset for Java and also fixes various java build failures. Currently we are using our own custom Arrow Dataset build with GCS turned on, but we would rather this be enabled in the official releases from Arrow. GCS support is already enabled for cpp, python, ruby, python, and r already, so there should be no reason not to enable this on java as well. ### What changes are included in this PR? - Changes to enable GCS for Java Arrow Dataset on just Linux for now. - Fixes to flight-sql-jdbc-driver/pom.xml. Without these fixes the flight-sql-jdbc-driver build will fail with the following errors: ``` [WARNING] Used undeclared dependencies found: [WARNING] org.bouncycastle:bcpkix-jdk15on:jar:1.61:runtime [WARNING] org.apache.arrow:arrow-memory-core:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.hamcrest:hamcrest:jar:2.2:runtime [WARNING] org.apache.arrow:flight-sql:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.mockito:mockito-core:jar:2.25.1:test [WARNING] org.apache.arrow:flight-core:jar:12.0.0-SNAPSHOT:runtime [WARNING] org.slf4j:slf4j-api:jar:1.7.25:runtime [WARNING] io.netty:netty-common:jar:4.1.82.Final:runtime [WARNING] joda-time:joda-time:jar:2.10.14:runtime [WARNING] org.apache.calcite.avatica:avatica:jar:1.18.0:runtime [WARNING] com.google.protobuf:protobuf-java:jar:3.21.6:runtime [WARNING] org.apache.arrow:arrow-vector:jar:12.0.0-SNAPSHOT:runtime [WARNING] com.google.guava:guava:jar:31.1-jre:runtime [...] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.0.1:analyze-only (analyze) on project flight-sql-jdbc-driver: Dependency problems found -> [Help 1] ``` ``` Caused by: java.lang.NullPointerException: Could not find test data path. Set the environment variable ARROW_TEST_DATA or the JVM property arrow.test.dataRoot. at java.util.Objects.requireNonNull(Objects.java:228) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getTestDataRoot(FlightSqlTestCertificates.java:40) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getFlightTestDataRoot(FlightSqlTestCertificates.java:51) at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.exampleTlsCerts(FlightSqlTestCertificates.java:60) at org.apache.arrow.driver.jdbc.ConnectionTlsTest.<clinit>(ConnectionTlsTest.java:59) ``` ### Are these changes tested? I've tested the build by running: ``` $HOME/.local/bin/archery docker run java-jni-manylinux-2014 ``` I've also tested the resulting `./java/dataset/target/arrow-dataset-12.0.0-SNAPSHOT.jar` from running the command and have verified that GCS support is enabled. ### Are there any user-facing changes? Yes, Java Arrow Dataset will now work with GCS. * Closes: apache#35245 Authored-by: Henry Mai <henrymai@users.noreply.github.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Describe the enhancement requested
Enable GCS support in Arrow Dataset for Java.
Currently we are using our own custom Arrow Dataset build with GCS turned on, but we would rather this be enabled in the official releases from Arrow.
GCS support is already enabled for cpp, python, ruby, python, and r already, so there should be no reason not to enable this on java as well.
Component(s)
Java
The text was updated successfully, but these errors were encountered: