Skip to content

[fix](jdbc) Add shared JDBC driver checksum loader#63676

Open
xylaaaaa wants to merge 1 commit into
apache:masterfrom
xylaaaaa:codex/jdbc-driver-checksum-loader
Open

[fix](jdbc) Add shared JDBC driver checksum loader#63676
xylaaaaa wants to merge 1 commit into
apache:masterfrom
xylaaaaa:codex/jdbc-driver-checksum-loader

Conversation

@xylaaaaa
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: None

Related PR: #61094

Problem Summary: Add shared FE and Java extension utilities to validate JDBC driver checksums before dynamically loading JDBC driver jars. This provides a common path for future JDBC-based connectors to verify driver jars before class loading.

Release note

None

Check List (For Author)

  • Test: Unit Test
    • MAVEN_OPTS='-Dmaven.build.cache.enabled=false' ./run-fe-ut.sh --run org.apache.doris.catalog.JdbcDriverLoaderTest,org.apache.doris.common.jdbc.JdbcDriverUtilsTest,org.apache.doris.paimon.PaimonJdbcDriverUtilsTest
    • MAVEN_OPTS='-Dmaven.build.cache.enabled=false' ./run-fe-ut.sh --run org.apache.doris.catalog.JdbcDriverLoaderTest
  • Behavior changed: No
  • Does this need documentation: No

Copilot AI review requested due to automatic review settings May 26, 2026 09:11
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds support for selectively bootstrapping Hive2/Hive3 datasets in the thirdparties Docker environment, and introduces JDBC driver checksum validation + driver registration helpers (with unit tests).

Changes:

  • Add Hive “bootstrap groups” helper + group list files, and wire group selection into hive data extraction/download, HDFS copy, and HQL/run.sh execution.
  • Add JdbcDriverLoader (FE) and JdbcDriverUtils (java-common) to validate driver checksums before registering JDBC drivers, with accompanying tests.
  • Add a small pipeline helper script to output bootstrap group strings from a simple CLI input.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
regression-test/pipeline/common/get-hive-bootstrap-groups.sh New CLI helper to emit hive bootstrap group strings.
fe/fe-core/src/main/java/org/apache/doris/catalog/JdbcDriverLoader.java New FE-side driver checksum validation + dynamic driver registration w/ caching.
fe/fe-core/src/test/java/org/apache/doris/catalog/JdbcDriverLoaderTest.java New tests for checksum validation and registration flow.
fe/be-java-extensions/java-common/src/main/java/org/apache/doris/common/jdbc/JdbcDriverUtils.java New BE/java-common checksum computation + driver registration utility.
fe/be-java-extensions/java-common/src/test/java/org/apache/doris/common/jdbc/JdbcDriverUtilsTest.java New tests for checksum validation behavior.
docker/thirdparties/run-thirdparties-docker.sh Plumbs bootstrap groups env into hive2/hive3 startup and data preparation.
docker/thirdparties/docker-compose/hive/scripts/prepare-hive-data.sh Filters which archives/downloads run based on selected bootstrap groups.
docker/thirdparties/docker-compose/hive/scripts/hive-metastore.sh Filters run.sh/HDFS copy/HQL creation steps based on selected bootstrap groups.
docker/thirdparties/docker-compose/hive/scripts/bootstrap/*.list New mapping lists used to categorize items as hive2_only/hive3_only.
docker/thirdparties/docker-compose/hive/scripts/bootstrap/bootstrap-groups.sh New shared bash library for group normalization/selection logic.
docker/thirdparties/docker-compose/hive/hadoop-hive.env.tpl Passes HIVE_BOOTSTRAP_GROUPS into hive containers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cd -
else
echo "${CUR_DIR}/tpch1.db exist, continue !"
BOOTSTRAP_GROUPS="$(bootstrap_normalize_groups "${HIVE_BOOTSTRAP_GROUPS:-}")"
set -e -x

. /mnt/scripts/bootstrap/bootstrap-groups.sh
BOOTSTRAP_GROUPS="$(bootstrap_normalize_groups "${HIVE_BOOTSTRAP_GROUPS:-}")"
fi

if [[ -d "${local_path}" && -z "$(ls "${local_path}")" ]]; then
echo "${local_path} does not exist"
Comment on lines +65 to +71
validateDriverChecksum(driverUrl, expectedChecksum);
} catch (DdlException e) {
throw new IllegalArgumentException(e.getMessage(), e);
}

try {
String fullDriverUrl = JdbcResource.getFullDriverUrl(driverUrl);
Comment on lines +65 to +71
validateDriverChecksum(driverUrl, expectedChecksum);
} catch (DdlException e) {
throw new IllegalArgumentException(e.getMessage(), e);
}

try {
String fullDriverUrl = JdbcResource.getFullDriverUrl(driverUrl);
public class JdbcDriverUtils {
private static final Logger LOG = Logger.getLogger(JdbcDriverUtils.class);
private static final int HTTP_TIMEOUT_MS = 10000;
private static final Map<URL, ClassLoader> DRIVER_CLASS_LOADER_CACHE = new ConcurrentHashMap<>();
Comment on lines +71 to +72
ClassLoader classLoader = DRIVER_CLASS_LOADER_CACHE.computeIfAbsent(url, u ->
URLClassLoader.newInstance(new URL[] {u}, parentClassLoader));
Comment on lines +36 to +76
boolean oldRunningUnitTest = FeConstants.runningUnitTest;
FeConstants.runningUnitTest = false;
try {
String driverUrl = createDriverUrl();

DdlException exception = Assert.assertThrows(DdlException.class,
() -> JdbcDriverLoader.validateDriverChecksum(driverUrl, "bad-checksum"));
Assert.assertTrue(exception.getMessage().contains("does not match the computed checksum"));
} finally {
FeConstants.runningUnitTest = oldRunningUnitTest;
}
}

@Test
public void testRegisterDriverValidatesChecksumBeforeLoadingClass() throws Exception {
boolean oldRunningUnitTest = FeConstants.runningUnitTest;
FeConstants.runningUnitTest = false;
try {
String driverUrl = createDriverUrl();

IllegalArgumentException exception = Assert.assertThrows(IllegalArgumentException.class,
() -> JdbcDriverLoader.registerDriver(driverUrl,
"org.apache.doris.catalog.NotExistingDriver", "bad-checksum",
getClass().getClassLoader()));
Assert.assertTrue(exception.getMessage().contains("does not match the computed checksum"));
} finally {
FeConstants.runningUnitTest = oldRunningUnitTest;
}
}

@Test
public void testValidateDriverChecksumReturnsComputedChecksum() throws Exception {
boolean oldRunningUnitTest = FeConstants.runningUnitTest;
FeConstants.runningUnitTest = false;
try {
String driverUrl = createDriverUrl();

Assert.assertEquals(DRIVER_CHECKSUM, JdbcDriverLoader.validateDriverChecksum(driverUrl, DRIVER_CHECKSUM));
Assert.assertEquals(DRIVER_CHECKSUM, JdbcDriverLoader.validateDriverChecksum(driverUrl, ""));
} finally {
FeConstants.runningUnitTest = oldRunningUnitTest;
Comment on lines +82 to +83
Files.write(driverPath, DRIVER_BYTES.getBytes(StandardCharsets.UTF_8));
return "file://" + driverPath.toAbsolutePath();
Comment on lines +26 to +38
case "${1:-}" in
hive2)
echo "common,hive2_only"
;;
hive3)
echo "common,hive3_only"
;;
both)
echo "common,hive2_only,hive3_only"
;;
all)
echo "all"
;;
@xylaaaaa xylaaaaa force-pushed the codex/jdbc-driver-checksum-loader branch 2 times, most recently from f77eb11 to ae3fedb Compare May 26, 2026 09:51
### What problem does this PR solve?

Issue Number: None

Related PR: apache#61094

Problem Summary: Add shared FE and Java extension utilities to validate JDBC driver checksums before dynamically loading JDBC driver jars.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - MAVEN_OPTS='-Dmaven.build.cache.enabled=false' ./run-fe-ut.sh --run org.apache.doris.catalog.JdbcDriverLoaderTest,org.apache.doris.common.jdbc.JdbcDriverUtilsTest
- Behavior changed: No
- Does this need documentation: No
@xylaaaaa xylaaaaa force-pushed the codex/jdbc-driver-checksum-loader branch from ae3fedb to 408141c Compare May 26, 2026 11:26
@xylaaaaa
Copy link
Copy Markdown
Contributor Author

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants