Skip to content

#757 Add support for dynamically loaded JDBC drivers and ensure connections are reused in JDBC Native#760

Open
yruslan wants to merge 5 commits into
mainfrom
feature/757-add-support-for-dynamically-loaded-jdbc-drivers
Open

#757 Add support for dynamically loaded JDBC drivers and ensure connections are reused in JDBC Native#760
yruslan wants to merge 5 commits into
mainfrom
feature/757-add-support-for-dynamically-loaded-jdbc-drivers

Conversation

@yruslan
Copy link
Copy Markdown
Collaborator

@yruslan yruslan commented May 26, 2026

Closes #757

Summary by CodeRabbit

  • New Features

    • Resource-managed table readers (closeable) and support for external JDBC driver JARs.
  • Improvements

    • Broader Delta schema error recovery to handle more failure cases.
    • Updated JDBC connection selection and retry behavior for more reliable connections.
    • JDBC operations now use selector-based workflows and ensure readers are closed after use.
  • Chores

    • Project version bumped to 1.14.0-SNAPSHOT.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 23188789-5128-452d-9d57-c83900fd320c

📥 Commits

Reviewing files that changed from the base of the PR and between ed9e9b2 and f17ff94.

📒 Files selected for processing (6)
  • pramen/core/src/main/scala/za/co/absa/pramen/core/reader/DynamicDriver.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/reader/JdbcUrlSelector.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/reader/JdbcUrlSelectorImpl.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/utils/JdbcNativeUtils.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/utils/impl/ResultSetToRowIterator.scala
  • pramen/core/src/test/scala/za/co/absa/pramen/core/tests/utils/JdbcNativeUtilsSuite.scala
✅ Files skipped from review due to trivial changes (1)
  • pramen/core/src/main/scala/za/co/absa/pramen/core/reader/DynamicDriver.scala
🚧 Files skipped from review as they are similar to previous changes (2)
  • pramen/core/src/main/scala/za/co/absa/pramen/core/reader/JdbcUrlSelector.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/reader/JdbcUrlSelectorImpl.scala

Walkthrough

Add dynamic JDBC driver loading and selector-driven connection management; refactor JDBC utilities/readers to accept a driver-aware JdbcUrlSelector, introduce AutoCloseable contracts for readers/iterators, ensure readers are closed, expand Delta AnalysisException checks, and bump project versions to 1.14.0-SNAPSHOT.

Changes

JDBC Driver Support and Resource Lifecycle

Layer / File(s) Summary
AutoCloseable contracts and DynamicDriver
pramen/api/src/main/scala/za/co/absa/pramen/api/TableReader.scala, pramen/core/src/main/scala/za/co/absa/pramen/core/reader/DynamicDriver.scala, pramen/core/src/main/scala/za/co/absa/pramen/core/utils/impl/ResultSetToRowIterator.scala
Add AutoCloseable to TableReader and ResultSetToRowIterator, and introduce DynamicDriver as a holder for a Driver plus its URLClassLoader.
JdbcUrlSelector redesign with driver support and caching
pramen/core/src/main/scala/za/co/absa/pramen/core/reader/JdbcUrlSelector.scala, pramen/core/src/main/scala/za/co/absa/pramen/core/reader/JdbcUrlSelectorImpl.scala
Replace retry-parameterized working methods with getWorkingUrl, cached getConnection, and getNewConnection(retriesLeft). Add optional jdbcDriverJarPath, loadedDriver, companion apply overload, and loadDriver logic; implement cached connection lifecycle and close behavior.
JdbcNativeUtils refactor (driver-aware, selector-based)
pramen/core/src/main/scala/za/co/absa/pramen/core/utils/JdbcNativeUtils.scala
Public methods now accept JdbcUrlSelector; getJdbcConnection accepts optional Driver; getJdbcNativeDataFrame/getJdbcNativeRecordCount use selector-based flows; withResultSet derives retries from selector and uses loop-based retry.
JdbcSparkUtils metadata helper
pramen/core/src/main/scala/za/co/absa/pramen/core/utils/JdbcSparkUtils.scala
withJdbcMetadata overload switched to accept JdbcUrlSelector and delegates metadata access to selector-managed connection handling (no explicit close in this overload).
JDBC readers adopt selector and lifecycle
pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderJdbcBase.scala, .../TableReaderJdbcNative.scala, .../TableReaderJdbc.scala, .../TableReaderSpark.scala
Readers now obtain connections via the selector (getNewConnection/getConnection), use selector-based JdbcNative utilities and metadata helpers, and override close() to delegate to the selector where applicable; TableReaderJdbcNative adds DRIVER_JAR_PATH and reads driver JAR config.
JdbcSource lifecycle and driver JAR config
pramen/core/src/main/scala/za/co/absa/pramen/core/source/JdbcSource.scala
getRecordCount, getData, and getDataIncremental wrap reader calls in try/finally to always close readers; getReader reads optional DRIVER_JAR_PATH and constructs JdbcUrlSelector(jdbcDriverJarPath, jdbcConfig).
Enhanced Delta schema migration detection
pramen/core/src/main/scala/za/co/absa/pramen/core/bookkeeper/BookkeeperDeltaPath.scala, .../BookkeeperDeltaTable.scala
getBkDf now treats AnalysisException messages containing either “cannot resolve” or “does not exist” as triggers to call migrateModel() and retry reading the records table.
Connection selection strategy updates
pramen/core/src/main/scala/za/co/absa/pramen/core/rdb/PramenDb.scala, pramen/core/src/main/scala/za/co/absa/pramen/core/rdb/RdbJdbc.scala, pramen/core/src/main/scala/za/co/absa/pramen/core/utils/hive/QueryExecutorJdbc.scala
Switch call sites from getWorkingConnection(...) to getNewConnection(...) when obtaining (Connection, url) pairs.
Maven version bumps
pramen/pom.xml, pramen/api/pom.xml, pramen/core/pom.xml, pramen/extras/pom.xml
Update project/parent versions to 1.14.0-SNAPSHOT.

Test Updates and Version Alignment

Layer / File(s) Summary
Test imports & helpers for JdbcUrlSelector
pramen/core/src/test/.../IncrementalPipelineJdbcLongSuite.scala, .../IncrementalPipelineLongFixture.scala, .../JdbcUrlSelectorImplSuite.scala
Replace JdbcUrlSelectorImpl direct constructions with JdbcUrlSelector(...) factory usage and remove obsolete numeric retries args from withResultSet calls.
JdbcNativeUtilsSuite and iterator constructor updates
pramen/core/src/test/.../utils/JdbcNativeUtilsSuite.scala
Rewire tests to use selector-based overloads, pass optional None driver where required, and update many ResultSetToRowIterator constructor usages to the new signature.
JdbcSparkUtilsSuite and QueryExecutorJdbcSuite updates
pramen/core/src/test/.../utils/JdbcSparkUtilsSuite.scala, .../utils/hive/QueryExecutorJdbcSuite.scala
Use JdbcUrlSelector in metadata tests and adapt query-executor retry tests to mock/stub getNewConnection(...).
Driver JAR config testing
pramen/core/src/test/.../reader/TableReaderJdbcNativeSuite.scala
Add reader_jar test config with driver.jar.path and a factory test asserting selector JAR path and derived settings.
Test mock reader close() implementations
pramen/core/src/test/.../mocks/ReaderSpy.scala, ReaderStub.scala, pramen/extras/src/test/.../mocks/ReaderSpy.scala
Add no-op close() overrides to test reader mocks to match AutoCloseable contract.
Test cleanup helper
pramen/core/src/test/.../journal/JournalHadoopDeltaTableSuite.scala
Replace non-recursive deletion with a recursive deleteRecursively(File) helper for spark-warehouse cleanup.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • AbsaOSS/pramen#753: Overlaps on changes to JDBC driver-loading and driver acquisition logic in JdbcNativeUtils and selector wiring.
  • AbsaOSS/pramen#614: Related retry and JDBC count/query handling refactors that touch similar native JDBC retry logic.
  • AbsaOSS/pramen#628: Overlaps with ResultSetToRowIterator constructor and iterator behavior changes.

"I'm a rabbit with a tiny load,
I hop to jars along the road.
Drivers loaded, connections neat,
Close calls tidy—no resource feat.
Hooray for selectors kept in code!"

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding support for dynamically loaded JDBC drivers and ensuring connection reuse in JDBC Native.
Linked Issues check ✅ Passed The PR implementation fully addresses issue #757 by enabling dynamic JDBC driver loading, implementing dedicated class loaders, and ensuring connections use the corresponding class loader.
Out of Scope Changes check ✅ Passed All changes align with the scope of issue #757; version bumps and AutoCloseable implementations support the core objective of dynamic driver loading and resource management.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/757-add-support-for-dynamically-loaded-jdbc-drivers

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pramen/api/src/main/scala/za/co/absa/pramen/api/TableReader.scala`:
- Line 24: The TableReader trait was changed to extend AutoCloseable but doesn't
provide a default close implementation, breaking downstream implementations; add
a default no-op override def close(): Unit = () in the TableReader trait so
existing external implementations remain binary/source compatible and
resource-owning readers can still override close() as needed, referencing the
TableReader trait and its close() method for where to add this default.

In
`@pramen/core/src/main/scala_2.12/za/co/absa/pramen/core/bookkeeper/BookkeeperDeltaTable.scala`:
- Line 57: The AnalysisException pattern in BookkeeperDeltaTable (line with case
ex: AnalysisException) is too broad and can match "table/database does not
exist" errors; narrow the match to only column/field resolution failures (e.g.,
check for "cannot resolve" or a message pattern like "column .* does not exist")
or, better, proactively verify the table exists before calling migrateModel()
using Spark's catalog (spark.catalog.tableExists) to avoid attempting
SaveMode.Append on a missing table; update the match in the exception handler
and/or add a table existence check in init()/migrateModel() so migrateModel()
only runs when the target table is present.

In
`@pramen/core/src/main/scala_2.13/za/co/absa/pramen/core/bookkeeper/BookkeeperDeltaTable.scala`:
- Around line 57-60: The current catch in BookkeeperDeltaTable matches
AnalysisException messages containing "does not exist", which is too broad and
can catch missing-table/database errors and lead migrateModel() to attempt
SaveMode.Append on a non-existent table; update the handling in the code that
calls migrateModel()/spark.table(recordsFullTableName) to either (a) narrow the
pattern to column-resolution errors (e.g. match "cannot resolve" or messages
that mention "column ... does not exist") or (b) check
spark.catalog.tableExists(recordsFullTableName) before calling migrateModel(),
and only call migrateModel() when the table exists but the schema indicates
missing columns (use the BookkeeperDeltaTable init/migrateModel() flow
accordingly so you don’t mask genuine missing-table/database errors).

In
`@pramen/core/src/main/scala/za/co/absa/pramen/core/bookkeeper/BookkeeperDeltaPath.scala`:
- Line 63: The catch in BookkeeperDeltaPath currently matches AnalysisException
messages containing the generic phrase "does not exist", which can accidentally
swallow unrelated errors (e.g., missing table or path) before migrateModel()
runs; update the match in the exception handler in BookkeeperDeltaPath (the
AnalysisException branch used around migrateModel/init logic) to only target
schema/column resolution failures (e.g., check ex.getMessage contains "cannot
resolve" or a case-insensitive regex like ".*column.*does not exist.*") or add
an explicit path existence check (call the path existence check used in init()
before performing migrateModel()) so that non-schema errors are not
misclassified and are propagated instead of being masked.

In
`@pramen/core/src/main/scala/za/co/absa/pramen/core/reader/JdbcUrlSelector.scala`:
- Around line 71-83: The loadDriver method in JdbcUrlSelector currently creates
a URLClassLoader and returns only the Driver, leaking the loader; modify
loadDriver to return/encapsulate both the Driver and its URLClassLoader (e.g., a
small Closeable wrapper class like DriverWithClassLoader), update callers to
store that wrapper, and ensure JdbcUrlSelectorImpl.close() calls loader.close()
(via wrapper.close()) in addition to closing the JDBC Connection so the jar
classloader is properly released.

In
`@pramen/core/src/main/scala/za/co/absa/pramen/core/reader/JdbcUrlSelectorImpl.scala`:
- Around line 120-125: The cached JDBC connection is replaced without closing
the old one; before assigning connection = newConnection in the branch that
checks connection validity, attempt to close the existing connection in a
best-effort block: if connection != null then try connection.close() (guarding
with !connection.isClosed if desired) and ignore/log any SQLException, then
proceed to assign the new handle from getNewConnection(...) and register it with
ThreadClosableRegistry.registerCloseable(connection); perform the close inside a
try/catch so failures don't prevent creating/assigning the new connection.

In
`@pramen/core/src/main/scala/za/co/absa/pramen/core/utils/JdbcNativeUtils.scala`:
- Around line 177-184: getResultSet(jdbcConfig: JdbcConfig, url: String, query:
String, jdbcDriverJarPath: Option[String]) and its overload
getResultSet(connection, jdbcConfig, query) currently return only a ResultSet
which leaks the owning Statement and Connection; change them to return a
closeable resource object (e.g., JdbcResultSetWrapper or similar) that holds the
Connection, Statement and ResultSet and implements AutoCloseable/Closable so its
close() closes rs, stmt, and conn in the correct order; update both getResultSet
overloads to construct and return this wrapper (the first to open the Connection
and pass it into the second), and update all callers to use the new wrapper with
try-with-resources / try-finally so the whole (Connection, Statement, ResultSet)
stack is always closed.
- Around line 121-130: The current for-comprehension scopes the selector-owned
Connection into the Using/cleanup logic (via "connection <- conn"), causing the
cached connection to be closed; change withResultSet so it does NOT put the
selector-owned Connection into the Using scope: obtain the connection reference
directly from jdbcUrlSelector.getConnection (e.g. assign the returned conn to a
plain val) and only wrap/using the Statement and ResultSet (created via
connection.createStatement and executeQuery) so that Statement/ResultSet are
closed but the selector retains ownership and lifetime of the Connection; keep
references to executeQuery and action unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6e38d49c-e309-4210-82c7-f85aae4726a9

📥 Commits

Reviewing files that changed from the base of the PR and between 1383679 and ed9e9b2.

📒 Files selected for processing (32)
  • pramen/api/pom.xml
  • pramen/api/src/main/scala/za/co/absa/pramen/api/TableReader.scala
  • pramen/core/pom.xml
  • pramen/core/src/main/scala/za/co/absa/pramen/core/bookkeeper/BookkeeperDeltaPath.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/rdb/PramenDb.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/rdb/RdbJdbc.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/reader/JdbcUrlSelector.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/reader/JdbcUrlSelectorImpl.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderJdbc.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderJdbcBase.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderJdbcNative.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderSpark.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/source/JdbcSource.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/utils/JdbcNativeUtils.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/utils/JdbcSparkUtils.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/utils/hive/QueryExecutorJdbc.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/utils/impl/ResultSetToRowIterator.scala
  • pramen/core/src/main/scala_2.12/za/co/absa/pramen/core/bookkeeper/BookkeeperDeltaTable.scala
  • pramen/core/src/main/scala_2.13/za/co/absa/pramen/core/bookkeeper/BookkeeperDeltaTable.scala
  • pramen/core/src/test/scala/za/co/absa/pramen/core/integration/IncrementalPipelineJdbcLongSuite.scala
  • pramen/core/src/test/scala/za/co/absa/pramen/core/integration/IncrementalPipelineLongFixture.scala
  • pramen/core/src/test/scala/za/co/absa/pramen/core/mocks/reader/ReaderSpy.scala
  • pramen/core/src/test/scala/za/co/absa/pramen/core/mocks/reader/ReaderStub.scala
  • pramen/core/src/test/scala/za/co/absa/pramen/core/tests/journal/JournalHadoopDeltaTableSuite.scala
  • pramen/core/src/test/scala/za/co/absa/pramen/core/tests/reader/JdbcUrlSelectorImplSuite.scala
  • pramen/core/src/test/scala/za/co/absa/pramen/core/tests/reader/TableReaderJdbcNativeSuite.scala
  • pramen/core/src/test/scala/za/co/absa/pramen/core/tests/utils/JdbcNativeUtilsSuite.scala
  • pramen/core/src/test/scala/za/co/absa/pramen/core/tests/utils/JdbcSparkUtilsSuite.scala
  • pramen/core/src/test/scala/za/co/absa/pramen/core/tests/utils/hive/QueryExecutorJdbcSuite.scala
  • pramen/extras/pom.xml
  • pramen/extras/src/test/scala/za/co/absa/pramen/extras/mocks/ReaderSpy.scala
  • pramen/pom.xml

import java.time.LocalDate

trait TableReader {
trait TableReader extends AutoCloseable {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

repo_root="."
target="pramen/api/src/main/scala/za/co/absa/pramen/api/TableReader.scala"

echo "== File: $target =="
sed -n '1,120p' "$target" | nl -ba

echo
echo "== Find implementations/uses of TableReader in repo =="
rg -n --hidden --no-ignore-vcs "extends\s+TableReader\b|with\s+TableReader\b|TableReader\s*(" pramen -S || true

echo
echo "== Find any existing close() methods on TableReader implementers =="
rg -n --hidden --no-ignore-vcs "\bclass\b|\btrait\b|object\s" pramen/api/src/main/scala/za/co/absa/pramen/api -S || true

Repository: AbsaOSS/pramen

Length of output: 197


🏁 Script executed:

#!/bin/bash
set -euo pipefail

target="pramen/api/src/main/scala/za/co/absa/pramen/api/TableReader.scala"

echo "== Confirm path exists =="
ls -la "$target" || true

echo
echo "== Show TableReader.scala (first 120 lines) =="
if [ -f "$target" ]; then
  awk '{printf "%6d\t%s\n", NR, $0}' "$target" | sed -n '1,120p'
else
  echo "File not found: $target"
fi

echo
echo "== Locate any other TableReader.scala files (in case path differs) =="
fd -t f -a TableReader.scala . || true

echo
echo "== Find repo usages/implementations of TableReader =="
rg -n --hidden --no-ignore-vcs "\bextends\s+TableReader\b|\bwith\s+TableReader\b|\bTableReader\s*\[" pramen -S || true

echo
echo "== Find any close() implementations in API module (TableReader related) =="
rg -n --hidden --no-ignore-vcs "\bdef\s+close\s*\(" pramen/api/src/main/scala -S || true

echo
echo "== Find close() signatures on TableReader itself =="
rg -n --hidden --no-ignore-vcs "trait\s+TableReader\b|AutoCloseable|override\s+def\s+close\s*\(" "$target" pramen/api/src/main/scala -S || true

Repository: AbsaOSS/pramen

Length of output: 3837


🏁 Script executed:

#!/bin/bash
set -euo pipefail

files=(
  pramen/core/src/test/scala/za/co/absa/pramen/core/mocks/reader/ReaderSpy.scala
  pramen/core/src/test/scala/za/co/absa/pramen/core/mocks/reader/ReaderStub.scala
  pramen/extras/src/test/scala/za/co/absa/pramen/extras/mocks/ReaderSpy.scala
  pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderSpark.scala
  pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderJdbcBase.scala
)

echo "== close() occurrences in TableReader implementations =="
for f in "${files[@]}"; do
  echo
  echo "--- $f ---"
  if [ -f "$f" ]; then
    rg -n "\bdef\s+close\s*\(" "$f" || true
  else
    echo "MISSING: $f"
  fi
done

echo
echo "== Show TableReaderSpark / JdbcBase definitions around class/trait header + close if present =="
for f in \
  pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderSpark.scala \
  pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderJdbcBase.scala
do
  echo
  echo "--- $f (context) ---"
  sed -n '1,120p' "$f" | awk '{printf "%4d\t%s\n", NR, $0}' | sed -n '1,120p'
done

Repository: AbsaOSS/pramen

Length of output: 12719


Adding AutoCloseable to TableReader forces an abstract close() contract downstream (binary/source break)

za.co.absa.pramen.api.TableReader now extends AutoCloseable (line 24) but doesn’t define a default close(), so external implementations that don’t already implement def close(): Unit will fail to compile (and can hit AbstractMethodError at runtime when close() is invoked). Provide a default no-op override def close(): Unit = () in the trait and let resource-owning readers override it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pramen/api/src/main/scala/za/co/absa/pramen/api/TableReader.scala` at line
24, The TableReader trait was changed to extend AutoCloseable but doesn't
provide a default close implementation, breaking downstream implementations; add
a default no-op override def close(): Unit = () in the TableReader trait so
existing external implementations remain binary/source compatible and
resource-owning readers can still override close() as needed, referencing the
TableReader trait and its close() method for where to add this default.

spark.table(recordsFullTableName).as[DataChunk]
} catch {
case ex: AnalysisException if ex.getMessage().contains("cannot resolve") =>
case ex: AnalysisException if ex.getMessage().contains("cannot resolve") || ex.getMessage().contains("does not exist") =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Overly broad error pattern matching.

The pattern "does not exist" is generic and may match errors beyond schema validation failures, such as:

  • "Table does not exist" (missing table entirely)
  • "Database does not exist" (catalog issue)

While init() creates the records table on construction, external deletion or catalog issues could still trigger these non-schema errors. If that occurs, migrateModel() will fail with SaveMode.Append on a non-existent table, masking the original error with a confusing secondary error.

Consider a more specific pattern that targets column/field resolution failures:

case ex: AnalysisException if ex.getMessage().contains("cannot resolve") || 
                              ex.getMessage().toLowerCase.matches(".*column.*does not exist.*") =>

Alternatively, verify the table exists before attempting migration using spark.catalog.tableExists().

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@pramen/core/src/main/scala_2.12/za/co/absa/pramen/core/bookkeeper/BookkeeperDeltaTable.scala`
at line 57, The AnalysisException pattern in BookkeeperDeltaTable (line with
case ex: AnalysisException) is too broad and can match "table/database does not
exist" errors; narrow the match to only column/field resolution failures (e.g.,
check for "cannot resolve" or a message pattern like "column .* does not exist")
or, better, proactively verify the table exists before calling migrateModel()
using Spark's catalog (spark.catalog.tableExists) to avoid attempting
SaveMode.Append on a missing table; update the match in the exception handler
and/or add a table existence check in init()/migrateModel() so migrateModel()
only runs when the target table is present.

Comment on lines +57 to +60
case ex: AnalysisException if ex.getMessage().contains("cannot resolve") || ex.getMessage().contains("does not exist") =>
// Spark 2 and 3
migrateModel()
spark.table(recordsFullTableName).as[DataChunk]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Overly broad error pattern matching.

The pattern "does not exist" is generic and may match errors beyond schema validation failures, such as:

  • "Table does not exist" (missing table entirely)
  • "Database does not exist" (catalog issue)

While init() creates the records table on construction, external deletion or catalog issues could still trigger these non-schema errors. If that occurs, migrateModel() will fail with SaveMode.Append on a non-existent table, masking the original error with a confusing secondary error.

Consider a more specific pattern that targets column/field resolution failures:

case ex: AnalysisException if ex.getMessage().contains("cannot resolve") || 
                              ex.getMessage().toLowerCase.matches(".*column.*does not exist.*") =>

Alternatively, verify the table exists before attempting migration using spark.catalog.tableExists().

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@pramen/core/src/main/scala_2.13/za/co/absa/pramen/core/bookkeeper/BookkeeperDeltaTable.scala`
around lines 57 - 60, The current catch in BookkeeperDeltaTable matches
AnalysisException messages containing "does not exist", which is too broad and
can catch missing-table/database errors and lead migrateModel() to attempt
SaveMode.Append on a non-existent table; update the handling in the code that
calls migrateModel()/spark.table(recordsFullTableName) to either (a) narrow the
pattern to column-resolution errors (e.g. match "cannot resolve" or messages
that mention "column ... does not exist") or (b) check
spark.catalog.tableExists(recordsFullTableName) before calling migrateModel(),
and only call migrateModel() when the table exists but the schema indicates
missing columns (use the BookkeeperDeltaTable init/migrateModel() flow
accordingly so you don’t mask genuine missing-table/database errors).

load()
} catch {
case ex: AnalysisException if ex.getMessage().contains("cannot resolve") =>
case ex: AnalysisException if ex.getMessage().contains("cannot resolve") || ex.getMessage().contains("does not exist") =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Overly broad error pattern matching.

The pattern "does not exist" is generic and may match errors beyond schema validation failures, such as:

  • "Table does not exist" (missing table entirely)
  • "Path does not exist" (filesystem issue)

While init() creates the records path on construction, external deletion or race conditions could still trigger these non-schema errors. If that occurs, migrateModel() will fail with SaveMode.Append on a non-existent path, masking the original error with a confusing secondary error.

Consider a more specific pattern that targets column/field resolution failures:

case ex: AnalysisException if ex.getMessage().contains("cannot resolve") || 
                              ex.getMessage().toLowerCase.matches(".*column.*does not exist.*") =>

Alternatively, verify the path exists before attempting migration.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@pramen/core/src/main/scala/za/co/absa/pramen/core/bookkeeper/BookkeeperDeltaPath.scala`
at line 63, The catch in BookkeeperDeltaPath currently matches AnalysisException
messages containing the generic phrase "does not exist", which can accidentally
swallow unrelated errors (e.g., missing table or path) before migrateModel()
runs; update the match in the exception handler in BookkeeperDeltaPath (the
AnalysisException branch used around migrateModel/init logic) to only target
schema/column resolution failures (e.g., check ex.getMessage contains "cannot
resolve" or a case-insensitive regex like ".*column.*does not exist.*") or add
an explicit path existence check (call the path existence check used in init()
before performing migrateModel()) so that non-schema errors are not
misclassified and are propagated instead of being masked.

Comment thread pramen/core/src/main/scala/za/co/absa/pramen/core/reader/JdbcUrlSelector.scala Outdated
Comment on lines +177 to +184
private[core] def getResultSet(jdbcConfig: JdbcConfig,
url: String,
query: String,
jdbcDriverJarPath: Option[String]): ResultSet = {
val driverOpt = jdbcDriverJarPath.map(path => JdbcUrlSelector.loadDriver(path, jdbcConfig.driver))
val connection = getJdbcConnection(jdbcConfig, url, driverOpt)

getResultSet(connection, jdbcConfig, query)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Keep Connection and Statement ownership with the returned result set.

The overload at Lines 177-184 opens a fresh JDBC Connection, and the core helper at Lines 187-213 creates a Statement, but both return only a ResultSet. The native iterator/count paths close the ResultSet only, so executor reads leak the connection and all callers leak the statement/open cursor. This needs a closeable wrapper or iterator that owns the full (Connection, Statement, ResultSet) stack.

Also applies to: 187-213

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@pramen/core/src/main/scala/za/co/absa/pramen/core/utils/JdbcNativeUtils.scala`
around lines 177 - 184, getResultSet(jdbcConfig: JdbcConfig, url: String, query:
String, jdbcDriverJarPath: Option[String]) and its overload
getResultSet(connection, jdbcConfig, query) currently return only a ResultSet
which leaks the owning Statement and Connection; change them to return a
closeable resource object (e.g., JdbcResultSetWrapper or similar) that holds the
Connection, Statement and ResultSet and implements AutoCloseable/Closable so its
close() closes rs, stmt, and conn in the correct order; update both getResultSet
overloads to construct and return this wrapper (the first to open the Connection
and pass it into the second), and update all callers to use the new wrapper with
try-with-resources / try-finally so the whole (Connection, Statement, ResultSet)
stack is always closed.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 26, 2026

Unit Test Coverage

Overall Project 76.97% -0.59% 🍏
Files changed 50.27%

Module Coverage
pramen:core Jacoco Report 77.97% -0.65%
Files
Module File Coverage
pramen:core Jacoco Report JdbcSource.scala 100% 🍏
DynamicDriver.scala 100% 🍏
JdbcUrlSelector.scala 100% 🍏
TableReaderJdbcBase.scala 100% -7.75%
BookkeeperDeltaPath.scala 97.36% -0.16% 🍏
JdbcSparkUtils.scala 91.7% -1.04%
BookkeeperDeltaTable.scala 88.95% -0.14% 🍏
QueryExecutorJdbc.scala 86.05% -0.95% 🍏
RdbJdbc.scala 84.65% -2.63%
TableReaderJdbc.scala 82.79% 🍏
ResultSetToRowIterator.scala 79% -21.29%
TableReaderJdbcNative.scala 74.79% -3.44%
JdbcUrlSelectorImpl.scala 72% -11.35%
JdbcNativeUtils.scala 71.4% -22%
TableReaderSpark.scala 70.58% -0.08%
PramenDb.scala 41.38% -0.43% 🍏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for dynamically loading JDBC drivers for the JDBC Native source

1 participant