#613 Add retries when executing count queries with the Hive JDBC driver.#614
Conversation
WalkthroughThe codebase refactors JDBC count query logic by moving native count methods from Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant TableReaderJdbcNative
participant TableReaderJdbcBase
participant JdbcNativeUtils
Caller->>TableReaderJdbcNative: getRecordCount(query)
alt Query is Table
TableReaderJdbcNative->>TableReaderJdbcBase: getCountForTableNatively(...)
TableReaderJdbcBase->>JdbcNativeUtils: withResultSet (uses executeQuery with retry)
else Query is SQL (starts with select)
TableReaderJdbcNative->>TableReaderJdbcBase: getCountForSql(...)
TableReaderJdbcBase->>JdbcNativeUtils: withResultSet (uses executeQuery with retry)
else Other SQL
TableReaderJdbcNative->>TableReaderJdbcBase: getCountForTableNatively(...)
TableReaderJdbcBase->>JdbcNativeUtils: withResultSet (uses executeQuery with retry)
end
JdbcNativeUtils-->>TableReaderJdbcBase: ResultSet with count
TableReaderJdbcBase-->>TableReaderJdbcNative: count
TableReaderJdbcNative-->>Caller: count
Assessment against linked issues
Assessment against linked issues: Out-of-scope changesNo out-of-scope changes found. Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
pramen/core/src/main/scala/za/co/absa/pramen/core/utils/JdbcNativeUtils.scala (2)
25-25: Prefer explicit imports over wildcard imports.Wildcard imports can lead to naming conflicts and make it harder to track dependencies.
Replace the wildcard import with explicit imports:
-import java.sql._ +import java.sql.{Connection, DriverManager, ResultSet, SQLException, Statement}
134-146: Consider making the error pattern configurable.The retry logic correctly handles the intermittent Hive JDBC driver issue. However, the hardcoded error message "Index: 1, Size: 1" is brittle and could break if the driver's error message changes.
Consider making the error pattern configurable or at least defining it as a constant:
private val HIVE_JDBC_RETRY_ERROR_PATTERN = "Index: 1, Size: 1" // Then in the catch block: case ex: SQLException if retriesLeft > 0 && ex.getMessage.contains(HIVE_JDBC_RETRY_ERROR_PATTERN) =>
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderJdbc.scala(1 hunks)pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderJdbcBase.scala(2 hunks)pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderJdbcNative.scala(1 hunks)pramen/core/src/main/scala/za/co/absa/pramen/core/utils/JdbcNativeUtils.scala(3 hunks)pramen/core/src/test/scala/za/co/absa/pramen/core/tests/reader/TableReaderJdbcNativeSuite.scala(0 hunks)
💤 Files with no reviewable changes (1)
- pramen/core/src/test/scala/za/co/absa/pramen/core/tests/reader/TableReaderJdbcNativeSuite.scala
🧰 Additional context used
🧠 Learnings (3)
pramen/core/src/main/scala/za/co/absa/pramen/core/utils/JdbcNativeUtils.scala (1)
Learnt from: yruslan
PR: AbsaOSS/pramen#611
File: pramen/core/src/main/scala/za/co/absa/pramen/core/metastore/model/MetastoreDependencyFactory.scala:29-29
Timestamp: 2025-06-18T08:27:21.504Z
Learning: In pramen/core/src/main/scala/za/co/absa/pramen/core/metastore/model/MetastoreDependencyFactory.scala, the constant variable name was changed from DATE_UNTIL_EXPR_KEY to DATE_TO_EXPR_KEY, but both constants hold the same string value "date.to". This is a variable name refactoring, not a configuration key change, so it doesn't affect backward compatibility.
pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderJdbc.scala (1)
Learnt from: yruslan
PR: AbsaOSS/pramen#611
File: pramen/core/src/main/scala/za/co/absa/pramen/core/metastore/model/MetastoreDependencyFactory.scala:29-29
Timestamp: 2025-06-18T08:27:21.504Z
Learning: In pramen/core/src/main/scala/za/co/absa/pramen/core/metastore/model/MetastoreDependencyFactory.scala, the constant variable name was changed from DATE_UNTIL_EXPR_KEY to DATE_TO_EXPR_KEY, but both constants hold the same string value "date.to". This is a variable name refactoring, not a configuration key change, so it doesn't affect backward compatibility.
pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderJdbcBase.scala (1)
Learnt from: yruslan
PR: AbsaOSS/pramen#611
File: pramen/core/src/main/scala/za/co/absa/pramen/core/metastore/model/MetastoreDependencyFactory.scala:29-29
Timestamp: 2025-06-18T08:27:21.504Z
Learning: In pramen/core/src/main/scala/za/co/absa/pramen/core/metastore/model/MetastoreDependencyFactory.scala, the constant variable name was changed from DATE_UNTIL_EXPR_KEY to DATE_TO_EXPR_KEY, but both constants hold the same string value "date.to". This is a variable name refactoring, not a configuration key change, so it doesn't affect backward compatibility.
🧬 Code Graph Analysis (1)
pramen/core/src/main/scala/za/co/absa/pramen/core/utils/JdbcNativeUtils.scala (2)
pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderJdbc.scala (7)
core(79-79)core(81-103)core(105-119)core(121-137)core(139-141)core(143-209)core(211-217)pramen/core/src/main/scala/za/co/absa/pramen/core/utils/JdbcSparkUtils.scala (2)
core(208-214)core(226-241)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Test Spark 3.3.4 on Scala 2.13.16
- GitHub Check: Test Spark 3.5.5 on Scala 2.13.16
- GitHub Check: Test Spark 3.4.4 on Scala 2.12.20
- GitHub Check: Test Spark 3.5.5 on Scala 2.12.20
- GitHub Check: Test Spark 2.4.8 on Scala 2.11.12
- GitHub Check: Test Spark 3.3.4 on Scala 2.12.20
- GitHub Check: Test Spark 3.4.4 on Scala 2.13.16
- GitHub Check: Test Coverage on Scala 2.12.18
🔇 Additional comments (4)
pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderJdbcNative.scala (1)
52-62: Well-structured query type handling.The refactored pattern matching clearly separates the handling of different query types, improving code clarity and maintainability. The logging for non-SELECT SQL statements is particularly helpful for debugging.
pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderJdbcBase.scala (1)
96-111: Robust count query execution with proper validation.The implementation correctly validates the count query results and provides clear error messages. The use of
JdbcNativeUtils.withResultSetensures the count queries benefit from the new retry logic.pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderJdbc.scala (2)
25-25: Good cleanup of unused import.The removal of
JdbcConfigfrom the import statement is appropriate after moving the native count methods to the base class.
40-56: Let’s search the entire repo for those method definitions (and their visibility modifiers):#!/bin/bash # Search for getCountForTableNatively across all Scala files rg -n 'def getCountForTableNatively' -g '*.scala' # Search for getCountForSql across all Scala files rg -n 'def getCountForSql' -g '*.scala'
Unit Test Coverage
Files
|
There was a problem hiding this comment.
Pull Request Overview
This PR enhances the resilience of JDBC count queries by adding retry logic to handle intermittent Hive JDBC driver exceptions and refactors count query logic for improved clarity and maintainability.
- Introduces retry logic in JdbcNativeUtils to handle specific SQLException messages.
- Refactors count query handling in TableReaderJdbcNative and TableReaderJdbcBase.
- Removes deprecated methods and tests no longer required.
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pramen/core/src/test/scala/za/co/absa/pramen/core/tests/reader/TableReaderJdbcNativeSuite.scala | Removed test for deprecated getSqlExpression. |
| pramen/core/src/main/scala/za/co/absa/pramen/core/utils/JdbcNativeUtils.scala | Added executeQuery retry logic and refactored exception handling. |
| pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderJdbcNative.scala | Updated count query logic to incorporate new retry paths. |
| pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderJdbcBase.scala | Extracted count query logic and introduced helper methods for count queries. |
| pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderJdbc.scala | Removed redundant count query methods to delegate functionality to TableReaderJdbcBase. |
Closes #613
Summary by CodeRabbit
Refactor
Tests