Skip to content

Conversation

@zaleslaw
Copy link
Collaborator

@zaleslaw zaleslaw commented Nov 13, 2025

This commit introduces progress tracking functionality for the JDBC module, including both simple and detailed modes, which are configurable via system properties.

Added new tests, configuration class (JdbcConfig), and memory estimation utilities to ensure efficient handling of large datasets.

Closes #455

This commit introduces progress tracking functionality for the JDBC module, including both simple and detailed modes configurable via system properties. Added new tests, configuration class (`JdbcConfig`), and memory estimation utilities to ensure efficient handling of large datasets.
@zaleslaw zaleslaw requested a review from Copilot November 13, 2025 15:12
Copilot finished reviewing on behalf of zaleslaw November 13, 2025 15:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces comprehensive progress tracking and memory estimation functionality for the JDBC module, enabling safer and more observable database query operations.

Key Changes:

  • Adds configurable progress tracking (simple and detailed modes) via system properties
  • Implements memory estimation utilities to predict data loading requirements before full query execution
  • Introduces JdbcSafeDataLoading API for automatic memory limit management with configurable actions (throw, warn, or apply limit)

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
JdbcSafeLoadingTest.kt Tests for safe loading functionality including memory limit handling and callbacks
JdbcProgressTest.kt Tests for progress tracking configuration via system properties
JdbcProgressOutputTest.kt Tests for verifying progress and warning log output
readJdbc.kt Integrates progress tracker into the core data loading pipeline
progressTracker.kt Implements three progress tracker variants: NoOp, Simple, and Detailed with memory estimation
MemoryEstimator.kt Provides memory estimation using sample rows and COUNT(*) queries
MemoryEstimate.kt Data class for memory estimation results with utility methods
JdbcSafeLoading.kt Public API for safe JDBC loading with automatic memory checking and limit application
JdbcConfig.kt Configuration object for JDBC module behavior via system properties and environment variables

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +203 to +208
private fun formatBytes(bytes: Long): String = when {
bytes < 1024 -> "$bytes bytes"
bytes < 1024 * 1024 -> "%.1f KB".format(bytes / 1024.0)
bytes < 1024 * 1024 * 1024 -> "%.1f MB".format(bytes / (1024.0 * 1024.0))
else -> "%.2f GB".format(bytes / (1024.0 * 1024.0 * 1024.0))
}
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatBytes function is duplicated across three files: progressTracker.kt (line 296), MemoryEstimator.kt (line 203), and JdbcSafeLoading.kt (line 414). This function should be extracted to a common utility file to follow the DRY (Don't Repeat Yourself) principle and ensure consistency.

Suggested change
private fun formatBytes(bytes: Long): String = when {
bytes < 1024 -> "$bytes bytes"
bytes < 1024 * 1024 -> "%.1f KB".format(bytes / 1024.0)
bytes < 1024 * 1024 * 1024 -> "%.1f MB".format(bytes / (1024.0 * 1024.0))
else -> "%.2f GB".format(bytes / (1024.0 * 1024.0 * 1024.0))
}

Copilot uses AI. Check for mistakes.
Comment on lines +414 to +419
private fun formatBytes(bytes: Long): String = when {
bytes < 1024 -> "$bytes bytes"
bytes < 1024 * 1024 -> "%.1f KB".format(bytes / 1024.0)
bytes < 1024 * 1024 * 1024 -> "%.1f MB".format(bytes / (1024.0 * 1024.0))
else -> "%.2f GB".format(bytes / (1024.0 * 1024.0 * 1024.0))
}
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatBytes function is duplicated across three files: progressTracker.kt (line 296), MemoryEstimator.kt (line 203), and JdbcSafeLoading.kt (line 414). This function should be extracted to a common utility file to follow the DRY (Don't Repeat Yourself) principle and ensure consistency.

Suggested change
private fun formatBytes(bytes: Long): String = when {
bytes < 1024 -> "$bytes bytes"
bytes < 1024 * 1024 -> "%.1f KB".format(bytes / 1024.0)
bytes < 1024 * 1024 * 1024 -> "%.1f MB".format(bytes / (1024.0 * 1024.0))
else -> "%.2f GB".format(bytes / (1024.0 * 1024.0 * 1024.0))
}

Copilot uses AI. Check for mistakes.
dbType: DbType,
): Long {
return try {
connection.prepareStatement("SELECT COUNT(*) FROM $tableName").use { stmt ->
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tableName is directly interpolated into the SQL query string without proper identifier quoting. This could lead to SQL injection vulnerabilities or query failures for table names with special characters. Consider using dbType.quoteIdentifier(tableName) or constructing the query through dbType.buildSelectTableQueryWithLimit() and then wrapping it with COUNT(*), similar to how it's done in estimateQueryRowCount.

Suggested change
connection.prepareStatement("SELECT COUNT(*) FROM $tableName").use { stmt ->
connection.prepareStatement("SELECT COUNT(*) FROM ${dbType.quoteIdentifier(tableName)}").use { stmt ->

Copilot uses AI. Check for mistakes.

private fun estimateClobSize(clob: Clob): Long {
return try {
clob.length() * 2L
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The estimateClobSize function doesn't limit the CLOB length like estimateBlobSize does (which uses coerceAtMost(10 * 1024 * 1024)). This could result in excessively large memory estimates for very large CLOBs. Consider adding a similar constraint: clob.length().coerceAtMost(10 * 1024 * 1024) * 2L

Suggested change
clob.length() * 2L
clob.length().coerceAtMost(10 * 1024 * 1024) * 2L

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +56
* val df = JdbcSafeDataLoading.load {
* maxMemoryGb = 2.0
* onExceed = ExceedAction.THROW
*
* onEstimate = { estimate ->
* println("Estimated: ${estimate.humanReadable}")
* }
*
* readSqlTable(config, "huge_table")
* }
* ```
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example code in the documentation is incorrect. The configuration properties (maxMemoryGb, onExceed, onEstimate) should be set in a configure lambda parameter, not inside the main block. The correct usage should be:

val df = JdbcSafeDataLoading.load(
    configure = {
        maxMemoryGb = 2.0
        onExceed = ExceedAction.THROW
        onEstimate = { estimate ->
            println("Estimated: ${estimate.humanReadable}")
        }
    }
) {
    readSqlTable(config, "huge_table")
}
Suggested change
* val df = JdbcSafeDataLoading.load {
* maxMemoryGb = 2.0
* onExceed = ExceedAction.THROW
*
* onEstimate = { estimate ->
* println("Estimated: ${estimate.humanReadable}")
* }
*
* readSqlTable(config, "huge_table")
* }
* ```
* val df = JdbcSafeDataLoading.load(
* configure = {
* maxMemoryGb = 2.0
* onExceed = ExceedAction.THROW
* onEstimate = { estimate ->
* println("Estimated: ${estimate.humanReadable}")
* }
* }
* ) {
* readSqlTable(config, "huge_table")
* }

Copilot uses AI. Check for mistakes.
""".trimIndent()
)

// Insert 10000 rows (~10KB each = ~100MB total)
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment incorrectly states "~10KB each = ~100MB total" but the code creates 1KB per row (VARCHAR(1000) with 1000 characters). For 10000 rows at 1KB each, the total should be approximately 10MB, not 100MB.

Suggested change
// Insert 10000 rows (~10KB each = ~100MB total)
// Insert 10000 rows (~1KB each = ~10MB total)

Copilot uses AI. Check for mistakes.
Comment on lines +296 to +301
private fun formatBytes(bytes: Long): String = when {
bytes < 1024 -> "$bytes bytes"
bytes < 1024 * 1024 -> "%.1f KB".format(bytes / 1024.0)
bytes < 1024 * 1024 * 1024 -> "%.1f MB".format(bytes / (1024.0 * 1024.0))
else -> "%.2f GB".format(bytes / (1024.0 * 1024.0 * 1024.0))
}
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatBytes function is duplicated across three files: progressTracker.kt (line 296), MemoryEstimator.kt (line 203), and JdbcSafeLoading.kt (line 414). This function should be extracted to a common utility file to follow the DRY (Don't Repeat Yourself) principle and ensure consistency.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add to debug logging in the jdbc integration module the information about the data loading process

2 participants