CASSANALYTICS-36: Bulk Reader should dynamically size the Spark job b…#118
CASSANALYTICS-36: Bulk Reader should dynamically size the Spark job b…#118frankgh merged 7 commits intoapache:trunkfrom
Conversation
| @Test | ||
| void testDynamicSizingOption() | ||
| { | ||
| Dataset<Row> data = bulkReaderDataFrame(tableForNullStaticColumn).option("SIZING", "dynamic").load(); |
There was a problem hiding this comment.
using tableForNullStaticColumn here seems implies that this test somehow has something to do with the null static column table, which isn't really the case other than the row.getString(0) below - can we switch this to just use one of the other tables?
There was a problem hiding this comment.
Yeah, good point. Let me make the change
2100474 to
e7ece5d
Compare
| this.table = table; | ||
| this.dc = datacenter; | ||
| this.maxPartitionSize = maxPartitionSize; | ||
| this.numCores = availableCores; |
There was a problem hiding this comment.
NIT on my own patch: We should rename the field availableCores as well.
| /** | ||
| * @deprecated replaced by {@link #ABORTED} | ||
| */ | ||
| @Deprecated |
There was a problem hiding this comment.
unrelated change. Please revert
| dependencies { | ||
| shadow(group: 'org.slf4j', name: 'slf4j-api', version: "${project.slf4jApiVersion}") | ||
| api(project(':analytics-sidecar-vertx-client')) | ||
| implementation(project(':analytics-sidecar-vertx-client')) |
There was a problem hiding this comment.
Unrelated; please revert. If it is a problem, we should fix it in another patch.
| * assume a maximum partition size of 2.5 GB. Also, assume that a consistency | ||
| * level of 2. The number of cores is calculated by the following formula: |
There was a problem hiding this comment.
I do not think it is "consistency level of 2", but should be 3. minReplicas is the quorum of 3.
There was a problem hiding this comment.
Correction: I meant to say, "the consistency level is quorum/local_quorum in this example, and RF is 3". It leads to the minReplicas of 2.
| // normalize table size to the number of instances | ||
| LOGGER.debug("{}/{} instances were used to determine the table size {} for table {}.{}", | ||
| successCount, instancesSize, tableSizeSum, keyspace, table); |
There was a problem hiding this comment.
How about logging at info level? It is logged only once per job and the information is seemingly useful for debugging.
…ased on estimated table size Patch by Francisco Guerrero; reviewed by TBD for CASSANALYTICS-36
7088f9e to
2a98fa0
Compare
…ased on estimated table size
Patch by Francisco Guerrero; reviewed by TBD for CASSANALYTICS-36