Start to use explicit memory limits in the parquet chunked reader #9991

revans2 · 2023-12-07T19:36:02Z

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

ttnghia · 2024-01-26T06:31:47Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala

+      val passReadLimit = if (useSubPageChunked) {
+        4 * chunkSizeByteLimit


Can we set some multiplier constant or configurable constant for it, instead of hard coding 4X like this? In the (near) future, chunked ORC reader may have benefit from it.

I'm not sure where would be a good place to put this. It is a magic number based on our estimates that we will take 4x the target batch size as the amount of memory that we are allowed to use. GpuDataProducer is the only place that might be common between them, but that is not a proper place for it. I could create a static object to hold it somewhere. GpuConventionMagicNumbers or something.

Okay then let's see what can we do the next time with the ORC chunked reader. I'm fine to leave this as-is for now.

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

revans2 · 2024-01-29T22:03:53Z

build

revans2 · 2024-01-30T14:34:08Z

@ttnghia could you please take another look?

jlowe · 2024-01-30T16:37:39Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala

+      .doc("Enable a chunked reader where possible for reading data that is smaller " +
+          "than the typical row group/page limit. Currently this only works for parquet.")
+      .booleanConf
+      .createWithDefault(false)


We should have a followup issue to turn this on in 24.04.

Start to use explicit memory limits in the parquet chunked reader

71100fa

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

revans2 added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Dec 7, 2023

sameerz added the reliability Features to improve reliability or bugs that severly impact the reliability of the plugin label Dec 10, 2023

ttnghia reviewed Jan 26, 2024

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala Outdated Show resolved Hide resolved

revans2 added 2 commits January 29, 2024 15:29

Merge branch 'branch-24.02' into limit_chunked_memory_usage

4d8b681

Update copyrights, Review Comments, remove debug logging

cac7125

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

revans2 marked this pull request as ready for review January 29, 2024 22:03

abellina approved these changes Jan 30, 2024

View reviewed changes

ttnghia approved these changes Jan 30, 2024

View reviewed changes

jlowe approved these changes Jan 30, 2024

View reviewed changes

revans2 mentioned this pull request Jan 30, 2024

[FEA] Turn on memory limits for parquet reader #10334

Closed

revans2 merged commit 7fd6292 into NVIDIA:branch-24.02 Jan 30, 2024
40 checks passed

revans2 mentioned this pull request Feb 5, 2024

[FEA] Add config option for the parquet reader input read limit. #9926

Closed

ttnghia mentioned this pull request Apr 16, 2024

Change parameters for memory limit in Parquet chunked reader #10718

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start to use explicit memory limits in the parquet chunked reader #9991

Start to use explicit memory limits in the parquet chunked reader #9991

revans2 commented Dec 7, 2023

ttnghia Jan 26, 2024 •

edited

revans2 Jan 29, 2024

ttnghia Jan 30, 2024

revans2 commented Jan 29, 2024

revans2 commented Jan 30, 2024

jlowe Jan 30, 2024

revans2 Jan 30, 2024

		val passReadLimit = if (useSubPageChunked) {
		4 * chunkSizeByteLimit

Start to use explicit memory limits in the parquet chunked reader #9991

Start to use explicit memory limits in the parquet chunked reader #9991

Conversation

revans2 commented Dec 7, 2023

ttnghia Jan 26, 2024 • edited

Choose a reason for hiding this comment

revans2 Jan 29, 2024

Choose a reason for hiding this comment

ttnghia Jan 30, 2024

Choose a reason for hiding this comment

revans2 commented Jan 29, 2024

revans2 commented Jan 30, 2024

jlowe Jan 30, 2024

Choose a reason for hiding this comment

revans2 Jan 30, 2024

Choose a reason for hiding this comment

ttnghia Jan 26, 2024 •

edited