[BUG] `from_json` fails with cuDF error `Invalid list size computation error` #9212

andygrove · 2023-09-08T22:18:10Z

Describe the bug

I am testing with a custom build of spark-rapids-jni, where I am specifying RECOVER_WITH_NULL in the from_json function that gets called from extractRawMapFromJsonString.

A simple test of from_json results in the cuDF error Invalid list size computation error.

Steps/Code to reproduce bug

scala> val df = Seq("{'a': '1'}\n{'a': '2'}\n").toDF("str").repartition(2)
df: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [str: string]

scala> df.createOrReplaceTempView("t")

scala> spark.sql("select from_json(str, 'MAP<STRING,STRING>') from t").show()

Fails with

ai.rapids.cudf.CudfException: CUDF failure at: /home/andy/git/nvidia/spark-rapids-jni/src/main/cpp/src/map_utils.cu:609: Invalid list size computation.
	at com.nvidia.spark.rapids.jni.MapUtils.extractRawMapFromJsonString(Native Method)
	at com.nvidia.spark.rapids.jni.MapUtils.extractRawMapFromJsonString(MapUtils.java:49)
	at org.apache.spark.sql.rapids.GpuJsonToStructs.doColumnar(GpuJsonToStructs.scala:153)

Expected behavior

Spark without plugin produces:

+--------+
| entries|
+--------+
|{a -> 1}|
+--------+

Environment details (please complete the following information)
N/A

Additional context

The text was updated successfully, but these errors were encountered:

mattahrens · 2023-09-12T20:45:16Z

@ttnghia has worked on the json tokenization layer in spark-rapids-jni and can provide help as needed.

ttnghia · 2023-09-12T21:00:37Z

Will look into this.

ttnghia · 2023-09-14T17:27:40Z

This is not a bug but rather the limitation of the current implementation:

cudf JSON parse doesn't support single quote character.
from_json only works with input having one (string) JSON object per row.
Duplicates are not handled.

andygrove · 2023-10-31T20:57:23Z

I just tested this again, using the code from #9423, and it actually failed with a segmentation fault, which is concerning.

Stack: [0x00007f358ff00000,0x00007f3590000000],  sp=0x00007f358fffae48,  free space=1003k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libcuda.so.1+0x186618]
C  [libcuda.so.1+0x277b5a]
C  [libcuda.so.1+0x4fae18]
C  [libcuda.so.1+0x13b116]
C  [libcuda.so.1+0x13b529]
C  [libcuda.so.1+0x13bdc7]
C  [libcuda.so.1+0x2dbca1]
C  [cudf5683805021365819471.so+0x3075821]

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  com.nvidia.spark.rapids.jni.MapUtils.extractRawMapFromJsonString(J)J+0
j  com.nvidia.spark.rapids.jni.MapUtils.extractRawMapFromJsonString(Lai/rapids/cudf/ColumnView;)Lai/rapids/cudf/ColumnVector;+37
j  org.apache.spark.sql.rapids.GpuJsonToStructs.doColumnar(Lcom/nvidia/spark/rapids/GpuColumnVector;)Lai/rapids/cudf/ColumnVector;+18

ttnghia · 2023-10-31T22:14:44Z

I can reproduce it with the latest cudf code:

scala> val df = Seq("{'a': '1'}").toDF("str").repartition(2)
df: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [str: string]

scala> df.createOrReplaceTempView("t")

scala> spark.sql("select from_json(str, 'MAP<STRING,STRING>') from t").show()
....
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f5975570710, pid=468146, tid=0x00007f596c39a640
#

ttnghia · 2023-10-31T22:57:04Z

I realize that the issue is due to having repartition(2). Without it, the example is just fine:


scala> val df = Seq("{'a': '1'}\n{'a': '2'}\n").toDF("str")
df: org.apache.spark.sql.DataFrame = [str: string]

scala> df.createOrReplaceTempView("t")

scala> spark.conf.set("spark.rapids.sql.expression.JsonToStructs","true")

scala> spark.sql("select from_json(str, 'MAP<STRING,STRING>') from t").show()
23/10/31 22:55:33 WARN GpuOverrides: 
! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
  @Expression <AttributeReference> entries#9 could run on GPU

+--------+
| entries|
+--------+
|{a -> 1}|
+--------+

So there should be something wrong with handling empty input somewhere.

andygrove · 2023-10-31T23:23:45Z

I realize that the issue is due to having repartition(2)

Without the repartition the query is falling back to CPU (cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec)

ttnghia · 2023-11-01T05:20:46Z

Got it. So this is indeed a bug in from_json in spark-rapids-jni. The issue is due to comparing signed (negative) vs unsigned integers when an exception is being thrown due to invalid token is detected.

I'll post a fix PR shortly.

ttnghia · 2023-11-01T18:09:28Z

Alright, that crash issue should be fixed by NVIDIA/spark-rapids-jni#1536.

After fixing, the example in this issue will cause a regular cudf exception being thrown.

andygrove · 2024-01-24T22:50:50Z

I just tested this on latest branch-24.02 and it is no longer an issue

andygrove added bug Something isn't working ? - Needs Triage Need team to review and classify labels Sep 8, 2023

andygrove self-assigned this Sep 8, 2023

mattahrens removed the ? - Needs Triage Need team to review and classify label Sep 12, 2023

andygrove mentioned this issue Oct 17, 2023

[FEA] [EPIC] Priority JSON Issues #9458

Open

26 tasks

andygrove closed this as completed Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] `from_json` fails with cuDF error `Invalid list size computation error` #9212

[BUG] `from_json` fails with cuDF error `Invalid list size computation error` #9212

andygrove commented Sep 8, 2023

mattahrens commented Sep 12, 2023

ttnghia commented Sep 12, 2023

ttnghia commented Sep 14, 2023

andygrove commented Oct 31, 2023 •

edited

ttnghia commented Oct 31, 2023

ttnghia commented Oct 31, 2023 •

edited

andygrove commented Oct 31, 2023

ttnghia commented Nov 1, 2023 •

edited

ttnghia commented Nov 1, 2023 •

edited

andygrove commented Jan 24, 2024

[BUG] from_json fails with cuDF error Invalid list size computation error #9212

[BUG] from_json fails with cuDF error Invalid list size computation error #9212

Comments

andygrove commented Sep 8, 2023

mattahrens commented Sep 12, 2023

ttnghia commented Sep 12, 2023

ttnghia commented Sep 14, 2023

andygrove commented Oct 31, 2023 • edited

ttnghia commented Oct 31, 2023

ttnghia commented Oct 31, 2023 • edited

andygrove commented Oct 31, 2023

ttnghia commented Nov 1, 2023 • edited

ttnghia commented Nov 1, 2023 • edited

andygrove commented Jan 24, 2024

[BUG] `from_json` fails with cuDF error `Invalid list size computation error` #9212

[BUG] `from_json` fails with cuDF error `Invalid list size computation error` #9212

andygrove commented Oct 31, 2023 •

edited

ttnghia commented Oct 31, 2023 •

edited

ttnghia commented Nov 1, 2023 •

edited

ttnghia commented Nov 1, 2023 •

edited