-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] from_json
fails with cuDF error Invalid list size computation error
#9212
Comments
@ttnghia has worked on the json tokenization layer in spark-rapids-jni and can provide help as needed. |
Will look into this. |
This is not a bug but rather the limitation of the current implementation:
|
I just tested this again, using the code from #9423, and it actually failed with a segmentation fault, which is concerning.
|
I can reproduce it with the latest cudf code:
|
I realize that the issue is due to having
So there should be something wrong with handling empty input somewhere. |
Without the repartition the query is falling back to CPU ( |
Got it. So this is indeed a bug in I'll post a fix PR shortly. |
Alright, that crash issue should be fixed by NVIDIA/spark-rapids-jni#1536. After fixing, the example in this issue will cause a regular cudf exception being thrown. |
I just tested this on latest branch-24.02 and it is no longer an issue |
Describe the bug
I am testing with a custom build of
spark-rapids-jni
, where I am specifyingRECOVER_WITH_NULL
in thefrom_json
function that gets called fromextractRawMapFromJsonString
.A simple test of
from_json
results in the cuDF errorInvalid list size computation error
.Steps/Code to reproduce bug
Fails with
Expected behavior
Spark without plugin produces:
Environment details (please complete the following information)
N/A
Additional context
The text was updated successfully, but these errors were encountered: