Skip to content

Conversation

@panbingkun
Copy link
Contributor

What changes were proposed in this pull request?

The pr aims to add Codegen Support for json_object_keys.

Why are the changes needed?

  • improve codegen coverage.
  • simplified code.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass GA & Existed UT (eg: JsonFunctionsSuite#json_object_keys function)

Was this patch authored or co-authored using generative AI tooling?

No.

@panbingkun panbingkun marked this pull request as ready for review October 12, 2024 16:33
jsonParser.skipChildren();
}
return new GenericArrayData(arrayBufferOfKeys.toArray());
} catch (IOException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you handle JsonProcessingException?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If written as one of the following two styles:

  • A.
image
  • B.
image
  • Because:
    JsonProcessingException extends JacksonException and JacksonException extends java.io.IOException

  • So, Based on my knowledge that catch java.io.IOException should already be enough to achieve the goal.

@MaxGekk
Copy link
Member

MaxGekk commented Oct 13, 2024

+1, LGTM. Merging to master.
Thank you, @panbingkun.

@MaxGekk MaxGekk closed this in 54fd408 Oct 13, 2024
@SubhamSinghal
Copy link

@panbingkun Do you plan to add codeGen support for StructsToJson and JsonToStructs expression as well?

@panbingkun
Copy link
Contributor Author

JsonToStructs

Yes!

@SubhamSinghal
Copy link

cool. If you are not planning for StructsToJson, then I will take your code as ref and try to implement it.

@panbingkun
Copy link
Contributor Author

cool. If you are not planning for StructsToJson, then I will take your code as ref and try to implement it.

I don't think StructsToJson(to_json) is suitable for using Invoke & RuntimeReplaceable to support codegen.
A draft implementation can be found at: #48467

@SubhamSinghal
Copy link

SubhamSinghal commented Oct 15, 2024

sorry for naive question, any reason why to_json is not suitable for using Invoke & RuntimeReplaceable? I was checking proto equivalent function toProtobuf expression which extends RuntimeReplaceable

@panbingkun
Copy link
Contributor Author

sorry for naive question, any reason why to_json is not suitable for using Invoke & RuntimeReplaceable? I was checking proto equivalent function toProtobuf expression which extends RuntimeReplaceable

If we use Invoke and RuntimeReplaceable to support Codegen for to_json,

  • Do we need to create a JacksonGenerator every time? Based on my understanding, if we do this, the performance doesn't seem to be optimal.
  • If we only create JacksonGenerator once in to_json, then we have to pass it in the Invoke call, and we must make JacksonGenerator extend Serializable, which will cause more or less changes to the old logic. Variables that may be used in Invoke, such as writer and converter, all have similar issues.

MaxGekk pushed a commit that referenced this pull request Oct 30, 2024
…emove redundant whether is null check

### What changes were proposed in this pull request?
- The pr aims to remove redundant `whether is null check` when implementing codegen using `Invoke` and `StaticInvoke` (they have a `propagateNull` tag, with a default value of `true`)
- The pr is following up:
  #48224
  #48428
  #48467
  #48509

### Why are the changes needed?
Delete unnecessary logic, improve performance, and reduce misunderstandings.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.
- Manually check
```
./build/sbt "sql/testOnly org.apache.spark.sql.JsonFunctionsSuite"
./build/sbt "catalyst/testOnly org.apache.spark.sql.catalyst.expressions.JsonExpressionsSuite"
```
- json_array_length
  JsonFunctionsSuite#`json_array_length function`

- json_object_keys
  JsonFunctionsSuite#`json_object_keys function`

- to_json
 JsonFunctionsSuite#`roundtrip in to_json and from_json - struct`
 JsonFunctionsSuite#`roundtrip in to_json and from_json - array`

- from_json
  JsonFunctionsSuite#`roundtrip in to_json and from_json - struct`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48699 from panbingkun/SPARK-49766_FOLLOWUP.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants