Avoid re-stringifying strings in JSON_FORMAT function#13097
Avoid re-stringifying strings in JSON_FORMAT function#13097yashmayya wants to merge 1 commit intoapache:masterfrom
Conversation
02f91b1 to
56f0747
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #13097 +/- ##
============================================
+ Coverage 61.75% 62.16% +0.41%
+ Complexity 207 198 -9
============================================
Files 2436 2514 +78
Lines 133233 137790 +4557
Branches 20636 21319 +683
============================================
+ Hits 82274 85657 +3383
- Misses 44911 45739 +828
- Partials 6048 6394 +346
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Jackie-Jiang
left a comment
There was a problem hiding this comment.
I'm not sure if this is the correct behavior.
JSON_FORMAT should be able to convert any object into JSON. If the value itself is abc, then the proper JSON version should be "abc". If user wants to generate another level of json over a JSON string, we should still allow that, and the new JSON is also valid
|
I don't fully follow the example in the description. If the value is already JSON string, user shouldn't call |
That makes sense, I hadn't considered that.
Are there any valid use cases for that?
Yeah, I agree in principle but this leads to the confusing behavior that I tried to document in the PR description. Let me try again below. Scenario 1
Scenario 2
Is this working as expected and documented somewhere? Or should we solve this issue in a different way to what this PR was attempting? |
|
In the given example, the problem is actually from the recursive call of
|
{"data":{"alias":"student1","age":24,"name":{"full.name":"Peter1","nick.name":"Pete1"}}}. Calling theJSON_FORMATfunction on this will result in the string value{"data":{"alias":"student1","age":24,"name":{"full.name":"Peter1","nick.name":"Pete1"}}}.JSON_FORMATfunction again on this string value will result in the escaped string -"{\"data\":{\"alias\":\"student1\",\"age\":24,\"name\":{\"full.name\":\"Peter1\",\"nick.name\":\"Pete1\"}}}". Passing it through the function again will result in a doubly escaped string -"\"{\\\"data\\\":{\\\"alias\\\":\\\"student1\\\",\\\"age\\\":24,\\\"name\\\":{\\\"full.name\\\":\\\"Peter1\\\",\\\"nick.name\\\":\\\"Pete1\\\"}}}\"".dataJSON column with the row -{"data":{"alias":"student1","age":24,"name":{"full.name":"Peter1","nick.name":"Pete1"}}}.aliasusing the ingestion transform functionJSON_PATH_STRING(JSON_FORMAT(data), '$.alias')(via segment reload). Since JSON columns are stored as strings internally, callingJSON_FORMATon its values will lead to escaped strings. This leads to an NPE here because theJSON_PATH_STRINGfunction returns anullas it will treat the escaped string argument as a literal JSON string value. This minor patch fixes this issue by avoiding re-stringifying strings inJsonUtils::objectToString.JSON_FORMATfunction in this case would be the rawHashMapthat hasn't been converted to a string yet.bugfix