-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40820][PYTHON] Creating StructType from Json #38285
Conversation
What changes were proposed in this pull request? Get param using .get instead by key Why are the changes needed? In order to create a StructField from json and avoid having to set implicit values Does this PR introduce any user-facing change? Yes, avoiding filling json with implicit values How was this patch tested? Unit tests
9fa9e1f
to
4bc89c0
Compare
Can one of the admins verify this patch? |
json["nullable"], | ||
json["metadata"], | ||
json.get("nullable", True), | ||
json.get("metadata"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is this not set? This JSON ser/de is an internal format and not meant to be used by end users directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here an example, many dataframes are being created from a schema, this schema is created from a Json.
The input parameters to create a schema is StructType.fromJson(json), this internally uses StructField.fromJson().
The issue is when the StructField parses the Json, which forces to define the nullable and metadata attributes inside.
it is understandable that name and type are mandatory, but the others should be optional.
The current parsing does not allow this. If more than 1000 fields are defined, this would be a headache and unnecessary metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for extra clarification, is there any way for end users to face the issue? How did you face this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the way to avoid this issue users would be to declare the nullable and metadata attributes mandatory:
Example:
{
"name": "auditname",
"type": "string",
"nullable": true, -> mandatory
"metadata": {} -> mandatory
}
I have avoided skipping this by not using the fromJson method, I mean passing the json directly.
json = {
"name": "auditname",
"type: StringType()
}
StructField(**json)
this gets tricky because it is necessary to add some previous logics to do the above, the solution would be to use .get in the fromJson method.
def fromJson(cls, json: Dict[str, Any]) -> "StructField":
return StructField(
json["name"],
_parse_datatatype_json_value(json["type"]),
json.get("nullable", True), -> Use get
json.get("metadata"), -> Use get
)
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
We are facing the same issue. The json which does not have metadata key is not getting parsed. |
@HyukjinKwon Can you please reopen it and remove the stale tag? |
@kunalgoyal98 mind elabourating the usecase? This ser/de format is for the internal purpose. |
@HyukjinKwon
We want to do parse the json in python too. The only way I could find is this - |
Okay, at least we might need to fix this only for |
So the argument is to match with Scala's |
Did you solve this?
If you need support from me, I'm in.
Regards.
El lun, 3 jul 2023 a la(s) 12:39, Hyukjin Kwon ***@***.***)
escribió:
… So the argument is to match with Scala's StructType.fromJson but not to
support user-facing format, that I am fine with it. Please open a new PR if
you are interested in this @kunalgoyal98 <https://github.com/kunalgoyal98>
—
Reply to this email directly, view it on GitHub
<#38285 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZBNZCITG4CAM72NCJLPEDXOKOPJANCNFSM6AAAAAARHGVWZA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
*Anthony Wainer*
***@***.***
*Antes de imprimir este mensaje, por favor comprueba que es necesario
hacerlo*
|
Nope, would you mind creating a PR? |
What changes were proposed in this pull request?
Get param using .get instead by key
Why are the changes needed?
In order to create a StructField from json and avoid having to set implicit values
Does this PR introduce any user-facing change?
Yes, avoiding filling json with implicit values
How was this patch tested?
Unit tests
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
How was this patch tested?