Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40820][PYTHON] Creating StructType from Json #38285

Closed
wants to merge 1 commit into from

Conversation

anthonywainer
Copy link
Contributor

What changes were proposed in this pull request?
Get param using .get instead by key

Why are the changes needed?
In order to create a StructField from json and avoid having to set implicit values

Does this PR introduce any user-facing change?
Yes, avoiding filling json with implicit values

How was this patch tested?
Unit tests

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

What changes were proposed in this pull request?
Get param using .get instead by key

Why are the changes needed?
In order to create a StructField from json and avoid having to set implicit values

Does this PR introduce any user-facing change?
Yes, avoiding filling json with implicit values

How was this patch tested?
Unit tests
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon HyukjinKwon changed the title [SPARK-40820][BUG] Creating StructType from Json [SPARK-40820][PYTHON] Creating StructType from Json Oct 19, 2022
json["nullable"],
json["metadata"],
json.get("nullable", True),
json.get("metadata"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is this not set? This JSON ser/de is an internal format and not meant to be used by end users directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here an example, many dataframes are being created from a schema, this schema is created from a Json.
The input parameters to create a schema is StructType.fromJson(json), this internally uses StructField.fromJson().

The issue is when the StructField parses the Json, which forces to define the nullable and metadata attributes inside.
image

it is understandable that name and type are mandatory, but the others should be optional.

The current parsing does not allow this. If more than 1000 fields are defined, this would be a headache and unnecessary metadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for extra clarification, is there any way for end users to face the issue? How did you face this issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way to avoid this issue users would be to declare the nullable and metadata attributes mandatory:
Example:
{
"name": "auditname",
"type": "string",
"nullable": true, -> mandatory
"metadata": {} -> mandatory

}

I have avoided skipping this by not using the fromJson method, I mean passing the json directly.

json = {
"name": "auditname",
"type: StringType()
}
StructField(**json)

this gets tricky because it is necessary to add some previous logics to do the above, the solution would be to use .get in the fromJson method.

def fromJson(cls, json: Dict[str, Any]) -> "StructField":
return StructField(
json["name"],
_parse_datatatype_json_value(json["type"]),
json.get("nullable", True), -> Use get
json.get("metadata"), -> Use get

)

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jan 28, 2023
@github-actions github-actions bot closed this Jan 29, 2023
@kunalgoyal98
Copy link

We are facing the same issue. The json which does not have metadata key is not getting parsed.

@kunalgoyal98
Copy link

@HyukjinKwon Can you please reopen it and remove the stale tag?

@HyukjinKwon
Copy link
Member

@kunalgoyal98 mind elabourating the usecase? This ser/de format is for the internal purpose.

@kunalgoyal98
Copy link

kunalgoyal98 commented Jul 3, 2023

@HyukjinKwon
For our application, we save the spark data types in its json format. We parse this json back to create the spark data types. In scala spark, we use this code to parse the json -

    import org.apache.spark.sql.types.DataType
    val schema =
      """
        |{
        |    "type": "struct",
        |    "fields": [
        |        {
        |            "name": "c1",
        |            "type": "string",
        |            "nullable": true
        |           }
        |    ]
        |}
        |""".stripMargin
    val dt = DataType.fromJson(schema)

We want to do parse the json in python too. The only way I could find is this - StructType.fromJson. However, this call fails if there is no metadata field in the json (unlike the above scala DataType.fromJson call).
This is the problem I wanted to solve and this PR solves it. I am open to any other suggestion too.

@HyukjinKwon
Copy link
Member

Okay, at least we might need to fix this only for metadata alone because that's what Scala's StructType.fromJson can understand but I don't think we should fix nullability. This JSON stuff is really an internal format.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jul 3, 2023

So the argument is to match with Scala's StructType.fromJson but not to support user-facing format, that I am fine with it. Please open a new PR if you are interested in this, @kunalgoyal98 or @anthonywainer

@anthonywainer
Copy link
Contributor Author

anthonywainer commented Jul 21, 2023 via email

@HyukjinKwon
Copy link
Member

Nope, would you mind creating a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants