Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40820][PYTHON][SQL] Creating StructType from Json #43474

Closed
wants to merge 1 commit into from

Conversation

anthonywainer
Copy link
Contributor

@anthonywainer anthonywainer commented Oct 21, 2023

What changes were proposed in this pull request?
Read schema from json without nullable and metadata

Why are the changes needed?
In order to read schema from json and avoid having to set implicit values

Does this PR introduce any user-facing change?
Yes, avoiding filling json with implicit values

How was this patch tested?
Unit tests

When create a StructType from a Python dictionary you use StructType.fromJson or in scala DataType.fromJson

To create a schema can be created as follows from the code below, but it requires to put inside the json: Nullable and Metadata, this is inconsistent because within the DataType class this by default.

schema = {
            "name": "name",
            "type": "string"
        }

StructField.fromJson(schema)

Python Error:

from pyspark.sql.types import StructField
schema = {
            "name": "c1",
            "type": "string"
        }
StructField.fromJson(schema)

>>
Traceback (most recent call last):
  File "code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "pyspark/sql/types.py", line 583, in fromJson
    json["nullable"],
KeyError: 'nullable' 

Scala Error:

    val schema =
      """
        |{
        |    "type": "struct",
        |    "fields": [
        |        {
        |            "name": "c1",
        |            "type": "string",
        |            "nullable": false
        |        }
        |    ]
        |}
        |""".stripMargin
    DataType.fromJson(schema)

>>
Failed to convert the JSON string '{"name":"c1","type":"string"}' to a field.
java.lang.IllegalArgumentException: Failed to convert the JSON string '{"name":"c1","type":"string"}' to a field.
	at org.apache.spark.sql.types.DataType$.parseStructField(DataType.scala:268)
	at org.apache.spark.sql.types.DataType$.$anonfun$parseDataType$1(DataType.scala:225)

@anthonywainer anthonywainer changed the title [SPARK-40820][PYTHON] Creating StructType from Json [SPARK-40820][PYTHON&SCALA] Creating StructType from Json Oct 22, 2023
@anthonywainer anthonywainer marked this pull request as ready for review October 22, 2023 15:42
@anthonywainer
Copy link
Contributor Author

@HyukjinKwon I have re-opened the PR, could you check please?

@HyukjinKwon HyukjinKwon changed the title [SPARK-40820][PYTHON&SCALA] Creating StructType from Json [SPARK-40820][PYTHON][SQL] Creating StructType from Json Oct 23, 2023
@@ -0,0 +1,68 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the tests to DataTypeSuite.scala

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved!

@@ -1579,6 +1579,12 @@ def test_row_without_field_sorting(self):
self.assertEqual(r, expected)
self.assertEqual(repr(r), "Row(b=1, a=2)")

def test_struct_field_from_json(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_struct_field_from_json(self):
def test_struct_field_from_json(self):
# SPARK-40820: fromJson with only name and type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed!

What changes were proposed in this pull request?
Read schema from json without nullable and metadata

Why are the changes needed?
In order to read schema from json and avoid having to set implicit values

Does this PR introduce any user-facing change?
Yes, avoiding filling json with implicit values

How was this patch tested?
Unit tests
@anthonywainer
Copy link
Contributor Author

@HyukjinKwon could you check this, please?

@HyukjinKwon
Copy link
Member

Merged to master.

@anthonywainer anthonywainer deleted the SPARK-40820 branch November 8, 2023 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants