-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3569][SQL] Add metadata field to StructField #2701
Conversation
QA tests have started for PR 2701 at commit
|
QA tests have started for PR 2701 at commit
|
QA tests have finished for PR 2701 at commit
|
Test FAILed. |
QA tests have finished for PR 2701 at commit
|
Test FAILed. |
nameToField.get(name).getOrElse( | ||
throw new IllegalArgumentException(s"Field ${name} does not exist.")) | ||
nameToField.getOrElse(name, | ||
throw new IllegalArgumentException(s"Field $name does not exist.")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: no need to wrap
I think using immutable Map for metadata is enough. We can add an API like |
PySpark also need to be updated. |
QA tests have started for PR 2701 at commit
|
@liancheng The Python API is hard to update at this time because the schema SerDe is via https://github.com/apache/spark/blob/master/python/pyspark/sql.py#L1131 |
QA tests have finished for PR 2701 at commit
|
Test PASSed. |
#2563 has already replaced |
QA tests have started for PR 2701 at commit
|
QA tests have finished for PR 2701 at commit
|
Test FAILed. |
QA tests have started for PR 2701 at commit
|
QA tests have finished for PR 2701 at commit
|
Test PASSed. |
@@ -305,12 +305,15 @@ class StructField(DataType): | |||
|
|||
""" | |||
|
|||
def __init__(self, name, dataType, nullable): | |||
def __init__(self, name, dataType, nullable, metadata={}): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use {} as default value will have side effects, such as:
>>> a = StructField('a', StringType(), True)
>>> b = StructField('b', StringType(), True)
>>> a.metadata['name'] = 'a'
>>> b.metadata['name']
'a'
So if the metadata could be modified somewhere, here you should use None
as default value.
def xxx(xxx, metadata=None):
....
self.metadata = metadata or {}
QA tests have started for PR 2701 at commit
|
QA tests have finished for PR 2701 at commit
|
Test PASSed. |
Test build #443 has started for PR 2701 at commit
|
Test build #443 has finished for PR 2701 at commit
|
Test build #458 has started for PR 2701 at commit
|
Test build #465 has started for PR 2701 at commit
|
Test build #458 has finished for PR 2701 at commit
|
@mengxr, thanks for working on this! Overall LGTM. One minor thing: I think we should expose Metadata as a type variable in the |
Test build #465 has finished for PR 2701 at commit
|
Here's a PR to fix the package visibility. If that looks good to you I think this is ready to merge: mengxr#1 |
Expose Metadata and MetadataBuilder through the public scala and java packages.
QA tests have started for PR 2701 at commit
|
QA tests have finished for PR 2701 at commit
|
Test FAILed. |
Test build #22671 has started for PR 2701 at commit
|
Test build #22671 has finished for PR 2701 at commit
|
Test PASSed. |
Thanks! Merged to master. |
Add
metadata: Metadata
toStructField
to store extra information of columns.Metadata
is a simple wrapper overMap[String, Any]
with value types restricted to Boolean, Long, Double, String, Metadata, and arrays of those types. SerDe is via JSON.Metadata is preserved through simple operations like
SELECT
.@marmbrus @liancheng