-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3713][SQL] Uses JSON to serialize DataType objects #2563
Conversation
QA tests have started for PR 2563 at commit
|
QA tests have finished for PR 2563 at commit
|
Test FAILed. |
QA tests have started for PR 2563 at commit
|
Tests timed out after a configured wait of |
Test FAILed. |
QA tests have started for PR 2563 at commit
|
QA tests have finished for PR 2563 at commit
|
* Parses a string representation of a DataType. | ||
* | ||
* TODO: Generate parser as pickler... | ||
* Utility functions for working with DataTypes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is in the wrong place. We should probably note that this parser is deprecated and is only here for backwards compatibility. We might even print a warning when it is used so we can get rid of it eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this comment is a mistake. Instead of print a warning, I made fromCaseClassString()
private. It's only referenced by CaseClassStringParser
, which has already been marked as deprecated.
Minor comment otherwise this LGTM. |
@@ -62,6 +63,12 @@ def __eq__(self, other): | |||
def __ne__(self, other): | |||
return not self.__eq__(other) | |||
|
|||
def jsonValue(self): | |||
return self.simpleString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can have default implementation as:
self.__class__.__name__.[:-4].lower()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, saved lots of boilerplate code! Removed all simpleString()
method in subclasses.
QA tests have started for PR 2563 at commit
|
QA tests have finished for PR 2563 at commit
|
Test FAILed. |
QA tests have started for PR 2563 at commit
|
QA tests have finished for PR 2563 at commit
|
Test PASSed. |
@@ -62,6 +67,17 @@ def __eq__(self, other): | |||
def __ne__(self, other): | |||
return not self.__eq__(other) | |||
|
|||
def simpleString(self): | |||
return _get_simple_string(self.__class__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just put _get_simple_string here? (it's not needed as separated functions, it will harder to understand without this context)
In order to make it available to class, it could be classmethod:
@classmethod
def simpleString(cls):
return cls.__name__[:-4].lower()
@davis Thanks for all the suggestions, really makes things a lot cleaner! |
QA tests have started for PR 2563 at commit
|
54c46ce
to
785b683
Compare
QA tests have started for PR 2563 at commit
|
QA tests have finished for PR 2563 at commit
|
Test PASSed. |
Tests timed out after a configured wait of |
return cls.__name__[:-4].lower() | ||
|
||
def jsonValue(self): | ||
return {"type": self.typeName()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you like to use single string for Primitive types, it's still doable, only use one layer dict for others.
Either one is good to me.
This looks good to me, you just forget to rollback the changes in run-tests after debugging. |
@davies Sorry for my carelessness... And thanks again for all the great advices! |
QA tests have started for PR 2563 at commit
|
QA tests have finished for PR 2563 at commit
|
Test PASSed. |
LGTM now, thanks! |
Could you rebase this to master? |
de18dea
to
fc92eb3
Compare
Finished rebasing. |
QA tests have started for PR 2563 at commit
|
QA tests have started for PR 2563 at commit
|
QA tests have finished for PR 2563 at commit
|
Test FAILed. |
QA tests have finished for PR 2563 at commit
|
@marmbrus I think this is ready to go. |
Thanks! I've merged this. |
This PR uses JSON instead of
toString
to serializeDataType
s. The latter is not only hard to parse but also flaky in many cases.Since we already write schema information to Parquet metadata in the old style, we have to reserve the old
DataType
parser and ensure downward compatibility. The old parser is now renamed toCaseClassStringParser
and moved intoobject DataType
.@JoshRosen @davies Please help review PySpark related changes, thanks!