-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-33888][SQL][FOLLOWUP] Restored scale metadata for ARRAY type (Postgres) #31252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This satisfies PostgresDialect's requirement for scale in Numeric arrays
034b44a to
9fe9769
Compare
| // scalastyle:off | ||
| case java.sql.Types.NUMERIC => metadata.putLong("scale", fieldScale) | ||
| case java.sql.Types.DECIMAL => metadata.putLong("scale", fieldScale) | ||
| case java.sql.Types.ARRAY => metadata.putLong("scale", fieldScale) // PostgresDialect.scala wants this information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this issue is only for postgresql, could you add this fix in PostgresDialect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for @maropu 's advice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way fieldScale can make it into the dialect is by the field metadata.
It was always added prior to the previous commit (which I agree with on a fundamental level)
skestle@0b647fe#diff-c3859e97335ead4b131263565c987d877bea0af3adbd6c5bf2d3716768d2e083
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've explained a few more rationale regarding this change proposed by @skestle in the comment below #31252 (comment)
|
ok to test |
|
Thanks for your contribution, @skestle ! Could you add tests in |
|
Also, could you file new jira for this issue and describe what's an issue there? I think we need it for better issue traceability. |
|
Test build #134242 has finished for PR 31252 at commit
|
| val metadata = new MetadataBuilder() | ||
| // SPARK-33888 | ||
| // - include scale in metadata for only DECIMAL & NUMERIC | ||
| // - include scale in metadata for only DECIMAL & NUMERIC as well as ARRAY (for Postgres) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we always include the scale metadata if it's present? cc @saikocat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do that for simplification. But we need to fix the tests for the rest of the dialects. Cos it adds {"scale": 0} to all metadata, then existing tests failed (previously metadata didn't get build in getSchema(), so the JSON meta didn't generated)
That's why I choose to set it to only decimal and numeric. The problem also eluded me cos the test for array in Postgresql dialects call the toCalaystType directly instead of going through the code path of using metadata. Sorry on phone so it's hard for me to link the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, let me elaborate more so you two (cc: @skestle) can decide on which approach to go for. Though I'm kind of favor the current approach of adding data type matching for adding scale metadata cos fixing the failing tests will be more difficult and it makes the JDBCSuite test "jdbc data source shouldn't have unnecessary metadata in its schema" test slightly lose its meaning.
So in order to push "logical_time_type" to metadata, I have to force metadata to be built in the field type as of here:
skestle@0b647fe#diff-c3859e97335ead4b131263565c987d877bea0af3adbd6c5bf2d3716768d2e083R323 whereas previously, metadata can be built by the dialect or completely ignored (as default)
This will cause 3 tests in JDBCSuite to fail cause of schema mismatch (extra {"scale": 0} in the metadata always present) [1. "jdbc API support custom schema", 2. "jdbc API custom schema DDL-like strings.", 3. "jdbc data source shouldn't have unnecessary metadata in its schema"].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whereas previously, metadata can be built by the dialect or completely ignored (as default)
If it was dialect's responsibility to put things into metadata before, I think this PR should put the fix in PostgresDialect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the only way fieldScale can make it into the dialect is by the field metadata. So it is a very chicken and egg problem.
EDIT: Postgresql utilize the metadatabuilder to get the scale for array[][] of type numeric for example - cos dataType is ARRAY but the typeName is _numeric - note the underscore specific for Postgresql. Whereas MySQL dialect is putting more info into the metadata (like put("binarylong")). The use cases differ.
Might have to change the interface somehow to let the ResultSetMetadata to be passed or init-ed to the dialect.
This satisfies PostgresDialect's requirement for scale in Numeric arrays (as it was before SPARK-33888 removed the metadata)
What changes were proposed in this pull request?
Ensuring that a numeric scale component is provided to Postgres for Array type metadata processing.
Why are the changes needed?
The initial changes in SPARK-33888 restrict the "scale" column metadata to NUMERIC and DECIMAL types, but PostgresDialect was also using "scale" for Array types.
Does this PR introduce any user-facing change?
No. (This restores master to the same [Postgres] behavior as it had on 3 Jan 2021)
How was this patch tested?
Manually tested by loading a Postgres table that specified a numeric array column.