Skip to content

Conversation

@skestle
Copy link

@skestle skestle commented Jan 20, 2021

This satisfies PostgresDialect's requirement for scale in Numeric arrays (as it was before SPARK-33888 removed the metadata)

What changes were proposed in this pull request?

Ensuring that a numeric scale component is provided to Postgres for Array type metadata processing.

Why are the changes needed?

The initial changes in SPARK-33888 restrict the "scale" column metadata to NUMERIC and DECIMAL types, but PostgresDialect was also using "scale" for Array types.

Does this PR introduce any user-facing change?

No. (This restores master to the same [Postgres] behavior as it had on 3 Jan 2021)

How was this patch tested?

Manually tested by loading a Postgres table that specified a numeric array column.

This satisfies PostgresDialect's requirement for scale in Numeric arrays
@skestle skestle force-pushed the SPARK-33888-postgres-fix branch from 034b44a to 9fe9769 Compare January 20, 2021 00:08
@github-actions github-actions bot added the SQL label Jan 20, 2021
// scalastyle:off
case java.sql.Types.NUMERIC => metadata.putLong("scale", fieldScale)
case java.sql.Types.DECIMAL => metadata.putLong("scale", fieldScale)
case java.sql.Types.ARRAY => metadata.putLong("scale", fieldScale) // PostgresDialect.scala wants this information
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this issue is only for postgresql, could you add this fix in PostgresDialect?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for @maropu 's advice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way fieldScale can make it into the dialect is by the field metadata.
It was always added prior to the previous commit (which I agree with on a fundamental level)
skestle@0b647fe#diff-c3859e97335ead4b131263565c987d877bea0af3adbd6c5bf2d3716768d2e083

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've explained a few more rationale regarding this change proposed by @skestle in the comment below #31252 (comment)

@maropu
Copy link
Member

maropu commented Jan 20, 2021

ok to test

@maropu
Copy link
Member

maropu commented Jan 20, 2021

Thanks for your contribution, @skestle ! Could you add tests in PostgresIntegrationSuite?

@maropu maropu changed the title SPARK-33888 Restored scale metadata for ARRAY type (Postgres) [SPARK-33888][SQL Restored scale metadata for ARRAY type (Postgres) Jan 20, 2021
@maropu maropu changed the title [SPARK-33888][SQL Restored scale metadata for ARRAY type (Postgres) [SPARK-33888][SQL][FOLLOWUP] Restored scale metadata for ARRAY type (Postgres) Jan 20, 2021
@maropu
Copy link
Member

maropu commented Jan 20, 2021

Also, could you file new jira for this issue and describe what's an issue there? I think we need it for better issue traceability.

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Test build #134242 has finished for PR 31252 at commit 9fe9769.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val metadata = new MetadataBuilder()
// SPARK-33888
// - include scale in metadata for only DECIMAL & NUMERIC
// - include scale in metadata for only DECIMAL & NUMERIC as well as ARRAY (for Postgres)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we always include the scale metadata if it's present? cc @saikocat

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that for simplification. But we need to fix the tests for the rest of the dialects. Cos it adds {"scale": 0} to all metadata, then existing tests failed (previously metadata didn't get build in getSchema(), so the JSON meta didn't generated)

That's why I choose to set it to only decimal and numeric. The problem also eluded me cos the test for array in Postgresql dialects call the toCalaystType directly instead of going through the code path of using metadata. Sorry on phone so it's hard for me to link the line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, let me elaborate more so you two (cc: @skestle) can decide on which approach to go for. Though I'm kind of favor the current approach of adding data type matching for adding scale metadata cos fixing the failing tests will be more difficult and it makes the JDBCSuite test "jdbc data source shouldn't have unnecessary metadata in its schema" test slightly lose its meaning.

So in order to push "logical_time_type" to metadata, I have to force metadata to be built in the field type as of here:
skestle@0b647fe#diff-c3859e97335ead4b131263565c987d877bea0af3adbd6c5bf2d3716768d2e083R323 whereas previously, metadata can be built by the dialect or completely ignored (as default)

This will cause 3 tests in JDBCSuite to fail cause of schema mismatch (extra {"scale": 0} in the metadata always present) [1. "jdbc API support custom schema", 2. "jdbc API custom schema DDL-like strings.", 3. "jdbc data source shouldn't have unnecessary metadata in its schema"].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whereas previously, metadata can be built by the dialect or completely ignored (as default)

If it was dialect's responsibility to put things into metadata before, I think this PR should put the fix in PostgresDialect.

Copy link
Contributor

@saikocat saikocat Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the only way fieldScale can make it into the dialect is by the field metadata. So it is a very chicken and egg problem.

EDIT: Postgresql utilize the metadatabuilder to get the scale for array[][] of type numeric for example - cos dataType is ARRAY but the typeName is _numeric - note the underscore specific for Postgresql. Whereas MySQL dialect is putting more info into the metadata (like put("binarylong")). The use cases differ.

Might have to change the interface somehow to let the ResultSetMetadata to be passed or init-ed to the dialect.

@skestle
Copy link
Author

skestle commented Jan 29, 2021

Thankyou @sarutak for following up with #31262 when I didn't have the time ;)

@skestle skestle closed this Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants