[SPARK-33888][SQL][FOLLOWUP] Restored scale metadata for ARRAY type (Postgres) #31252

skestle · 2021-01-20T00:08:05Z

This satisfies PostgresDialect's requirement for scale in Numeric arrays (as it was before SPARK-33888 removed the metadata)

What changes were proposed in this pull request?

Ensuring that a numeric scale component is provided to Postgres for Array type metadata processing.

Why are the changes needed?

The initial changes in SPARK-33888 restrict the "scale" column metadata to NUMERIC and DECIMAL types, but PostgresDialect was also using "scale" for Array types.

Does this PR introduce any user-facing change?

No. (This restores master to the same [Postgres] behavior as it had on 3 Jan 2021)

How was this patch tested?

Manually tested by loading a Postgres table that specified a numeric array column.

This satisfies PostgresDialect's requirement for scale in Numeric arrays

maropu · 2021-01-20T00:20:52Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

        // scalastyle:off
        case java.sql.Types.NUMERIC => metadata.putLong("scale", fieldScale)
        case java.sql.Types.DECIMAL => metadata.putLong("scale", fieldScale)
+        case java.sql.Types.ARRAY   => metadata.putLong("scale", fieldScale) // PostgresDialect.scala wants this information


If this issue is only for postgresql, could you add this fix in PostgresDialect?

+1 for @maropu 's advice.

The only way fieldScale can make it into the dialect is by the field metadata.
It was always added prior to the previous commit (which I agree with on a fundamental level)
skestle@0b647fe#diff-c3859e97335ead4b131263565c987d877bea0af3adbd6c5bf2d3716768d2e083

I've explained a few more rationale regarding this change proposed by @skestle in the comment below #31252 (comment)

maropu · 2021-01-20T00:21:19Z

ok to test

maropu · 2021-01-20T00:21:45Z

Thanks for your contribution, @skestle ! Could you add tests in PostgresIntegrationSuite?

maropu · 2021-01-20T00:24:16Z

Also, could you file new jira for this issue and describe what's an issue there? I think we need it for better issue traceability.

SparkQA · 2021-01-20T05:03:48Z

Test build #134242 has finished for PR 31252 at commit 9fe9769.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-01-20T05:37:50Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

      val metadata = new MetadataBuilder()
      // SPARK-33888
-      // - include scale in metadata for only DECIMAL & NUMERIC
+      // - include scale in metadata for only DECIMAL & NUMERIC as well as ARRAY (for Postgres)


can we always include the scale metadata if it's present? cc @saikocat

We can do that for simplification. But we need to fix the tests for the rest of the dialects. Cos it adds {"scale": 0} to all metadata, then existing tests failed (previously metadata didn't get build in getSchema(), so the JSON meta didn't generated)

That's why I choose to set it to only decimal and numeric. The problem also eluded me cos the test for array in Postgresql dialects call the toCalaystType directly instead of going through the code path of using metadata. Sorry on phone so it's hard for me to link the line.

Alright, let me elaborate more so you two (cc: @skestle) can decide on which approach to go for. Though I'm kind of favor the current approach of adding data type matching for adding scale metadata cos fixing the failing tests will be more difficult and it makes the JDBCSuite test "jdbc data source shouldn't have unnecessary metadata in its schema" test slightly lose its meaning.

So in order to push "logical_time_type" to metadata, I have to force metadata to be built in the field type as of here:
skestle@0b647fe#diff-c3859e97335ead4b131263565c987d877bea0af3adbd6c5bf2d3716768d2e083R323 whereas previously, metadata can be built by the dialect or completely ignored (as default)

This will cause 3 tests in JDBCSuite to fail cause of schema mismatch (extra {"scale": 0} in the metadata always present) [1. "jdbc API support custom schema", 2. "jdbc API custom schema DDL-like strings.", 3. "jdbc data source shouldn't have unnecessary metadata in its schema"].

whereas previously, metadata can be built by the dialect or completely ignored (as default)

If it was dialect's responsibility to put things into metadata before, I think this PR should put the fix in PostgresDialect.

But the only way fieldScale can make it into the dialect is by the field metadata. So it is a very chicken and egg problem.

EDIT: Postgresql utilize the metadatabuilder to get the scale for array[][] of type numeric for example - cos dataType is ARRAY but the typeName is _numeric - note the underscore specific for Postgresql. Whereas MySQL dialect is putting more info into the metadata (like put("binarylong")). The use cases differ.

Might have to change the interface somehow to let the ResultSetMetadata to be passed or init-ed to the dialect.

skestle · 2021-01-29T07:04:39Z

Thankyou @sarutak for following up with #31262 when I didn't have the time ;)

SPARK-33888 Restored scale for ARRAY type

9fe9769

This satisfies PostgresDialect's requirement for scale in Numeric arrays

skestle force-pushed the SPARK-33888-postgres-fix branch from 034b44a to 9fe9769 Compare January 20, 2021 00:08

github-actions bot added the SQL label Jan 20, 2021

maropu reviewed Jan 20, 2021

View reviewed changes

maropu changed the title ~~SPARK-33888 Restored scale metadata for ARRAY type (Postgres)~~ [SPARK-33888][SQL Restored scale metadata for ARRAY type (Postgres) Jan 20, 2021

maropu changed the title ~~[SPARK-33888][SQL Restored scale metadata for ARRAY type (Postgres)~~ [SPARK-33888][SQL][FOLLOWUP] Restored scale metadata for ARRAY type (Postgres) Jan 20, 2021

cloud-fan reviewed Jan 20, 2021

View reviewed changes

skestle closed this Jan 29, 2021

[SPARK-33888][SQL][FOLLOWUP] Restored scale metadata for ARRAY type (Postgres) #31252

[SPARK-33888][SQL][FOLLOWUP] Restored scale metadata for ARRAY type (Postgres) #31252

Uh oh!

Conversation

skestle commented Jan 20, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu commented Jan 20, 2021

Uh oh!

maropu commented Jan 20, 2021

Uh oh!

maropu commented Jan 20, 2021

Uh oh!

SparkQA commented Jan 20, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

saikocat Jan 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skestle commented Jan 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

saikocat Jan 20, 2021 •

edited

Loading