[SPARK-43267][JDBC] Handle postgres unknown user-defined column as string in array #40953

Hisoka-X · 2023-04-26T03:08:23Z

What changes were proposed in this pull request?

Spark SQL now doesn’t support creating data frame from a Postgres table that contains user-defined array column.

This PR support it as string.

Why are the changes needed?

Support handle user-defined array column in SPARK SQL with Postgres

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add new test.
Tested in local.

CREATE DOMAIN not_null_text
    AS TEXT
    DEFAULT '';

create table films
(
    code         char(5 char)     not null
        constraint firstkey
            primary key,
    title        varchar(40 char) not null,
    did          bigint           not null,
    date_prod    date,
    kind         varchar(10 char),
    tz           timestamp with time zone,
    int_arr      integer[],
    column_name  not_null_text[],
    column_name2 not_null_text
);

INSERT INTO public.films (code, title, did, date_prod, kind, tz, int_arr, column_name, column_name2) VALUES (e'2
   ', 'fdas', 1, '2023-04-07 16:05:48', '2', null, null, null, null);
INSERT INTO public.films (code, title, did, date_prod, kind, tz, int_arr, column_name, column_name2) VALUES (e'4
   ', 'fdsa', 1, '2023-04-07 16:05:48', '4', null, null, null, null);
INSERT INTO public.films (code, title, did, date_prod, kind, tz, int_arr, column_name, column_name2) VALUES ('1    ', 'dafsdf', 1, '2023-04-04 14:43:51', '1', '2023-04-25 18:53:17.467000 +00:00', '{1,2,3}', '{1,fds,fdsa}', 'fdasfasdf');

Test Case

  test("jdbc array") {
    val connectionProperties = new Properties()
    connectionProperties.put("user", "system")
    connectionProperties.put("password", "system")
    spark.read.jdbc(
      url = "jdbc:postgresql://localhost:54321/test?useSSL=false&serverTimezone=UTC" +
        "&useUnicode=true&characterEncoding=utf-8",
      table = "TEST.public.films",
      connectionProperties
    ).show()
  }

Result

… as string

Hisoka-X · 2023-05-03T11:43:46Z

@cloud-fan @MaxGekk @hvanhovell Hi, PTAL. Thanks!

cloud-fan · 2023-05-31T07:15:08Z

@yaooqinn @ulysses-you

sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala

yaooqinn · 2023-06-02T02:02:56Z

thanks, merged to master

…ring in array ### What changes were proposed in this pull request? Spark SQL now doesn’t support creating data frame from a Postgres table that contains user-defined array column. This PR support it as string. ### Why are the changes needed? Support handle user-defined array column in SPARK SQL with Postgres ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? 1. Add new test. 2. Tested in local. ```sql CREATE DOMAIN not_null_text AS TEXT DEFAULT ''; create table films ( code char(5 char) not null constraint firstkey primary key, title varchar(40 char) not null, did bigint not null, date_prod date, kind varchar(10 char), tz timestamp with time zone, int_arr integer[], column_name not_null_text[], column_name2 not_null_text ); INSERT INTO public.films (code, title, did, date_prod, kind, tz, int_arr, column_name, column_name2) VALUES (e'2 ', 'fdas', 1, '2023-04-07 16:05:48', '2', null, null, null, null); INSERT INTO public.films (code, title, did, date_prod, kind, tz, int_arr, column_name, column_name2) VALUES (e'4 ', 'fdsa', 1, '2023-04-07 16:05:48', '4', null, null, null, null); INSERT INTO public.films (code, title, did, date_prod, kind, tz, int_arr, column_name, column_name2) VALUES ('1 ', 'dafsdf', 1, '2023-04-04 14:43:51', '1', '2023-04-25 18:53:17.467000 +00:00', '{1,2,3}', '{1,fds,fdsa}', 'fdasfasdf'); ``` Test Case ```scala test("jdbc array") { val connectionProperties = new Properties() connectionProperties.put("user", "system") connectionProperties.put("password", "system") spark.read.jdbc( url = "jdbc:postgresql://localhost:54321/test?useSSL=false&serverTimezone=UTC" + "&useUnicode=true&characterEncoding=utf-8", table = "TEST.public.films", connectionProperties ).show() } ``` Result <img width="1444" alt="image" src="https://user-images.githubusercontent.com/32387433/234458027-e67e410b-c417-400d-be7e-431768afc0ef.png"> Closes apache#40953 from Hisoka-X/SPARK-43267_pg_array. Lead-authored-by: Jia Fan <fanjiaeminem@qq.com> Co-authored-by: Hisoka <fanjiaeminem@qq.com> Signed-off-by: Kent Yao <yao@apache.org>

[SPARK-43267][JDBC] Handle postgres unknown user-defined column array…

ea266a1

… as string

github-actions bot added the SQL label Apr 26, 2023

ulysses-you reviewed Jun 1, 2023

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala Show resolved Hide resolved

Hisoka-X added 2 commits June 1, 2023 09:46

Merge branch 'master' into SPARK-43267_pg_array

faa82aa

add test

07030c4

Hisoka-X requested a review from ulysses-you June 2, 2023 01:09

yaooqinn approved these changes Jun 2, 2023

View reviewed changes

yaooqinn closed this in 6f593be Jun 2, 2023

Hisoka-X deleted the SPARK-43267_pg_array branch June 3, 2023 03:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-43267][JDBC] Handle postgres unknown user-defined column as string in array #40953

[SPARK-43267][JDBC] Handle postgres unknown user-defined column as string in array #40953

Uh oh!

Hisoka-X commented Apr 26, 2023 •

edited

Loading

Uh oh!

Hisoka-X commented May 3, 2023

Uh oh!

cloud-fan commented May 31, 2023

Uh oh!

Uh oh!

yaooqinn commented Jun 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-43267][JDBC] Handle postgres unknown user-defined column as string in array #40953

[SPARK-43267][JDBC] Handle postgres unknown user-defined column as string in array #40953

Uh oh!

Conversation

Hisoka-X commented Apr 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Hisoka-X commented May 3, 2023

Uh oh!

cloud-fan commented May 31, 2023

Uh oh!

Uh oh!

yaooqinn commented Jun 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Hisoka-X commented Apr 26, 2023 •

edited

Loading