Skip to content

Conversation

@Hisoka-X
Copy link
Member

@Hisoka-X Hisoka-X commented Apr 26, 2023

What changes were proposed in this pull request?

Spark SQL now doesn’t support creating data frame from a Postgres table that contains user-defined array column.

This PR support it as string.

Why are the changes needed?

Support handle user-defined array column in SPARK SQL with Postgres

Does this PR introduce any user-facing change?

No

How was this patch tested?

  1. Add new test.
  2. Tested in local.
CREATE DOMAIN not_null_text
    AS TEXT
    DEFAULT '';

create table films
(
    code         char(5 char)     not null
        constraint firstkey
            primary key,
    title        varchar(40 char) not null,
    did          bigint           not null,
    date_prod    date,
    kind         varchar(10 char),
    tz           timestamp with time zone,
    int_arr      integer[],
    column_name  not_null_text[],
    column_name2 not_null_text
);

INSERT INTO public.films (code, title, did, date_prod, kind, tz, int_arr, column_name, column_name2) VALUES (e'2
   ', 'fdas', 1, '2023-04-07 16:05:48', '2', null, null, null, null);
INSERT INTO public.films (code, title, did, date_prod, kind, tz, int_arr, column_name, column_name2) VALUES (e'4
   ', 'fdsa', 1, '2023-04-07 16:05:48', '4', null, null, null, null);
INSERT INTO public.films (code, title, did, date_prod, kind, tz, int_arr, column_name, column_name2) VALUES ('1    ', 'dafsdf', 1, '2023-04-04 14:43:51', '1', '2023-04-25 18:53:17.467000 +00:00', '{1,2,3}', '{1,fds,fdsa}', 'fdasfasdf');

Test Case

  test("jdbc array") {
    val connectionProperties = new Properties()
    connectionProperties.put("user", "system")
    connectionProperties.put("password", "system")
    spark.read.jdbc(
      url = "jdbc:postgresql://localhost:54321/test?useSSL=false&serverTimezone=UTC" +
        "&useUnicode=true&characterEncoding=utf-8",
      table = "TEST.public.films",
      connectionProperties
    ).show()
  }

Result
image

@github-actions github-actions bot added the SQL label Apr 26, 2023
@Hisoka-X
Copy link
Member Author

Hisoka-X commented May 3, 2023

@cloud-fan @MaxGekk @hvanhovell Hi, PTAL. Thanks!

@cloud-fan
Copy link
Contributor

@yaooqinn @ulysses-you

@Hisoka-X Hisoka-X requested a review from ulysses-you June 2, 2023 01:09
@yaooqinn yaooqinn closed this in 6f593be Jun 2, 2023
@yaooqinn
Copy link
Member

yaooqinn commented Jun 2, 2023

thanks, merged to master

@Hisoka-X Hisoka-X deleted the SPARK-43267_pg_array branch June 3, 2023 03:47
czxm pushed a commit to czxm/spark that referenced this pull request Jun 12, 2023
…ring in array

### What changes were proposed in this pull request?

Spark SQL now doesn’t support creating data frame from a Postgres table that contains user-defined array column.

This PR support it as string.

### Why are the changes needed?

Support handle user-defined array column in SPARK SQL with Postgres

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
1. Add new test.
2. Tested in local.
```sql
CREATE DOMAIN not_null_text
    AS TEXT
    DEFAULT '';

create table films
(
    code         char(5 char)     not null
        constraint firstkey
            primary key,
    title        varchar(40 char) not null,
    did          bigint           not null,
    date_prod    date,
    kind         varchar(10 char),
    tz           timestamp with time zone,
    int_arr      integer[],
    column_name  not_null_text[],
    column_name2 not_null_text
);

INSERT INTO public.films (code, title, did, date_prod, kind, tz, int_arr, column_name, column_name2) VALUES (e'2
   ', 'fdas', 1, '2023-04-07 16:05:48', '2', null, null, null, null);
INSERT INTO public.films (code, title, did, date_prod, kind, tz, int_arr, column_name, column_name2) VALUES (e'4
   ', 'fdsa', 1, '2023-04-07 16:05:48', '4', null, null, null, null);
INSERT INTO public.films (code, title, did, date_prod, kind, tz, int_arr, column_name, column_name2) VALUES ('1    ', 'dafsdf', 1, '2023-04-04 14:43:51', '1', '2023-04-25 18:53:17.467000 +00:00', '{1,2,3}', '{1,fds,fdsa}', 'fdasfasdf');
```

Test Case
```scala
  test("jdbc array") {
    val connectionProperties = new Properties()
    connectionProperties.put("user", "system")
    connectionProperties.put("password", "system")
    spark.read.jdbc(
      url = "jdbc:postgresql://localhost:54321/test?useSSL=false&serverTimezone=UTC" +
        "&useUnicode=true&characterEncoding=utf-8",
      table = "TEST.public.films",
      connectionProperties
    ).show()
  }
```

Result
<img width="1444" alt="image" src="https://user-images.githubusercontent.com/32387433/234458027-e67e410b-c417-400d-be7e-431768afc0ef.png">

Closes apache#40953 from Hisoka-X/SPARK-43267_pg_array.

Lead-authored-by: Jia Fan <fanjiaeminem@qq.com>
Co-authored-by: Hisoka <fanjiaeminem@qq.com>
Signed-off-by: Kent Yao <yao@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants