Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support to load array type #74

Merged
merged 2 commits into from
Aug 1, 2023
Merged

Conversation

banmoy
Copy link
Collaborator

@banmoy banmoy commented Aug 1, 2023

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Which issues of this PR fixes :

Fixes #

Problem Summary(Required) :

Support to load array data type to starrocks. Here is an example

StarRocks DDL

CREATE TABLE `array_tbl` (
  `id` INT NOT NULL,
  `a0` ARRAY<STRING>,
  `a1` ARRAY<ARRAY<INT>>
) ENGINE=OLAP
PRIMARY KEY(`id`)
DISTRIBUTED BY HASH(`id`)
PROPERTIES (
  "replication_num" = "1"
);

Spark DataFrame

val data = Seq(
   |  (1, Seq("hello", "starrocks"), Seq(Seq(1, 2), Seq(3, 4))),
   |  (2, Seq("hello", "spark"), Seq(Seq(5, 6, 7), Seq(8, 9, 10)))
   | )
val df = data.toDF("id", "a0", "a1")
df.write
     .format("starrocks")
     .option("starrocks.fe.http.url", "127.0.0.1:8038")
     .option("starrocks.fe.jdbc.url", "jdbc:mysql://127.0.0.1:9038")
     .option("starrocks.table.identifier", "test.array_tbl")
     .option("starrocks.user", "root")
     .option("starrocks.password", "")
     .option("starrocks.column.types", "a0 ARRAY<STRING>,a1 ARRAY<ARRAY<INT>>")
     .mode("append")
     .save()

Note that

  • currently we can't infer the spark type of array column in starrocks, because the column type is missed in information_schema.COLUMNS, so the user must tell the spark type of the column via starrocks.column.types. The missed column type is fixing, and after that we can remove this limitation
  • the stream load will force to use json format whatever the starrocks.write.properties.format is, because there is no standard to represent the array in csv currently, but there is array type in json

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr will affect users' behaviors
  • This pr needs user documentation (for new or modified features or behaviors)
  • I have added documentation for my new feature or new function

Signed-off-by: PengFei Li <lpengfei2016@gmail.com>
Signed-off-by: PengFei Li <lpengfei2016@gmail.com>
@banmoy banmoy merged commit 2c45602 into StarRocks:main Aug 1, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants