Search before asking
Version
doris-2.0.3-rc01dev-8cbd8fd3fd
What's Wrong?
export table will null data, use parquet format got wrong result
What You Expected?
export correct result
How to Reproduce?
-
create table:
> create table test_for_export_parquet (k1 int, v1 int) duplicate key(k1) distributed by hash(k1) buckets 1;
-
insert data with NULL, and make sure the last row do not with null
insert into test_for_export_parquet values(1, null),(2,2),(3,3),(4,4);
-
check data
> select * from test_for_export_parquet;
+------+------+
| k1 | v1 |
+------+------+
| 1 | NULL |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
+------+------+
4 rows in set (0.021 sec)
- export table using parquet format, for example:
EXPORT TABLE test_for_export_parquet TO "s3://xxx"
PROPERTIES (
"column_separator"="\\x07",
"line_delimiter" = "\\x07",
"format" = "parquet"
) WITH s3 (
"AWS_ENDPOINT" = "",
"AWS_ACCESS_KEY" = "",
"AWS_SECRET_KEY"="",
"AWS_REGION" = "",
"use_path_style" = "false"
)
- check the output parquet file, we find data is wrong
Anything Else?
In vparquet_writer.cpp, we use WriteBatch to write parquet data, actually we should use WriteBatchSpaced.
Are you willing to submit PR?
Code of Conduct
Search before asking
Version
doris-2.0.3-rc01dev-8cbd8fd3fd
What's Wrong?
export table will null data, use parquet format got wrong result
What You Expected?
export correct result
How to Reproduce?
create table:
> create table test_for_export_parquet (k1 int, v1 int) duplicate key(k1) distributed by hash(k1) buckets 1;insert data with NULL, and make sure the last row do not with null
insert into test_for_export_parquet values(1, null),(2,2),(3,3),(4,4);check data
Anything Else?
In vparquet_writer.cpp, we use WriteBatch to write parquet data, actually we should use WriteBatchSpaced.
Are you willing to submit PR?
Code of Conduct