Skip to content

[Bug] export parquet format got wrong result #26207

@cambyzju

Description

@cambyzju

Search before asking

  • I had searched in the issues and found no similar issues.

Version

doris-2.0.3-rc01dev-8cbd8fd3fd

What's Wrong?

export table will null data, use parquet format got wrong result

What You Expected?

export correct result

How to Reproduce?

  1. create table:
    > create table test_for_export_parquet (k1 int, v1 int) duplicate key(k1) distributed by hash(k1) buckets 1;

  2. insert data with NULL, and make sure the last row do not with null
    insert into test_for_export_parquet values(1, null),(2,2),(3,3),(4,4);

  3. check data

> select * from test_for_export_parquet;
+------+------+
| k1   | v1   |
+------+------+
|    1 | NULL |
|    2 |    2 |
|    3 |    3 |
|    4 |    4 |
+------+------+
4 rows in set (0.021 sec)
  1. export table using parquet format, for example:
EXPORT TABLE test_for_export_parquet TO "s3://xxx"
PROPERTIES (
  "column_separator"="\\x07",
  "line_delimiter" = "\\x07",
  "format" = "parquet"
) WITH s3 (
  "AWS_ENDPOINT" = "",
  "AWS_ACCESS_KEY" = "",
  "AWS_SECRET_KEY"="",
  "AWS_REGION" = "",
  "use_path_style" = "false"
)
  1. check the output parquet file, we find data is wrong
image

Anything Else?

In vparquet_writer.cpp, we use WriteBatch to write parquet data, actually we should use WriteBatchSpaced.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions