Skip to content

[Bug] Sometimes an error occurs: Can't decode parquet physical type BYTE_ARRAY to doris logical type Int32. #19357

@kaka11chen

Description

@kaka11chen

Search before asking

  • I had searched in the issues and found no similar issues.

Version

master

What's Wrong?

q01 in external_table_emr_p2/hive/test_external_github.groovy sometimes an error occurs:

2023-05-06 12:02:10.943 INFO [suite-thread-1] (test_external_github.groovy:512) - catalog external_yandex created

2023-05-06 12:02:10.944 INFO [suite-thread-1] (Suite.groovy:194) - Execute sql: switch external_yandex;

2023-05-06 12:02:10.945 INFO [suite-thread-1] (test_external_github.groovy:516) - switched to catalog external_yandex

2023-05-06 12:02:10.945 INFO [suite-thread-1] (Suite.groovy:194) - Execute sql: use multi_catalog;

2023-05-06 12:02:11.059 INFO [suite-thread-1] (test_external_github.groovy:520) - use multi_catalog

2023-05-06 12:02:11.078 INFO [suite-thread-1] (test_external_github.groovy:523) - Process format _parquet

2023-05-06 12:02:11.082 INFO [suite-thread-1] (Suite.groovy:465) - Execute tag: 01, sql: SELECT /*+SET_VAR(exec_mem_limit=8589934592) */

        repo_name,

        count() AS prs,

        count(distinct actor_login) AS authors

    FROM github_events_parquet

    WHERE (event_type = 'PullRequestEvent') AND (action = 'opened') AND (actor_login IN

    (

        SELECT actor_login

        FROM github_events_parquet

        WHERE (event_type = 'PullRequestEvent') AND (action = 'opened') AND (repo_name IN ('yandex/ClickHouse', 'ClickHouse/ClickHouse'))

    )) AND (lower(repo_name) NOT LIKE '%clickhouse%')

    GROUP BY repo_name

    ORDER BY authors DESC, prs DESC, length(repo_name) DESC

    LIMIT 50

2023-05-06 12:02:14.475 ERROR [suite-thread-1] (ScriptContext.groovy:121) - Run test_external_github in /mnt/datadisk0/chenqi/doris/regression-test/suites/external_table_emr_p2/hive/test_external_github.groovy failed

java.sql.SQLException: errCode = 2, detailMessage = (172.21.0.101)[INTERNAL_ERROR]Read parquet file /user/hive/github/parquet/part-00004-902bca87-e0f6-4780-af94-c840b41b1213-c000.snappy.parquet failed, reason = [INVALID_ARGUMENT]Can't decode parquet physical type BYTE_ARRAY to doris logical type Int32

What You Expected?

no error occurs.

How to Reproduce?

run q01 in external_table_emr_p2/hive/test_external_github.groovy

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions