Skip to content

Upcast types during union schema creation. #5212

@mustafasrepo

Description

@mustafasrepo

Describe the bug
A clear and concise description of what the bug is.
When I run the query below on postgre

SELECT c1, c9 FROM aggregate_test_100 
UNION ALL 
SELECT c1, c3 FROM aggregate_test_100

where c9 has type Bigint and c3 has type smallint. It produces a valid result. However, when I run the above query on datafusion where c9 has type Uint32 and c3 has type Int8.
It gives the error ArrowError(CastError("Can't cast value 1491205016 to type Int8")).
The physical plan of the query above in DataFusion is as follows

"UnionExec",
"  ProjectionExec: expr=[c1@0 as c1, c3@1 as c3]",
"    CsvExec: files={1 group: [[Users/akurmustafa/projects/synnada/arrow-datafusion-tmp/testing/data/csv/aggregate_test_100.csv]]}, has_header=true, limit=None, projection=[c1, c3]",
"  ProjectionExec: expr=[c1@0 as c1, CAST(c9@1 AS Int8) as c3]",
"    CsvExec: files={1 group: [[Users/akurmustafa/projects/synnada/arrow-datafusion-tmp/testing/data/csv/aggregate_test_100.csv]]}, has_header=true, limit=None, projection=[c1, c9]",

Datafusion coerces the types DataType::Uint32 and DataType::Int8 to DataType::Int8. For instance we may choose to upcast types to DataType::Int64 for this specific case.

To Reproduce
Steps to reproduce the behavior:
One can run query above

Expected behavior
A clear and concise description of what you expected to happen.
I expect above query to work

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions