Skip to content

Binaries instead of Strings in dataframe and arrow exports #92

@blackrez

Description

@blackrez

Describe the unexpected behaviour
When querying a parquet files with chdb, strings become bytes in dataframe and arrow format. I don't this issue with JSON or CSV.

How to reproduce

The query

SELECT AVG(prix) as prix_moy,
        pdvid,
        name,
        ville,
        type_carburant
FROM  s3('https://********.s3.eu-west-1.amazonaws.com/instantane.parquet', 'Parquet') AS p
LEFT JOIN s3('https://************.s3.eu-west-1.amazonaws.com/station.csv', '*****', '****', 'CSVWithNames') AS stations
ON p.pdvid = stations.id
GROUP BY all
ORDER BY prix_moy DESC;

The results with clickhouse local
Capture d’écran 2023-08-19 à 14 41 11

The result with chdb

In [25]: res
Out[25]:
       prix_moy     pdvid                                    name                        ville type_carburant
0         2.799  49480005  b"BP A11 AIRE DES PORTES D'ANGERS SUD"     b"Saint-Sylvain-D'Anjou"        b'SP98'
1         2.770  75014008                                    None                     b'Paris'        b'SP98'
2         2.740  75014008                                    None                     b'Paris'        b'SP95'
3         2.699  49160003                            b'SARL ROUX'    b'Longu\xc3\xa9-Jumelles'        b'SP98'
4         2.690  75016011                  b'Sarl STATION KLEBER'                     b'Paris'        b'SP98'

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions