Skip to content

[Python] ArrowInvalid: CSV conversion error to date32[day]: invalid value '2000-01-01 00:00:00' #37180

@randolf-scholz

Description

@randolf-scholz

Describe the bug, including details regarding any error messages, version, and platform.

pyarrow has no issue casting timestamps like '2000-01-01 00:00:00' to date32, but for some reason it doesn't like doing it when reading a CSV. Generally, if types are safe to cast, I'd expect that read_csv can directly convert to the desired type.

from io import BytesIO

import pyarrow as pa
from pyarrow import csv

data = "\n".join(
    [
        "time,value",
        "2000-01-01 00:00:00,-1",
        "2000-01-02 00:00:00,-2",
        "2000-01-03 00:00:00,-3",
    ]
)

schema = pa.schema(
    {
        "time": pa.date32(),
        "value": pa.int8(),
    }
)

with BytesIO(data.encode("utf8")) as file:
    table = csv.read_csv(file)
    table = table.cast(schema)  # ✔ casting OK

with BytesIO(data.encode("utf8")) as file:
    table = csv.read_csv(  # ✘ ArrowInvalid
        file,
        convert_options=csv.ConvertOptions(column_types=schema),
    )

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions