Describe the bug, including details regarding any error messages, version, and platform.
pyarrow has no issue casting timestamps like '2000-01-01 00:00:00' to date32, but for some reason it doesn't like doing it when reading a CSV. Generally, if types are safe to cast, I'd expect that read_csv can directly convert to the desired type.
from io import BytesIO
import pyarrow as pa
from pyarrow import csv
data = "\n".join(
[
"time,value",
"2000-01-01 00:00:00,-1",
"2000-01-02 00:00:00,-2",
"2000-01-03 00:00:00,-3",
]
)
schema = pa.schema(
{
"time": pa.date32(),
"value": pa.int8(),
}
)
with BytesIO(data.encode("utf8")) as file:
table = csv.read_csv(file)
table = table.cast(schema) # ✔ casting OK
with BytesIO(data.encode("utf8")) as file:
table = csv.read_csv( # ✘ ArrowInvalid
file,
convert_options=csv.ConvertOptions(column_types=schema),
)
Component(s)
Python