Skip to content

Using pandas Period in filters for parquet fails #48023

@rhshadrach

Description

@rhshadrach

Describe the bug, including details regarding any error messages, version, and platform.

import pandas as pd
# Ensure `ArrowPeriodType` is registered
import pandas.core.arrays.arrow.extension_types

df = pd.DataFrame(
    {
        "month": pd.period_range("2024-01", "2024-12", freq="M"),
        "data": range(12),
    }
)
df.to_parquet("test.parquet", engine="pyarrow")

pd.read_parquet(
    "test.parquet",
    engine="pyarrow",
    filters=[("month", ">=", pd.Period("2024-07", freq="M"))]
)
# Could not convert Period('2024-07', 'M') with type Period: did not 
# recognize Python value type when inferring an Arrow data type

A similar example using pd.Timestamp succeeds. It's not clear to me if something on the pandas side needs to be done here.

Original pandas report: pandas-dev/pandas#62769

Component(s)

Parquet, Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions