Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] regression: cannot cast string scalar to date32 #37411

Closed
wjones127 opened this issue Aug 27, 2023 · 1 comment · Fixed by #38038
Closed

[Python] regression: cannot cast string scalar to date32 #37411

wjones127 opened this issue Aug 27, 2023 · 1 comment · Fixed by #38038
Assignees
Labels
Milestone

Comments

@wjones127
Copy link
Member

Describe the bug, including details regarding any error messages, version, and platform.

In PyArrow 12, you could do:

>>> import pyarrow as pa
>>> pa.scalar('2021-01-01').cast(pa.date32())
<pyarrow.Date32Scalar: datetime.date(2021, 1, 1)>

But in PyArrow 13, this no longer works:

>>> import pyarrow as pa
>>> pa.scalar('2021-01-01').cast(pa.date32())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyarrow/scalar.pxi", line 91, in pyarrow.lib.Scalar.cast
  File "/Users/willjones/Documents/delta-rs/python/venv/lib/python3.10/site-packages/pyarrow/compute.py", line 403, in cast
    return call_function("cast", [arr], options, memory_pool)
  File "pyarrow/_compute.pyx", line 572, in pyarrow._compute.call_function
  File "pyarrow/_compute.pyx", line 367, in pyarrow._compute.Function.call
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 121, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Unsupported cast from string to date32 using function cast_date32

This is likely due to the changes in: #35040
There is a scalar cast implementation for string -> date32, but no array implementation.

Component(s)

Python

@jorisvandenbossche
Copy link
Member

Hmm, that's unfortunate. It's indeed caused by switching to the standard cast compute kernels, instead of using the custom (and in other ways more limited) Scalar cast implementation. But so clearly that also supported some things not yet supported in the cast kernels.

I am not sure if there would be an easy way to fix the regression (apart from reverting), except for adding the "string->date32" cast kernel (although that's something we should do anyway).

The "string -> timestamp" kernel lives here:

// ----------------------------------------------------------------------
// String to Timestamp
struct ParseTimestamp {
explicit ParseTimestamp(const TimestampType& type)
: type(type), expect_timezone(!type.timezone().empty()) {}
template <typename OutValue, typename Arg0Value>
OutValue Call(KernelContext*, Arg0Value val, Status* st) const {
OutValue result = 0;
bool zone_offset_present = false;
if (ARROW_PREDICT_FALSE(!ParseTimestampISO8601(val.data(), val.size(), type.unit(),
&result, &zone_offset_present))) {
*st = Status::Invalid("Failed to parse string: '", val, "' as a scalar of type ",
type.ToString());
}
if (zone_offset_present != expect_timezone) {
if (expect_timezone) {
*st = Status::Invalid(
"Failed to parse string: '", val, "' as a scalar of type ", type.ToString(),
": expected a zone offset. If these timestamps "
"are in local time, cast to timestamp without timezone, then "
"call assume_timezone.");
} else {
*st = Status::Invalid("Failed to parse string: '", val, "' as a scalar of type ",
type.ToString(), ": expected no zone offset.");
}
}
return result;
}
const TimestampType& type;
bool expect_timezone;
};
template <typename I>
struct CastFunctor<TimestampType, I, enable_if_t<is_base_binary_type<I>::value>> {
static Status Exec(KernelContext* ctx, const ExecSpan& batch, ExecResult* out) {
const auto& out_type = checked_cast<const TimestampType&>(*out->type());
applicator::ScalarUnaryNotNullStateful<TimestampType, I, ParseTimestamp> kernel(
ParseTimestamp{out_type});
return kernel.Exec(ctx, batch, out);
}
};

It might not be too hard to add a date32 version, essentially using the above but changing ParseTimestampISO8601 with ParseYYYY_MM_DD (which is what is being used for the scalar casting / string parsing for date32), and simplified to not deal with timezones.

@danepitkin danepitkin added this to the 14.0.0 milestone Oct 4, 2023
@raulcd raulcd added the Priority: Blocker Marks a blocker for the release label Oct 10, 2023
pitrou pushed a commit that referenced this issue Oct 10, 2023
…alar cast) (#38038)

### Rationale for this change

Adding `string -> date32/date64` cast kernels, which then also fixes the pyarrow scalar cast method (which was earlier refactored to rely on the general cast kernels)

* Closes: #37411

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this issue Oct 23, 2023
…hon scalar cast) (apache#38038)

### Rationale for this change

Adding `string -> date32/date64` cast kernels, which then also fixes the pyarrow scalar cast method (which was earlier refactored to rely on the general cast kernels)

* Closes: apache#37411

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…hon scalar cast) (apache#38038)

### Rationale for this change

Adding `string -> date32/date64` cast kernels, which then also fixes the pyarrow scalar cast method (which was earlier refactored to rely on the general cast kernels)

* Closes: apache#37411

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…hon scalar cast) (apache#38038)

### Rationale for this change

Adding `string -> date32/date64` cast kernels, which then also fixes the pyarrow scalar cast method (which was earlier refactored to rely on the general cast kernels)

* Closes: apache#37411

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants