Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-36730: [Python] Add support for Cython 3.0.0 #37097

Merged
merged 41 commits into from
Sep 21, 2023
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
93c61c7
GH-36730: [Python] Add support for Cython 3.0.0
kou Jul 18, 2023
dd9f5e5
Fix bad merge conflict resolution
danepitkin Aug 9, 2023
2da3f31
Revert unnecessary edits of compute.rst
danepitkin Aug 9, 2023
01025e3
Try class instead of classmethod in __reduce__
danepitkin Aug 10, 2023
cf1ea3f
Revert "Try class instead of classmethod in __reduce__"
danepitkin Aug 10, 2023
e2d631e
Fix test_fragments_repr
danepitkin Aug 11, 2023
d1e7d8a
Update substrait test
danepitkin Aug 11, 2023
f2937d8
Try noexcept on substrait function
danepitkin Aug 11, 2023
25ba028
Move the noexcept to the correct spot
danepitkin Aug 11, 2023
097f6f3
Fix cloudpickle test
danepitkin Aug 14, 2023
3263eaf
Also run test_fs.py with cloudpickle
danepitkin Aug 15, 2023
5241c24
Lint
danepitkin Aug 15, 2023
021e133
Delete accidental file
danepitkin Aug 15, 2023
ee9f049
Make _reconstruct staticmethods
danepitkin Aug 17, 2023
0cd3439
Add test for MapScalar.__iter__
danepitkin Aug 17, 2023
148d285
Parametrize all pickling tests to use both the pickle and cloudpickle…
danepitkin Aug 18, 2023
1d99984
Lint
danepitkin Aug 18, 2023
fe24f0b
Remove unnecessary test function
danepitkin Aug 18, 2023
ac2d11f
Add Cython<3 dev CI job
danepitkin Aug 21, 2023
32a4402
Fix docker_compose.yml
danepitkin Aug 21, 2023
3b10534
Update CI config
danepitkin Aug 21, 2023
bdec809
Fix dockerfile
danepitkin Aug 21, 2023
2df074f
Apply suggestions from code review
danepitkin Aug 28, 2023
91257db
Handle repr non-determinism in test case
danepitkin Aug 28, 2023
de35878
Ignore numpydocs warnings
danepitkin Aug 28, 2023
a9d99d4
Try fixing numpydoc warning ignores
danepitkin Aug 29, 2023
2d59f16
Try adding docstrings to cpdef enums
danepitkin Aug 29, 2023
e5e3cea
Ignore EnumType parameter docstrings checks during numpydoc validation
danepitkin Aug 30, 2023
f47f806
Remove print statement
danepitkin Aug 30, 2023
676b538
Disable debug builds for cuda and ubuntu 20 on azure
danepitkin Aug 30, 2023
d89fde1
Use backwards-compatible Enum base class
danepitkin Aug 30, 2023
59fffdc
Update x-hierarchy in docker-compose
danepitkin Aug 30, 2023
7c56dcc
Disable gdb tests for non-debug builds
danepitkin Aug 30, 2023
27d1128
Revert "Disable gdb tests for non-debug builds"
danepitkin Sep 12, 2023
23e8a3e
Revert "Update x-hierarchy in docker-compose"
danepitkin Sep 12, 2023
4ec5188
Revert "Disable debug builds for cuda and ubuntu 20 on azure"
danepitkin Sep 12, 2023
90f5321
Disable Cython 3
danepitkin Sep 15, 2023
6f32fcf
Clean up quotes in dockerfile
danepitkin Sep 15, 2023
ec5754c
Add todo, revert back to classmethod
danepitkin Sep 18, 2023
eee7573
Revert formatting change
danepitkin Sep 18, 2023
3c4f581
Revert typo
danepitkin Sep 18, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
24 changes: 24 additions & 0 deletions ci/docker/conda-python-cython2.dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

ARG repo
ARG arch
ARG python=3.8
FROM ${repo}:${arch}-conda-python-${python}

RUN mamba install -q -y "cython<3" && \
mamba clean --all
9 changes: 9 additions & 0 deletions dev/archery/archery/lang/python.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
# under the License.

from contextlib import contextmanager
from enum import EnumMeta
import inspect
import tokenize

Expand Down Expand Up @@ -112,6 +113,10 @@ def inspect_signature(obj):


class NumpyDoc:
IGNORE_VALIDATION_ERRORS_FOR_TYPE = {
# Enum function signatures should never be documented
EnumMeta: ["PR01"]
}

def __init__(self, symbols=None):
if not have_numpydoc:
Expand Down Expand Up @@ -229,6 +234,10 @@ def callback(obj):
continue
if disallow_rules and errcode in disallow_rules:
continue
if any(isinstance(obj, obj_type) and errcode in errcode_list
for obj_type, errcode_list
in NumpyDoc.IGNORE_VALIDATION_ERRORS_FOR_TYPE.items()):
continue
errors.append((errcode, errmsg))

if len(errors):
Expand Down
8 changes: 8 additions & 0 deletions dev/tasks/tasks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1286,6 +1286,14 @@ tasks:
PYTHON: "3.10"
image: conda-python-substrait

test-conda-python-3.10-cython2:
ci: github
template: docker-tests/github.linux.yml
params:
env:
PYTHON: "3.10"
image: conda-python-cython2

test-debian-11-python-3:
ci: azure
template: docker-tests/azure.linux.yml
Expand Down
25 changes: 25 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ x-hierarchy:
- conda-python:
- conda-python-pandas:
- conda-python-docs
- conda-python-cython2
- conda-python-dask
- conda-python-hdfs
- conda-python-java-integration
Expand Down Expand Up @@ -1349,6 +1350,30 @@ services:
/arrow/ci/scripts/java_build.sh /arrow /build /tmp/dist/java &&
/arrow/ci/scripts/java_cdata_integration.sh /arrow /tmp/dist/java" ]

conda-python-cython2:
# Usage:
# docker-compose build conda
# docker-compose build conda-cpp
# docker-compose build conda-python
# docker-compose build conda-python-cython2
# docker-compose run --rm conda-python-cython2
image: ${REPO}:${ARCH}-conda-python-${PYTHON}-cython2
build:
context: .
dockerfile: ci/docker/conda-python-cython2.dockerfile
cache_from:
- ${REPO}:${ARCH}-conda-python-${PYTHON}-cython2
args:
repo: ${REPO}
arch: ${ARCH}
python: ${PYTHON}
shm_size: *shm-size
environment:
<<: [*common, *ccache]
PYTEST_ARGS: # inherit
volumes: *conda-volumes
command: *python-conda-command

################################## R ########################################

ubuntu-r:
Expand Down
29 changes: 18 additions & 11 deletions python/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -168,37 +168,44 @@ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${PYARROW_CXXFLAGS}")

if(MSVC)
# MSVC version of -Wno-return-type-c-linkage
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /wd4190")
string(APPEND CMAKE_CXX_FLAGS " /wd4190")

# Cython generates some bitshift expressions that MSVC does not like in
# __Pyx_PyFloat_DivideObjC
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /wd4293")
string(APPEND CMAKE_CXX_FLAGS " /wd4293")

# Converting to/from C++ bool is pretty wonky in Cython. The C4800 warning
# seem harmless, and probably not worth the effort of working around it
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /wd4800")
string(APPEND CMAKE_CXX_FLAGS " /wd4800")

# See https://github.com/cython/cython/issues/2731. Change introduced in
# Cython 0.29.1 causes "unsafe use of type 'bool' in operation"
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /wd4804")
string(APPEND CMAKE_CXX_FLAGS " /wd4804")

# See https://github.com/cython/cython/issues/4445.
#
# Cython 3 emits "(void)__Pyx_PyObject_CallMethod0;" to suppress a
# "unused function" warning but the code emits another "function
# call missing argument list" warning.
string(APPEND CMAKE_CXX_FLAGS " /wd4551")
else()
# Enable perf and other tools to work properly
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-omit-frame-pointer")
string(APPEND CMAKE_CXX_FLAGS " -fno-omit-frame-pointer")

# Suppress Cython warnings
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-variable -Wno-maybe-uninitialized")
string(APPEND CMAKE_CXX_FLAGS " -Wno-unused-variable -Wno-maybe-uninitialized")

if(CMAKE_CXX_COMPILER_ID STREQUAL "AppleClang" OR CMAKE_CXX_COMPILER_ID STREQUAL
"Clang")
# Cython warnings in clang
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-parentheses-equality")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-constant-logical-operand")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-missing-declarations")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-sometimes-uninitialized")
string(APPEND CMAKE_CXX_FLAGS " -Wno-parentheses-equality")
string(APPEND CMAKE_CXX_FLAGS " -Wno-constant-logical-operand")
string(APPEND CMAKE_CXX_FLAGS " -Wno-missing-declarations")
string(APPEND CMAKE_CXX_FLAGS " -Wno-sometimes-uninitialized")

# We have public Cython APIs which return C++ types, which are in an extern
# "C" blog (no symbol mangling) and clang doesn't like this
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-return-type-c-linkage")
string(APPEND CMAKE_CXX_FLAGS " -Wno-return-type-c-linkage")
endif()
endif()

Expand Down
9 changes: 5 additions & 4 deletions python/pyarrow/_dataset.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -1075,10 +1075,11 @@ cdef class FileSystemDataset(Dataset):
self.partition_expression
)

@classmethod
def from_paths(cls, paths, schema=None, format=None,
filesystem=None, partitions=None, root_partition=None):
"""A Dataset created from a list of paths on a particular filesystem.
@staticmethod
def from_paths(paths, schema=None, format=None, filesystem=None,
partitions=None, root_partition=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... classmethod doesn't work correctly on Cython 3 anymore? Is it a known issue? Or is this change actually not needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classmethod does work, I can change it back. I tried switching to staticmethod when numpydocs were failing, but it turns out it was the comment that was causing numpydoc parsing errors. I thought staticmethod was a slight improvement since cls wasn't actually used in the classmethod.

"""
A Dataset created from a list of paths on a particular filesystem.

Parameters
----------
Expand Down
10 changes: 7 additions & 3 deletions python/pyarrow/_flight.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -988,8 +988,10 @@ cdef class _MetadataRecordBatchReader(_Weakrefable, _ReadPandasMixin):
cdef shared_ptr[CMetadataRecordBatchReader] reader

def __iter__(self):
while True:
yield self.read_chunk()
return self

def __next__(self):
return self.read_chunk()

@property
def schema(self):
Expand Down Expand Up @@ -1699,7 +1701,9 @@ cdef class FlightClient(_Weakrefable):

def close(self):
"""Close the client and disconnect."""
check_flight_status(self.client.get().Close())
client = self.client.get()
if client != NULL:
check_flight_status(client.Close())

def __del__(self):
# Not ideal, but close() wasn't originally present so
Expand Down
2 changes: 1 addition & 1 deletion python/pyarrow/_substrait.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ from pyarrow.includes.libarrow_substrait cimport *

cdef CDeclaration _create_named_table_provider(
dict named_args, const std_vector[c_string]& names, const CSchema& schema
):
) noexcept:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this function cannot raise a Python exception? Its code is definitely non-trivial, so the "solution" here looks more like a bandaid.

In the interest of moving this forward, can you open a bug for this and we'll revisit later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function executes user-defined python code, which absolutely can raise an exception. In fact, the test cases specifically do raise an exception. The problem is that the C++ functionality explicitly chooses to ignore it and return its own error. The pyarrow binding is designed with this in mind, so adding noexcept is mimicking the expected behavior of previous cython versions. I'll file an issue to request refactoring of this feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #37235

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can fix this in a separate PR first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can fix this in a separate PR first.

It would certainly be nice to fix that, but FWIW I don't think that should hold up this PR even longer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I haven't attempted #37235. I hope its okay to merge as-is given how long this PR has taken.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just add a comment pointing to GH-37235?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

cdef:
c_string c_name
shared_ptr[CTable] c_in_table
Expand Down
15 changes: 10 additions & 5 deletions python/pyarrow/includes/libarrow_flight.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -118,16 +118,16 @@ cdef extern from "arrow/flight/api.h" namespace "arrow" nogil:
c_bool Equals(const CLocation& other)

@staticmethod
CResult[CLocation] Parse(c_string& uri_string)
CResult[CLocation] Parse(const c_string& uri_string)

@staticmethod
CResult[CLocation] ForGrpcTcp(c_string& host, int port)
CResult[CLocation] ForGrpcTcp(const c_string& host, int port)

@staticmethod
CResult[CLocation] ForGrpcTls(c_string& host, int port)
CResult[CLocation] ForGrpcTls(const c_string& host, int port)

@staticmethod
CResult[CLocation] ForGrpcUnix(c_string& path)
CResult[CLocation] ForGrpcUnix(const c_string& path)

cdef cppclass CFlightEndpoint" arrow::flight::FlightEndpoint":
CFlightEndpoint()
Expand Down Expand Up @@ -172,7 +172,9 @@ cdef extern from "arrow/flight/api.h" namespace "arrow" nogil:
CResult[unique_ptr[CFlightInfo]] Next()

cdef cppclass CSimpleFlightListing" arrow::flight::SimpleFlightListing":
CSimpleFlightListing(vector[CFlightInfo]&& info)
# This doesn't work with Cython >= 3
# CSimpleFlightListing(vector[CFlightInfo]&& info)
CSimpleFlightListing(const vector[CFlightInfo]& info)

cdef cppclass CFlightPayload" arrow::flight::FlightPayload":
shared_ptr[CBuffer] descriptor
Expand Down Expand Up @@ -310,7 +312,10 @@ cdef extern from "arrow/flight/api.h" namespace "arrow" nogil:
cdef cppclass CCallHeaders" arrow::flight::CallHeaders":
cppclass const_iterator:
pair[c_string, c_string] operator*()
# For Cython < 3
const_iterator operator++()
# For Cython >= 3
const_iterator operator++(int)
bint operator==(const_iterator)
bint operator!=(const_iterator)
const_iterator cbegin()
Expand Down
15 changes: 8 additions & 7 deletions python/pyarrow/ipc.pxi
Original file line number Diff line number Diff line change
Expand Up @@ -436,8 +436,10 @@ cdef class MessageReader(_Weakrefable):
return result

def __iter__(self):
while True:
yield self.read_next_message()
return self

def __next__(self):
return self.read_next_message()

def read_next_message(self):
"""
Expand Down Expand Up @@ -656,11 +658,10 @@ cdef class RecordBatchReader(_Weakrefable):
# cdef block is in lib.pxd

def __iter__(self):
while True:
try:
yield self.read_next_batch()
except StopIteration:
return
return self

def __next__(self):
return self.read_next_batch()

@property
def schema(self):
Expand Down
4 changes: 2 additions & 2 deletions python/pyarrow/scalar.pxi
Original file line number Diff line number Diff line change
Expand Up @@ -819,8 +819,8 @@ cdef class MapScalar(ListScalar):
Iterate over this element's values.
"""
arr = self.values
if array is None:
raise StopIteration
if arr is None:
return
danepitkin marked this conversation as resolved.
Show resolved Hide resolved
for k, v in zip(arr.field(self.type.key_field.name), arr.field(self.type.item_field.name)):
yield (k.as_py(), v.as_py())

Expand Down
6 changes: 5 additions & 1 deletion python/pyarrow/tests/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -1615,9 +1615,13 @@ def test_fragments_repr(tempdir, dataset):
# partitioned parquet dataset
fragment = list(dataset.get_fragments())[0]
assert (
# Ordering of partition items is non-deterministic
repr(fragment) ==
"<pyarrow.dataset.ParquetFileFragment path=subdir/1/xxx/file0.parquet "
"partition=[key=xxx, group=1]>"
"partition=[key=xxx, group=1]>" or
repr(fragment) ==
"<pyarrow.dataset.ParquetFileFragment path=subdir/1/xxx/file0.parquet "
"partition=[group=1, key=xxx]>"
)

# single-file parquet dataset (no partition information in repr)
Expand Down
4 changes: 4 additions & 0 deletions python/pyarrow/tests/test_scalars.py
Original file line number Diff line number Diff line change
Expand Up @@ -700,6 +700,10 @@ def test_map(pickle_module):
for i, j in zip(s, v):
assert i == j

# test iteration with missing values
for _ in pa.scalar(None, type=ty):
pass

assert s.as_py() == v
assert s[1] == (
pa.scalar('b', type=pa.string()),
Expand Down