Skip to content

[BUG] unicode.split does not allow to pass None for sep #4737

@navytux

Description

@navytux

Describe the bug
I'm hitting the difference in behaviour in between CPython and Cython for unicode.split - with Cython passing sep=None explicitly raises TypeError. Please find details below:

To Reproduce
Code to reproduce the behaviour:

---- 8< ---- usplit.pyx

# cython: language_level=3

def mysplit(q):
    return unicode.split(q, None)

print(mysplit("hello world"))

Expected behavior

I expect it to behave the same as in Python - i.e. print ['hello', 'world']:

---- 8< ---- usplit_py.py

def mysplit(q):
    return str.split(q, None)

print(mysplit("hello world"))
$ python usplit_py.py 
['hello', 'world']

However what I get instead is the following exception that None could not be used for sep:

$ cythonize -i usplit.pyx 
Compiling /home/kirr/usplit.pyx because it changed.
[1/1] Cythonizing /home/kirr/usplit.pyx
running build_ext
building 'usplit' extension
creating /home/kirr/tmp3kckc5wa/home
creating /home/kirr/tmp3kckc5wa/home/kirr
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -ffile-prefix-map=/build/python3.9-RNBry6/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -ffile-prefix-map=/build/python3.9-RNBry6/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/kirr/src/wendelin/venv/py3.venv/include -I/usr/include/python3.9 -c /home/kirr/usplit.c -o /home/kirr/tmp3kckc5wa/home/kirr/usplit.o
x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-z,relro -g -fwrapv -O2 -g -ffile-prefix-map=/build/python3.9-RNBry6/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 /home/kirr/tmp3kckc5wa/home/kirr/usplit.o -o /home/kirr/usplit.cpython-39-x86_64-linux-gnu.so
$ python -c 'import usplit'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "usplit.pyx", line 6, in init usplit
    print(mysplit("hello world"))
  File "usplit.pyx", line 4, in usplit.mysplit
    return unicode.split(q, None)
TypeError: must be str, not NoneType

Environment (please complete the following information):

  • OS: [Debian GNU/Linux 11]
  • Python version [e.g. 3.9.2]
  • Cython version [e.g. 0.29.27]

Thanks beforehand,
Kirill

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions