Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Package include paths containing literal glob patterns cause exception #5942

Closed
eewanco opened this issue Jan 12, 2024 · 1 comment
Closed

Comments

@eewanco
Copy link
Contributor

eewanco commented Jan 12, 2024

Describe the bug

If you have a package include directory path name with, say, a directory with literal brackets in it (containing invalid glob syntax), Cython will choke on compilation on a regular expression exception (re.error).

In the function find_versioned_file in Cython/Utils.py, an attempt is made to find versioned files by slapping a glob pattern on the end of a file path and invoking Python's glob.glob, but this also will attempt to interpret any glob-like patterns within the path itself, and cause an exception if they are invalid (or at best doesn't work as expected). This should be readily apparent by inspection (line 317, tag 3.0.8):

    path_prefix = os.path.join(directory, filename)

    matching_files = glob.glob(path_prefix + ".cython-*" + suffix)

If directory is /tmp/src/[Hello-World], filename is pair, and suffix is .pxd, then an attempt made to parse /tmp/src/[Hello-World]/pair.cython-*.pxd fails:

>>> path_prefix='/tmp/src/[Hello-World]/pair.cython-*.pxd'
>>> glob.glob(path_prefix)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/glob.py", line 21, in glob
    return list(iglob(pathname, recursive=recursive))
  File "/usr/lib/python3.8/glob.py", line 73, in _iglob
    for dirname in dirs:
  File "/usr/lib/python3.8/glob.py", line 74, in _iglob
    for name in glob_in_dir(dirname, basename, dironly):
  File "/usr/lib/python3.8/glob.py", line 85, in _glob1
    return fnmatch.filter(names, pattern)
  File "/usr/lib/python3.8/fnmatch.py", line 52, in filter
    match = _compile_pattern(pat)
  File "/usr/lib/python3.8/fnmatch.py", line 46, in _compile_pattern
    return re.compile(res).match
  File "/usr/lib/python3.8/re.py", line 252, in compile
    return _compile(pattern, flags)
  File "/usr/lib/python3.8/re.py", line 304, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/lib/python3.8/sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib/python3.8/sre_parse.py", line 948, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
  File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/usr/lib/python3.8/sre_parse.py", line 834, in _parse
    p = _parse_sub(source, state, sub_verbose, nested + 1)
  File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/usr/lib/python3.8/sre_parse.py", line 598, in _parse
    raise source.error(msg, len(this) + 1 + len(that))
re.error: bad character range o-W at position 9

Code to reproduce the behaviour:

The easiest way to reproduce this is to rename your Cython development directory to [Hello-World] and run python3 runtests.py; it will immediately fail on the self-same error.

Expected behaviour

The include directory should be searched successfully for versioned import files and no exception issued.

OS

Ubuntu Linux 20.04 LTS

Python version

3.8.10

Cython version

3.0.8

Additional context

For a fix, there are a few considerations here.

One way is to escape the troublesome characters; theoretically, you can do so by enclosing them in brackets, and there is even a function introduced into Python 3.4 for doing this:

Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import glob
>>> glob.escape('/tmp/src/[Hello-World]/')
'/tmp/src/[[]Hello-World]/'
>>> d=glob.escape('/tmp/src/[Hello-World]/')
>>> glob.glob(d+"*")
['/tmp/src/[Hello-World]/pair.pxd']

But, Cython has to be backwards-compatible with versions before 3.4, so for now, Cython would have to do the escaping itself.

This seems obscure, but it's got much wider implications than are immediately apparent.

eewanco added a commit to eewanco/cython that referenced this issue Jan 19, 2024
If your build environment uses literal globbing characters, Cython 3.0
might fail, because `find_versioned_file` in `Utils.py` was slapping a
`*` in the path and expecting it to properly glob. This fails in
certain large build environments that use, for example, brackets in
system paths. Escape paths from sys.path and other sources before
looking for different versions of collateral files.

Closes GitHub issue [cython#5942](cython#5942)
@eewanco
Copy link
Contributor Author

eewanco commented Jan 19, 2024

The reproduction instructions are wrong; they trigger the same error but from a different root cause. To trigger this root cause, from the development directory do:

PYTHONPATH="[Hello-World]" python3 runtests.py

eewanco added a commit to eewanco/cython that referenced this issue Jan 25, 2024
da-woods pushed a commit that referenced this issue Feb 18, 2024
If your build environment uses literal globbing characters, Cython 3.0
might fail, because `find_versioned_file` in `Utils.py` was slapping a
`*` in the path and expecting it to properly glob. This fails in
certain large build environments that use, for example, brackets in
system paths. Escape paths from sys.path and other sources before
looking for different versions of collateral files.

Closes GitHub issue [#5942](#5942)
@scoder scoder added this to the 3.0.9 milestone Feb 19, 2024
@scoder scoder closed this as completed in f2954a0 Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants