Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use re2 instead of re for match expressions #67

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ddn0
Copy link
Collaborator

@ddn0 ddn0 commented May 24, 2024

CEL specifies that match should use re2 pattern matching and not re semantics.

Closes #58

Copy link

linux-foundation-easycla bot commented May 24, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: ddn0 / name: Donald Nguyen (100ea21)

Copy link
Collaborator

@slott56 slott56 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will require a comment to bypass mypy type hint checking.

@@ -39,6 +39,7 @@
import logging
import operator
import re
import re2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing mypy errors. I think this needs a # type: ignore [import-untyped] to ignore the lack of type hints.

This module may be a little too immature if it lacks type hints. Is there another choice?

py38: commands[6]> poetry run mypy --show-error-codes src
src/celpy/evaluation.py:42: error: Skipping analyzing "re2": module is installed, but missing library stubs or py.typed marker  [import-untyped]
src/celpy/evaluation.py:42: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
Found 1 error in 1 file (checked 9 source files)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert on re2 libraries, but in the interest of showing my work, here's what I found about the top results when searching for "re2 python":

I don't know how much to weigh the Snyk scores but they do give another data point in addition to the most recent commit date.

My thinking is that google-re2 is the best of the bunch.

Regarding mypy, the various re2 packages are intended to be drop-in replacements for re so they should have compatible interfaces. I don't know how to tell mypy to use the same typestubs as re for re2. If you happen to know if there is an easy way to do that, please let me know.

Otherwise, I can add support in this repo for typestubs just for the portion of google-re2 that we use, using https://mypy.readthedocs.io/en/stable/stubgen.html.

Thoughts?

Copy link
Collaborator

@slott56 slott56 May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid research. One that has type hints is the bare minimum for me. After that, you've ranked them nicely.

The standard library re package relies on the typeshed stub definitions. (https://github.com/python/typeshed/blob/main/stdlib/re.pyi) With a name change, this file could be repurposed to attempt to cover an RE2 implementation without hints.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've drafted some typestubs and now I'm checking to see if google-re2 will take them: google/re2#496

If not, I'm inclined to just publish them separately (e.g., google-re2-stubs) and they can be pulled into this project.

I can also just vendor in the typestubs just into this project, but that doesn't seem as community-oriented as just publishing typestubs for every one.

@junyer
Copy link

junyer commented Jun 1, 2024

CEL specifies that match should use re2 pattern matching and not re semantics.

Although it's obvious that the language definition specifies RE2 syntax for regular expressions, it's possibly less obvious that all of the aforementioned Python wrappers are precisely that: Python wrappers... around RE2, the C++ library. Given that this project aims to be pure Python and to have minimal dependencies, it's worth noting what might be a necessary compromise.

@ddn0
Copy link
Collaborator Author

ddn0 commented Jun 3, 2024

Given that this project aims to be pure Python and to have minimal dependencies, it's worth noting what might be a necessary compromise.

I don't have much context about the north star of "pure Python and minimal dependencies", but if we want to maintain that, I can add the re2 capability as a package extra.

@ddn0
Copy link
Collaborator Author

ddn0 commented Jun 6, 2024

@slott56 ready for re-review.

Some questions: when I ran black or ruff I get a bunch of unrelated formatting changes which makes me think my configuration is wrong somehow. black isn't mentioned in pyproject.toml so maybe there is a difference between my local version and the one that has been used on the repo so far?

Anyway, I manually applied a style which I hope is compatible with the repo style, but let me know if you have any guidance on checking the formatting style automatically.

@slott56
Copy link
Collaborator

slott56 commented Jun 7, 2024

Black has (almost) no configuration, which is why it's widely used. We don't include a format check on this repository right now, but we will in the future. I need to confirm Kapil's preference (Ruff Format vs. Black) before I enable it.

Copy link
Collaborator

@slott56 slott56 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow is this good. I have a few small requests.

@@ -44,6 +44,10 @@ python-dateutil = "^2.9.0.post0"
pyyaml = "^6.0.1"
types-pyyaml = "^6.0.12.20240311"
types-python-dateutil = "^2.9.0.20240316"
google-re2 = { version = "^1.0", optional = true }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Excellent. As is the presence of the stubs.

import re2
except ImportError:
pass
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit convoluted. Can we put the _USE_RE2 = True after the import and lose this else: block?

return celpy.celtypes.BoolType(m is not None)


def function_matches(text: str, pattern: str) -> Result:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we hoist the if outside this function?

if _USE_RE2:
    def function_matches(...
        ... using _function_matches_re2()
else:
    def function_matches(...
        ... using _function_matches_re()

The idea is to avoid evaluating the if statement more than once at import time.

@@ -158,6 +158,13 @@ def test_operator_in():
assert isinstance(operator_in(celtypes.IntType(-1), container_2), CELEvalError)


def test_function_matches():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to assume that RE2 is installed as part of the testing. I'm not sure that's best. I may be wrong about this, but I can imagine someone declining the RE2 add-on and then being unable to test CEL.

Maybe there should be a @pytest.skipif() option to test the two low-level functions depending on the _USE_RE2 global, as well as a separate generic test of function_matches()?

(I love the test of an RE2-specific feature, that's fabulous!)

Copy link
Collaborator Author

@ddn0 ddn0 Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyone testing cel-python will have to install google-re2-stubs if they want to type-check the code and ...

google-re2-stubs depends on google-re2 because that is the way [1] for it to indicate which versions of google-re2 it is compatible with and also that it is intended to be a stub for the google-re2 distribution of re2 and not one of the other distributions of re2 which all claim to be "drop-in" replacements for re but in practice have slightly different APIs.

So, if we need to type-check we need the stubs and installing the stubs installs the implementation, and we cannot conditionally install the stubs because the code will not type-check unless we install the stubs.

I happen to be the maintainer of google-re2-stubs so I can obviously drop the dependency on google-re2 so that cel-python can install the stubs without installing google-re2 but given the weirdness in the re2 ecosystem noted above, I think that is not optimal for other reasons.

I can't think of a good way to conditionally type-check that isn't just the same as not type-checking a file at all. If you can think of a way, I'm happy to add it to the PR.

References

[1]

In addition, stub-only distributions MAY indicate which version(s) of the runtime package are targeted by indicating the runtime distribution’s version(s) through normal dependency data. For example, the stub package flyingcircus-stubs can indicate the versions of the runtime flyingcircus distribution it supports through dependencies field in pyproject.toml.

https://typing.readthedocs.io/en/latest/spec/distributing.html#stub-only-packages

@ddn0
Copy link
Collaborator Author

ddn0 commented Jun 7, 2024

Black has (almost) no configuration, which is why it's widely used. We don't include a format check on this repository right now, but we will in the future. I need to confirm Kapil's preference (Ruff Format vs. Black) before I enable it.

Cool. If it helps, in my other projects, we transitioned from black to ruff format because we were already using ruff for linting and the standard ruff format configuration is supposed to be black compatible, so one less tool in the toolchain. ruff format optionally formats code examples in docstrings, which was relevant for my other projects.

CEL specifies that match should use re2 pattern matching and not re
semantics.
@ddn0
Copy link
Collaborator Author

ddn0 commented Jul 16, 2024

@slott56 ping. Any changes you want to request?

@kapilt
Copy link
Collaborator

kapilt commented Jul 24, 2024

@ddn0 thanks for the updates and making a re2 stub package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for RE2
4 participants