Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CFP] Implement the _sre module in Rust #2258

Closed
coolreader18 opened this issue Oct 1, 2020 · 6 comments
Closed

[CFP] Implement the _sre module in Rust #2258

coolreader18 opened this issue Oct 1, 2020 · 6 comments
Labels
C-compat A discrepancy between RustPython and CPython E-help-wanted Extra attention is needed

Comments

@coolreader18
Copy link
Member

coolreader18 commented Oct 1, 2020

At the moment, _sre is a modified version of _sre.py from an old version of PyPy: https://github.com/nikhaldi/_sre.py.

This is s l o w. If you ctrl-C rustpython while it's running a decently complex script, there's a good chance that the traceback from the KeyboardInterrupt will have originated somewhere in _sre.py. While it's definitely possible to implement _sre in Rust -- I was able to modify the constant-generation script to output Rust instead of C code -- both the C and Python versions of _sre don't really lend well to simple/direct translation to Rust. The C version (_sre.c/sre_lib.h) uses lots of pointer arithmetic and a kind of "as long as we own an active reference to the str object, we can send pointers anywhere we want" approach that doesn't jive well with Rust's memory-safety rules. The Python version (Lib/_sre.py in the RustPython source tree) uses enough nested references (context.x_stack[n].context is context), generator functions, and dynamic typing that it's also tricky to translate to Rust.

If anyone wants to take a stab at this, I've left a scaffold of the implementation in the _sre-wip branch, that has the SRE constants available to Rust, and some of my sketches for how it might work, but no real functionality. Be warned, either route you go to emulate is tricky, so feel free to try whichever approach you think would be easier.

@coolreader18 coolreader18 added the C-compat A discrepancy between RustPython and CPython label Oct 1, 2020
@darleybarreto
Copy link

pyre2 and Jython's _sre [1,2] might help on this.

@coolreader18 coolreader18 added the E-help-wanted Extra attention is needed label Oct 22, 2020
@qingshi163
Copy link
Contributor

Should we implement it by follow CPython that compile to bytecode? I did not look deep into the code yet, but can we do like compile to Vec? Anything that may tricky?

@darleybarreto
Copy link

I think that doing something like pyre2 would be better, but using regex in the backend instead of re2.

@coolreader18
Copy link
Member Author

@darleybarreto Unfortunately, we can't use the regex crate, as it doesn't support lookbehinds and all the other fancy features that sre has.

@qingshi163, I'm not sure exactly what you're asking, but Python code in the sre_compile module compiles the regex to bytecode, the _sre module "just" has to interpret it. I think the scaffold already collects the bytecode into a Vec<u32>

@darleybarreto
Copy link

@darleybarreto Unfortunately, we can't use the regex crate, as it doesn't support lookbehinds and all the other fancy features that sre has.

Hmm, do you know all the fancy features that it has (and we can't use regex)? Depending on what those are, we could simply "avoid" them for as long as we can.

@darleybarreto
Copy link

Or perhaps fancy-regex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-compat A discrepancy between RustPython and CPython E-help-wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants