Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvements #33

Open
wants to merge 130 commits into
base: master
Choose a base branch
from
Open

improvements #33

wants to merge 130 commits into from

Conversation

andreasvc
Copy link
Collaborator

Summary of changes:

  • Python 3 compatibility
  • improve unicode handling
  • implement finditer as generator
  • support named groups in replacement strings
  • support substitutions with count > 1
  • re-organize code in several files

- Python 2/3 compatibility
- support searching in buffer objects (e.g., mmap)
- add module docstring
- some refactoring
- remove outdated Cython-generated file
- modify setup.py to cythonize as needed.
- properly translate pos, endpos indices with unicode
- keep original unicode string in Match objects
- separate compile.pxi file
- use new buffer API
  NB: even though the old buffer interface is deprecated from Python 2.6,
  the new buffer interface is only supported on mmap starting from
  Python 3.
- avoid creating Match objects in findall()
- precompute groups and spans of Match objects, so that possibly encoded
  version of search string (bytestr / cstring) does not need to be kept.
- in _make_spans(), keep state for converting utf8 to unicode indices;
  so that there is no quadratic behavior on repeated invocations for
  different Match objects.
- release GIL in pattern_Replace / pattern_GlobalReplace
- prepare_pattern: loop over pattern as char *
- advertise Python 3 support in setup.py, remove python 2.5
- support pickling of Pattern objects
- support buffers from objects that do not support char buffer (e.g.,
  integer arrays); does not make a lot of sense, but this is what re does.
- enable benchmarks shown in readme by default; fix typo.
- fix typo in test_re.py
- handle named groups in replacement string
- store index of named groups in Pattern object instead of Match object.
- use bytearray for result in _subn_callback
- when running under Python 3+, reject unicode patterns on
  bytes data, and vice versa, in according with general Python 3 behavior.
- improve Match.expand() implementation.
- The substitutions by RE2 behave differently from Python (character escapes,
  named groups, etc.), so use Match.expand() for anything but simple literal
  replacement strings.
- make groupindex of pattern objects public.
- add Pattern.fullmatch() method.
- use #define PY2 from setup.py instead of #ifdef hack.
- debug option for compilation.
- use data() instead of c_str() on C++ strings, and always supply length,
  so that strings with null characters are supported.
- bump minimum cython version due to use of bytearray typing
- adapt tests to Python 3; add b and u string prefixes where needed, &c.
- update README
- add count method, equivalent to len(findall(...))
- use arrays in utf8indices
- tweak docstrings
@axiak
Copy link
Owner

axiak commented Nov 20, 2015

this looks good, I'm going to play around with this over the weekend

- add reference of supported syntax to main docstring
- add __all__ attribute defining public members
- add re's purge() function
- add tests for count method
- switch order of prepare_pattern() and _compile()
- rename prepare_pattern() to _prepare_pattern() to signal that it is
  semi-private
andreasvc and others added 10 commits April 27, 2016 01:05
- Fix bug causing zero-length matches to be returned multiple times
- Use Latin 1 encoding with RE2 when unicode not requested
- Ensure memory is released:
  - put del calls in finally blocks
  - add missing del call for 'matches' array
- Remove Cython hacks for C++ that are no longer needed;
  use const keyword that has been supported for some time.
  Fixes Cython 0.24 compilation issue.
- Turn _re2.pxd into includes.pxi.
- remove some tests that are specific to internal Python modules _sre and sre
From 3.5 onwards sub() and subn() now replace unmatched groups with
empty strings. See:

https://docs.python.org/3/whatsnew/3.5.html#re

This change removes the 'unmatched group' error which occurs when using
re2.
Ignore non-matched groups when replacing with sub
Fix groupdict decode bug
@andreasvc
Copy link
Collaborator Author

Hi @axiak
It seems you're no longer maintaining pyre2; perhaps I could take over? I think this mostly involves handing over the pypi package. Let me know if you want to do this.
Cheers

andreasvc and others added 30 commits December 20, 2022 18:55
…istent across versions/platforms, maybe the test should be disabled altogether. #27
this prevents a cached regular expression being used that was created
with a different notification level.

For example, the following now generates the expected warning:

    In [1]: import re2
    In [2]: re2.compile('a*+')
    Out[2]: re.compile('a*+')
    In [3]: re2.set_fallback_notification(re2.FALLBACK_WARNING)
    In [4]: re2.compile('a*+')
    <ipython-input-5-041122e221c7>:1: UserWarning: WARNING: Using re module. Reason: bad repetition operator: *+
      re2.compile('a*+')
    Out[4]: re.compile('a*+')
* update pybind11 usage and set cmake python vars to Title_CASE
* refactor cmake extension build to use pybind11 module bits
* move emptygroups test from "differences"

Signed-off-by: Steve Arnold <sarnold@vctlabs.com>
* cleanup asserts and add groups() test

Signed-off-by: Steve Arnold <sarnold@vctlabs.com>
* refactor setup.py after pybind11 upstream changes

Signed-off-by: Steve Arnold <sarnold@vctlabs.com>
Signed-off-by: Steve Arnold <sarnold@vctlabs.com>
Signed-off-by: Steve Arnold <sarnold@vctlabs.com>
* cleanup ci workflow, remove crufty makefile with deprecated
  setup.py commands
* remove the package_dir bit from setup.py

Signed-off-by: Steve Arnold <sarnold@vctlabs.com>
* check if find_package py3 works across all CI runners

Signed-off-by: Steve Arnold <sarnold@vctlabs.com>
* no epel pkgs for linux aarch64, enable PYBIND11_FINDPYTHON
* set macos deployment target to 10.9

Signed-off-by: Steve Arnold <sarnold@vctlabs.com>
Signed-off-by: Steve Arnold <sarnold@vctlabs.com>
…2024.07.02

Signed-off-by: Stephen L Arnold <nerdboy@gentoo.org>
* revert to macos-13 with the same version as target
* In Theory this should get us full c++17

Signed-off-by: Stephen Arnold <nerdboy@gentoo.org>
… cfg

Signed-off-by: Stephen Arnold <nerdboy@gentoo.org>
* split all runners into separate arch via matrix
* macos does need macos-14 to get a proper arm64 build

Signed-off-by: Stephen Arnold <nerdboy@gentoo.org>
Signed-off-by: Stephen Arnold <nerdboy@gentoo.org>
* this is essentially a workaround for non-pypi pkg cruft

Signed-off-by: Stephen Arnold <nerdboy@gentoo.org>
Signed-off-by: Stephen Arnold <nerdboy@gentoo.org>
* also cleanup the wheel artifact check, download to artifacts/

Signed-off-by: Stephen Arnold <nerdboy@gentoo.org>
CI, python, and cmake cleanup
* update .gitignore and .gitchangelog.rc and (re)generate new changelog
* add sphinx docs build using apidoc extension and readme/changelog
  symlinks
* rst apidoc modules are auto-generated and are in .gitignore
  along with the generated html dir
* add dependencies to packaging and add docs/changes cmds to tox file.
  Includes a tox extension for shared tox environments; the new tox
  commands are an example of this => 4 cmds using one tox env

Signed-off-by: Stephen Arnold <nerdboy@gentoo.org>
* cleanup docs config, remove dicey sphinx_git extension
* switch readme badge, download wheel artifacts to single directory

Signed-off-by: Stephen Arnold <nerdboy@gentoo.org>
* also cleanup sphinx workflow

Signed-off-by: Stephen Arnold <nerdboy@gentoo.org>
Signed-off-by: Stephen Arnold <nerdboy@gentoo.org>
add basic sphinx docs build using apidoc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants