Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strings with special regex characters are causing a DeprecationWarning #143

Open
zormit opened this issue Apr 21, 2022 · 3 comments
Open

Comments

@zormit
Copy link

zormit commented Apr 21, 2022

General information

  • SDK/Library version: 4.9.0 (I also checked the latest code and it's still an issue)
  • Environment: doesn't matter, I think
  • Language, language version, and OS: Python 3.8

Issue description

According to https://docs.python.org/3/library/re.html, some of the regex strings might lead to syntax errors in future python versions:

Regular expressions use the backslash character ('') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\' as the pattern string, because the regular expression must be \, and each backslash must be expressed as \ inside a regular Python string literal. Also, please note that any invalid escape sequences in Python’s usage of the backslash in string literals now generate a DeprecationWarning and in the future this will become a SyntaxError. This behaviour will happen even if it is a valid escape sequence for a regular expression.

For example:

virtualenv/lib/python3.8/site-packages/braintree/util/parser.py:13: DeprecationWarning: invalid escape sequence \s
self.doc = minidom.parseString("><".join(re.split(">\s+<", xml)).strip())

I think the solution is just to convert all regex strings to the raw r"..." format. But I'm not sure on backwards compatibility on that...

@hollabaq86
Copy link

👋 @zormit thanks for reaching out, I'm marking this for the next major version of the SDK so that it stays on our radar, but if it's a fix that doesn't break compatibility with python 3.5, we'll do our best to get it in the current major version of the SDK (v4).

@Montana
Copy link

Montana commented May 17, 2022

The solution you proposed would mean, both the following strings are stored identically in memory with no concept of whether they were raw or not, for example:

r'a regex digit: \d'  # a regex digit: \d
'a regex digit: \\d'  # a regex digit: \d

Both these strings contain \d and there is nothing to say that this came from a raw string. So when you pass this string to the re module it sees that there is a \d and sees it as a digit because the re module does not know that the string came from a raw string literal.

There is a use-case to proceed with caution and I agree with @hollabaq86.

@hollabaq86
Copy link

for internal tracking, ticket 2054

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants