Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gmpy2 can't parse valid Python number literals #210

Closed
skirpichev opened this issue Feb 21, 2019 · 11 comments
Closed

gmpy2 can't parse valid Python number literals #210

skirpichev opened this issue Feb 21, 2019 · 11 comments

Comments

@skirpichev
Copy link
Contributor

I expect, that any correct number, coming in str form must be accepted by gmpy2's mpz/mpq/mpfr.

Unfortunately, this is not the case. For example, underscores can't be used to group digits (spaces sometimes are accepted instead):

Python 3.7.2+ (default, Feb  2 2019, 14:31:48) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gmpy2
>>> gmpy2.version()
'2.1.0a4'
>>> gmpy2.mpz('1 2 3')
mpz(123)
>>> int('1 2 3')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '1 2 3'
>>> int('1_2_3')
123
>>> gmpy2.mpz('1_2_3')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid digits
>>> float('1.2_3_4')
1.234
>>> gmpy2.mpfr('1.2_3_4')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid digits
>>> gmpy2.mpfr('1.2 3 4')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid digits
@casevh
Copy link
Collaborator

casevh commented Feb 22, 2019

I'm testing a fix for mpz(). I'll fix mpfr() next. The use of spaces as separators is a GMP behavior, no MPFR. Underscores were introduces in Python 3.6.

@casevh
Copy link
Collaborator

casevh commented Mar 10, 2019

I've done some testing and it becomes complicated to exactly match match Python's handling of underscores in all cases. I tried the simple approach of just removing the underscore characters but then strings that are invalid for Python become valid in gmpy2. (int('1_2_') raises an exception in Python but returns mpz(12) with gmpy2.)

If the expectation is that valid Python strings are converted to the same value, then just deleting underscores should work, If all the expectation is that all invalid Python strings raise an exception, then I don't think that will ever be possible.

@skirpichev
Copy link
Contributor Author

skirpichev commented Mar 10, 2019 via email

@casevh
Copy link
Collaborator

casevh commented Mar 12, 2019

I will get back to this when I have more time (i.e. after the next release).

@slel
Copy link

slel commented Sep 16, 2020

This issue also came up in the SageMath project. See

for how this was dealt with there.

@skirpichev
Copy link
Contributor Author

@casevh, maybe calling int() on the string first - can be a solution?

@casevh
Copy link
Collaborator

casevh commented May 27, 2021

Refreshing my memory...

Calling int() would cause performance delays that I don't want to introduce.

Do we want to change the behavior for strings that contain embedded spaces (i.e. mpz("1 2 3") currently returns mpz(123))?

The simplest solution would be to just strip space and underscore characters. It wouldn't change the current embedded space behavior which is a side-effect of GMP and would accept valid strings that contain underscores. Rewriting the string pre-processor in C to follow Python's behavior exactly is more than I want to do right now.

@skirpichev
Copy link
Contributor Author

Calling int() would cause performance delays that I don't want to introduce.

This is true. But how common is such a path of initialization? Second, we can only try int()'s string parsing for a failure...

Do we want to change the behavior for strings that contain embedded spaces (i.e. mpz("1 2 3") currently returns mpz(123))?

Probably, not. The reason for this issue is to simplify replacement int <-> mpz. If the mpz() constructor will support all stuff the int()'s constructor does - it will be fine.

@skirpichev
Copy link
Contributor Author

Maybe we should patch gmplib/mpfr to support underscores?

@skirpichev
Copy link
Contributor Author

@skirpichev
Copy link
Contributor Author

Ok, @casevh, it seems you are happy with the current solution (which accept invalid numeric literals), then I'll close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants