Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

% string formatting with multiple inputs via of tuples doesn't match python3.5 behavior #3092

Closed
dannyjeck opened this issue Aug 22, 2019 · 8 comments · Fixed by #3589
Closed

Comments

@dannyjeck
Copy link

dannyjeck commented Aug 22, 2019

I'm new to cython but this seems like incorrect behavior.

Python code put in cytest/test.py:

def test_func_no_tuples(a, b):
  print((a.__class__, b.__class__))
  print((a, b))
  s = 'string being formatted=%d' % a
  print(s)


def test_func_with_tuple(a,b):
  print((a.__class__, b.__class__))
  print((a, b))
  s = 'string being formatted=%d-%d' % (a,b)
  print(s)

test_func_no_tuples(50.,50.)

test_func_with_tuple(50.,50.)

Before compilation the output of python3 -c "from cytest import test" is:

python3 -c "from cytest import test"
(<class 'float'>, <class 'float'>)
(50.0, 50.0)
string being formatted=50
(<class 'float'>, <class 'float'>)
(50.0, 50.0)
string being formatted=50-50

After a pip install of cython==0.29.13 I compile with:

PYTHONLIB=/usr/include/python3.5
CFLAGS="-shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I${PYTHONLIB}"
cython --no-docstrings -3 cytest/test.py -o cytest/test.c
gcc $CFLAGS -o cytest/test.so cytest/test.c

Now the output is:

(<class 'float'>, <class 'float'>)
(50.0, 50.0)
string being formatted=50
(<class 'float'>, <class 'float'>)
(50.0, 50.0)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "cytest/test.py", line 16, in init cytest.test
    test_func_with_tuple(50.,50.)
  File "cytest/test.py", line 11, in cytest.test.test_func_with_tuple
    s = 'string being formatted=%d-%d' % (a,b)
ValueError: Unknown format code 'd' for object of type 'float'

This is the error I would get if did 1.2.__format__('d'), but it appears that in the other case cython isn't using that function.

The relevant c code output by cython is:

  /* "cytest/test.py":4
 *   print((a.__class__, b.__class__))
 *   print((a, b))
 *   s = 'string being formatted=%d' % a             # <<<<<<<<<<<<<<
 *   print(s)
 * 
 */
  __pyx_t_3 = __Pyx_PyUnicode_FormatSafe(__pyx_kp_u_string_being_formatted_d, __pyx_v_a); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __pyx_v_s = ((PyObject*)__pyx_t_3);
  __pyx_t_3 = 0;

  /* "cytest/test.py":11
 *   print((a.__class__, b.__class__))
 *   print((a, b))
 *   s = 'string being formatted=%d-%d' % (a,b)             # <<<<<<<<<<<<<<
 *   print(s)
 * 
 */
  __pyx_t_3 = PyTuple_New(4); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 11, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __pyx_t_4 = 0;
  __pyx_t_5 = 127;
  __Pyx_INCREF(__pyx_kp_u_string_being_formatted);
  __pyx_t_4 += 23;
  __Pyx_GIVEREF(__pyx_kp_u_string_being_formatted);
  PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_kp_u_string_being_formatted);
  __pyx_t_2 = __Pyx_PyObject_Format(__pyx_v_a, __pyx_n_u_d); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 11, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_5 = (__Pyx_PyUnicode_MAX_CHAR_VALUE(__pyx_t_2) > __pyx_t_5) ? __Pyx_PyUnicode_MAX_CHAR_VALUE(__pyx_t_2) : __pyx_t_5;
  __pyx_t_4 += __Pyx_PyUnicode_GET_LENGTH(__pyx_t_2);
  __Pyx_GIVEREF(__pyx_t_2);
  PyTuple_SET_ITEM(__pyx_t_3, 1, __pyx_t_2);
  __pyx_t_2 = 0;
  __Pyx_INCREF(__pyx_kp_u_);
  __pyx_t_4 += 1;
  __Pyx_GIVEREF(__pyx_kp_u_);
  PyTuple_SET_ITEM(__pyx_t_3, 2, __pyx_kp_u_);
  __pyx_t_2 = __Pyx_PyObject_Format(__pyx_v_b, __pyx_n_u_d); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 11, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_5 = (__Pyx_PyUnicode_MAX_CHAR_VALUE(__pyx_t_2) > __pyx_t_5) ? __Pyx_PyUnicode_MAX_CHAR_VALUE(__pyx_t_2) : __pyx_t_5;
  __pyx_t_4 += __Pyx_PyUnicode_GET_LENGTH(__pyx_t_2);
  __Pyx_GIVEREF(__pyx_t_2);
  PyTuple_SET_ITEM(__pyx_t_3, 3, __pyx_t_2);
  __pyx_t_2 = 0;
  __pyx_t_2 = __Pyx_PyUnicode_Join(__pyx_t_3, 4, __pyx_t_4, __pyx_t_5); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 11, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __pyx_v_s = ((PyObject*)__pyx_t_2);
  __pyx_t_2 = 0;

It looks like the __Pyx_PyUnicode_FormatSafe function is being used only in the first case and not in the second, which may or may not be relevant. I am using Python 3.5.2.

If there is any other information needed let me know

@dannyjeck
Copy link
Author

I should also mention I'm compiling on ubuntu 16.04

@scoder
Copy link
Contributor

scoder commented Aug 23, 2019

Looks like this is a difference between %-formatting and format()/f-strings. Cython optimises some %-formatting cases into f-strings, that's why this occurs.

I think the right fix would be to generate a bit of additional code that checks if the format value is an integer, and if not, passes it through PyNumber_Int(). Tests will show if the behaviour is the same in Py2 and Py3, and if both need this adaptation.

@nanlliu
Copy link

nanlliu commented Apr 3, 2020

and can I work on this as a beginner?

@scoder
Copy link
Contributor

scoder commented Apr 3, 2020

@nanlliu depends. It probably needs a little bit of C knowledge, and a good understanding of how formatting works in Python (or the dedication to learn that).

I'd suggest to start by extending the '%' formatting tests in tests/run/fstring.pyx, then look at the generated C code and try to make sense of it. Look at FormattedValueNode in ExprNodes.py to see how the formatting is done, and the _build_fstring() method in Optimize.py to see how '%' formatting is mapped to f-strings. There's probably more than one case in which the mapping is incorrect. Find those, and then the next step is to find a good way how to resolve those, either with something in the Python code or with some additional C code.

@nanlliu
Copy link

nanlliu commented Apr 3, 2020

ok. I will dedicate myself to learn! I have experience of both python and c++, hopefully learning C is not too bad.

@scoder
Copy link
Contributor

scoder commented Apr 18, 2020

@nanlliu any progress? Anything we can help with?

@nanlliu
Copy link

nanlliu commented Apr 18, 2020

oh shoot. my bad! I have been working on another issue for another project! actually i haven't started it yet since i had so many things going on. If someone else would like to take it, that's totally fine. Im really sorry. If no, then I would come back and work on this

scoder added a commit that referenced this issue May 8, 2020
… which does not support '{x:d}' formatting.

Closes #3092
@scoder
Copy link
Contributor

scoder commented May 8, 2020

I think I found a nice way of doing this in #3589.

scoder added a commit that referenced this issue May 9, 2020
…GH-3589)

Floats do not support '{x:d}' formatting and need conversion to 'int' first.
Closes #3092
@scoder scoder added this to the 0.29.18 milestone May 9, 2020
scoder added a commit that referenced this issue May 9, 2020
…GH-3589)

Floats do not support '{x:d}' formatting and need conversion to 'int' first.
Closes #3092
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants