Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document and improve generation of C lines in traceback #6034

Closed
1 of 5 tasks
matusvalo opened this issue Feb 25, 2024 · 9 comments · Fixed by #6036
Closed
1 of 5 tasks

Document and improve generation of C lines in traceback #6034

matusvalo opened this issue Feb 25, 2024 · 9 comments · Fixed by #6036

Comments

@matusvalo
Copy link
Contributor

matusvalo commented Feb 25, 2024

Describe your issue

TL;DR

By default Cython is extracting C line number via __LINE__ in error handling function.

When turned off:

  • compilation times are reduced
  • resulting binary has reduced size by up to 5%

Extracting C line numbers can be disabled by:

  • undocumented --no-c-in-traceback command line argument of cython command
  • undocumented c_line_in_traceback=False compilation option of Cython.Build.cythonize()

Details

Cython is able to add C line number of error in __PYX_ERR macro which is responsible for handing exception after function call:

#define __PYX_MARK_ERR_POS(f_index, lineno) \
    { __pyx_filename = __pyx_f[f_index]; (void)__pyx_filename; __pyx_lineno = lineno; (void)__pyx_lineno; __pyx_clineno = __LINE__; (void)__pyx_clineno; }
#define __PYX_ERR(f_index, lineno, Ln_error) \
    { __PYX_MARK_ERR_POS(f_index, lineno) goto Ln_error; }

Moreover, to actually report C lines in traceback, undocumented C macro CYTHON_CLINE_IN_TRACEBACK must be set to 1. Hence, by default, we have overhead of collecting C line numbers but are not reported unless mentioned macro must be enabled.

Example

Following example shows how to setup C lines reporting in traceback:

  • setup.py
    from setuptools import setup, Extension
    from Cython.Build import cythonize
    import numpy
    
    from Cython.Compiler import Options
    
    extensions = [
        Extension("*", ["delete.pyx"],
        extra_compile_args=['-DCYTHON_CLINE_IN_TRACEBACK=1'], # This macro enables reporting C lines
        ),
    ]
    
    setup(
        name="My hello app",
        # By default extracting of C lines is enabled. When parameter 
        # c_line_in_traceback=False is added to cythonize() function
        # the C lines are not extracted when error occurs.
        ext_modules=cythonize(extensions)
        
    )
  • delete.pyx
    cdef bleh():
        raise ValueError()
    
    bleh()
  • execution
    $ python -c 'import delete'
    Traceback (most recent call last):
     File "<string>", line 1, in <module>
     File "delete.pyx", line 4, in init delete (delete.c:2260)
       bleh()
     File "delete.pyx", line 2, in delete.bleh (delete.c:1876)
       raise ValueError()
    ValueError
    

Proposal

Related discusion

See #4425 (comment) + discussion below

Adding a maintainers of popular scientific libraries because they care about compilation time and resulting binary size. Until, this issue is closed, they can use --no-c-in-traceback argument of cython.
CC @rgommers @WillAyd @glemaitre

@da-woods
Copy link
Contributor

da-woods commented Feb 25, 2024

Moreover, to actually report C lines in traceback, undocumented C macro CYTHON_CLINE_IN_TRACEBACK must be set to 1. Hence, by default, we have overhead of collecting C line numbers but are not reported unless mentioned macro must be enabled.

This isn't quite right I think. What I think is right (provided Options.c_line_in_traceback==True):

  • -DCYTHON_CLINE_IN_TRACEBACK=1 - clines are always reported
  • -DCYTHON_CLINE_IN_TRACEBACK=0 - clines are never reported
  • CYTHON_CLINE_IN_TRACEBACK - runtime lookup of cython_runtime.cline_in_traceback (where cython_runtime is a module that cython adds to the global list of modules. As far as I can tell cline_in_traceback is the only thing controlled from it).

@matusvalo
Copy link
Contributor Author

matusvalo commented Feb 25, 2024

This isn't quite right I think

Yes it is possible. Feel free to correct me. I am not 100% sure if I understand correctly everything. My understanding is following:

CYTHON_CLINE_IN_TRACEBACK Options.c_line_in_traceback c lines are in traceback
1 True __LINE__ macro is present in __PYX_MARK_ERR_POS, cline reporting code is compiled yes
1 False __LINE__ macro is not present in __PYX_MARK_ERR_POS, cline reporting code is compiled no
0 True __LINE__ macro is present in __PYX_MARK_ERR_POS, cline reporting code is not compiled no
0 False __LINE__ macro is not present in __PYX_MARK_ERR_POS, cline reporting code is not compiled no

But maybe I am wrong.

@rgommers
Copy link
Contributor

rgommers commented Feb 25, 2024

Thanks for working on this! This is indeed quite important for SciPy.

I did a quick test on all scipy.linalg extensions using Cython. Default build(with line numbers):

$ python dev.py build -C-Dbuildtype=release
$ ls -l build/scipy/linalg/*.so
-rwxr-xr-x  1 rgommers  staff   469946 Feb 25 09:26 build/scipy/linalg/_cythonized_array_utils.cpython-310-darwin.so
-rwxr-xr-x  1 rgommers  staff   210932 Feb 25 09:26 build/scipy/linalg/_decomp_lu_cython.cpython-310-darwin.so
-rwxr-xr-x  1 rgommers  staff   276945 Feb 25 09:26 build/scipy/linalg/_decomp_update.cpython-310-darwin.so
-rwxr-xr-x  1 rgommers  staff   359233 Feb 25 09:26 build/scipy/linalg/_matfuncs_expm.cpython-310-darwin.so
-rwxr-xr-x  1 rgommers  staff   212999 Feb 25 09:26 build/scipy/linalg/_matfuncs_sqrtm_triu.cpython-310-darwin.so
-rwxr-xr-x  1 rgommers  staff   234594 Feb 25 09:26 build/scipy/linalg/_solve_toeplitz.cpython-310-darwin.so
-rwxr-xr-x  1 rgommers  staff   263166 Feb 25 09:26 build/scipy/linalg/cython_blas.cpython-310-darwin.so
-rwxr-xr-x  1 rgommers  staff   697856 Feb 25 09:26 build/scipy/linalg/cython_lapack.cpython-310-darwin.so

With --no-c-in-traceback:

-rwxr-xr-x  1 rgommers  staff   436506 Feb 25 09:30 build/scipy/linalg/_cythonized_array_utils.cpython-310-darwin.so
-rwxr-xr-x  1 rgommers  staff   210468 Feb 25 09:30 build/scipy/linalg/_decomp_lu_cython.cpython-310-darwin.so
-rwxr-xr-x  1 rgommers  staff   276497 Feb 25 09:30 build/scipy/linalg/_decomp_update.cpython-310-darwin.so
-rwxr-xr-x  1 rgommers  staff   342289 Feb 25 09:30 build/scipy/linalg/_matfuncs_expm.cpython-310-darwin.so
-rwxr-xr-x  1 rgommers  staff   212567 Feb 25 09:30 build/scipy/linalg/_matfuncs_sqrtm_triu.cpython-310-darwin.so
-rwxr-xr-x  1 rgommers  staff   217666 Feb 25 09:30 build/scipy/linalg/_solve_toeplitz.cpython-310-darwin.so
-rwxr-xr-x  1 rgommers  staff   262734 Feb 25 09:30 build/scipy/linalg/cython_blas.cpython-310-darwin.so
-rwxr-xr-x  1 rgommers  staff   697264 Feb 25 09:30 build/scipy/linalg/cython_lapack.cpython-310-darwin.so

Differences:

>>> with_linenums = np.array([469946, 210932, 276945, 359233, 212999, 234594, 263166, 697856])
>>> no_linenums = np.array([436506, 210468, 276497, 342289, 212567, 217666, 262734, 697264])
>>> np.round(with_linenums / no_linenums, decimals=3)
array([1.077, 1.002, 1.002, 1.05 , 1.002, 1.078, 1.002, 1.001])

For some reason there's a weird bimodal thing going on here: for 5 extensions the difference is 0.1%-0.2%, and for 3 other ones it's way larger, 5%-8%.

EDIT: for SciPy as a whole, the gain is ~1.7%: a built wheel changes from 22.56 MB to 22.19 MB.

There are a lot of unused variable warnings for __pyx_clineno, those will need addressing before we can use --no-c-in-traceback unconditionally:

[154/256] Compiling C object scipy/io/matlab/_streams.cpython-310-darwin.so.p/meson-generated__streams.c.o
scipy/io/matlab/_streams.cpython-310-darwin.so.p/_streams.c:10359:7: warning: unused variable '__pyx_clineno' [-Wunused-variable]
  int __pyx_clineno = 0;
      ^
scipy/io/matlab/_streams.cpython-310-darwin.so.p/_streams.c:10375:7: warning: unused variable '__pyx_clineno' [-Wunused-variable]
  int __pyx_clineno = 0;
      ^
scipy/io/matlab/_streams.cpython-310-darwin.so.p/_streams.c:10464:7: warning: unused variable '__pyx_clineno' [-Wunused-variable]
  int __pyx_clineno = 0;
      ^
scipy/io/matlab/_streams.cpython-310-darwin.so.p/_streams.c:1466:12: warning: unused variable '__pyx_clineno' [-Wunused-variable]
static int __pyx_clineno = 0;
```

</details>

@matusvalo
Copy link
Contributor Author

matusvalo commented Feb 25, 2024

For some reason there's a weird bimodal thing going on here: for 5 extensions the difference is 0.1%-0.2%, and for 3 other ones it's way larger, 5%-8%.

I suppose the difference depends on number function of calls propagating exception. If module uses heavily noexcept clause, the number of __PYX_ERR macros is limited.

@da-woods
Copy link
Contributor

#6035 should probably fix most of the warnings.

@scoder
Copy link
Contributor

scoder commented Feb 25, 2024

I propose to deprecate all command line options for this and let users control the feature entirely with the C macro.

If users have set the option before, we can use that as the default for the C macro, but if it's unchanged, the C macro is the way to go.

See #6036 (comment)

@matusvalo matusvalo changed the title Document generation of C lines in traceback Document and improve generation of C lines in traceback Feb 25, 2024
@matusvalo
Copy link
Contributor Author

I propose to deprecate all command line options for this and let users control the feature entirely with the C macro.

Updated description to reflect it

@matusvalo matusvalo added this to the 3.1 milestone Feb 25, 2024
@da-woods
Copy link
Contributor

I ended up adding some documentation to #6036 since I'd changed enough that it seemed worth documenting it at that point

@WillAyd
Copy link
Contributor

WillAyd commented Feb 25, 2024

Thanks for the ping. For pandas the difference between on/off is a final wheel of 58 MB versus 57 MB, so not a huge deal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants