Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cython numpy users #2162

Merged
merged 8 commits into from
Mar 23, 2018
Merged

Conversation

gabrieldemarmiesse
Copy link
Contributor

Hi!

I'll need some help one two points. I'm trying to explain the speed gains in the docs.

  • Using infer_types=True make the code faster than when manually declaring type. See line 377.

  • Using fused types make the code faster than when supporting only one type. See line 420.

Thank you!

value += g[smid - s, tmid - t] * f[v, w]
h[x, y] = value
return h
.. literalinclude:: ../../examples/userguide/convolve_py.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a more concrete directory name would be nice for these files. Why not name it "memoryviews" rather than "userguide"? That's what they showcase, after all.

* ``[:enhancements/buffer:Spec for the efficient indexing]``
* Kurt Smith's `video tutorial of Cython at SciPy 2015
<https://www.youtube.com/watch?v=gMvkiQ-gOW8&t=4730s&ab_channel=Enthought>`_.
It's longuer but some readers like watching talks more than reading.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"longuer" -> "longer". But actually, I'd remove this sentence all together. I don't consider it helpful in any way.

have them compiled automatically (See :ref:`pyximport`).
3. Cython supports distutils so that you can very easily create build scripts
4. Cython supports distutils so that you can very easily create build scripts
which automate the process, this is the preferred method for full programs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually not for "full programs", but rather "Cython implemented libraries and packages".


.. Note::
If using another interactive command line environment than SAGE, like
IPython or Python itself, it is important that you restart the process
IPython, Jupyter Notebook or Python itself, it is important that you restart the process
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really apply directly to Jupyter, where users are not expected to "compile" anything but things are happening behind their back when running a Cython cell. When you change something in a Jupyter Cython cell, it will automatically be recompiled. If you don't change anything, it won't be.

@@ -125,10 +124,6 @@ like::

$ gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python2.7 -o yourmod.so yourmod.c

``gcc`` should have access to the NumPy C header files so if they are not
installed at :file:`/usr/include/numpy` or similar you may need to pass another
option for those.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this paragraph removes information that users would need for manual compilation: "call numpy.get_include() and add the result to your C include paths."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the user have to do this for memoryviews too? I just now that it's the case when using the old numpy buffer syntax.

Copy link
Contributor

@scoder scoder Mar 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, true. The include path is only required when you do cimport numpy. Meaning, it's also independent of the buffer syntax and only depends on the numpy cimport (plain Python imports are also fine).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the note about the numpy headers and added another line just after that it's needed only if doing cimport numpy.


Especially have a look at the for loops: In :file:`convolve1.c`, these are ~20 lines
of C code to set up while in :file:`convolve2.c` a normal C for loop is used.
Especially have a look at the for loops: In :file:`convolve_cy.c`, these are ~20 lines
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for-loops

cdef int ymax = wmax + 2*tmid
h_np = np.zeros([xmax, ymax], dtype=DTYPE)
cdef int [:,:] h = h_np
cdef int x, y, s, t, v, w
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think that all the index/size variables should use Py_ssize_t rather than int as type, just to avoid providing bad examples.

So what made this line so much slower than in the pure Python version?

``g`` and ``f`` are still NumPy arrays, so Python objects, and expect
Python integers as indexes. Here we give C integers. So every time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"we pass C int values" (or size_t or whatever you choose).

``g`` and ``f`` are still NumPy arrays, so Python objects, and expect
Python integers as indexes. Here we give C integers. So every time
Cython reaches this line, it has to convert all the C integers to Python
integers. Since this line is called very often, it outweight the speed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would explicitly say "to Python int objects" here, to emphasise the difference to native C values.
And "it outweighs" the speed benefits.

benefits of the pure C loops that were created from the ``range()`` earlier.

Furthermore, ``g[smid - s, tmid - t] * f[v, w]`` returns a Python integer
and ``value`` is a C integers, Cython has to do a type conversion again.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"and since 'value' is a C integer"


Furthermore, ``g[smid - s, tmid - t] * f[v, w]`` returns a Python integer
and ``value`` is a C integers, Cython has to do a type conversion again.
In the end those types conversions add up. And made our convolution really
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"type conversions"


There's still a bottleneck killing performance, and that is the array lookups
and assignments. The ``[]``-operator still uses full Python operations --
There are still two bottleneck killing performance, and that is the array lookups
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"two bottlenecks that degrade the performance"


More information on this syntax [:enhancements/buffer:can be found here].
In short, memoryviews are C structures that can hold a pointer to the data
of a NumPy array. They also support slices, so they work even if
Copy link
Contributor

@scoder scoder Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, a pointer, yes, but also all the necessary buffer metadata to provide efficient and safe access: dimensions, strides, item size, item type information, etc.

Showing the changes needed to produce :file:`convolve3.pyx` only
No data is copied from the NumPy array to the memoryview in our example.
As the name implies, it is only a "view" of the memory. So we can use
``h`` for efficient indexing and return then ``h_np``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a quick reminder here what "h" and "h_np" were, e.g. "we can use the view 'h' for efficient indexing and at the end return the real NumPy array 'h_np' that holds the data that we operated on."


cdef int [:,:,::1] a

if they are F-contiguous, you can declare the memoryview like this::
If you want to give Cython the information that the data is C-contiguous
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You meant "Fortran-contiguous" here.

@scoder
Copy link
Contributor

scoder commented Mar 20, 2018

infer_types=True

Good question, can't see a general reason why that would be the case. cython -a might give a hint, but if the C code isn't different, then the performance shouldn't be different either. Unsigned size types, maybe? They have the side effect of disabling wrap-around, but can also lead to subtle bugs in the code when negative indexing is used (by accident).

And regarding the fused types performance, that seems to take more of less the same time as with the auto-typed code, so no difference here.


More generic code
==================

# Explain here templated
All those speed gains are nice, but adding types constrains our code.
At the moment, it would mean that our function only work with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"can only work" or "only works".

It is similar to C++ 's templates. It generates mutiple function declarations
at compile time, and then chooses the right one at run-time based on the
types of the arguments provided. It is also possible to check with
``if-else`` statements what is the value of the fused type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"By comparing types in if-conditions, it is also possible to execute entirely different code paths depending on the specific data type."

1. Support for efficient access to structs/records stored in arrays; currently
only primitive types are allowed.
2. Support for efficient access to complex floating point types in arrays. The
1. Support for efficient access to complex floating point types in arrays. The
main obstacle here is getting support for efficient complex datatypes in
Cython.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cython has support for different complex types, so I'm not sure if this doesn't work out of the box.

main obstacle here is getting support for efficient complex datatypes in
Cython.
3. Calling NumPy/SciPy functions currently has a Python call overhead; it
2. Calling NumPy/SciPy functions currently has a Python call overhead; it
would be possible to take a short-cut from Cython directly to C. (This does
however require some isolated and incremental changes to those libraries;
mail the Cython mailing list for details).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cython maps some of the operations to C calls in numpy/__init__.pxd (a.k.a. cimport numpy). That's not what this comment refers to, but it's not like there's nothing there at this front...
I'd remove that comment, also because it's fairly unclear and unhelpful overall.

@@ -462,7 +355,9 @@ compile-time if the type is set to :obj:`np.ndarray`, specifically it is
assumed that the data is stored in pure strided mode and not in indirect
mode).

[:enhancements/buffer:More information]
Where to go from here?
======================
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing to hint at here is the Pythran support. The drawback is that it really only currently works with the buffer syntax and not yet with memory views. So now, after changing all of this tutorial to memoryviews (which is great!), it doesn't really line up anymore.

This can probably be done today by calling the NumPy C multi-dimensional
iterator API directly; however it would be nice to have for-loops over
:func:`enumerate` and :func:`ndenumerate` on NumPy arrays create efficient
code.
5. A high-level construct for writing type-generic code, so that one can write
functions that work simultaneously with many datatypes. Note however that a
macro preprocessor language can help with doing this for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there's almost nothing left in this "future" section, I'd vote for deleting it.

@gabrieldemarmiesse
Copy link
Contributor Author

Thank you for all those advices! Would you have any idea why using fused types makes the function run faster?

@scoder
Copy link
Contributor

scoder commented Mar 20, 2018

I extended the comment that I made in that regard.

@gabrieldemarmiesse
Copy link
Contributor Author

Ok thanks, my page wasn't refreshed. Thanks! I looked at the annotation file and nothing seems out of the ordinary. The only thing that changes is the number of "temporary" variables. the ones called __pyx_t_.... When using infer_types, it seems there are less of them. It seems i would have to go through the C file to know if there is a difference in the types.


.. Note::
If using another interactive command line environment than SAGE, like
IPython or Python itself, it is important that you restart the process
IPython, Jupyter Notebook or Python itself, it is important that you restart the process
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more about importing in a notebook a cython module created by a setup.py. I see now my wording can be confusing.

@gabrieldemarmiesse
Copy link
Contributor Author

Done! Also since I changed the datatype to py_size_t and used a bigger N to remove the overhead of the function call and array creation, I don't have this strange speed difference in my benchmarks anymore.

Please tell me if there are still things that needs to be changed.

@scoder
Copy link
Contributor

scoder commented Mar 23, 2018

Very nice! Thanks for working on this.

@scoder scoder merged commit 3c2dd5a into cython:master Mar 23, 2018
@gabrieldemarmiesse gabrieldemarmiesse deleted the cython_numpy_users branch May 10, 2018 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants