Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement read-only memoryviews #1869

Merged
merged 11 commits into from Feb 16, 2018
Merged

implement read-only memoryviews #1869

merged 11 commits into from Feb 16, 2018

Conversation

scoder
Copy link
Contributor

@scoder scoder commented Sep 12, 2017

See issue #1605.
Needs more testing and probably also more compile time checks.

@lesteve
Copy link

lesteve commented Sep 13, 2017

Nice to see some progress on this since this would be very nice to have for scikit-learn (and I guess pandas has similar problems too)! I quickly check out this PR and it does seem to fix the problem indeed.

I am not a cython expert unfortunately. I would be keen to help out on this one but I am not sure how useful I can be ...

Copy link

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is looking great from the perspective of what's currently tested. What more do you feel needs to be tested?

@jnothman
Copy link

By compile time checks I assume you mean that the compilation should error if a const memoryview is mutated...?

@lesteve
Copy link

lesteve commented Oct 17, 2017

From scikit-learn perspective, I don't think we care that much about compile-time checks, would you agree @jnothman? The crucial thing we need is to be able to create a typed memoryview from a read-only buffer.

@jnothman
Copy link

jnothman commented Oct 17, 2017 via email

@scoder
Copy link
Contributor Author

scoder commented Feb 11, 2018

What more do you feel needs to be tested?

There aren't really any dedicated tests for the whole feature. I'm sure there are various corner cases and likely also cases where it misdetects usage patterns etc., e.g. when passing memory views around across functions. If someone who wants this feature and thus knows how to make use of it could write more targeted tests for this, it would become much clearer in what state this feature is and what is left to do to get it released. If the answer is "nothing", then that's perfect and I'll click the merge button, but I doubt it.

By compile time checks I assume you mean that the compilation should error if a const memoryview is mutated...?

Yes. That also needs test code.

@jnothman
Copy link

I suppose one option might be trying to make use of this in a project (e.g. scikit-learn) in a pull request and seeing what can't be expressed and compiled... Sounds like lots of work though :\

@scoder
Copy link
Contributor Author

scoder commented Feb 13, 2018 via email

@jnothman
Copy link

Can I please confirm that this does not introduce any syntax, merely allows a read-only memoryview until writing is required?

@lesteve
Copy link

lesteve commented Feb 13, 2018

For the record I tried to compile scikit-learn using this branch and I get a similar error as the ones you can currently see on Travis (all the Travis builds seem to be failing currently).

======================================================================
ERROR: runTest (__main__.CythonRunTestCase)
compiling (c) and running memslice
----------------------------------------------------------------------
Traceback (most recent call last):
  File "runtests.py", line 1103, in run
    ext_so_path = self.runCompileTest()
  File "runtests.py", line 792, in runCompileTest
    self.test_directory, self.expect_errors, self.expect_warnings, self.annotate)
  File "runtests.py", line 1027, in compile
    self._match_output(expected_errors, errors, tostderr)
  File "runtests.py", line 1073, in _match_output
    self.assertEqual(None, unexpected)
AssertionError: None != '1630:4: C struct/union member cannot be a memory view'
scikit-learn compilation error
Error compiling Cython file:
------------------------------------------------------------
...
def __reduce_cython__(self):
    cdef bint use_setstate
    state = (self.next_label, self.parent, self.size)
            ^
------------------------------------------------------------

(tree fragment):3:13: C struct/union member cannot be a memory view

Error compiling Cython file:
------------------------------------------------------------
...
def __reduce_cython__(self):
    cdef bint use_setstate
    state = (self.next_label, self.parent, self.size)
            ^
------------------------------------------------------------

(tree fragment):3:13: C struct/union member cannot be a memory view
Traceback (most recent call last):
  File "setup.py", line 241, in <module>
    setup_package()
  File "setup.py", line 237, in setup_package
    setup(**metadata)
  File "/home/lesteve/miniconda3/envs/cython-read-only~memory-views/lib/python3.6/site-packages/numpy/distutils/core.py", line 135, in setup
    config = configuration()
  File "setup.py", line 136, in configuration
    config.add_subpackage('sklearn')
  File "/home/lesteve/miniconda3/envs/cython-read-only~memory-views/lib/python3.6/site-packages/numpy/distutils/misc_util.py", line 1024, in add_subpackage
    caller_level = 2)
  File "/home/lesteve/miniconda3/envs/cython-read-only~memory-views/lib/python3.6/site-packages/numpy/distutils/misc_util.py", line 993, in get_subpackage
    caller_level = caller_level + 1)
  File "/home/lesteve/miniconda3/envs/cython-read-only~memory-views/lib/python3.6/site-packages/numpy/distutils/misc_util.py", line 930, in _get_configuration_from_setup_py
    config = setup_module.configuration(*args)
  File "sklearn/setup.py", line 82, in configuration
    maybe_cythonize_extensions(top_path, config)
  File "/home/lesteve/dev/alt-scikit-learn/sklearn/_build_utils/__init__.py", line 84, in maybe_cythonize_extensions
    config.ext_modules = cythonize(config.ext_modules)
  File "/home/lesteve/miniconda3/envs/cython-read-only~memory-views/lib/python3.6/site-packages/Cython/Build/Dependencies.py", line 1028, in cythonize
    cythonize_one(*args)
  File "/home/lesteve/miniconda3/envs/cython-read-only~memory-views/lib/python3.6/site-packages/Cython/Build/Dependencies.py", line 1148, in cythonize_one
    raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: sklearn/cluster/_hierarchical.pyx

@rth
Copy link

rth commented Feb 13, 2018

Thanks for working on this @scoder , it is very much appreciated.

I have tried to make some self contained tests in rth/cython-mmview-ro. So far I'm testing that,

I'm not sure if there are other corner cases which would be worth testing?

BTW, I'm also getting a few warnings at compilation time,

$ python setup.py build_ext --inplace
Compiling mmview.pyx because it changed.
[1/1] Cythonizing mmview.pyx
warning: TestClass:20:10: 'cpdef_method' redeclared 
warning: TestClass:31:10: 'cpdef_cname_method' redeclared 
warning: mmview.pyx:1:6: 'getmax' redeclared 
warning: mmview.pyx:11:6: 'update_array' redeclared 
warning: mmview.pyx:16:6: 'getconst' redeclared 

similarly to what was reported in #1985 (comment). That's not related to this PR though, I also get those on master...

@jnothman
Copy link

jnothman commented Feb 13, 2018 via email

@rth
Copy link

rth commented Feb 14, 2018

@scoder mentioned passing between functions...

Right, I tried to add a simple tests of passing memoryviews between functions here but I'm not certain what exactly should be tested there.

I imagine tests failures mentioned by @lesteve is a more immediate concern...

@scoder
Copy link
Contributor Author

scoder commented Feb 14, 2018

The "obvious" issue with passing memoryviews through functions is that one function might acquire a read-only view, pass it into another function, and that function might try to write to it. That should a) not fail and b) probably acquire a writable view for now.

@scoder
Copy link
Contributor Author

scoder commented Feb 14, 2018

I merged the latest master into the branch to fix a recent (unrelated) regression.

[test] that const memoryview only accept ro objects - not true

That's actually not how it should work. It's ok to pass a writable view into a function that requires a read-only view, but not the other way round.

I stole some of your tests and created a dedicated test suite file from them. Seems to work for me. Let's see what other users find when they bump into this feature.

@rth
Copy link

rth commented Feb 14, 2018

Thanks for the explanations!

issue with passing memoryviews through functions is that one function might acquire a read-only view, pass it into another function, and that function might try to write to it. That should a) not fail and b) probably acquire a writable view for now

So I tried that here i.e. an outer function that passes the memory view to the inner function that then tries to modify it. When the memoryview is ro,

  • the inner function is able to write to it (i.e. that does not fail)
  • the memoryview is still flagged as read-only (that is if x.base.flags.writeable is the correct flag to look at)

That's actually not how it should work. It's ok to pass a writable view into a function that requires a read-only view, but not the other way round.

Right, makes sense.

@scoder scoder changed the title [WIP] implement read-only memoryviews implement read-only memoryviews Feb 14, 2018
@scoder
Copy link
Contributor Author

scoder commented Feb 14, 2018

I'll let travis give it a run and it it's happy (enough), I'll merge it in its current state. It seems good enough to hand it to our users.

@scoder scoder added this to the 0.28 milestone Feb 14, 2018
@rth
Copy link

rth commented Feb 15, 2018

It looks like there is just one docstring failing in Travis.

Another good news is that it looks like at least one build of pandas built sucessfully with Cython from this PR (the other are still running at the moment of writing)

The not so good one is that scikit-learn gets a compilation error with Cython from this PR (it does pass with Cython from the master branch). I should be able to investigate and isolate the issue tomorrow (to add it to the test suite), but it looks like it might have something to do with memoryviews with fused types passed between functions...

I'm not sure how you would prefer to handle this @scoder and whether it would have an impact on this PR or if I should open a separate issue? Thanks!

@scoder
Copy link
Contributor Author

scoder commented Feb 15, 2018

Yeah ... turns out that travis is not happy. In fact, this is enough to confuse the auto-readonly detection:

def writable(int[:] mslice):
    new_var = mslice
    new_var[2] = 23

This fails because it requests a read-only view, which is then not assignable through Python's item setting. Thus, I cannot seriously consider this part of the feature ready. It needs more work.

But at least the const views should be fine, so people can start using read-only buffers by explicitly preparing their code for it. Much better than nothing.

I'll disable the auto-readonly detection, then you can give it another try in scikit-learn to see if that fixes it.

@rth
Copy link

rth commented Feb 15, 2018

So to recapitulate, to check that I understood correctly, the example from #1605 (comment) would fail with a ro input array, unless the function was defined with cpdef getmax(const double[:] x), right? That would be a significant improvement...

@lesteve @jnothman For joblib related parallelism in scikit-learn, specifying ro memoryviews manually with const would work, wouldn't it?

@scoder
Copy link
Contributor Author

scoder commented Feb 15, 2018

to check that I understood correctly ...

Yes, that is correct. And yes, that is a major improvement.

@scoder
Copy link
Contributor Author

scoder commented Feb 15, 2018

I disabled the auto-detection and fixed a couple of further issues. The "usual suspect" tests pass, but travis has some serious problems today, so I'll take a look at it tomorrow.

@rth: would be nice if you could already retest on your side, now that the feature is in a safer state.

Note that simply changing the memory view declaration to const might not always be enough since it's actually the dtype of the view itself that becomes const, i.e. the memoryview declaration parses as (const double)[:] and not const (double[:]). In some cases, this might require further changes to the processing code if structured types or pointers are involved. The most common use case of reading numeric values from the view should not be impacted as the const modifier has no effect when copying by value.

@scoder
Copy link
Contributor Author

scoder commented Feb 16, 2018

Ok, travis likes it and the latest pandas build also seems to have succeeded. Let's get it in.

@scoder scoder merged commit 9ca6532 into cython:master Feb 16, 2018
@jnothman
Copy link

jnothman commented Feb 16, 2018 via email

@scoder
Copy link
Contributor Author

scoder commented Feb 16, 2018

I also think I found a fix for the scikit-learn compilation failure. Please try with the master branch from now on.

@rth
Copy link

rth commented Feb 16, 2018

scikit-learn also builds sucessfully with the current master branch, so everything looks great!

Thanks for making this happen!

@lesteve
Copy link

lesteve commented Feb 16, 2018

Great stuff, thanks a lot @scoder! I played with this a bit, and I get the impression that const double[:] is a valid syntax in cython 0.27 but that it is the same as double[:], i.e. the const does not do anything. Could you comment on this @scoder? If that's the case it would allow to use the same code for cython > 0.28 and cython <= 0.28 and thus simplify maintenance for scikit-learn.

Another thing I noticed: I was not able to find a way to create a writable memoryview from a readable one (using cython master):

import numpy as np

cdef const double[:] ro = np.array([1., 2.])
cdef double[:] rw = np.empty_like(ro)
rw[:] = ro

The error I get is:
Memoryview 'const double[:]' not conformable to memoryview 'double[:]'

@scoder
Copy link
Contributor Author

scoder commented Feb 16, 2018

Thanks for the reproducer, fixed here: 6704d23

@scoder
Copy link
Contributor Author

scoder commented Feb 16, 2018

No idea what const double[:] did before. I'm actually surprised that you said it passes - I thought that I had seen some errors on that front...

In any case, whatever 0.27 did is not going to change any more in that release series.

@lesteve
Copy link

lesteve commented Feb 19, 2018

This seems to work fine on cython 0.27:

cpdef func(const double[:] arr):
    # of course the const does not do anything for cython < 0.28 so you can change the array
    arr[0] = -999

import numpy as np
arr = np.ones(3)
func(arr)

Just trying to clarify, scikit-learn requires cython >= 0.23 at the moment. I am trying to investigate whether it is possible for scikit-learn to have a code compatible for cython < 0.28 and cython >= 0.28.

@jnothman
Copy link

jnothman commented Feb 19, 2018 via email

@lesteve
Copy link

lesteve commented Feb 19, 2018

We could do that but we will have to wait until cython 0.28 is released first. If there is an easy way to support both cython > 0.27 and cython <= 0.27, this may make it easier to use the "const typed memoryview" feature incrementally in scikit-learn, experiment with this feature and possibly iron out a few quirks before cython 0.28 is released.

@rth
Copy link

rth commented Feb 19, 2018

This seems to work fine on cython 0.27:

The generated C code on 0.27 with and without const or the above snippet is different: func_const.c vs func.c see diff below (excluding comments),

1542c1542
< static PyObject *__pyx_f_6mmview_func(__Pyx_memviewslice const , int __pyx_skip_dispatch); /*proto*/
---
> static PyObject *__pyx_f_6mmview_func(__Pyx_memviewslice, int __pyx_skip_dispatch); /*proto*/
1842c1842
< static PyObject *__pyx_f_6mmview_func(__Pyx_memviewslice const __pyx_v_arr, CYTHON_UNUSED int __pyx_skip_dispatch) {
---
> static PyObject *__pyx_f_6mmview_func(__Pyx_memviewslice __pyx_v_arr, CYTHON_UNUSED int __pyx_skip_dispatch) {

but I imagine as long as it works and produces the expected result, it might be OK..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants