New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError in _parse_message() when using PyPy #509

Closed
vstinner opened this Issue Mar 15, 2017 · 18 comments

Comments

Projects
None yet
3 participants
@vstinner
Contributor

vstinner commented Mar 15, 2017

Hi,

I'm working on the http://github.com/python/performance benchmark suite which has a "dulwich_log" benchmark. When I tried to upgrade Dulwich from 0.16.2 to 0.17.1, I got the following error using PyPy 5.4.0 on Fedora 25:

INFO:root:Running `/home/haypo/prog/python/performance/venv/pypy2.7-b5a8c1089518/bin/python /home/haypo/prog/python/performance/performance/benchmarks/bm_dulwich_log.py --debug-single-value --output /tmp/tmpZjCAtn`
Traceback (most recent call last):
  File "/home/haypo/prog/python/performance/performance/benchmarks/bm_dulwich_log.py", line 27, in <module>
    runner.bench_func('dulwich_log', iter_all_commits, repo)
  File "/home/haypo/prog/python/performance/venv/pypy2.7-b5a8c1089518/site-packages/perf/_runner.py", line 636, in bench_func
    return self._main(name, time_func, inner_loops, metadata)
  File "/home/haypo/prog/python/performance/venv/pypy2.7-b5a8c1089518/site-packages/perf/_runner.py", line 547, in _main
    bench = self._worker(name, time_func, inner_loops, metadata)
  File "/home/haypo/prog/python/performance/venv/pypy2.7-b5a8c1089518/site-packages/perf/_runner.py", line 510, in _worker
    inner_loops)
  File "/home/haypo/prog/python/performance/venv/pypy2.7-b5a8c1089518/site-packages/perf/_runner.py", line 468, in _worker_run_bench_mem
    inner_loops)
  File "/home/haypo/prog/python/performance/venv/pypy2.7-b5a8c1089518/site-packages/perf/_runner.py", line 448, in _worker_run_bench
    loops=loops, nvalue=args.values)
  File "/home/haypo/prog/python/performance/venv/pypy2.7-b5a8c1089518/site-packages/perf/_runner.py", line 372, in _run_bench
    raw_value = time_func(loops)
  File "/home/haypo/prog/python/performance/venv/pypy2.7-b5a8c1089518/site-packages/perf/_runner.py", line 616, in time_func
    local_func(*local_args)
  File "/home/haypo/prog/python/performance/performance/benchmarks/bm_dulwich_log.py", line 14, in iter_all_commits
    for entry in repo.get_walker(head):
  File "/home/haypo/prog/python/performance/venv/pypy2.7-b5a8c1089518/site-packages/dulwich/walk.py", line 354, in _next
    entry = next(self._queue)
  File "/home/haypo/prog/python/performance/venv/pypy2.7-b5a8c1089518/site-packages/dulwich/walk.py", line 185, in next
    self._push(parent_id)
  File "/home/haypo/prog/python/performance/venv/pypy2.7-b5a8c1089518/site-packages/dulwich/walk.py", line 145, in _push
    obj = self._store[object_id]
  File "/home/haypo/prog/python/performance/venv/pypy2.7-b5a8c1089518/site-packages/dulwich/object_store.py", line 119, in __getitem__
    return ShaFile.from_raw_string(type_num, uncomp, sha=sha)
  File "/home/haypo/prog/python/performance/venv/pypy2.7-b5a8c1089518/site-packages/dulwich/objects.py", line 391, in from_raw_string
    obj.set_raw_string(string, sha)
  File "/home/haypo/prog/python/performance/venv/pypy2.7-b5a8c1089518/site-packages/dulwich/objects.py", line 303, in set_raw_string
    self.set_raw_chunks([text], sha)
  File "/home/haypo/prog/python/performance/venv/pypy2.7-b5a8c1089518/site-packages/dulwich/objects.py", line 308, in set_raw_chunks
    self._deserialize(chunks)
  File "/home/haypo/prog/python/performance/venv/pypy2.7-b5a8c1089518/site-packages/dulwich/objects.py", line 1152, in _deserialize
    parse_commit(chunks))
  File "/home/haypo/prog/python/performance/venv/pypy2.7-b5a8c1089518/site-packages/dulwich/objects.py", line 1092, in parse_commit
    for field, value in _parse_message(chunks):
  File "/home/haypo/prog/python/performance/venv/pypy2.7-b5a8c1089518/site-packages/dulwich/objects.py", line 633, in _parse_message
    (k, v) = l.split(b' ', 1)
ValueError: expected length 2, got 1

PyPy:

haypo@selma$ pypy --version
Python 2.7.10 (77392ad263504df011ccfcabf6a62e21d04086d0, Sep 12 2016, 15:21:10)
[PyPy 5.4.0 with GCC 6.2.1 20160901 (Red Hat 6.2.1-1)]

The workaround is to keep the old 0.16.4 version.

@vstinner

This comment has been minimized.

Contributor

vstinner commented Mar 15, 2017

I tested more dulwich version. I get an error with dulwich >= 0.11.0, but it works with dulwich 0.10.0.

Code:

import dulwich.repo
repo_path = 'performance/benchmarks/data/asyncio.git'
repo = dulwich.repo.Repo(repo_path)
head = 'eba6a462551421fd0be31899436dcfdbead086be'
for entry in repo.get_walker(head):
    print(entry.commit)
    break

where asyncio.git/ is https://github.com/python/performance/tree/master/performance/benchmarks/data/asyncio.git

The bug is in the C version of apply_delta(). If I force the Python apply_delta(), the code works fine.

@vstinner

This comment has been minimized.

Contributor

vstinner commented Mar 16, 2017

It seems like PyPy 5.4.0 miscompile the following C function:

static PyObject *py_apply_delta(PyObject *self, PyObject *args)
{
	uint8_t *out;
	PyObject *ret;

	ret = PyString_FromStringAndSize(NULL, 2);
	if (ret == NULL) {
		PyErr_NoMemory();
		return NULL;
	}
	out = (uint8_t *)PyString_AsString(ret);
        out[0] = 1;
        out[1] = 2;
        return ret;
}

Calling this C function from PyPy returns '\x00\x00' (byte string of 2 null bytes), whereas CPython 3.5 returns '\x01\x02' as expected.

It don't know if the C version of apply_delta() ever worked on PyPy.

The workaround is to use to skip C code on PyPy and use the pure Python version. PyPy should even be able to optimize hot Python code like apply_delta().

@vstinner

This comment has been minimized.

Contributor

vstinner commented Mar 16, 2017

setup.py patch to disable C extensions on PyPy:

diff --git a/setup.py b/setup.py
index 3e5b0fa..9be46dc 100755
--- a/setup.py
+++ b/setup.py
@@ -8,6 +8,7 @@ try:
 except ImportError:
     from distutils.core import setup, Extension
 from distutils.core import Distribution
+import platform
 
 dulwich_version_string = '0.17.2'
 
@@ -52,8 +53,10 @@ if not '__pypy__' in sys.modules and not sys.platform == 'win32':
     tests_require.extend([
         'gevent', 'geventhttpclient', 'mock', 'setuptools>=17.1'])
 
-if sys.version_info[0] > 2 and sys.platform == 'win32':
-    # C Modules don't build for python3 windows, and prevent tests from running
+if ((sys.version_info[0] > 2 and sys.platform == 'win32')
+   or platform.python_implementation().lower() == 'pypy'):
+    # C Modules don't build for python3 windows, and prevent tests from running.
+    # On PyPy, don't use C extensions.
     ext_modules = []
 else:
     ext_modules = [
@vstinner

This comment has been minimized.

Contributor

vstinner commented Mar 16, 2017

My test case to reproduce the bug:

try:
    from dulwich._pack import apply_delta
    print("USE C IMPL")
except ImportError:
    from dulwich.pack import apply_delta
    print("USE PYTHON IMPL")

chunks = [
    b'tree 03207ccf58880a748188836155ceed72f03d65d6\n'
    b'parent 408fbab530fd4abe49249a636a10f10f44d07a21\n'
    b'author Victor Stinner <victor.stinner@gmail.com> 1421355207 +0100\n'
    b'committer Victor Stinner <victor.stinner@gmail.com> 1421355207 +0100\n'
    b'\n'
    b'Backout changeset 3a06020af8cf\n'
    b'\nStreamWriter: close() now clears the reference to the transport\n'
    b'\nStreamWriter now raises an exception if it is closed: write(), writelines(),\n'
    b'write_eof(), can_write_eof(), get_extra_info(), drain().\n']
delta = [b'\xcd\x03\xad\x03]tree ff3c181a393d5a7270cddc01ea863818a8621ca8\n'
         b'parent 20a103cc90135494162e819f98d0edfc1f1fba6b\x91]7\x0510738'
         b'\x91\x99@\x0b10738 +0100\x93\x04\x01\xc9']
res = apply_delta(chunks, delta)
expected = [b'tree ff3c181a393d5a7270cddc01ea863818a8621ca8\nparent 20a103cc90135494162e819f98d0edfc1f1fba6b',
            b'\nauthor Victor Stinner <victor.stinner@gmail.com> 14213',
            b'10738',
            b' +0100\ncommitter Victor Stinner <victor.stinner@gmail.com> 14213',
            b'10738 +0100',
            b'\n\nStreamWriter: close() now clears the reference to the transport\n\nStreamWriter now raises an exception if it is closed: write(), writelines(),\nwrite_eof(), can_write_eof(), get_extra_info(), drain().\n']

if 1:
    res = b''.join(res)
    expected = b''.join(expected)

if res != expected:
    print("FAIL!")
    print("got (%s) %r" % (len(res), res))
    print("expected (%s) %r" % (len(expected), expected))
else:
    print("ok!")
@vstinner

This comment has been minimized.

Contributor

vstinner commented Mar 16, 2017

It's a bug in PyPy:
https://bitbucket.org/pypy/pypy/issues/2499/cpyext-pystring_asstring-doesnt-work

Another fix for _pack.c is to replace PyString_AsString() with PyString_AS_STRING().

@jelmer

This comment has been minimized.

Member

jelmer commented Mar 16, 2017

Nice find! Changing to PyString_AS_STRING() seems reasonable to me. Could you send a pull request to that effect, ideally with your test case added to dulwich/tests/test_pack.py?

@Yardanico

This comment has been minimized.

Yardanico commented Mar 16, 2017

@vstinner

This comment has been minimized.

Contributor

vstinner commented Mar 16, 2017

Nice find! Changing to PyString_AS_STRING() seems reasonable to me. Could you send a pull request to that effect, ideally with your test case added to dulwich/tests/test_pack.py?

Done: PR #510.

@vstinner

This comment has been minimized.

Contributor

vstinner commented Mar 16, 2017

@TiberiumPY: "https://bitbucket.org/pypy/pypy/issues/2499/cpyext-pystring_asstring-doesnt-work
It was already fixed before in PyPy 5.6"

Cool! Good to know! But I expect that many Linux distro still only provide PyPy < 5.6. For example, I'm using Fedora 25 which is kind of "bleeding edge" but only provide PyPy 5.4.

@Yardanico

This comment has been minimized.

Yardanico commented Mar 16, 2017

@Haypo probably because there's no interest in PyPy from fedora maintainers

jelmer added a commit that referenced this issue Mar 16, 2017

jelmer added a commit that referenced this issue Mar 16, 2017

Replace PyString_AsString with PyString_AS_STRING
Fix issue #509: Replace PyString_AsString with PyString_AS_STRING to
workaround a PyPy bug in cpyext (PyPy older than 5.6).
@jelmer

This comment has been minimized.

Member

jelmer commented Mar 16, 2017

Merged #510, thanks!

@jelmer jelmer closed this Mar 16, 2017

@vstinner

This comment has been minimized.

Contributor

vstinner commented Mar 17, 2017

Would it be possible to get a dulwich 0.17.2 release with this fix please?

@vstinner

This comment has been minimized.

Contributor

vstinner commented Mar 17, 2017

@TiberiumPY: "@Haypo probably because there's no interest in PyPy from fedora maintainers"

It seems like the latest PyPy version on Travis-CI is PyPy 5.3.1 :-( I'm using Travis CI to test PyPy support in http://github.com/python/performance project.

@Yardanico

This comment has been minimized.

Yardanico commented Mar 17, 2017

@Haypo you can use newer pypy's on travis
example: biopython/biopython@4b4902f
so it downloads @squeaky-pl's portable pypy binaries, installs pip on them, and then does stuff

@jelmer

This comment has been minimized.

Member

jelmer commented Mar 18, 2017

I'll try to do a release some time this week.

https://github.com/python/performance mentions that dulwich installation fails on Windows for you. How does it fail?

@vstinner

This comment has been minimized.

Contributor

vstinner commented Mar 18, 2017

@jelmer

This comment has been minimized.

Member

jelmer commented Mar 19, 2017

I've released a 0.17.2.

@vstinner

This comment has been minimized.

Contributor

vstinner commented Mar 20, 2017

Thanks! I just updated performance requirements to get Dulwich 0.17.2 and the benchmark now works on PyPy 5.4.0 on my Fedora 25!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment