-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python 3: ~100MB/sec vs. 800MB/sec FileService throughput #485
Comments
Main thread (Py3): Log
. Main thread (Py2) Log
. Broker thread (Py3): Log
. Broker thread (Py2): Log
|
Broker thread WTFs:
Main thread WTFs:
|
For the read() mess, this measures no obvious difference in perf output or runtime.
Could /dev/zero really be slow enough to hide the problem? I think not. I'm suspecting this may be due to the GIL |
Python 3 slows down a little bit when an extra thread is present, but only by ~10%ish. Python 3's base overhead is significantly higher, which is obvious by changing the read size to 64/128, but still nowhere nearly enough to explain the profile output |
* python24: WIP first run of py24 CI issue #477: initial Python 2.4.6 build for CI. issue #477: enable git-lfs for tests/data/*.tar.bz2. issue #477: import build script for Python 2.4.6. issue #477: add mitogen_py24 CI test type. issue #477: disable Django parts of module_finder_test on 2.4. issue #477: clean up globals after unix_test. issue #477: remove unused pytest bits from importer_test. issue #477: remove fork use from unix_test. parent: don't kill child when profiling=True issue #485: import new throuhgput bench issue #477: more fork removal issue #477: Py2.4 startswith() did not support tuples. issue #477: util/fakessh/two_three_compat fixes. issue #477: call_function_test fixes for 2.4. issue #477: promote setup_gil() to mitogen.utils issue #477: fix lxc_test any polyfill import. issue #477: stop using fork in responder_test. issue #477: stop using fork in service_test. issue #477: Python<2.5 ioctl() request parameter was signed. issue #477: stop using fork() in parent_test, compatible enumerate(). issue #477: Popen.terminate() polyfill for Py2.4. issue #477: stop using .fork() in router_test, one small 2.4 fix. issue #477: document master.Router.max_message_size. issue #477: old Py zlib did not include extended exception text. issue #477: stop using router.fork() in receiver_test issue #477: any() polyfill for lxc_test. issue #477: replace type(e) -> __class__ for an exception issue #477: old Mock does not throw side_effect exceptions from a list issue #477: 2.4 stat() returned int timestamps not float. issue #477: set().union(a, b, ..) unsupported on Py2.4. issue #477: Logger.log(extra=) unsupported on Py2.4. issue #477: fix another Threading.getName() call. issue #477: %f date format requires Py2.6 or newer. issue #477: make mitogen.fork unsupported on Py<2.6. issue #477: Py2.4 dep scanner bytecode difference Drop 'alpha' trove classifier issue #477: fix another str/bytes mixup. issue #477: blacklist 'thread' module to avoid roundtrip on 2.x->3.x issue #477: fix 3.x failure in new target.set_file_mode() function. issue #477: fix 3.x failure in new target.set_file_mode() function. issue #477: fix 2 runner tests on Ansible 2.7. issue #477: fix 3.x test regressions. issue #477: fix new KwargsTest on Python 3.x. issue #477: ModuleFinder now returns Unicode module names. issue #477: Python3 does not have Pickler.dispatch. issue #477: ModuleFinder test fixes. issue #477: Ansible 2.3 compatible regression/all.yml. issue #477: Ansible 2.3 requires placeholder module for assert_equals issue #477: build a CentOS 5/Py2.4 container + playbook compat fixes. issue #477: use PY24 constant rather than explicit test. issue #477: backport mitogen.master to Python 2.4. issue #477: parent: make iter_read() log disconnect reason. issue #477: backport ansible_mitogen.runner to 2.4. issue #477: backport various test modules to Python 2.4. issue #477: backport ansible_mitogen/target.py to Python2.4 issue #477: add all() polyfill to custom_python_detect_environmnet issue #477: polyfill partition() use in mitogen.parent. issue #477: polyfill partition() use in mitogen.service. issue #477: polyfill partition() use in mitogen.ssh. issue #477: vendorize the last 2.4-compatible simplejson issue #477: _update_linecache() must append newlines. issue #415, #477: Poller must handle POLLHUP too. issue #477: Python 2.5 needs next() polyfill too. issue #477: explicitly populate Py2.4 linecache from Importer. issue #477: rename and add tests for polyfill functions. issue #477: various core.py docstring cleanups. issue #477: Ansible 2.3 module output format difference. issue #477: Ansible 2.3 cannot use when: on an include. issue #477: tests: use Ansible 2.3-compatible include rather than import issue #477: serve up junk ansible/__init__.py just like Ansible. issue #477: testlib: Py2.4 did not have BaseException. issue #477: master: ability to override ModuleResponder output. issue #477: yet another bug in core._partition(). issue #477: 2.4.x compat fixes for mitogen.service. issue #477: Py2.4 lacks all(). issue #477: Ansible 2.3 had stricter arg spec format. issue #477: make CallError serializable on 2.4. issue #477: log full module name when SyntaxError occurs. issue #477: more Py2.4 (str|unicode).partition(). issue #477: Py2.4 cannot tolerate unicode kwargs. issue #477: Py2.4 lacks BaseException. issue #477: Py2.4: enumerate() may return stopped threads. issue #477: Py2.4: more unicode.rpartition() usage. issue #477: Python 2.4 type(exc) returns old-style instance. issue #477: Python 2.4 lacked str.partition. issue #477: Python 2.4 lacked Thread.name. issue #477: Python 2.4 lacked context managers. issue #477: Python <2.5 did not have combined try/finally/except. issue #477: older Ansibles had no vars plugin base class. issue #477: Python <2.5 lacked any(). issue #477: Python <2.6 lacked rpartition(). issue #477: make CallError inherit from object for 2.4/2.5. issue #477: 2.4/2.5 had no better poller than poll().
* origin/dmw: (135 commits) tests: just disable the test. tests: hopefully fix this dumb test for the final time docs: update Changelog; closes #477. issue #477: use MITOGEN_INVENTORY_FILE everywhere. issue #477: hacksmash weird 2.3 inventory_file var issue. issue #477: travis.yml typo. issue #477: fix sudo_args selection. issue #477: one more conditional test. issue #477: enable Ansible 2.3.3 CI. issue #477: some more conditional tests. docs: update Changelog. issue #477 / ansible: avoid a race in async job startup. issue #477: use assert_equal for nicer debug. issue #477: fix source of become_flags on 2.3. issue #477: add Connection.homedir test. core: docstring tidyups. core: ensure early debug messages are logged correctly. core: log disconnection reason. issue #477: target.file_exists() wrapper. issue #477: introduce subprocess isolation. ansible: docstring fixes. issue #477: paper over Ansible 2.3 flag handling difference issue #477: update forking_correct_parent for subprocess isolation issue #477: shlex.split() in 2.4 required bytes input. issue #477: get rid of perl JSON module requirement. issue #477: Ansible 2.3 did not support gather_facts min subset. issue #477: CentOS 5 image requires perl installed too. issue #477: missing stub-su.py from 137f5fa issue #477: 2.4-compatible syntax. issue #477: clearing glibc caches is not possible on Py2.4. parent: --with-pydebug bootstrap could fail due to corrupted stream issue #477: install simplejson for vanilla tests. docs: update Changelog. ansible: synchronize module needs '.docker_cmd' attr for Docker plugin. issue #477: add basic su_test and Py2.4 polyfill. issue #477: import updated Python build scripts ci: don't use the TTY->pipe hack except on Travis where it's needed. WIP first run of py24 CI issue #477: initial Python 2.4.6 build for CI. issue #477: enable git-lfs for tests/data/*.tar.bz2. issue #477: import build script for Python 2.4.6. issue #477: add mitogen_py24 CI test type. issue #477: disable Django parts of module_finder_test on 2.4. issue #477: clean up globals after unix_test. issue #477: remove unused pytest bits from importer_test. issue #477: remove fork use from unix_test. parent: don't kill child when profiling=True issue #485: import new throuhgput bench issue #477: more fork removal issue #477: Py2.4 startswith() did not support tuples. issue #477: util/fakessh/two_three_compat fixes. issue #477: call_function_test fixes for 2.4. issue #477: promote setup_gil() to mitogen.utils issue #477: fix lxc_test any polyfill import. issue #477: stop using fork in responder_test. issue #477: stop using fork in service_test. issue #477: Python<2.5 ioctl() request parameter was signed. issue #477: stop using fork() in parent_test, compatible enumerate(). issue #477: Popen.terminate() polyfill for Py2.4. issue #477: stop using .fork() in router_test, one small 2.4 fix. issue #477: document master.Router.max_message_size. issue #477: old Py zlib did not include extended exception text. issue #477: stop using router.fork() in receiver_test issue #477: any() polyfill for lxc_test. issue #477: replace type(e) -> __class__ for an exception issue #477: old Mock does not throw side_effect exceptions from a list issue #477: 2.4 stat() returned int timestamps not float. issue #477: set().union(a, b, ..) unsupported on Py2.4. issue #477: Logger.log(extra=) unsupported on Py2.4. issue #477: fix another Threading.getName() call. issue #477: %f date format requires Py2.6 or newer. issue #477: make mitogen.fork unsupported on Py<2.6. issue #477: Py2.4 dep scanner bytecode difference Drop 'alpha' trove classifier issue #477: fix another str/bytes mixup. issue #477: blacklist 'thread' module to avoid roundtrip on 2.x->3.x issue #477: fix 3.x failure in new target.set_file_mode() function. issue #477: fix 3.x failure in new target.set_file_mode() function. issue #477: fix 2 runner tests on Ansible 2.7. issue #477: fix 3.x test regressions. issue #477: fix new KwargsTest on Python 3.x. issue #477: ModuleFinder now returns Unicode module names. issue #477: Python3 does not have Pickler.dispatch. issue #477: ModuleFinder test fixes. issue #477: Ansible 2.3 compatible regression/all.yml. issue #477: Ansible 2.3 requires placeholder module for assert_equals issue #477: build a CentOS 5/Py2.4 container + playbook compat fixes. issue #477: use PY24 constant rather than explicit test. issue #477: backport mitogen.master to Python 2.4. issue #477: parent: make iter_read() log disconnect reason. issue #477: backport ansible_mitogen.runner to 2.4. issue #477: backport various test modules to Python 2.4. issue #477: backport ansible_mitogen/target.py to Python2.4 issue #477: add all() polyfill to custom_python_detect_environmnet issue #477: polyfill partition() use in mitogen.parent. issue #477: polyfill partition() use in mitogen.service. issue #477: polyfill partition() use in mitogen.ssh. issue #477: vendorize the last 2.4-compatible simplejson issue #477: _update_linecache() must append newlines. issue #415, #477: Poller must handle POLLHUP too. ...
I have a feeling that this could be related: paramiko/paramiko#1141 |
I updated the timings in that issue for the latest paramiko but there is little/no difference. I'd love to know if you guys get to the bottom of what this massive difference in performance is. |
Great, thank you! I would like to find out if we have a non-linear performance curve here with mitogen just like Paramiko. I think that would be very telling. If this is only a linear decrease, it seems different (though maybe still partly related). |
The Paramiko problem was separate, Mitogen doesn't actually use Paramiko at all. The problem here is fundamental to a permanent compatibility hack in Python 3.x, that basically renders cPickle completely unsuitable for transporting binary data any more. The problem is that when communicating using Pickle protocol v2 or older (i.e. with Python 2), Python 3 represents all 8-bit bytestrings in the pickle by transliterating them to Unicode using the LATIN-1 codec, and undoing the result in the remote side. Unicode in turn is represented in pickle as UTF-8, so every chunk of raw data is doubly transcoded (and copied) during serialization and deserialization. Basically this can't be fixed -- the compatibility hack is needed to support yet more compatibility hacks elsewhere in pickle to make it appear like 2.x and 3.x pickles are compatible. That means every 8-bit bytestring experiences on average 1.5x size expansion on the wire, and that size is variable depending on the actual data bytes being transmitted (thus making it unpredictable, and impossible to correctly align to frame boundaries), and every bytestring must roundtrip first through the UTF-8 decoder then the LATIN-1 decoder every time it is deserialiezd. It looks like this ticket killed Mitogen's use of pickle. Now we need to consider replacing it with a custom serializer with much higher priority than before. |
i recall that this was the reason why exec-net eventually made a own serializer, pickle just turned out to be a liability personally i would like to see some kind of layering, so each channel/transport endpoint can have a own serializer setup, allowing to configure for raw data, json, pickle, and others instead of enforcing a specific serializer |
I'm generally anti-modularity unless a use case makes absolute sense. One of the problems discovered here, broken frame alignment, is alone likely responsible for 25%+ of the slowdown. Optimizations like that require precise knowledge of lower layers, which can be hard to maintain through an abstraction. In other words, particularly in this case, I believe it is much better to have a single option that works very well for many cases, than many options catering poorly to every case |
Wow, thank you for digging into this so much. I appreciate it! |
I stumbled across this too. As a shot in the dark, since Mitogen controls both the pickling and unpickling, the extension registry could be used to implement something without throwing out all of pickle. I need to think about that.
AFAICT the options aren't great. Whatever is chosen needs to be implemented in C, for any hope of speed, and that means binary dependencies. That will be hard enough across 2.7, 3.5, 3.6. Adding 2.6 makes it harder, adding 2.4 or 2.5 verges on impossible. I'll try an integration of pikl, but currently it will suffer the same problem as discussed here. If I can get the time and perseverance I think a Rust based implementation of pikl, that implemented protocol 3 (or higher) on all supported Python versions would be a win. I already tried backporting the Finally, I've been meaning to look at the format CBOR more closely (and the package cbor2) as possible basis to work on. However IIRC, it can't round trip tuples in it's present form. |
In case anyone needs some for comparison, I made pickles with various CPython versions https://github.com/moreati/pickle-fuzz/tree/master/pickles |
I don't think this buys us anything. However, zodbpickle may be good enough, at least as a stopgap. The latest release supports CPython 2.7 & 3.4-3.7, a release from 2015 supports CPython 2.6. Critically it supports Pickle protocol 3 on Python 2.x, and it has a mechanism for pickling Python 2.x
Edit 1: pikl is currently forked from zodbpickle |
I'm having a play with your pickle-fuzz repo at the moment :) Feeling slightly more optimistic after a few hours to think about it, at least in the case of file transfer, it is possible to simply remove the use of pickle -- it is only there to allow the file sender to abort the transfer, which could simply be signalled some other way. That would fix the most egregious inefficiency, but only as a stopgap measure Regarding a replacement for pickle, losing e.g. tuples might not be the end of the world, we could simply bump the major version number. It's more important at least for me that whatever serialization used is subjectively "convenient" and does not regularly feature prominently in profile output. Really the bar isn't too high at this point -- we could potentially implement some serialization in bash and it'd still beat what we have on Python 3 right now 😃 |
I haven't profiled yet, but this transcoding madness is possibly a large part of why Mitogen for Ansible has always been slower on 3.x -- we rely on efficient binary serialization for transfer of all kinds of data (like Ansible modules) in the extension |
To be clear, pickle-fuzz is a dumping ground of experiments and half finished prototypes. The output of all that is pikl. |
Hey @moreati, I pushed a branch with a first pass at this, if you fancy taking a look. The semantics change likely needs a version number bump, so fixing this could be the first patch to 0.3.x. Random thoughts:
|
Although the draft is fairly efficient on CPU, on the wire it's pretty wasteful: >>> len(mitogen.core.encode((1, 2, 3, 4)))
25
>>> len(pickle.dumps((1, 2, 3, 4), protocol=2))
15 It's because all the container types (and string) use fixed 32bit integers for lengths, and there is nothing smaller than a 32bit integer. The more specializations that are added the slower everything will get, but it is tempting (either now or later) to add e.g.
Thatt'll likely blow out the bootstrap a little more. I guess the question is use cases. For the Ansible extension since SSH compression was always used, the fixed integer types will have basically no effect on the wire, but what about other cases? |
Looking |
FWIW I think this draft is ugly as sin :) It's motivated as usual by file size, but I don't even think that's a pressing concern.. getting serialization right is important enough to spend bytes on. I think the size bar for core.py is probably something like 40kb (hopefully it never happens), then it's time to start really worrying, so there is plenty of room to play with if it produces a nicer solution. |
Initial thoughts
Bike shed
|
+1 on dumps/loads. I forgot about pencode! It preserves serialization of cyclical structures, handles long ints and is already the right size. Could we overcome this LATIN1 transcoding problem with pikl? There would need to be some way to ship any solution to 3.x targets since they are the root of the problem, so I think it either needs to be part of core.py, or there must be a better way to ship additional modules alongside core.py during bootstrap. |
pencode supports Python 2.7 (I mistakenly thought not), adding Python 2.4 support looks doable.
I need to investigate further. Until this moment I thought that would require upgrading pikl to pickle protocol 4. Looking again, it might be fine with the existing protocol 3 support. If protocol 4 were necessary it would require a pure Python implementation on Python 2.x, with the associated CPU hit. Research progress:
marshal is very intriguing.
Cons
|
I love the name mrna :) I'd like to do some archeology and see what kept execnet away from marshal. It's definitely had historical security issues like stack/integer overflows, the question is whether they're relevant here. If we support talking to Python 2.4, and 2.4 has a marshal module with some kind of RCE in it, it's fine so long as any trusted parent is running a patched marshal, since parent->child RCE is the whole point of the library. But e.g. going host A -> B -> C where bastion B is running an ancient Python, this cannot work where any response from C is interpreted by B (e.g. something like REQUEST_MODULE). It'd be great to use this opportunity to eliminate any fear surrounding the serializer entirely. I think the strongest contender so far is pencode -- but its support for cyclical structures opens trusted code to algorithmic attacks (infinite recursion or loops), so the icing on the cake (I think) would be pencode minus memoization. |
An ominous message from the execnet sources (earliest commit in the current Git): # XXX marshal.dumps doesn't work for exchanging data across Python
# version :-((( There is no sane solution, short of a custom
# pure Python marshaller This may be referring to pre-2.4 Pythons, though |
Very alpha pickle protocol 4 backport https://github.com/moreati/pickle4 |
Fixed up the pickle4 backport enough to benchmark on Python 2.7 and 3.7. It's slow. There's a lot of branching, in pure python. Learned some ancient Python lore Next, I'm going to look at pencode some more. |
pencode_read5 is a variant of pencode. It decodes between 2 and 3 times faster than plain pencode. This brings it within 10x of cpickle.
To do this pencode_read5 removes the dedicated the op codes for |
This commit is not intended to be merged. It exists to help characterise/diagnose performance regression(s) of Mitogen × Python 3.x. It will not work with Python 2.x. See mitogen-hq#485
Short to medium term ideas
|
…oplefonesk-2-new-number 15558-slovakia-peoplefonesk-2-new-number
Recorded using
tests/bench/throughput.py
, either running withpython_path=["perf", "record", "-g", "python"]
/ perf report, or withenable_profiling=True
to capture the cProfile output. All methods except.local()
are commented outSome insane string encoding somewhere. Py2:
Py3:
Traces are from receiver only
The text was updated successfully, but these errors were encountered: