Segmentation fault when CPU load is high #311

Open
ivan opened this Issue Feb 22, 2016 · 1 comment

Comments

Projects
None yet
2 participants
@ivan
Contributor

ivan commented Feb 22, 2016

I'm seeing wpull 1.2.3 on Python 3.4.3 repeatably crash when running it on a machine with high CPU load. I've observed this on both Ubuntu 14.04 and 15.10 (different machines; the 14.04 is on a Core i3 from 2011 and 15.10 on a 4-core 4790K).

You can probably reproduce this by running this web server: https://github.com/ludios/crawl-destroyer

and then starting a lot of crawls with

(mkdir j-1 && cd j-1 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-2 && cd j-2 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-3 && cd j-3 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-4 && cd j-4 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-5 && cd j-5 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-6 && cd j-6 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-7 && cd j-7 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-8 && cd j-8 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-9 && cd j-9 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-10 && cd j-10 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-11 && cd j-11 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-12 && cd j-12 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-13 && cd j-13 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-14 && cd j-14 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-15 && cd j-15 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-16 && cd j-16 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-17 && cd j-17 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-18 && cd j-18 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-19 && cd j-19 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-20 && cd j-20 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-21 && cd j-21 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-22 && cd j-22 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-23 && cd j-23 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-24 && cd j-24 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-25 && cd j-25 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-26 && cd j-26 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-27 && cd j-27 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-28 && cd j-28 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-29 && cd j-29 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-30 && cd j-30 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-31 && cd j-31 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &
(mkdir j-32 && cd j-32 && ~/.local/bin/wpull --quiet --output-file wpull.log --delete-after --concurrent 5 --warc-file warc --recursive http://127.0.0.1:3000/ > log 2> log) &

After about 10-30 minutes on a 4-core 4790K, I see at least one process crash with

[11]    segmentation fault  ( mkdir j-11 && cd j-11 && ~/.local/bin/wpull --quiet --output-file wpull.log)

It may take longer to crash on other processors. A lower or higher number of wpull processes may be optimal, but I think the load average needs to be > 25 for a good chance of a crash.

This happens both with and without cchardet installed.

This one might be tricky to track down because heap corruption is probably happening some time before the crash. A Mozilla person suggested I use http://rr-project.org/ to try to track it down, but that the overhead might not be acceptable for Python. I will continue investigating.

@ivan

This comment has been minimized.

Show comment
Hide comment
@ivan

ivan Feb 22, 2016

Contributor

My next step might be to check if sqlalchemy is to blame, by verifying the heap with gc.collect() before and after calls to sqlalchemy.

Contributor

ivan commented Feb 22, 2016

My next step might be to check if sqlalchemy is to blame, by verifying the heap with gc.collect() before and after calls to sqlalchemy.

@chfoo chfoo added the bug label Feb 22, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment