Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser multiprocessing pool is not repopulated #9713

Closed
1 task
andresriancho opened this issue May 5, 2015 · 13 comments
Closed
1 task

Parser multiprocessing pool is not repopulated #9713

andresriancho opened this issue May 5, 2015 · 13 comments

Comments

@andresriancho
Copy link
Owner

andresriancho commented May 5, 2015

TODO

  • Create a unittest to verify that the pool is re-populated after we kill one process from the pool abruptly

Issue

After one 60 second timeout, all I see is a repeating series of:

[timeout] The parser took more than 60 seconds to complete parsing of "http://www.clarin.com/medios/tapas-diarios-lunes-mayo_0_1351064987.html", killed it!
Failed to find a suitable document parser. Exception "There is no parser for "http://www.clarin.com/medios/tapas-diarios-lunes-mayo_0_1351064987.html"."
...

With different URLs.

@andresriancho
Copy link
Owner Author

The pool seems to be re-populated without any code changes:

_repopulate_pool 8088
_repopulate_pool 8089
killin 8088
_repopulate_pool 8105
killin 8089

But still it's not working.

@andresriancho
Copy link
Owner Author

Working on this in this branch

@andresriancho
Copy link
Owner Author

The process pool seems to be maintained without any major issues, but:

  • Running ./collector examples/w3af/config.yml 69d8c07 with a short config.yml timeout doesn't reproduce the issue. The timeouts start to appear several minutes into the scan
  • To confirm the above I run two different tests, both with the same timeout but with different profiling "plugins".

First run

#export W3AF_CPU_PROFILING=1
export W3AF_MEMORY_PROFILING=1
export W3AF_CORE_PROFILING=1
export W3AF_THREAD_ACTIVITY=1
export W3AF_PROCESSES=1
export W3AF_PSUTILS=1
export W3AF_PYTRACEMALLOC=1

Instances which run this test:

  • i-f1c2c0de
  • i-b3c2c09c

Second run (memory profiling disabled)

#export W3AF_CPU_PROFILING=1
#export W3AF_MEMORY_PROFILING=1
export W3AF_CORE_PROFILING=1
export W3AF_THREAD_ACTIVITY=1
export W3AF_PROCESSES=1
export W3AF_PSUTILS=1
#export W3AF_PYTRACEMALLOC=1

Instances which run this test:

  • i-64c4c64b
  • i-71c4c65e

Expected results

  • If both runs without the memory profiling doesn't have timeouts, then we can be sure that profiling is the reason and nothing else should be done. Well... maybe add some kind of warning to the result? Document this strange thing?
  • If all runs (with and without memory profiling) have the timeout problem, then I'm 🖕 and need to understand and fix the issue

@andresriancho
Copy link
Owner Author

Analysis

Memory profiling enabled

  • i-f1c2c0de: 188 parser timeouts
  • i-b3c2c09c: 321 parser timeouts

Memory profiling disabled

  • i-64c4c64b: 0 parser timeouts
  • i-71c4c65e: 0 parser timeouts

Result

Memory profiling breaks multiprocessing+parser_cache

@andresriancho
Copy link
Owner Author

This is still an issue since I can't run scans to profile very important information such as:

#export W3AF_CPU_PROFILING=1
#export W3AF_MEMORY_PROFILING=1
#export W3AF_PYTRACEMALLOC=1

This is how we count the number of parser cache timeouts (10 seconds): grep -i 'seconds to compl' w3af-log-output.txt | wc -l

I believe I should run tests with the following configurations:

  • Only enable W3AF_CPU_PROFILING, disable the other two and count the number of cache timeouts
  • Only enable W3AF_MEMORY_PROFILING, disable the other two and count the number of cache timeouts
  • Only enable W3AF_PYTRACEMALLOC, disable the other two and count the number of cache timeouts
  • Consider changing the PARSER_TIMEOUT in MultiProcessingDocumentParser when the scanner is running in profiling mode; maybe it is not a critical issue, just the parser taking more time to run due to the profiling tools

@andresriancho
Copy link
Owner Author

Increasing PARSER_TIMEOUT reduced the errors but didn't make them disappear. I'm setting the timeout to 60 * 3.

@andresriancho
Copy link
Owner Author

andresriancho commented Jun 29, 2017

Potential reason documented in https://docs.python.org/2/library/multiprocessing.html:

Warning If a process is killed using Process.terminate() or os.kill() while it is trying to use a Queue, then the data in the queue is likely to become corrupted. This may cause any other process to get an exception when it tries to use the queue later on.

And another one:

Warning As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
Note that a queue created using a manager does not have this issue. See Programming guidelines.

@andresriancho
Copy link
Owner Author

@andresriancho
Copy link
Owner Author

andresriancho commented Jun 29, 2017

https://github.com/noxdafox/pebble is a potential replacement for mp_document_parser implementation , these guys might be handling some of the errors explained above correctly with a lock:

https://github.com/noxdafox/pebble/blob/master/pebble/pool/process.py#L360-L366

@andresriancho
Copy link
Owner Author

Note on 772f769: The test does break the multiprocessing parser once every 5 runs, on a rainy day, on my computer, and while I'm playing hard-core music 🗡️ . I found no way to reproduce this in 100% of the cases, since it depends on the parent process killing the worker in a specific point in time when the worker is writing to the queue.

@andresriancho andresriancho changed the title Review: Parser pool is not repopulated Parser multiprocessing pool is not repopulated Jun 30, 2017
@andresriancho
Copy link
Owner Author

re: performance, the new implementation from 27c6e25 seems to run the unittests faster: 35sec vs. 45sec.

@andresriancho
Copy link
Owner Author

Running a lot of tests to make sure this change works, but it looks promising! Fixing the build https://circleci.com/gh/andresriancho/w3af/tree/feature%2Fpebble-mp and making sure there are no performance issues

@andresriancho
Copy link
Owner Author

This was fixed with pebble

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant