Hibernation #329

Closed
dogasantos opened this Issue Apr 15, 2013 · 18 comments

Comments

Projects
None yet
4 participants

A "Save State" (or pause with steroids) to preserve current scan state, that make possible to shutdown all operations (poweroff the machine) and resume the task later.
All from webgui.
That's possible?
Thanks for your work, arachni is a great too, my favorite one!

Owner

Zapotek commented Apr 15, 2013

I keep considering this one every now and then but I'm reluctant to implement it, or attempt to. I'm assigning this to the v0.5 milestone since there's gonna be some sizeable rewrite work which could make this easier to add.

But I'd like to make it clear that this won't be a high priority and could end up dropping off of my list.

Thanks for the kind words though, glad you like it.

thanks for your attention Zapotek!
This feature came to me because I'm a pentester, and it's a quite common handle with very large websites (specially with mod rewrite enabled), like 50k pages mapped on arachni.
Using just one rpcd instance, it takes hours to crawl everything.
So if we can pause and resume, we could perform this in 2 days using a simple notebook.

Ofcourse, speedup things can be a better solution for this special issue, or maybe a exclude dirs/pages feature, to select portions of large sites to work. Anyway, a resume could be a nice feature and can help people in many scenarios.

I'm not a regular developer, so i cannot offer my coding skills to contribute, but I can contribute with other things, like some photoshop, webgui improvements (visual), test scenarios, feedbacks, and any way i could.

Ty again!

Owner

Zapotek commented Apr 15, 2013

The closest thing to this right now is that future scans can use the sitemaps of previous scans to avoid having to do a redundant crawl.
This is available via the new WebUI as an option when repeating a scan and via the rescan plugin when using the CLI.

Contributor

user021 commented Apr 15, 2013

Why not use the suspend feature in Vmware, it works for me, not so sure about VirtualBox though.

Indeed i use something similar with macosx (sleep), but still a nice feature that we must think about.

Maybe a SKIP feature can help too. Just pass the crawl step and start the processing phase.
ty

Owner

Zapotek commented Apr 15, 2013

Yeah that's what my last comment was about.

Contributor

user021 commented Apr 29, 2013

Quick reminder : you forgot to implement the Pause option on cli on relase 0.4.2
The only reason why i care about that is because each time when i suspend the vm and the audit is taking place, as i resume i get a few timed out requests.
ps: i wonder if that can be done also on the grid cli

Owner

Zapotek commented Apr 29, 2013

Yeah I skipped because I didn't want to risk introducing a regression at the last minute.

Owner

Zapotek commented Jul 13, 2013

This is what I'm thinking:

Instead of having caches of checked pages/elements/etc spread across lots of modules create a structure to hold that type of data which can be serialized to disk and then restored, bringing the system to a previously saved state.

This will also enable the Grid to scale down (as opposed to only up, as is the case currently) since it will essentially allow for scans (represented by their saved states) to be migrated between machines.

@ghost ghost assigned Zapotek Jul 13, 2013

thats sounds great zapotek!
I was thinking about that days ago...
We got stuck in crawling phase wen the target has lots of pages.
So maybe using sqlite database to store "crawled" data we could solve this, but this could lead by two new troubles: performance and a way to quick restore your session.
The guys from sqlmap project do exacly this trick, but i think this could be slightly different on arachni case.

https://github.com/sqlmapproject/sqlmap/wiki/Usage#load-session-from-a-stored-sqlite-file

Owner

Zapotek commented Jul 13, 2013

I was thinking about SQLite too actually but I've had bad experiences when working with it for the WebUI. On the other hand, crawl and audit operations are single threaded so it could work. It's certainly worth a try.

Owner

Zapotek commented Mar 26, 2014

FYI, hibernation is next on my TODO list and I'll start work on it soon. Got any feedback for me?

great news!
For now, nothing comes to mind.
I'm working on my hardening tool, and I have some concepts that could apply to arachni, but I have to finish my article first to be sure of things.
When i got all data, i'll be in touch!!
ty!!

Owner

Zapotek commented Apr 8, 2014

Closing this as I've implemented the feature, although it doesn't use a DB.
Instead, it dumps the state and data of every Arachni subsystem to disk, as a compressed (zip) archive, on demand.

There are a lot of stuff going on behind the scenes and just keeping a session in a DB which you can just resume from wouldn't work.

The way it works is:

  • Scan is running.
  • You suspend it.
  • A snapshot is dumped to disk as soon as possible - may take a while depending on the workload.
  • The scan exits.

You can then use the new arachni_restore executable to restore a snapshot and the scan resumes from where it left off.

Sound good?

@Zapotek Zapotek closed this Apr 8, 2014

sounds amazing!
thanks Zapotek!

Owner

Zapotek commented Apr 8, 2014

You may be able to do some really cool stuff with it too, like migrating running scans between machines, for whatever reason.

In any case, it's a very nice feature to have, thank you for suggesting it.

champ1 commented Apr 8, 2014

c00l1 mates..:) looking 4ward to som rsync grid scans..:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment