Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan stalls with 1 job left #752

Closed
dnyrop opened this issue Aug 1, 2016 · 34 comments
Closed

Scan stalls with 1 job left #752

dnyrop opened this issue Aug 1, 2016 · 34 comments
Assignees
Milestone

Comments

@dnyrop
Copy link

dnyrop commented Aug 1, 2016

I'm getting this error consistently:

[2016-08-01 12:09:38 +0000] [framework/parts/data#pop_page_from_url_queue:147] Giving up trying to audit: https://[domain]/
[2016-08-01 12:09:38 +0000] [framework/parts/data#pop_page_from_url_queue:148] Couldn't get a response after 5 tries: Couldn't connect to server.

Probably because my site don't support https. The odd thing is that the scan stalls with just one job left. While I can stop the scan with kill -2 and get a report, it would be much better if the scan finished properly.

I'm using the latest experimental build btw.

@Zapotek
Copy link
Member

Zapotek commented Aug 1, 2016

That's excellent, I've been hunting this bug for a while.
Could I rerun the scan to see what's going on please?

You can contact me in private with the details.

@Zapotek Zapotek added the Bug label Aug 1, 2016
@Zapotek Zapotek added this to the v1.5 milestone Aug 1, 2016
@Zapotek Zapotek self-assigned this Aug 1, 2016
@dnyrop
Copy link
Author

dnyrop commented Aug 1, 2016

Sure, I'll send the details.

@Zapotek
Copy link
Member

Zapotek commented Aug 1, 2016

Scanning now, thanks a lot.

@Zapotek
Copy link
Member

Zapotek commented Aug 2, 2016

You can now enable the browser_cluster_job_monitor plugin to monitor active jobs and their active HTTP connections.

Enable it like so:

--plugin=browser_cluster_job_monitor:logfile=/tmp/browser_cluster_job_monitor.log

Monitor it like so:

watch -n1 cat /tmp/browser_cluster_job_monitor.log

@Zapotek
Copy link
Member

Zapotek commented Aug 6, 2016

I forgot to reference the issue from the commit message: 3533463

I believe that the issue isn't related to a browser job but with their book-keeping by the BrowserCluster instead.
A miscalculation resulting in the system thinking there are still jobs to be processed, even when there were none, caused the system to wait for the BrowserCluster forever.

Pushing nightlies now and I'll update this issue when they are up.

Zapotek added a commit that referenced this issue Aug 6, 2016
@Zapotek
Copy link
Member

Zapotek commented Aug 6, 2016

Nightlies are up and should include the fix.

@dnyrop
Copy link
Author

dnyrop commented Aug 6, 2016

Cool, I'll give it a test run right away and get back to you.

@dnyrop
Copy link
Author

dnyrop commented Aug 6, 2016

I'm afraid this didn't fix it. The scan is still stalling. I'll send you the debug output directly.

@dnyrop
Copy link
Author

dnyrop commented Aug 7, 2016

The build from 20160806 is completing scans properly.
I'll do some more tests, to verify the results.

@Zapotek
Copy link
Member

Zapotek commented Aug 7, 2016

I updated the browser cluster workers to use a new proxy instance when they respawn the PhantomJS process instead of using the old one, if something fixed it that was it.

Please do run a few more scans to make sure it works.

Cheers

PS. I'm pushing fresh nightlies now with more debugging messages in case the issue wasn't fixed after all.

@Zapotek
Copy link
Member

Zapotek commented Aug 7, 2016

Nightlies are up, please lest me know how they do.

@Zapotek
Copy link
Member

Zapotek commented Aug 8, 2016

Fix was verified in private, closing. :)

@Zapotek Zapotek closed this as completed Aug 8, 2016
@dnyrop
Copy link
Author

dnyrop commented Aug 22, 2016

Hi Tasos

Although i scanned my site successfully with the 20160806 build, I'm getting stalled scans again with the 20160808 build.

I'll be more than happy to assist you in tracking this down on my current Debian setup, but I was wandering what OS and ruby version you are using, since you don't experience the problems ?

@Zapotek Zapotek reopened this Aug 22, 2016
@Zapotek
Copy link
Member

Zapotek commented Aug 22, 2016

Is this reproducible?
Does it consistently happen with one build and not the other?

I'm using Ubuntu 16.04 with Ruby 2.3.1 (also tried with the same Ruby version as the packages).

@dnyrop
Copy link
Author

dnyrop commented Aug 22, 2016

The bug is persistent on the 20160808 build, I only ran one scan on the 20160806 build but that worked.

I'm running with 6 scan workers now instead of 10, but that should contain the favor of completing the scan.

Thanks for the system info, I'll make a similar setup and see if I can make it work.

@Zapotek
Copy link
Member

Zapotek commented Aug 22, 2016

If it only happens with the latest build yet not the older one, I have an idea of what it may be, otherwise we're back to square one.

@dnyrop
Copy link
Author

dnyrop commented Aug 22, 2016

It's happened on all 7-8 scans I've done with the 20160808 build, and not for the one scan I did with 20160806. So I would say your idea is worth a shot :)

@Zapotek
Copy link
Member

Zapotek commented Aug 22, 2016

Hm, can you give the older one a few more tries?
It takes a while to prepare new nightlies and this would help speed things along.

@dnyrop
Copy link
Author

dnyrop commented Aug 22, 2016

Sure, I'll do that and get back to you.

@dnyrop
Copy link
Author

dnyrop commented Aug 22, 2016

It turns out that I didn't have the 20160806 build, but the 20160805 build just completed a scan with no problems.

@Zapotek
Copy link
Member

Zapotek commented Aug 23, 2016

The commit that fixed the issue originally was made on the 6th of the month, not the 5th.
Looking into it.

@dnyrop
Copy link
Author

dnyrop commented Aug 23, 2016

Thats odd, let me know what you find.

@Zapotek
Copy link
Member

Zapotek commented Aug 25, 2016

I found a way to debug this that can help:

  1. Install the debugging tool:

    ./arachni/bin/arachni_shell -c 'gem install newrelic_rpm'
    
  2. Run the scan and wait for it to get stuck.

  3. Gather the log:

    sudo ./arachni/bin/arachni_shell -c 'nrdebug [SCAN PID]'
    

@Zapotek
Copy link
Member

Zapotek commented Aug 25, 2016

I found a way to reproduce the issue and it turns out there's still something wrong with the book-keeping of pending jobs, I hadn't fixed that after all.
Even thought all jobs are done the system thinks there are more.

@Zapotek
Copy link
Member

Zapotek commented Aug 26, 2016

Nightlies are up please give them a try and let me know how they do.

@dnyrop
Copy link
Author

dnyrop commented Aug 26, 2016

Cool, I will give those a shot.

Btw, I noticed that the 20160808 build has a tendency to leave browser cluster processes running, even after new ones have been started, so over a long scan the number of processes builds up quite a lot. This is not the case with the 20160805 build.
This could be what causes the stalling.

@Zapotek
Copy link
Member

Zapotek commented Aug 26, 2016

The current nightlies have some big changes in the way browsers are handled so it's possible that they fix the issue; I just checked and I couldn't reproduce it.

@dnyrop
Copy link
Author

dnyrop commented Sep 13, 2016

Sorry for the late reply.
I've run some test with the 20160828 build, so far no problems.
I will test with the latest build as well to make sure, and get back to you.

@Zapotek
Copy link
Member

Zapotek commented Sep 15, 2016

Please do, thank you.

@Zapotek
Copy link
Member

Zapotek commented Oct 5, 2016

Hello,

Any update on this? Did the issue go away?

@dnyrop
Copy link
Author

dnyrop commented Oct 5, 2016

I've been trying to reproduce the issue the last few days on the current build, but the scans are running for days, creating a lot more browser jobs than usual.
I will investigate further and get back within this week.

@Zapotek
Copy link
Member

Zapotek commented Oct 5, 2016

Checking it out now, thanks for the feedback.

@dnyrop
Copy link
Author

dnyrop commented Oct 6, 2016

I can confirm this issue as resolved.

I did find another inconsistency unrelated to this, so I'm closing this issue, but can you hold the stable release a few days ?

@dnyrop dnyrop closed this as completed Oct 6, 2016
@Zapotek
Copy link
Member

Zapotek commented Oct 6, 2016

Sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants