Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommended ceiling on the number of the monitored domains? #11

Closed
wrinkl3 opened this issue Sep 20, 2016 · 1 comment
Closed

Recommended ceiling on the number of the monitored domains? #11

wrinkl3 opened this issue Sep 20, 2016 · 1 comment

Comments

@wrinkl3
Copy link

wrinkl3 commented Sep 20, 2016

My current project might involve monitoring around 1200 small-to-medium sized domains. Other than the database size, are there any bottlenecks I should consider?

@jasheppa5
Copy link
Contributor

Hi Alex,

Consider two things:

  • Baselining/tuning false positives. Extraneous and legitimate hidden
    elements will show up in your results. Malspider uses an alexa list to tune
    most of these out. Other FPs should be tuned out in the Admin panel under
    "Custom Whitelist" - substrings are accepted. I personally prefer to load
    domains in smaller chunks, vet the alerts, and then whitelist what I need
    to. I quickly ran a scan against 300 domains and needed to create 13
    whitelist entries. It didn't take much time.

  • of pages to scan beyond the homepage. Malspider, by default, scans 20

    pages beyond the homepage. This was a feature added last month after
    popular demand. The PAGES_PER_DOMAIN variable can be set to whatever you
    feel is best (and can scan an entire domain), but I think having a limit
    like 20 prevents bottlenecks. It also protects you against cases where
    phantomjs may hang - this seems to be a common problem among people using
    phantomjs to do a lot of crawling. In my research, crawling more than 20
    pages beyond the homepage had no benefit. It also limits your footprint.
    The only time I would crawl a full domain is if I was scanning my orgs web
    presence or if i was intentionally monitoring client domains or
    something... basically non-research purposes.

I test with about 1100 domains and use a proxy service to hide the origin
of my traffic. On my home internet connection I was able to scan all 1100
domains (20 pages beyond the home page for each domain) in about 90min. 6GB
of data was stored in the database. Scanning significantly more domains (or
pages per domain) is certainly possible in a 24 hour period.

PS - A new version will be coming out very soon. The new version will
support yara signatures and immediate page analysis (instead of
post-processing data).

Thanks,
James

On Tue, Sep 20, 2016 at 8:32 AM, Alex Shatberashvili <
notifications@github.com> wrote:

My current project might involve monitoring around 1200 small-to-medium
sized domains. Other than the database size, are there any bottlenecks I
should consider?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#11, or mute the thread
https://github.com/notifications/unsubscribe-auth/AR0QEJDq3VF0Z1rmPV3QG0icjKKe0fbHks5qr9JRgaJpZM4KBkUP
.

@wrinkl3 wrinkl3 closed this as completed Oct 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants