Manually specify paths #93

alexnederlof · 2013-01-02T12:52:15Z

Right now you can only specify elements to click by id or xpath. It would be nice to also allow the Crawler to visit certain URL's. This has the advantage that it's easier to configure then XPath and it might be more stable. It also allows a Crawler to find places that are hidden by click paths.

wesleytsai · 2013-03-19T23:21:02Z

Hi Alex, my ECE310 group would like to work on this issue as our project. Can you expand upon what's meant by certain URL's.

For example, does this imply that you'd like it to scan for direct URLs, say in a blog post or a forum post, and for crawljax to be able to click it?

Can we implement this by scanning the entire HTML for urls, say using the token "http:", and adding it as an new clickable URL object (or simply a string) in Crawljax?

alexnederlof · 2013-03-20T12:05:18Z

Crawljax starts from one seed URL: the URL you give it in the configuration. You can specify that Crawljax clicks certain elements to get to every state you want. However, it may not reach a certain state, that is reachable by a certain URL. For example you observe that is doesn't crawl http://mysite.com/someplace/.

It would be nice if you could configure the crawler to also crawl that URL, and not only the seed URL. It would require an extra builder parameter like config.alsoCrawl(theUrl).

Crawljax already extracts all the URL's from the HTML. That is not the issue here. It could be that the URL we're looking for is only reachable via a certain state that Crawljax can't access.

wesleytsai · 2013-03-27T08:57:01Z

We're trying to understand what would be the most beneficial implementation of this issue.

Should the new URL simply act as another seed, where we crawl the same amount of states and depths as the initial URL. Or, should it be one of the states under the initial URL. Say, we've finished crawling 4 states, and end up at the new given URL?

thc202 · 2016-08-24T10:43:56Z

As users of Crawljax, we might have multiple paths/URLs but they might not be exactly known before hand (e.g. it's provided a regex instead of well-formed URL). [1]

Would be possible as part of this issue allow to control which URLs are considered under the crawl scope?
For example, config.crawlWithScope(new MyScope()); with MyScope implementing an interface that's used to know if a URL/site is valid or not to keep crawling.

If not part of this issue, is this something that you think worth adding? At the moment, we are doing the changes directly into Crawler class to allow any URL to be crawled (which we then allow to pass-through or reject based on custom criteria).

[1] zaproxy/zap-extensions#468

amesbah · 2016-08-24T16:58:27Z

This sounds like a nice addition. Feel free to submit a pull request, which we will include in the next release (if accepted).

ghost assigned alexnederlof Jan 2, 2013

bobbajs mentioned this issue Mar 27, 2013

Implemented issue #93 - "Manually Specify Paths" #197

Closed

nradoicic mentioned this issue Apr 3, 2013

Team L2C4 Project - Issue #93 Fix #224

Closed

wesleytsai mentioned this issue Apr 5, 2013

Allow crawljax to visit external urls specified by alsoCrawl(String url) - Issue #93 #231

Closed

alexnederlof mentioned this issue May 3, 2013

Crawl the Sitemap #247

Open

alexnederlof modified the milestones: 4.0, 3.5 Mar 8, 2014

polybahn mentioned this issue Jul 21, 2016

multiple initial seed URLs #501

Open

thc202 mentioned this issue Aug 24, 2016

spiderAjax: allow choose context/user and set URL zaproxy/zap-extensions#468

Merged

thc202 mentioned this issue Feb 18, 2023

Allow to specify a crawl scope #548

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manually specify paths #93

Manually specify paths #93

alexnederlof commented Jan 2, 2013

wesleytsai commented Mar 19, 2013

alexnederlof commented Mar 20, 2013

wesleytsai commented Mar 27, 2013

thc202 commented Aug 24, 2016 •

edited

amesbah commented Aug 24, 2016

Manually specify paths #93

Manually specify paths #93

Comments

alexnederlof commented Jan 2, 2013

wesleytsai commented Mar 19, 2013

alexnederlof commented Mar 20, 2013

wesleytsai commented Mar 27, 2013

thc202 commented Aug 24, 2016 • edited

amesbah commented Aug 24, 2016

thc202 commented Aug 24, 2016 •

edited