-
-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented issue #93 - "Manually Specify Paths" #197
Conversation
Added simpleExample Added ArrayList<URL> urls to CrawljaxConfiguration
Is it because of new feature implemented? or maybe I forgot to port some code...
crawljax can crawl different URL by calling alsoCrawl(). Needs refactoring.
Before only strings were accepted for crawling additional sites now URLs can be entered too.
Removing print statements that were used for debugging purposes. Removing unused functions
Checks that it is possible to build the CrawljaxController after adding a second url as a url and as a string.
Merging diana-new branch with master
There's one conceptual problem with this solution: you won' see the links between two sites. For example: I specify to crawl |
EECE310 L2A2 Group, implementing Issue #93
Crawljax is now able to crawl a path specified by using alsoCrawl() function in CrawljaxConfigurationBuilder.
Short description regarding the implementation of this feature:
Instead of having one URL to store the seed URL, we replace the member variable with an ArrayList of URLs. By calling alsoCrawl(), new url specified by user will be added to this ArrayList.
WorkQueue is modified not to be final anymore, since it needs to crawl another URL once it is done crawling the seedURL.
Should there be any questions / concerns, please let us know.