New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wivet patches #72

Merged
merged 3 commits into from Dec 16, 2012

Conversation

Projects
None yet
4 participants
@amesbah
Member

amesbah commented Dec 15, 2012

  • cleaned up WIVET patch submitted by @guifre
  • also fixes issue #58
WIVET enhancements
-Added support for html frame tags.
-Included support for inspecting code within frame tags.
-Added support for crawling meta refresh tags.
-Added a new optional specification to carry out deeper analyses.
-Fixed bug causing crawlax not to update candidates after running the
prestatcrawling plugin.
-Created a new test class that targets the wivet benchmark .

For detailed information refer to the patch files.

We used a set of frameworks to validate the results of our
improvements. WIVET is the most noticeable one, it is widely used to
test crawlers. If you run the WivetTest class, you will see if crawls
up to a 74% of the site. In the trunk version of crawljax it is only
able to crawl betwen 0% and 10% depending on the targeted node (due to
the issues fixed in our patches).
@alexnederlof

This comment has been minimized.

Show comment
Hide comment
@alexnederlof

alexnederlof Dec 9, 2012

Member

I imported these patches from @guifre . However the main method he provided doesn't seem to work:

Caused by: org.xml.sax.SAXNotRecognizedException: Feature 'http://cyberneko.org/html/features/scanner/allow-selfclosing-iframe' is not recognized.

Member

alexnederlof commented on 256727f Dec 9, 2012

I imported these patches from @guifre . However the main method he provided doesn't seem to work:

Caused by: org.xml.sax.SAXNotRecognizedException: Feature 'http://cyberneko.org/html/features/scanner/allow-selfclosing-iframe' is not recognized.

This comment has been minimized.

Show comment
Hide comment
@amesbah

amesbah Dec 9, 2012

Member

I confirm Arie's finding. Changing the browser to Firefox resolves the issue.

There is a newer version of WIVET (v3) available: http://code.google.com/p/wivet/

Would it be an idea to include the WIVET page in the embedded jetty server and write a test case with proper assertions (on the number of states, coverage statistics that WIVET provides)?

Member

amesbah replied Dec 9, 2012

I confirm Arie's finding. Changing the browser to Firefox resolves the issue.

There is a newer version of WIVET (v3) available: http://code.google.com/p/wivet/

Would it be an idea to include the WIVET page in the embedded jetty server and write a test case with proper assertions (on the number of states, coverage statistics that WIVET provides)?

This comment has been minimized.

Show comment
Hide comment
@avandeursen

avandeursen Dec 15, 2012

Member

@alexnederlof and I further discussed wivet yesterday at TU Delft.

  1. To me it seems that the present very nice contribution by @guifre can be merged (after changing to firefox).
  2. We may want to report the htmlunit issue
  3. The site hosting an instance of wivet v3 seems unavailable quite often -- embedding it in the jetty sounds useful.
  4. I really like the idea of enriching the wivet3 test cases with proper assertions
  5. The wivet3 test cases should be part of the 'largetests' -- crawljax would benefit from a good separation between a unit test suite that is instantaneous and a (slower) integration test suite.

If we agree we can turn the above list into separate issues and go for it.

Member

avandeursen replied Dec 15, 2012

@alexnederlof and I further discussed wivet yesterday at TU Delft.

  1. To me it seems that the present very nice contribution by @guifre can be merged (after changing to firefox).
  2. We may want to report the htmlunit issue
  3. The site hosting an instance of wivet v3 seems unavailable quite often -- embedding it in the jetty sounds useful.
  4. I really like the idea of enriching the wivet3 test cases with proper assertions
  5. The wivet3 test cases should be part of the 'largetests' -- crawljax would benefit from a good separation between a unit test suite that is instantaneous and a (slower) integration test suite.

If we agree we can turn the above list into separate issues and go for it.

This comment has been minimized.

Show comment
Hide comment
@amesbah

amesbah Dec 15, 2012

Member

Great. Agreed on all items.

Item 5 is really needed to speed up the regular unit testing time. As a developer, you don't want to wait 5 minutes (give or take) every time the test suite is run.

Member

amesbah replied Dec 15, 2012

Great. Agreed on all items.

Item 5 is really needed to speed up the regular unit testing time. As a developer, you don't want to wait 5 minutes (give or take) every time the test suite is run.

@avandeursen

This comment has been minimized.

Show comment
Hide comment
@avandeursen

avandeursen Dec 9, 2012

Member

Could the SAXNotRecognizedException be a htmlunit issue?
If I change the browser type to firefox things start working for me.

Could the SAXNotRecognizedException be a htmlunit issue?
If I change the browser type to firefox things start working for me.

@avandeursen

This comment has been minimized.

Show comment
Hide comment
@avandeursen

avandeursen Dec 9, 2012

Member

Perhaps these dontClicks are a bit outdated: The current wivet site has url's like

http://caos.uab.es/~gruiz/test/wivet/offscanpages/statistics.php

that should not be scanned (anything containing offscanpages).

Perhaps these dontClicks are a bit outdated: The current wivet site has url's like

http://caos.uab.es/~gruiz/test/wivet/offscanpages/statistics.php

that should not be scanned (anything containing offscanpages).

@avandeursen

This comment has been minimized.

Show comment
Hide comment
@avandeursen

avandeursen Dec 9, 2012

Member

Indentation

Indentation

@avandeursen

This comment has been minimized.

Show comment
Hide comment
@avandeursen

avandeursen Dec 14, 2012

Member

This is wivet v2; v3 is available too.

This is wivet v2; v3 is available too.

@ghost ghost assigned avandeursen Dec 15, 2012

alexnederlof added a commit that referenced this pull request Dec 16, 2012

@alexnederlof alexnederlof merged commit 6010409 into master Dec 16, 2012

1 check passed

default The Travis build passed
Details

@amesbah amesbah referenced this pull request Dec 19, 2012

Closed

Frames and Iframes #3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment