Every repository with this icon (
Every repository with this icon (
| Description: | a big hairy fuzzy spider that crawls your site, wreaking havoc edit |
-
8 comments Created 6 months ago by relevancefeaturexTarantula deletes records and then complains about 404s when trying to access those recordsfixxTarantula is complaining about 404 errors in a vanilla application built using Rails scaffolding for a single ActiveRecord model. It appears that Tarantula is performing actions an incorrect order. It appears that Tarantula is first deleting a record and then attempting to access the ’show’ page for that record. The application correctly responds with a 404 error (since the record no longer exists), but Tarantula reports the 404 error as a test failure.
Compare this behavior to the way a user would interact with the app via a browser: Once a user deletes a record, the app redisplays the index page and does not show a link to the deleted record. Therefore, there’s no link that the user could follow to produce a 404 error.
How to Repeat
- Create a simple scaffolded Rails app (like the one attached to this ticket).
- Generate the default Tarantula test (using rake tarantula:setup)
- Run the Tarantula test
- Observe test failure
Test Output
[store] rake tarantula:test VERBOSE=true Loaded suite /Users/jason/.gem/ruby/1.8/gems/rake-0.8.3/lib/rake/rake_test_loader Started Response 200 for <Relevance::Tarantula::Link href=/, method=get> ... Response 302 for <Relevance::Tarantula::Link href=/products/996332877, method=delete> Response 404 for <Relevance::Tarantula::Link href=/products/996332877/edit, method=get> Response 404 for <Relevance::Tarantula::Link href=/products/996332877, method=get> Response 302 for <Relevance::Tarantula::Link href=/products/953125641, method=delete> Response 404 for <Relevance::Tarantula::Link href=/products/953125641/edit, method=get> Response 404 for <Relevance::Tarantula::Link href=/products/953125641, method=get> Response 302 for /products post {"commit"=>-2303, "product[price]"=>-3795, "product[name]"=>-4472} ... ****** FAILURES 404: /products/996332877/edit 404: /products/996332877 404: /products/953125641/edit 404: /products/953125641 E Finished in 1.266037 seconds.This ticket has 1 attachment(s).
Comments
- Create a simple scaffolded Rails app (like the one attached to this ticket).
-
3 comments Created 6 months ago by relevanceRequest to add support and documentation to run with Rspec.suggestionxIt would be nice to have Tarantula work with Rspec as well as documentation on how to implement an example specification.
This ticket has 0 attachment(s).
Comments
Request to add support and documentation to run with Rspec.
For now, you can still run it with the rake tarantula:test command even if you are using rspec.
What needs to be done to make this run with all the specs I need to get Cucumber integration hacked in. I am going to get together with Chad and see if we can get this working.
by Aaron Bedra
Request to add support and documentation to run with Rspec.
by Aaron Bedra
-
Tarantula should keep track of all views that are rendered then produce a report that shows what views were hit and which ones weren’t
This ticket has 0 attachment(s).
Comments
-
The way tests configure the Tarantula crawler is way too Java-like. It should be more declarative and Rubyish. This will make Tarantula easier to use and document, and also easier to extend and enhance. (The fact that the refactoring described in #8 causes an incompatible change to the Tarantula API is due to the low level of abstraction of the current API.)
Comments
Here are some examples of what a revised API might look like.
This is not a promise to implement all of these features. I think all of these feature ideas would be useful, but I already know some of them will be very costly and difficult to implement. I'm including them all here because I want to come up with an API syntax and style that will accommodate a lot of different things.
I don't yet know whether I'll maintain the current association with test cases; it seems that it might be better to have a standalone
tarantula_config.rbfile or something like that, with a custom Tarantula runner that doesn't depend on test/unit or RSpec.Basic crawl, starting from '/'
Tarantula.crawlBasic crawl, starting from '/' and '/admin'
Tarantula.crawl('both') do |t| t.root_page '/' t.root_page '/admin' endCrawl with the Tidy handler
# the operand to the crawl method, if supplied, will be used # as the tab label in the report. Tarantula.crawl("tidy") do |t| t.add_handler :tidy endReorder requests on the queue
This is necessary to fix this bug
Tarantula.crawl do |t| # Treat the following controllers as "resourceful", # reordering appropriately (see my comment on # <http://github.com/relevance/tarantula/issues#issue/3>) t.resources 'post', 'comment' # For the 'news' controller, order the actions this way: t.reorder_for 'news', :actions => %w{show read unread mark_read} # For the 'history' controller, supply a comparison function: t.reorder_for 'history', :compare => lambda{|x, y| ... } end(Unlike most of the declarations in this example document, these will need to be reusable across multiple crawl blocks somehow.)
Selectively allowing errors
Tarantula.crawl("ignoring not-found users") do |t| t.allow_errors :not_found, %r{/users/\d+/} # or t.allow_errors :not_found, :controller => 'users', :action => 'show' endAttacks
Tarantula.crawl("attacks") do |t| t.attack :xss, :input => "<script>gotcha!</script>", :output => :input t.attack :sql_injection, :input => "a'; DROP TABLE posts;" t.times_to_crawl 2 endWe should have prepackaged attack suites that understand various techniques.
Tarantula.crawl("xss suite") do |t| t.attack :xss, :suite => 'standard' end Tarantula.crawl("sql injection suite") do |t| t.attack :sql_injection, :suite => 'standard' endTimeout
Tarantula.crawl do |t| t.times_to_crawl 2 t.stop_after 2.minutes endFuzzing
Tarantula.crawl do |t| # :valid input uses SQL types and knowledge of model validations # to attempt to generate valid input. You can override the defaults. t.fuzz_with :valid_input do |f| f.fuzz(Post, :title) { random_string(1..40) } f.fuzz(Person, :phone) { random_string("%03d-%03d-%04d") } end # The point of fuzzing is to keep trying a lot of things to # see if you can find breakage. t.crawl_for 45.minutes end Tarantula.crawl do |t| # :typed_input uses SQL types to generate "reasonable" but probably # invalid input (e.g., numeric fields will get strings of digits, # but they'll be too large or negative; date fields will get dates, # but very far in the past or future; string fields will get very # large strings.) t.fuzz_with :typed_input t.crawl_for 30.minutes end Tarantula.crawl do |t| # :random_input just plugs in random strings everywhere. t.fuzz_with :random_input t.crawl_for 2.hours endAs of commit a99e859, a very basic version of this is in place on the 'dslify' branch.
There is a new command,
tarantula, that works like this:$ tarantula <filenames>It processes all the files, runs the crawls as directed, and exits with a nonzero status if there were any problems. (A rake task wrapper is not in place yet.)
The configuration language is very simple. Here's an example:
fixtures :all crawl('anonymous') crawl('admin') do |t| t.post '/session', :login => 'admin', :password => 'admin' t.follow_redirect! t.root_page '/admin' endUnder the covers, things are still based on test/unit, to piggy-back on the Rails integration test support. The configuration object passed into the crawl block is a testcase, so that things like
postandfollow_redirect!can still work. The particulars of that will probably change, but I expect the basic idea to remain; part of the strength of Tarantula is that it works within the app using the integration testing interface, and there's no sense rewriting all of that to be independent of test/unit. The goal of this new interface is to get all of that out of the developers' faces, rather than to eliminate it altogether.At the moment, some things are hacked together, it won't work under Ruby 1.9, and test coverage is weak. I plan to fix those problems first, and then start fleshing out the API.
-
Retrieving /favicon.ico always yields a 404, even though the file exists in the public dir. The file (and all files with the .ico extension) should be supported in the same way as other image and text formats.
(I sent a pull request w/ a patch for this a few minutes ago).
Comments
-
2 comments Created 2 months ago by glvAdd basic support for detecting coverage regressionsfeaturexWe just encountered a case where a change to the app's code caused a lot of the app to be closed off to Tarantula; given the privileges with which it was logging in for the crawl, it suddenly could only see about 20% of the app. I happened to notice, but it would've been easy to miss. Tarantula needs to help detect that case.
The "right way" to do this is by integrating rcov and reporting the percentage of code covered by the crawl (and optionally failing the build when coverage is too low). But that would also slow down the crawl a lot, so a simpler alternative should also be provided. We should be able to tell Tarantula "expect to crawl at least n pages" and have Tarantula fail the build if fewer pages are found (and also prompt users to raise the threshold when the actual number is too far above that number).
Comments
What about doing a simple file-system scan to list all view files under the app/views directory, and then we could report on how many of these were rendered during the run? We'd have to alias_method_chain render (which is non-trivial), but unit-controller is a good example of that.
-
I keep getting invalid html errors on pages that are rendering successfully, in fact on EVERY path. Here's what the data result is on the report:
uninitialized constant HTML::Document
Sure enough is I head into invalid_html_handler.rb and comment out lines 8-13 it runs without issue. (And presumably without checking any html syntax.)
Couldn't find the reason why HTML::Document wasn't being loaded....ideas?
- Mark McSpadden
Comments












Tarantula deletes records and then complains about 404s when trying to access those records
If you modify the index.html.erb file and change the delete link_to to a button_to, everything works fine.
Theoretically, your links shouldn’t be pointing to deletes, but rather, they should be behind forms, that way web-crawlers don’t hit things and delete them for you, so button_to is probably more appropriate, but the scaffold does generate the link_to as a default.
Probably something that should be fixed, because it won’t occur to most people to not use link_to for deletes, but if you’re following good practices, it shouldn’t be an issue.
by Kevin Gisi
Now that I've got forms and links unified onto a single crawl queue, we should be able to reorder things while on the queue. I need to replace the current queue (a simple Array) with a priority queue, but once that's done, here's the plan.
In FormSubmission, I can use the following line of code to learn the controller, action, and other parameters for a form action:
For a given controller, I need to sort actions this way:
newcreateindexeditupdateshowdeleteOf course, in most apps
indexwill probably have to be crawled first, before any of the others are even seen, and then after doing a create I may not even see theshow,edit, ordeleteactions for the newly created object unless I visitindexagain. So I may need to add some smarts to add index to the queue again. This will require some experimentation.any words on this? I'm experiencing the same using 0.3.3 . thanks
No rock-solid solution just yet. There's a workaround you can consider using, but it's admittedly less than ideal.
I hope this helps in the meantime.
Cheers,
Jason
I'll try the suggestion but defeats the purpose of tarantula a bit, allowing 404s to hide the real problem in medium complex application.
Anyway I changed all my link_to deletes to button_to but still no dice, Tarantula is acting this way again. It should be noted that the delete button is only on the /edit object form view and doesn't exists in index pages.
sorry for spamming, however I have to reconsider this is as a semi-feature.
In some forms I have cascade selects that relies on data from the controller, this data is tied to an object that the controller assumes to exists, so scoped like obj.obj1.collection; if obj1 or obj are deleted before showing this view an exception is raised, because probably the object wasn't meant to be destroyed in first place or because a validation elsewhere isn't doing its job.
So when Tarantula catches it it's a good idea to go back and revise model's destroy policy and fix your code accordingly.
If Tarantula didn't destroy the object prior in showing this controller's view the problem may not have been discovered, unless some other tests were in place.
All in all I can live with a tons of 404s in the report.
I just want to file an update on this.
It was apparent early on that fixing this problem would require some major changes in Tarantula---a big internal refactoring and a change to the configuration interface, for starters. So while I can't say a fix for this is imminent, I can say that the groundwork has mostly been laid. I'm preparing a 0.4 release that includes a new configuration interface and an overhaul of how Tarantula keeps track of the crawl in progress. (If you want a taste of what the new config interface looks like, check #9.)
I hope that release will go out this week, and then the next order of business is to tackle this bug. (And don't worry, it'll still be possible to run crawls the same way they work now, so you can still find the problems you just mentioned.)
I kind of took things a different direction over on the garlandgroup fork. I've added read_only and non_destructive attributes to the Crawler object. This tells the crawler to 1) skip all non get methods when read_only is true OR 2) skip delete methods when non_destructive is true.
t = tarantula_crawler(self)
t.read_only = true # or maybe you prefer: t.non_destructive = true
t.crawl "/"
It's definitely a flawed approached, and I still think the reordering is the right long term solution, but it's getting me closer to where I want to be with tarantula running as part of our test suite.
(To me it's much more digestible while setting up this suite to start with just the read onlys, then move on to non destructive, then finally the whole enchilada.)