RSpec2 upgrade, faster :truncation strategy, spec/support/ design to test real db interactions (for AR), already working travis setup. (mainly addresses #126) #127

Closed
wants to merge 6 commits into from

3 participants

@stanislaw

1) 'Shared example group 'a generic strategy' already exists'.

See https://www.relishapp.com/rspec/rspec-core/docs/example-groups/shared-examples

If shared examples are in a separate file, it should be named *.rb, not
_spec.rb. This was the problem of double inclusion.

2) I did an upgrade to RSpec2.

3 of 4 specs are failing because of I don't have MongoDB, and right now
it is not an easy task to install it because of messy dependendencies my
Gentoo version has.

About the fourth failing spec - configuration_spec#210 - it seems to me that its flow is broken. Please, check carefully what you test there.

3) lib/database_cleaner/active_record/truncation.rb.

See the pieces of code for MySQL and MySQL2 and PostgreSQL adapters: they look very
similar at first glance, but notice the difference how 'mysql' and
'mysql2' gem handle parsing of results - each gem does it its own way. PostgreSQL fast truncation methods in their turn operate on groups of tables, while MySQL and MySQL2 operate each on one table. By exposing this, I just mean, that it is hard to write some common code they all reuse. I think it is up to you to decide, whether fast truncation methods deserve their own module in fx:
lib/database_cleaner/active_record/fast_truncation.rb, like

module DatabaseCleaner
  module ActiveRecord
    module FastTruncation
      module MySQL
      module MySQL2
      module PostgreSQL
      # ...

4) I made additional option :fast. Let's take some time for testing fast
truncation among the users which will like to try it. Also, it is good to have
it as an option, to just compare performance thus timing advantage, when
running large test suites! After you gather feedback from developers,
who succeeded using it, you can easily remove :fast option, making the
fast strategy used by default.

5) Travis

You need:

Go on http://travis-ci.org/ - an sign in there with your github account,
then visit your 'Profile' page, then toggle into on your database_cleaner
repository.

Then go to the 'home' page, 'My repositories' tab there and see if
database_cleaner appeared to begin doing builds.

Also here: https://github.com/bmabey/database_cleaner/hooks you will
see that travis hook activated - click on it - then to press "Test hook"
to begin doing first build.

Remember, that I've already added status image that points to database_cleaner
on travis-ci.org. It is not displayed before you enable database_cleaner on
travis.

.travis.yml - contains "1.9.3" version commented now, I'm sure specs should all pass too, it is 'linecache' gem which refuses to compile on Travis workers having Ubuntu. We need to check it, and all will be fine with 1.9.3.

@bmabey

Thanks @stanislaw! I've merged in the RSpec2 upgrade already and have been going over the fast truncation stuff now.

In DatabaseCleaner's README I mention that the :deletion strategy has been reported to be faster than truncate in certain circumstances. I decided to update your benchmark to use DELETE to see how it compares. These are the results I get with no records:

Truncate non-empty tables (AUTO_INCREMENT ensured)
  0.010000   0.010000   0.020000 (  0.028967)
Truncate non-empty tables (AUTO_INCREMENT is not ensured)
  0.010000   0.000000   0.010000 (  0.010334)
Truncate all tables one by one:
  0.000000   0.000000   0.000000 (  0.035987)
Truncate all tables with DatabaseCleaner:
  0.010000   0.000000   0.010000 (  0.036416)
Delete all tables one by one:
  0.000000   0.000000   0.000000 (  0.007189)
Delete non-empty tables one by one:
  0.000000   0.000000   0.000000 (  0.009258)

These are the results I get with 100 records:

Truncate non-empty tables (AUTO_INCREMENT ensured) 
  0.010000   0.000000   0.010000 (  0.042821)
Truncate non-empty tables (AUTO_INCREMENT is not ensured)
  0.010000   0.010000   0.020000 (  0.043030)
Truncate all tables one by one:
  0.000000   0.000000   0.000000 (  0.034902)
Truncate all tables with DatabaseCleaner:
  0.010000   0.000000   0.010000 (  0.085237)
Delete all tables one by one:
  0.000000   0.000000   0.000000 (  0.058040)
Delete non-empty tables one by one:
  0.020000   0.010000   0.030000 (  0.078900)

Here is running the same benchmark above but doing it 10 times for each method:

Truncate non-empty tables (AUTO_INCREMENT ensured)
  0.260000   0.040000   0.300000 (  1.164267)
Truncate non-empty tables (AUTO_INCREMENT is not ensured)
  0.300000   0.040000   0.340000 (  2.160942)
Truncate all tables one by one:
  0.180000   0.030000   0.210000 (  1.510205)
Truncate all tables with DatabaseCleaner:
  0.190000   0.020000   0.210000 (  1.137671)
Delete all tables one by one:
  0.200000   0.030000   0.230000 (  1.612719)
Delete non-empty tables one by one:
  0.250000   0.040000   0.290000 (  1.585001)

(I should really be taking the average and standard deviation of each run and plotting them, but I don't have the time to do that now. I am skeptical of this benchmark in general since I do see a good amount of variance. If I get more time I'll improve on it.)

The benchmark isn't perfect but the general trends I've seen is that:

  • When you have a lot of tables that haven't been modified under a test then the fastest approach is to use DELETE (or the :deletion strategy in DatabaseCleaner).
  • When all your tables are populated truncating all the tables (without any checks) is the fastest approach.

In real life situations you will end up with a mix of the above. That is why for some people DELETE is faster for them, while for others TRUNCATE is faster. It all depends on the test suite. That said, I would bet that on average DELETE would be the better option and I should probably update the README to reflect this finding.

Have you tried using the :deletion strategy on your test suite? I suspect that it would outperform your fast truncate. Give it a try and let me know how it goes.

@stanislaw

I push improved procedure in https://github.com/stanislaw/truncate-vs-count, please run it to see results yourself. Now I'm going to write expanded answer about your comment.

@stanislaw

MySQL: DELETE wins only in the case if it operates on empty tables. This is very specific situation, so when building a strategy covering all possible test cases you cannot rely on just DELETE without SELECT EXISTS check, if you want a win for all setups of N & NUM_RECORDS. SELECT EXISTS CHECKS - are very fast (less than 1ms), so you can improve :delete strategy for both MySQL and PostgreSQL with fast option exactly using the code you've added to mysql code in truncate_vs_count repository (I've added lightly modified version using DELETE FROM for PG as well).

PostgreSQL: DELETE operates AMAZINGLY fast, faster than TRUNCATE. It even outperforms fast truncation - it is very big difference. But! it is only in PG. In MySQL DELETE DOESN'T behave SUCH FAST!

But the most important point, that your comment actually is out of scope of my pull request addresses. I was acting in the scope of :truncation strategy DatabaseCleaner has, so I think my PR code could be merged in :truncation strategy without any change.

But if you want the best performance possible, we could think of kind of :mixed strategy for DatabaseCleaner, having TRUNCATE and DELETE in appropriate places. But this not an easy task to guess the best option working for all versions of PG and MySQL and do a test covering all possible cases.

@stanislaw

General conclusion I make is:
Truncation strategy works its best when :fast is enabled (code from my PR) for both MySQL and PG.
Deletion strategy works the best as it is for PostgreSQL and with DELETE FROM on empty tables for MySQL.
I thing this not about creating :mixed strategy, but just using :fast truncation for MySQL and :deletion strategy for PG (actually non-empty deletion does not improve deletion strategy for PG somehow seriously, so we could just drop the idea of possible adding :fast option to deletion strategy for any adapters at all).

What do you think?

@stanislaw

I've run more different setups of N, NUM_RECORDS - so generalizing it even more:

Best strategy for MySQL is fast :truncation,
for PG - :deletion with checking empty tables.

I am waiting for your results.

@stanislaw

And the last: for MySQL fast :deletion is faster in general than just :deletion.

@stanislaw

I did real testing of :fast truncation strategy applied to my rather large project using MySQL and having large number of tables.

I did the following setup of my env.rb file:

# Null strategy to force Cucumber to not interact with DatabaseCleaner
# in any possible ways.
# For example, Cucumber's default :truncation js strategy 
# is hardcoded to use DatabaseCleaner in old non-fast way.
class NullStrategy < Cucumber::Rails::Database::Strategy
  def before_js
  end

  def before_non_js
  end
end

Cucumber::Rails::Database.javascript_strategy = NullStrategy

AfterConfiguration do
   DatabaseCleaner.clean_with :truncation, { :fast => true, :reset_ids => false }
end

After do
  DatabaseCleaner.clean_with :truncation, { :fast => true, :reset_ids => false }
end

It works AMAZINGLY FAST without any problems. I am going to use my fork until the moment when you will merge :fast truncation so that I hope very much.

Also I can suggest myself doing analogous commit for fast :deletion with counts strategy, if you like.

@bmabey

Would you mind sharing more information about your particular app? For example, how many tables exist, how many tables on average does a single scenario populate.. Also, it would be interesting to see the time difference between :truncation, :truncation, { :fast => true, :reset_ids => false } and :deletion so I have an idea of the kind of gains other people may expect to see.

@bmabey

BTW, here are the results from my computer (MySQL)...

With no records:

Truncate non-empty tables (AUTO_INCREMENT ensured)
  0.010000   0.000000   0.010000 (  0.030524)
Truncate non-empty tables (AUTO_INCREMENT is not ensured)
  0.000000   0.000000   0.000000 (  0.009543)
Truncate all tables one by one:
  0.000000   0.000000   0.000000 (  0.031045)
Truncate all tables with DatabaseCleaner:
  0.010000   0.000000   0.010000 (  0.031126)
Delete all tables one by one:
  0.010000   0.000000   0.010000 (  0.007151)
Delete non-empty tables one by one:
  0.000000   0.000000   0.000000 (  0.008464)

With 100 records in each of the 30 tables:

Truncate non-empty tables (AUTO_INCREMENT ensured)
  0.010000   0.010000   0.020000 (  0.062481)
Truncate non-empty tables (AUTO_INCREMENT is not ensured)
  0.010000   0.010000   0.020000 (  0.041582)
Truncate all tables one by one:
  0.010000   0.000000   0.010000 (  0.032512)
Truncate all tables with DatabaseCleaner:
  0.010000   0.000000   0.010000 (  0.051319)
Delete all tables one by one:
  0.010000   0.010000   0.020000 (  0.058828)
Delete non-empty tables one by one:
  0.010000   0.000000   0.010000 (  0.099430)

And because I was curious, here is when only 5 of the 30 tables are populated with 100 records (since often time in tests you may not touch the entire system):

Truncate non-empty tables (AUTO_INCREMENT ensured)
  0.010000   0.000000   0.010000 (  0.048525)
Truncate non-empty tables (AUTO_INCREMENT is not ensured)
  0.010000   0.000000   0.010000 (  0.014333)
Truncate all tables one by one:
  0.000000   0.000000   0.000000 (  0.030471)
Truncate all tables with DatabaseCleaner:
  0.000000   0.000000   0.000000 (  0.031658)
Delete all tables one by one:
  0.000000   0.000000   0.000000 (  0.014889)
Delete non-empty tables one by one:
  0.000000   0.000000   0.000000 (  0.018202)
@stanislaw

I wonder, how DELETE can perform so fast on your machine. I've NEVER seen, it behaves such fast on two mine machines. I use MySQL 5.1.62-r1 on Gentoo Linux.

@bmabey

I'm using 5.1.49 on OSx Lion, on a Macbook Pro with a 2.53 GHz Intel Core i5.

@stanislaw

Would you mind sharing more information about your particular app?

Sure!

I have Rails 3.2.6 app, 46 tables total. MySQL and 'mysql2' gem.

My test suite has 74 scenarios, 588 steps.

Each test fills 3-10 tables.

DatabaseCleaner.strategy = :truncation
# 9m56.995s
DatabaseCleaner.strategy = :deletion
# 5m32,286s
DatabaseCleaner.strategy = :truncation, { :fast => true, :reset_ids => false }
# 5m0.997s

UPD: As I said earlier, I can't rely on "shared connection + :transaction strategy" now because of :webkit Capybara driver, so these are the only options I can use. Definitely, I will be using :truncation, { :fast => true, :reset_ids => false }, in all my Rails projects having webkit or poltergeist.

@stanislaw

Are you going to merge this commits? I am interested in this, because I am still holding my attention on this case, thus being ready to place some new additions or improvements, you may like to add.

Actually, using my own fork is already enough for my needs, but consistency question is very valuable for me.

Please, let me know, if you are hesitating about doing it for some reasons.

@bmabey

I plan on merging in the functionality. I have it merged in locally, but before I push I want to do some cleanup first (e.g. creating an easy way for people to setup the needed DBs for mysql and postgres). I think I will also change the :fast option to :pre_count or :empty_check since I think that communicates better what it is actually doing.

You can continue working off of your fork and I'll let you know when it has been merged into master along with the option name change.

@stanislaw

Good! Thanks!

@stanislaw

By the way, if you rewrite your README with markdown markup, you will have github-colored ruby-syntax for your examples. This is because of Github favours the usage of markdown (Github flavored markdown) instead of textile.

I've already did this painless migration from Textile to Markdown for a bunch of my gems.

@bmabey

Textile's table formatting is helpful for this README.. But I may still convert to markdown.

@stanislaw

I am +1 for :empty_check as a name for the option.

@bmabey

Okay, I'll go with :empty_check. Thanks for the feedback!

@vitobotta

Just wanted to say thanks to bmabey for this so useful gem, and to stanislaw for his changes - I was struggling to get capybara+poltergeist to work properly with the transaction strategy and the shared connection method, due to the "This connection is still waiting for a result issue" and the tons of hacks I have tried to mitigate it.

I tried your fork and the fast truncation method, and all works great and, to my surprise, the speed is the same as with the transaction method, with the difference that I don't have to fight with the connection issues.

I have a very fast, last generation SSD though on my MBP2011. Do you think I would see different results with a normal hard drive?

@bmabey

Thanks for the field report @vitobotta!

WRT SSD vs harddrive... The hardrive would certainly be slower but if the whole database was small enough to fit into ram you probably wouldn't notice much of a difference.

Out of curiosity, could you share a similar report to what @stanislaw posted about his app? (Number of tables, average number of tables used per test, and then the speed for the various cleaning strategies.)

@vitobotta

The app I am working on at the moment has 30 tables, and I'd say that perhaps 3-4 is the average number of tables used per test (just guessing really, I would need to properly investigate to give an accurate figure...).

The full Cucumber suite runs in around 5'30" (5'22" right now) with the poltergeist driver for javascript scenarios, and the fast truncation strategy. With the transaction strategy and the shared connection method I save ~10 seconds but most often some scenarios fail with the "This connection is still waiting for a result" exception. I have found a few hacks that mitigate this issue by retrying the failing code or forcing reconnection and things like these, but there's no much difference after all, so for now I won't be struggling any more with trying to get the transaction strategy fully working with poltergeist (or webkit).

I've also tried again right now with Akephalos2 + transaction/shared connection method, and the full Cucumber suite ran in 5'39".

@bmabey

Would you mind running the suite with :truncation (no fast option) and also with :deletion?

@stanislaw

@vitobotta, I have almost exact speedup and the similar issues with :webkit, :poltergeist (i.e. 'This connection is still waiting for result' and need for the hacks with reconnection'). I am glad you found it very useful.

@stanislaw

Any updates on this?

@bmabey bmabey added a commit that referenced this pull request Jul 25, 2012
@bmabey bmabey WIP - adds DB config and rake tasks so test DBs can be created easily
I had to bump ActiveRecord to get the "standalone_migrations" rake tasks
to work.  The task "rake db:create:all" works but the AR upgrade is
causing errors in some of the other parts of the spec suite.

I'll need to get these errors resolved before moving forward. #127
9a8b60a
@bmabey bmabey added a commit that referenced this pull request Aug 5, 2012
@bmabey bmabey WIP - adds DB config and rake tasks so test DBs can be created easily
I had to bump ActiveRecord to get the "standalone_migrations" rake tasks
to work.  The task "rake db:create:all" works but the AR upgrade is
causing errors in some of the other parts of the spec suite.

I'll need to get these errors resolved before moving forward. #127
9fa407b
@stanislaw

So the last thing to be done is to rename strategy :fast => :pre_count, right? Who will do this, you or me? This time, I suggest you to do it, because I'm sure you will anyway need to do a post-work on syntax cleanups after my changes.

@bmabey

I'm doing this now.. I rebased to master and am doing some post-work now. Quick question.. in the MysqlAdapter you have:

def truncate_table_no_id_reset(table_name)
        rows_exist = execute("SELECT EXISTS (SELECT 1 FROM #{quote_table_name(table_name)} LIMIT 1)").fetch_row.first.to_i
        truncate_table(table_name) if rows_exist > 0
      end

(https://github.com/stanislaw/database_cleaner/blob/master/lib/database_cleaner/active_record/truncation.rb#L83)

But in the Mysql2Adapter you have:

     def truncate_table_no_id_reset(table_name)
        rows_exist = execute("SELECT EXISTS(SELECT 1 FROM #{quote_table_name(table_name)} LIMIT 1)").first.first
        truncate_table(table_name) if rows_exist == 1
      end

(https://github.com/stanislaw/database_cleaner/blob/master/lib/database_cleaner/active_record/truncation.rb#L126)

Why is the conditional different? In one you have if rows_exist > 0 and the other you have if rows_exist == 1. Seems like they could be the same, correct?

@bmabey

never mind... I was able to use the same code for MysqlAdapter and Mysql2Adapter and the tests passed so I don't think the difference mattered.

(commit: ca00f7d)

@stanislaw

Yeah, you're right. They are the same.

@stanislaw

Also, you could uncomment -1.9.3 in travis and see what happens with dependencies, when travis is doing a build for 1.9.3.

If it will require too much time to resolve it, please, let me know. I could try to help you with it, after you finish the work on fast_merge, if you would like.

I think it should be easy to set database_cleaner on travis with 1.9.3.

@bmabey

Sorry it took me so long to sit down and actually merge this in.. but it is done! There is some more refactoring I'd like to do (related to the per-existing monkeypatching) but I think it is best to get this in master for now. Thanks again for the patch and help!

BTW, any reason why I need the special travis rake task? Why not the default rake task?

@bmabey bmabey closed this Aug 6, 2012
@stanislaw

rake travis just has a subset of entire spec suite, that works for for me on travis.

As an example, I didn't touch Mongo specs at. If you manage to have a successful build with the whole suite running, it would be great.

@stanislaw

Ah, I see, that you've done it already! Now 1.9.3 build fails because of linecache dependency - I had it failing on the same point.

By the way, I really like the way you've merged this ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment