sml / pseudo_cursors

Git-hosted version of pseudocursors plugin for Rails

This URL has Read+Write access

name age message
file MIT-LICENSE Fri Aug 29 11:58:13 -0700 2008 initial import of 1.0.1 from rubyforge [graysky]
file README Fri Aug 29 11:58:13 -0700 2008 initial import of 1.0.1 from rubyforge [graysky]
file Rakefile Fri Aug 29 11:58:13 -0700 2008 initial import of 1.0.1 from rubyforge [graysky]
file init.rb Fri Aug 29 11:58:13 -0700 2008 initial import of 1.0.1 from rubyforge [graysky]
file install.rb Fri Aug 29 11:58:13 -0700 2008 initial import of 1.0.1 from rubyforge [graysky]
directory lib/ Fri Aug 29 13:33:16 -0700 2008 support :include option in find_each [sml]
directory spec/ Fri Aug 29 13:33:16 -0700 2008 support :include option in find_each [sml]
directory tasks/ Fri Aug 29 11:58:13 -0700 2008 initial import of 1.0.1 from rubyforge [graysky]
file uninstall.rb Fri Aug 29 11:58:13 -0700 2008 initial import of 1.0.1 from rubyforge [graysky]
README
This plugin is designed to add bring some of the functionality of SQL cursors to ActiveRecord. One of the most useful 
reason for using cursors is when you are iterating over a large data set and you don't want to blow up your memory. 
ActiveRecord makes iterating over your data so easy that you might not think about what's going on with a large amount 
of data.

For example, suppose for a migration you want to scan through all the rows in a table for a model that has a belongs_to 
association called parent to update some data:

  Model.find(:all, :conditions => "name IS NOT NULL").each do |record|
    record.name = record.parent.name
    record.save!
  end

Now if Model has less than a few hundred rows you'll be fine. However, if Model has 50,000 rows in it, you may run into 
some problems. Each row in the table will be serialized into a Model object. On top of that, you'll serialize each 
records parent object into memory as well. While the iteration is being performed, these objects will all be in scope 
and not reclaimable by the garbage collector. After a while your process can use up a lot of memory and cause a lot of 
memory swapping and slow down the whole box. Since this sort of behavior only appears with large data sets, you'll of 
course not notice there's a problem until you get to production.

== Pseudo Cursors

The way pseudo_cursors works is to add the method :cursor_each to ActiveRecord. This method takes all the same arguments 
as :find and will iterate over the results. However, it will run a query first that only gets the row ids. This will 
stay in memory, but since it's only an array of integers, the memory consumption should be reasonable. The it will 
iterate over the rows it found in batches (either 100 or specified in a :batch_size argument to the method). If a 
:transaction argument is provided to the method, each batch will be wrapped in a transaction. This can be useful if your 
database is clustered to cut down on the number of writes propagating across the cluster. If the :order argument was 
provided to the method, it will be honored.

The above block would then be written as:

  Model.find_each(:conditions => "name IS NOT NULL") do |record|
    record.name = record.parent.name
    record.save!
  end

Requires Rails 1.2 or higher.