Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identity map #76

Merged
99 commits merged into from Feb 18, 2011
Merged

Identity map #76

99 commits merged into from Feb 18, 2011

Conversation

miloops
Copy link
Contributor

@miloops miloops commented Oct 8, 2010

This is the implementation of Identity Map for ActiveRecord, Marcin Raczkowski's project for Ruby Summer Of Code (http://rubysoc.org/projects):

Project #12: ActiveRecord Identity Map

Our goal is provide plugable identity map implementation for ActiveRecord. An identity map is a design pattern used to improve performance by providing a in-memory cache to prevent duplicate retrieval of the same object data from the database, in our case in context of the same request or thread.

If the requested data has already been loaded from the database, the identity map returns the same instance of the already instantiated object, but if it has not been loaded yet, it loads it and stores the new object in the map. The main gains of this project will be performance improvement and memory consumption reduction.

@josevalim
Copy link
Contributor

Just one note, for those interested in trying it out, you need to add to your Gemfile:

gem "rails", :git => "git://github.com/miloops/rails.git", :branch => "identity_map"
gem "weakling", :git => "git://github.com/swistak/weakling.git"
gem "rack", :git => "git://github.com/rack/rack.git"
gem "arel", :git => "git://github.com/rails/arel.git"

@Soleone
Copy link

Soleone commented Oct 8, 2010

Great stuff! This is very useful in large projects where each request has to load e.g. a "user" model or another context every time. We rolled our own customer Identity Map implementation in a very large app and definitely observed increased performance so I'm glad an official solution is in the works. Thanks!

@bensie
Copy link
Contributor

bensie commented Oct 8, 2010

Awesome work!

@loe
Copy link
Contributor

loe commented Oct 8, 2010

This is amazing! I think the best feature is being able to validate on both sides of an association without having to manually stitch them together in the controller.

@author.books.build

book validates_presence_of :author and author validates_presence_of :book

doing b = author.books.build; b.author = @author was frustrating at best!

@iain
Copy link
Contributor

iain commented Oct 8, 2010

Nice! I would love to write controller specs with mocking #save without having to mock .find:

site = Factory :site
site.should_receive(:update_attributes).with('foo' => 'bar').and_return(true)
put :update, :id => site.to_param, :site => { :foo => "bar" }
response.should redirect_to(site_url(site))

This actually works now!

Anyway, I've tried it on two real live Rails 3.0 apps and put them on rails master and miloops identimap_branch.

The first app showed no difference in performance. It has 332 examples, all 3 versions took around 20 seconds to run and around 118MB of RAM used. In the identity_map branch there was one failing spec.

A second project (544 specs) did show some differences between Rails 3.0 and the master branch, but no difference in performance in the identity_map branch. But there were a lot of failing specs though.

3.0 stable: 35 seconds, 272MB, 0 failing specs
master: 29 seconds, 260MB, 2 failing specs
identity_map: 29 seconds 260MB, 13 failing specs

These weren't real benchmarks or anything, I just ran rake spec and observed memory usage myself.
The failing specs were all different errors, but all related to updating and finding records.

Oh, I couldn't get cucumber to run on master or identity_map, which is a shame, because that would've been more representative of real usage.

@josevalim
Copy link
Contributor

Awesome feedback Iain! If you have some extra time, do you think you can give us more information about these extra errors you got?

@iain
Copy link
Contributor

iain commented Oct 8, 2010

It's past midnight here, so I'll be brief:

From the first app, a test that failed only in the identity_map branch:

site = Factory :site
other_site = Factory :site
recruiter = Factory :recruiter, :first_name => "before-update", :site_id => site.id
attrs = Factory.attributes_for(:recruiter, :first_name => "after-update", :site_id => other_site.id)
put :update, :id => recruiter.to_param, :recruiter => attrs
recruiter.reload.first_name.should == "after-update" # succeeds
recruiter.site.should == other_site # fails, still pointing to site, not other_site.

I can't really see what's wrong here and why the first_name field does update, but the site_id doesn't. Especially since it works in 3.0 and rails-master. It might be authentication/authorization that doesn't go quite well, because certain signed in users are not allowed to change the site_id. (I'm using devise, cancan and inherited_resources in this controller).

The other project has been around for a lot longer (was started with rails 3.0.0.beta, if I remember correctly) and has a lot more gem dependencies.

There were some errors I can understand that come from the identity map. I have these classes:

class User < ActiveRecord::Base
end
module Authentication
  class User < ::User
  end
end

And it sometimes picks the wrong one. This pattern sounded really cool when I first heard about it, but caused me nothing but headaches, but that's besides the point.

I got this one a couple of times:

put :update, :project_id => project.id, :id => comment.id, :comment => { :body => "" }, :format => :js
JSON.parse(response.body).should have_key('errors') # fails

And some that look like this:

user = Factory :user
comment1 = Factory :comment, :user => user
comment2 = Factory :comment, :user => user
subject.comments << comment1 << comment2
subject.save!
subject.reload.comments.should == [ comment1, comment2 ]

Where I get just one comment instead of both. But when I removed the call to reload it worked again.
It works without the reload in 3.0 too, and I'm not entirely sure why I put it there in the first place. I guess putting reload in is one of the first things I try to do when debugging something.

I guess the majority of failing specs fail because they happened to be accidentally passing before. I found a couple instances where I was testing the wrong object. So I think this update will be a huge improvement and help you find bugs faster than before.

Edit: well, that didn't turn out to be very brief at all! :)

@miloops
Copy link
Contributor Author

miloops commented Oct 14, 2010

Hey Iain, you should try it now, in the latests commits i added a middleware to flush identity map on each requests, flush IM on tests and many other things that you can check out in today's commits.

Feel free to add me on IM miloops at gmail in case to discuss any problem you are having.

@iain
Copy link
Contributor

iain commented Oct 14, 2010

It seems to work fine when running the the server, but I still have some issues running my specs. Like this one:

  subject { Factory.build :profile, :first_name => "Jan" }
  it "accepts utf8" do
    subject.first_name = "☃"
    subject.save!
    subject.reload.first_name.should == "☃"
  end

When I run rake spec:models, or rspec spec/models/profile_spec.rb, it works passes.
When I run rake spec it fails. Weirdly enough, it returns the default value from the factory, even though I never mention that in my specs:

  6) Profile accepts utf8
     Failure/Error: subject.reload.first_name.should == "☃"
     expected: "\342\230\203",
          got: "Kees" (using ==)
     # ./spec/models/profile_spec.rb:34

I don't have any time anymore tonight, but I'll be happy to discuss it with you soon.

@josevalim
Copy link
Contributor

Iain, you are using rspec, so a callback that we added to ActiveSupport::TestCase is not being executed. Please try adding the code below (it should run before each test in the whole suite):

before(:each) do
  ActiveRecord::IdentityMap.clear
end

It will likely solve the issue. :)

@iain
Copy link
Contributor

iain commented Oct 15, 2010

I found one bug in rails master (not specific to identity map):

>> Project.select(:id).map(&:id)
  Project Load (2.6ms)  SELECT 'id' FROM `projects`
=> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, etc...
>> Project.select('id').map(&:id)
  Project Load (0.7ms)  SELECT id FROM `projects`
=> [7, 16, 76, 92, 98, 101, 102, 116, etc....

On a more related note, the only other issue I could find was that ActiveRecord::IdentityMap.clear doesn't clear the aggregation cache. I'm not sure whether it should, but it is something that broke my controller spec:

it "shows errors for invalid comment when html" do
  comment.clear_aggregation_cache
  put :update, :project_id => project.id, :id => comment.id, :comment => { :body => "" }
  assigns(:comment).should_not be_valid # fails without the clear_aggregation_cache 2 lines up
  response.should render_template(:edit)
end

I couldn't find anything else.

swistak and others added 19 commits November 19, 2010 19:03
- Added IdentityMap to be included into AR::Base
- Fixed bug with Mysql namespace missing when running tests only for sqlite
- Added sqlite as default connection
…ain the same record will have invalid attributes.
…gain the same record will have invalid attributes.
@alexbartlow
Copy link

Jose,

How does this mesh with Rack::FiberPool and EventMachine-based DB adapters that run every request in its own fiber?

Thanks for your work on this,

-Alex

Conflicts:
	activerecord/examples/performance.rb
	activerecord/lib/active_record/association_preload.rb
	activerecord/lib/active_record/associations.rb
	activerecord/lib/active_record/associations/association_proxy.rb
	activerecord/lib/active_record/autosave_association.rb
	activerecord/lib/active_record/base.rb
	activerecord/lib/active_record/nested_attributes.rb
	activerecord/test/cases/relations_test.rb
Conflicts:
	activerecord/lib/active_record/associations/association.rb
	activerecord/lib/active_record/fixtures.rb
@hamin
Copy link

hamin commented Apr 14, 2011

AWESOME!

@pechorin
Copy link

very cool! i am waiting for release in stable

@jweiss
Copy link

jweiss commented Apr 26, 2011

Why is this tied to ActiveRecord and not an ActiveModel functionality?
I wanted to add support for SimplyStored (CouchDB wrapper) but it seems wrong to require ActiveRecord...

@josevalim
Copy link
Contributor

I believe the part of IdentityMap that is agnostic is actually quite small. Most of concerns are actually in cleaning up the identity map and identifying all the situations that require so. If you think there is a significant part of the identity map that could be moved to ActiveModel, please do provide a patch!

@jweiss
Copy link

jweiss commented Apr 26, 2011

I'm taking about https://github.com/rails/rails/blob/master/activerecord/lib/active_record/identity_map.rb

This looks totaly generic to me and could be copied for my SimplyStored IdentityMap. I'll see that I extract it.

j-manu pushed a commit to j-manu/rails that referenced this pull request Jan 18, 2012
UrlEncodedPairParser is deprecated, but still used as an example
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet