Skip to content
This repository has been archived by the owner on Apr 17, 2018. It is now read-only.

Inconsistent views when creating and accessing associations #82

Open
solnic opened this issue May 17, 2011 · 14 comments
Open

Inconsistent views when creating and accessing associations #82

solnic opened this issue May 17, 2011 · 14 comments

Comments

@solnic
Copy link
Contributor

solnic commented May 17, 2011

Just getting started with DM and I apologise if I have misunderstood the scope of the identity map semantics and intended behaviour of associations.

With a basic join model, accessing the association from the "has side" before the join model is saved leaves the association/collection empty even after the save occurs.

See pastie for details http://www.pastie.org/222592

Not sure if this is intended, but it seems to break the identity map / object equivalence illusion as you can obtain another instance of the object from the model after saving the join table and the association collection on that instance now works. Even though obj1 == obj2 => true, the instances behave quite differently. Is this intended ?

Also, from the join table (ie the belongs_to side) the association does indeed exist (even before the save). It seems inconsistent/unexpected that the association is only visible from one side, and only visible from the other side after saving ... and then only if you haven't previously attempted to access the association.

The identity map seems to only actually apply to the key field(s) so while the object identities are the same, the actual object instances are different and their contents are thus independent and can actually be quite different, eg:

obj1 = Model.first
obj2 = Model.first
obj1 == obj2
=> true
ob1.name = "Ben"
obj2.name = "Joe"
obj1 == obj2
=> true
obj1.name == obj2.name
=> false

Is this intended ? Documentation may need to cover this unless I missed it somewhere.

Q: In light of transactions, would/could the approach being used/planned only create one actual object instance for each database identity referenced within a transaction context ? Changes to an object/object-graph would remain isolated to that transaction context until the transaction commits. I realise this is probably a can of worms ... but :) extending the database notion of transaction isolation to the in-memory object model would be really cool especially for maintaining object graph consistency in a multi-threaded server.


Created by Hax - 2010-02-04 07:29:53 UTC

Original Lighthouse ticket: http://datamapper.lighthouseapp.com/projects/20609/tickets/418

@solnic
Copy link
Contributor Author

solnic commented May 17, 2011

Hax, yes this is intended. In fact its one of the primary things that separates DataMapper from other ORMs. Read the first section "Identity Map" on the following page:

http://datamapper.org/why.html

We use the Identity Map pattern to do this described in PoEAA:

http://martinfowler.com/eaaCatalog/identityMap.html

I’m not sure what should happen within Transactions. It would be relatively simple to have an isolated IdentityMap within a transaction so that objects within the transaction block are different from objects outside; but I haven’t thought of all the pluses and minuses for this technique. Reconciling the inner-objects with the outer-objects after the transaction is complete would be a little more difficult I think.

Sam, do you have any opinion on this?

by Dan Kubb (dkubb)

@solnic
Copy link
Contributor Author

solnic commented May 17, 2011

Hi Dan,

Thanks for the link on the Identity Map pattern. After reading this, I am now even more surprised as it seems to imply that you should only have a single object instance per row in the table which is not what you get with DM.

The Identity Map pattern states:
"If you aren’t careful you can load the data from the same database record into two different objects. Then, when you update them both you’ll have an interesting time writing the changes out to the database correctly."

What I am seeing definitely seems to contradict what the identity map pattern seeks to prevent. I do get more than one object instance of the same database row, and each instance can have different property values and relationships as demonstrated in my ticket.

Aside from the identity map issue, the inconsistency in the relationship visibility is still a concern. It just seems ’off’ to me that (as per my pastie) the Enrolment object has the association to the Student and Subject objects, but the Student and Subject objects don’t have the inverse relation.

by Hax

@solnic
Copy link
Contributor Author

solnic commented May 17, 2011

Sorry I must’ve been partly asleep when I read your example.

IdentityMap is only active inside a repository scope, so try wrapping the code in a repository block.

I do understand what you’re saying about the two way inconsistency. Its definitely a problem.

I want to update the relationship code so that when you define a relationship in either direction only a single object is created. So whether the has() or belongs_to() side is defined with exactly corresponding arguments, they’d share the same relationship object and know what the association on the other end is called. (as of right now there’s no way to know what the corresponding association on the other side is called without guessing each time)

That way when adding a resource to many to one association, the underlying relationship can also update any in-memory objects for the corresponding one to many association on the other side.

Would you mind providing a patch with a failing spec that shows the behavior you expect to happen?

by Dan Kubb (dkubb)

@solnic
Copy link
Contributor Author

solnic commented May 17, 2011

Regarding bi-directional associations. Completely against it. It’ll throw a real wrench into Strategic-Eager-Loading for one (right now we just have to overwrite Relationship#get_parent/child).

You can use the Query::Path to find complementary relationships anyways.

Someone wants to write a plugin, that’s fine. But automatically adding methods I didn’t specifically define or include (most likely on purpose)? No thanks. That’s too close for comfort to AR territory.

@hax:

There’s no way around scoping an IdentityMap. WeakRef doesn’t GC. Objects don’t finalize. So a global IM is out of the question. It’s a Ruby limitation. If someone can find a reliable way around it, I’m all ears though.

by Sam Smoot

@solnic
Copy link
Contributor Author

solnic commented May 17, 2011

Sam, would you mind clarifying what you mean by bi-directional associations?

Each time I’ve heard them mentioned, they were in reference to when you use belongs_to, having it automatically create the has n at the other end, and vice versa.That’s not what I was talking about.

by Dan Kubb (dkubb)

@solnic
Copy link
Contributor Author

solnic commented May 17, 2011

@dan:

Thanks for the info, I will try it out when I get back to working with DM again. It will be great to have the relationship stuff eventually sorted as you describe. I hope to send you some specs as requested by the end of the week.

@sam:

I believe you are referring to something else, what I want for bi-directional associations is that if I have specified ’has n, :enrolments’ on say the Student model class, and on the Enrolment class I have specified ’belongs_to :Student’ then when I create a new Enrolment object giving it a student, that particular student object should immediately reflect the association in its enrolments collection. Currently enrolment.student does return the student, but student.enrolments does not return the enrolment.

by Hax

@solnic
Copy link
Contributor Author

solnic commented May 17, 2011

Ok, sorry, I completely confused the issue. My apologies guys. :-)

@dan:

Relationship#name should return the name of the association. So it can’t be the same Relationship for both sides. But Associations can have Proxy#complement that returns the Relationship with the complementary Query::Path. Which should be really easy I think. If you think it’d be useful.

@hax:

OK, makes sense. Yes, zoo.exhibits.new(:name => ’Bob’) should add the complementing association if one is defined. We’ll work on that. I think it should probably happen through an #"#{association_name}_keys=(key_arrays)" method. Shouldn’t be hard to add, but that’s unsafe until we finish off :protect => true for mass-assignment.

So... we don’t actually need to do anything for the case when you assign a Exhibit to zoo.exhibits. When the Exhibit#zoo_id is set, then Relationship#get_parent should be able to hand you back the right Zoo without actually making a database call. So it’s perhaps a little more expensive this way if you actually touch all your association attributes after such an assignment, but it’s much cheaper if you don’t.

To ensure that scenario works then, we just have to make sure to fix association repository bindings (which are broken right now I believe), and ensure that Relationship#get_parent checks the IdentityMap before doing the Strategic-Eager-Loading thing.

BTW: Relationship should take a :repository option. If it’s present, it’s bound to that repository. If it’s not, it defaults to the repository of the instance passed in to #get_parent/get_children. (The latter being the desired behaviour 9 times out of 10.) So that’s the fix for that. Also, I’ve noticed the repository_name argument takes precendence over the repository stack in the resource/model #repository methods I think. But since most methods will probably pass a repository_name, this is a potential problem.

There’s a fine-line to walk there anyways. Because the stack as well is "explicit" from the user’s POV. I’m actually concerned that DM3 had it "right", and DM9 might have a few issues with the use of the stack (or rather lack thereof). But I’ll have to verify that one way or another. I could be off-base.

Either way, these are the issues generally related to this request. Nothing too big. But there’s a few things to work through until we can call it done.

by Sam Smoot

@solnic
Copy link
Contributor Author

solnic commented May 17, 2011

This ticket still applies in 0.10.0 (see attached test, adapted from the pastie)

The issue only occurs if student.enrollments is actually inspected. Just calling it does nothing.

by Jonathan Stott (namelessjon)

@solnic
Copy link
Contributor Author

solnic commented May 17, 2011

[project:id#20609 not-tagged:"0.10.0" not-tagged:"0.10.1" milestone:id#51895 bulk edit command]

by Dan Kubb (dkubb)

@solnic
Copy link
Contributor Author

solnic commented May 17, 2011

[bulk edit]

by Dan Kubb (dkubb)

@solnic
Copy link
Contributor Author

solnic commented May 17, 2011

[bulk edit]

by Dan Kubb (dkubb)

@solnic
Copy link
Contributor Author

solnic commented May 17, 2011

[bulk edit]

by Dan Kubb (dkubb)

@solnic
Copy link
Contributor Author

solnic commented May 17, 2011

[bulk edit]

by Dan Kubb (dkubb)

@solnic
Copy link
Contributor Author

solnic commented May 17, 2011

Ok, this is a rather old ticket, and I would still like to resolve it, but I should provide some updated information based on our current understanding (which substantially more advanced than when this ticket was created). Some of this might repeat the information above, but I want to summarize what we’ve done, and what the ultimate solution will be.

Current State

Since this ticket was created relationships were specced and rewritten, and resolved alot of problems. One concept that was introduced was that every relationship has a way to find it’s "inverse". So if you have a 1:m relationship, you can call it’s inverse method to find the m:1 relationship. If the inverse relationship wasn’t explicitly defined, we’d still get a Relationship object back, but it would be known as an anonymous relationship.

The inverse method is reliable in most cases, except when the relationship has a query scope. The query scope applies to the target side only, and not the source side. These relationships cannot have an inverse because the target scope of an 1:m can’t match the source scope of the m:1 (since there is no way to specify the source scope right now).

The idea with inverse was that when you added a resource to a 1:m association, we could automatically set the m:1 association in the resource. We’d simply set the ivar in the resource so that if the inverse relationship was anonymous it would just be ignored.

One problem with this approach is that if you go the other direction, adding a resource to a m:1 association, we can’t just automatically add the child to the 1:m association. I mean, we could add it, but the problem is that if the children aren’t loaded, we add the child, and then lazy load the children, the child might be in the wrong location in the list.

Possible Solution

There are two things that need to be fixed before this ticket can be closed:

  1. Inverse identification needs to happen even when a relationship has a query scope
  2. We need to be able to add a Resource to a Collection, but without specifying it’s location in the list

Inverse Identification

The relatively recent relationship refactoring was a good first step, but we can probably clean things up even further. If we extract the description of the "Join", namely which properties link to which models, and other conditions (the scope) to apply to each model in the join, we can push this into an object that the relationship has. The scope can be applied to the target, source or any intermediaries in the case of a m:m relationship. Then when we attempt to match a relationship up with it’s inverse, we can just compare the join objects.

The next step would be that when we define the relationship, we check the "other side" and see if there’s a relationship with an equivalent join object, and if it is equivalent then we set both relationships to use the same join object. Note that this can only be done when the models on both sides of the relationship have been defined.

One other problem to deal with is that a relationship may sometimes point to the PK in a model. The PK can be setup when the model (class) is first defined, or monkey patched later on. It’s only really when the relationship is used that we say that the PK is "frozen". We also want to be able to dynamically create the FK that points to the PK, but we can’t set the FK until the PK is in a frozen. So basically we need a way to tell the Join object that the PK is not-yet-known, but that when it is first used it should initialize the FK and setup the equi-join conditions.

Since relationships would allow a source scope to be defined now, we would need to update the relationship accessor/mutator so that they only work when the relationship matches the source scope. For example, if we had a relationship that was only available when the :active attribute is true, then if a resource had a false :active attribute we’d probably raise an exception.

One other gotcha is that the inverse relationship may be a a 1:1 relationship, or there might be both a 1:1 and 1:m defined on the other side. I am honestly not sure how to deal with that. We may need to raise an exception about the inverse being ambiguous, and force the user to supply an :inverse option when declaring the relationship.

Adding to Collection

In comparison, this should be relatively easy. We’d need a method in Collection that allows us to add a resource to the Collection, but not specify it’s order. When the Collection is lazy loaded, it would use that object rather than loading a new one.

Result

The result of these two fixes is that we’d be able to define a relationship on one side, and have an inverse be created that is an absolute match. We’d be able to set a m:1 association with a parent, and have the child added to the inverse 1:m association in the parent. We’d be able to do the opposite too. It could also be made to work where the inverse was a 1:1 or m:m relationship.

When retrieving a list of child resources, they would have references to their parent that are set at load time. If the parent was set to nil, then the child should be removed from the parent’s children collection. Likewise if a child is removed from the parent’s children, it should also set the parent to nil.

Conclusion

This is a rather large change, but still should be attempted when there is a need to refactor the relationship organization, but I don’t think it should be done when there are other more pressing bugs dependent on this fix. This behaviour has been in place since the beginning, and I can’t see it resolving alot of common problems.

by Dan Kubb (dkubb)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant