Data corruption due to concurrency issues with set_default_left_and_right #70

Closed
tardate opened this Issue Mar 30, 2011 · 5 comments

Comments

Projects
None yet
5 participants

tardate commented Mar 30, 2011

when running in high volume, multi-process environment (currently I'm testing with nginx+4 passenger workers), I think I'm seeing the lack of ACID control over lft/rgt setting expose itself in the form of massively corrupted nested sets under load.

Specifically, with multiple workers in play, before_create :set_default_left_and_right can calculate duplicate values for different records - at best a nested set corruption, at worst leaking sensitive data from one user to another.

For example, these are two records created within 80ms of each other, and they get the same lft and rgt values, and as a result they leak into each other's tree (i.e. lft and rgt were independently calculated by separate worker processes before any of the processes committed their new lft/rgt values, hence the duplication)

# select id,lft,rgt,parent_id,created_at from ticket_notes where id in (18893,18894);
id    | lft    | rgt   | parent_id |  created_at 
------+--------+-------+-----------+-------------------------------
18893 |  27945 | 27946 | 13297     | 2011-03-29 14:44:42.71804
18894 |  27945 | 27946 | 13300     | 2011-03-29 14:44:42.774143

This is my initial assessment of the situation at least. Does it sound plausible?

I don't see anything similar covered in prior issues, and a quick glance at the code makes me think that addressing this problem would require pretty drastic refactoring. Any thoughts or fixes in the works?

kerrizor commented May 9, 2011

Similar to the issues that prompted this fork?

https://github.com/vamosa/awesome_nested_set

Contributor

MarkusQ commented Jul 8, 2011

No, this is not the same as the issue that fork addresses as far as I can see. The problem is that

  • ANS assumes that the lft/rgt tags form a continuous range of the integers
  • Therefore new nodes have to be assigned the "next" available tags
  • The transaction guarding set_default_left_and_right only ensures read coherency
  • It is possible for two workers to simultaneously find that the maximum(rgt) is x
  • They will then (both) assign their new nodes {lft:x+1,rgt:x+2}
  • There are now two nodes with the same tags, as shown above.

Since nothing in this conflicts so far as the transactions can see, there's no coherency problem, and both transactions commit fine.

Contributor

MarkusQ commented Jul 8, 2011

One possible solution (which should work for all cases other than insertion of the initial row, which is still a race condition) is:

     # on creation, set automatically lft and rgt to the end of the tree
     def set_default_left_and_right
-      maxright = nested_set_scope.maximum(right_column_name) || 0
+      highest_right_row = nested_set_scope.find(:first, :order => "#{quoted_right_column_name} desc", :limit => 1,:lock => true )
+      maxright = highest_right_row ? highest_right_row[right_column_name] : 0
       # adds the new node to the right of all existing nodes
       self[left_column_name] = maxright + 1
       self[right_column_name] = maxright + 2

Basically, use a transaction bounded row lock on the item we are using to compute the new tags, so that it can never be used to produce more than one set of tags. A concurrent attempt will block until the new row has been added, at which time that row (and not the one we locked) will be used, ensuring sequentiality.

@MarkusQ MarkusQ added a commit to MarkusQ/awesome_nested_set that referenced this issue Jul 18, 2011

@MarkusQ MarkusQ Cherry pick of row locking logic from vamosa's fork plus extention t…
…o set_default_left_and_right case (ANS issue #70)
0868405
Contributor

parndt commented Aug 1, 2011

Unfortunately the patch makes ~12 specs fail (see discussion in pull request)

parndt closed this Aug 14, 2011

This fix (commit: 0868405) brakes on Oracle.

The query issued is shown below and is NOT a valid statement (we are using Rails 3.1.2). The lock can not be in a subselect - which is added because of the :limit option (.first method would do the same):

SELECT *
FROM
(SELECT "NODES_SALES_AGENT_ORGANIZA".*
FROM "NODES_SALES_AGENT_ORGANIZA"
ORDER BY "RIGHT" DESC
For UPDATE
)
WHERE ROWNUM <= 1;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment