DboSource::_matchRecords() race conditions and usage re affectedrows #1746

ghost · 2013-10-11T20:40:38Z

Created by Dieter Plaetinck, 16th Mar 2010. (originally Lighthouse ticket #468):

DboSource::_matchRecords() is used in 2 places:
DboSource::update() and DboSource::delete()

In both places it's used to check if records can be found matching the specified conditions before doing the actual update/delete.

However:

this is vulnerable to race conditions. Matching rows can appear or vanish between the find('all') call in _matchRecords() and the actual query execution in DboSource::update() or DboSource::delete().
(imho) it's bad practice anyway to return false in DboSource::update() and DboSource::delete() when there will be/was no affected row. "false" is a return value which should only be used in case of errors. This also how it's documented in the api:
http://api.cakephp.org/class/dbo-source#method-DboSourceupdate
http://api.cakephp.org/class/dbo-source#method-DboSourcedelete

I propose to get rid of the redundant find('all') calls, just execute the update/delete queries. And (only) in case of failure, return false.
If users want a reliable way to check if rows were affected, they can always use Model::getAffectedRows()

If you guys agree with this logic, I will write a patch for 1.3

ghost · 2013-10-11T20:40:39Z

15th Mar 2010, Mark Story said:

How do race conditions get created? _matchConditions is used to fix database servers that don't allow deletion or updating using joins. Without _matchConditions if you were to attempt a delete query that requires joins, and were using a database server that does not support joins on delete/update, the query will error out, which is not acceptable.

I also don't see the issue with returning false from update() and delete(). The documentation says it returns the boolean of the queries success. If the conditions would fail to find anything, the query was unsuccessful. Running queries that could potentially generate errors doesn't seem like a good idea to me.

ghost · 2013-10-11T20:40:41Z

15th Mar 2010, Dieter Plaetinck said:

when you use DboSource::update() or DboSource::delete() to update/delete records matching certain conditions, the only way the query can be unsucessfull is when it failed to do what you ask. I.e. when it failed to update/delete the records matching your conditions.

When there are no matching records, that's something completely different then a failure.
Let's make the analogy with a find() call. If you supply conditions that match nothing, you just get an empty array. the query executed properly; there are just no rows in the resultset because the conditions did not match.

This is good, because if you get false back, you know the query/database failed, an empty array means no results, and so on. unambiguos return value. The return code of DboSource::update() and DboSource::delete() is right now very dubious.

race conditions: in _matchRecords you first do a find('all') with the given conditions and return false in no rows are found.
This return value is checked in DboSource::update() and DboSource::delete(), and if _matchRecords() returned false, then these functions also return false. but records that match the conditions can be added or removed by other php processes, so that _matchRecords() return value may not be accurate anymore.

ghost · 2013-10-11T20:40:41Z

16th Mar 2010, Mark Story said:

Fair enough, in a multi process environment there could be race conditions. I'd be more than willing to accept a patch that fixes this short coming. As long as that patch didn't involve shenanigans like locking tables.

ghost · 2013-10-11T20:40:42Z

16th Mar 2010, Dieter Plaetinck said:

in some transactionless storage engines (i.e. myisam) table locks are afaik the only way to do this in a reliable manner. but that's just how those engines are designed, and table locks are not that expensive in myisam.

But anyway my primary concern here is the return value. Do you agree with what i said about the return value?

ghost · 2013-10-11T20:40:44Z

28th Mar 2010, Mark Story said:

Your original suggestion of removing all the 'extra' find() calls isn't going to work, as those extra find calls are used to qualify conditions that the dbo cannot normally handle.

I also disagree that the return of update() and delete() should change to an array. I don't see how returning an empty array is a better alternative to false. It creates an additional type you need to check for and handle separately.

Moving to RFC, as I don't see the return of important methods changing in stable releases, but its something that can be discussed for future versions. Personally, I don't like the idea of additional return values, and would rather see exceptions used instead.

ghost · 2013-10-11T20:40:44Z

29th Mar 2010, cake_joe said:

I disagree with the suggestion that delete() and update() should return true if no data was changed due to no rows being affected. I consider that bad practice - the ORM should not force you to rely on something like getAffectedRows(), else one could directly start doing queries manually.

If at all there could be an additional option/flag that specifies if delete() and update() should still return true if they got executed without errors but did affect 0 rows (working somehow like deleteIfExists() and updateIfExists()).

I agree that on actual "errors" (like DBO/SQL errors, errors resulting out of race conditions) it should throw an error.
p.s.: Wouldn't optional transactions, if available, help with the race condition?

ghost · 2013-10-11T20:40:46Z

29th Mar 2010, Dieter Plaetinck said:

There must be a way to distinguish "no rows affected" from "an error occurred". Not being able to distinguish those is very bad. I suggested returning an empty array because that's how it's implemented for find(). Any solution is better then the ambiguous return value we have now. If you prefer the usage of exceptions over an extra possible return value, that's fine by me. I'm also fine with ionas' "deleteIfExists() and updateIfExists()" proposal.
I'm willing to patch this, I just want clarity on how to deal with this.
As for the race conditions. I'm glad I managed to convince Mark, I leave the implementation to someone else, as I'm not familiar enough with the context.

ghost · 2013-10-11T20:40:47Z

29th Mar 2010, Mark Story said:

ionas: The Dbo's currently return false, when no rows were deleted, as well as when an error is generated.

Using transactions, is something that Cake often leaves up to the end developer to manage. Outside of saveAll there are no transactions used by cake.

There are other return value problems that are not addressed by changing the return for when _matchRecords() is falsey. If you grab a dbo object and run a delete command that doesn't hit anything it will return true, which is also technically incorrect from the arguments presented here. You would still need to check the affected rows in this situation as well.

lorenzo · 2014-01-23T10:29:31Z

Closing as now update and delete happen inside a transaction in 3.0

lorenzo closed this as completed Jan 23, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DboSource::_matchRecords() race conditions and usage re affectedrows #1746

DboSource::_matchRecords() race conditions and usage re affectedrows #1746

ghost commented Oct 11, 2013

ghost commented Oct 11, 2013

ghost commented Oct 11, 2013

ghost commented Oct 11, 2013

ghost commented Oct 11, 2013

ghost commented Oct 11, 2013

ghost commented Oct 11, 2013

ghost commented Oct 11, 2013

ghost commented Oct 11, 2013

lorenzo commented Jan 23, 2014

DboSource::_matchRecords() race conditions and usage re affectedrows #1746

DboSource::_matchRecords() race conditions and usage re affectedrows #1746

Comments

ghost commented Oct 11, 2013

ghost commented Oct 11, 2013

ghost commented Oct 11, 2013

ghost commented Oct 11, 2013

ghost commented Oct 11, 2013

ghost commented Oct 11, 2013

ghost commented Oct 11, 2013

ghost commented Oct 11, 2013

ghost commented Oct 11, 2013

lorenzo commented Jan 23, 2014