New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-dirty serialized columns are being saved anyway #8328
Comments
+1 for that |
This is the intended behavior. It was added with this commit 144e869 by @jonleighton . There is even an explantation why it behaves the way it does: # Serialized attributes should always be written in case they've been
# changed in place.
def keys_for_partial_write
changed | (attributes.keys & self.class.serialized_attributes.keys)
end The problem is that serialized data-structures can be changed without using the ActiveRecord setters. Then there is no way to detect that a change actually happend. To work around this problem serialized attributes are written all the time. |
Thanks. What about keeping the original serialized value after fetching and compare it to the current serialized value before saving? |
@elado I think this is a unnecessary overhead, much more than writing the serialized attribute every time. |
@rafaelfranca think large scale, 1M+ writes per minute to the RDB. on the ruby side, that wouldn't take too much time, but it can bring a really strong SQL machine down on its knees. there are less expensive ways of doing this than comparing, for instance, extend the serialized object to have a changed attribute accessor and when a setter method is called on it, it will also set that value to true. |
@rafaelfranca, as @unorthodoxgeek said, it's definitely a performance issue. Instead of an expensive DB access for no reason, Rails could either serialize the initial value after fetching and compare to it to the current serialized value for dirty check, or have I believe usually the serialized attribute is a Hash or an Array, so Let's remember this code overhead (not performance overhead) only happens when there are serialized attributes, and current state is that a query is executed every time and this is far longer operation. Thanks |
@elado If you can come up with a good PR I think there is nothing that will prevent it from getting merged. As the current behavior is known and expected we don't treat this as a bug though. Feel free to use the Rails Core GoogleGroup to discuss the idea and the change you want to make. Of course you can also just hack up the PR and start a discussion with it. |
@elado @unorthodoxgeek I see. It make sense. Mind to open a pull request? I'm reopening this issue. |
@tenderlove has talked to me about using |
The challenge here is how to "save the original" for comparison purposes - just stashing it someplace isn't sufficient. For instance, if you've got a Hash with Array values: irb: h = { a: [1,2,3], b: [4,5,6] }
===> {:a=>[1, 2, 3], :b=>[4, 5, 6]}
irb: h2 = h.deep_dup
===> {:a=>[1, 2, 3], :b=>[4, 5, 6]}
irb: h2[:a][0] = -1
===> -1
irb: h
===> {:a=>[-1, 2, 3], :b=>[4, 5, 6]} Neither irb: h = { a: [1,2,3], b: [4,5,6] }
===> {:a=>[1, 2, 3], :b=>[4, 5, 6]}
irb: h_copy = Marshal.load(Marshal.dump(h))
===> {:a=>[1, 2, 3], :b=>[4, 5, 6]}
irb: h_copy[:a][0] = -1
===> -1
irb: h
===> {:a=>[1, 2, 3], :b=>[4, 5, 6]} There's also the side-issue that such copying would have to be done pessimistically (every time an object is loaded from the DB) or users would likely encounter bugs if they mutate the serialized attribute in-place (since calling |
what about marshalling it and saving an md5 of the output string? should be |
This isn't the ideal solution, but why not note when the serialized attribute is read/written to and only save it in that case. If the attribute hasn't been read it couldn't have been modified. For applications where the serialized attribute is rarely accessed this would cut down on extraneous SQL calls while avoiding some of complexities in the proposals that try to compare versions of the serialized attribute. |
I agree with @elado's initial suggestion of storing the original unserialized string, but maybe as a MD5 hash. If you have:
then the original string's MD5 hash could be stored somewhere like That would make it easy to detect changes properly, even if the attribute is modified in place. |
+1 |
+1 @tenderlove via @jonleighton had the answer. Every object has a .hash method. Call that to see if the serialized hash is different. If so, mark as dirty. |
@jonleighton any thoughts on how to retrieve old value so that when |
So how do we handle the following scenario?
|
Is there a way to disable this at a per-model level ? |
@senny Referring back to your code quote from @jonleighton, I'm not sure why serialized data is so special just because it can be changed in-place. A string can also be changed in place without the changes being detected. user = User.first
user.first_name.replace('Changed')
user.first_name #=> "Changed"
user.changes #=> {} |
@njakobsen - sure, but the situation with serialized fields is different (IMO) for two reasons:
|
Supporting in-place changes feels like it is beyond the scope of this discussion, no? Not least because trying to support it (and being consistent) would require that a shadow As @jonleighton pointed out, using the So +1 for the use of And then another +1 for spelling out in the docs that in-place changes won't be picked up (particularly here) - I got bitten by this because I (wrongly?) wrapped my |
@myitcv If the .hash implementation is used, there will be no need to document gotchas with in-place changes since they would be detected. |
@kbrock storing a hash or even a clone of the serialized value on initialization is fine, if the developer has explicitly opted into the behavior and knows how to turn it off. Another performance implication of dirty tracking serialized columns is the comparison/hash calculation on each save, which you can't avoid anyway. This is likely another reason the current behavior was implemented. I'll take some time tomorrow to gist this |
What's the status on this issue? This is a terrible issue. I see that @kbrock got #13799 pulled into master, but has anything else changed since then that remedies this? I'm running into race conditions where my worker processes keep reverting user configuration because it keeps saving the CLEAN state of that configuration (serialized attribute) in the background right after a user updated the configuration through the web interface. This bug basically causes data (i.e. user input) to be lost. |
This is doable after the refactoring I've done recently. I'll be submitting a fix some time this week. |
This issue is fixed in #15458 |
thanks @sgrif |
Great work @sgrif ... Glad to see this got merged in. EDIT: Should really say thanks as well to the many others who contributed to this long-standing issue. |
Thanks ! |
Rails 4.2.0 still seeing this issue when parent -> child with If I omit the |
@steakchaser Can you please open a new issue, with a reproducible script to demonstrate the issue, using this template? Ping me on the issue, and I'll look into it. |
Which rails version should this be fixed in? Still facing in 4.1.6 and 4.2.0 |
Should be in 4.2.0, if you're still having problems open a new issue and maybe add a reference to this ticket. |
There's a few edge cases not handled in 4.2.0 which are fixed on 4.2.1. Please check on the 4-2-stable branch before opening a new issue, and provide a reproduction script using our template here: https://github.com/rails/rails/blob/master/guides/bug_report_templates/active_record_master.rb |
(In a new issue, not as a comment on this one) |
…method so that it can be overwritten also read rails#8328 (comment)
New Rails 3.2.9 app with one model that has a serialized column:
On every save, this column is being updated, regardless if it was changed or not.
The text was updated successfully, but these errors were encountered: