Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tombstones are not reaped if reaping occurs before tombstones reach all replicas [JIRA: RIAK-2803] #311

Closed
dreverri opened this issue Apr 3, 2012 · 6 comments

Comments

@dreverri
Copy link
Contributor

dreverri commented Apr 3, 2012

Tombstones may not be reaped if reaping occurs before tombstones are written to all replicas.

Scenario

If one of the replicas returns an older version during tombstone reaping, the riak_kv_get_core check returns read_repair rather than delete.

The riak_kv_get_fsm should be able to delete the object rather than read repairing if one of the replicas returns an object older than the tombstone.

Below is an example set of replicas that will result in a read repair rather than a delete:

[{274031556999544297163190906134303066185487351808,
  {ok,{r_object,<<"foo">>,<<"9698">>,
                [{r_content,{dict,6,16,16,8,80,48,
                                  {[],[],[],[],[],[],[],[],[],...},
                                  {{[],[],[[<<"Links">>]],[],[],[],[],...}}},
                            <<"hello world">>}],
                [{<<"úZ»/O{=\t">>,{1,63500707125}}],
                {dict,1,16,16,8,80,48,
                      {[],[],[],[],[],[],[],[],[],[],...},
                      {{[],[],[],[],[],[],[],[],...}}},
                undefined}}},
 {296867520082839655260123481645494988367611297792,
  {ok,{r_object,<<"foo">>,<<"9698">>,
                [{r_content,{dict,4,16,16,8,80,48,
                                  {[],[],[],[],[],[],[],[],...},
                                  {{[],[],[],[],[],[],...}}},
                            <<>>}],
                [{<<"úZ»/O{=\t">>,{2,63500707280}}],
                {dict,1,16,16,8,80,48,
                      {[],[],[],[],[],[],[],[],[],...},
                      {{[],[],[],[],[],[],[],...}}},
                undefined}}},
 {251195593916248939066258330623111144003363405824,
  {ok,{r_object,<<"foo">>,<<"9698">>,
                [{r_content,{dict,4,16,16,8,80,48,
                                  {[],[],[],[],[],[],[],...},
                                  {{[],[],[],[],[],...}}},
                            <<>>}],
                [{<<"úZ»/O{=\t">>,{2,63500707280}}],
                {dict,1,16,16,8,80,48,
                      {[],[],[],[],[],[],[],[],...},
                      {{[],[],[],[],[],[],...}}},
                undefined}}}]
@dreverri
Copy link
Contributor Author

dreverri commented Apr 4, 2012

Reported in zd://1139

@evanmcc
Copy link
Contributor

evanmcc commented Jun 14, 2012

Also, there is a perhaps related issue going on in the thread started here:

http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-May/008389.html

which might look more like this issue starting here:

http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-May/008423.html

it isn't clear to me if they're the same thing.

@reiddraper might be able to say with more authority if we're looking at one issue or two related issues.

@evanmcc
Copy link
Contributor

evanmcc commented Jun 15, 2012

https://gist.github.com/2938621 reproduces the issue on my machine on a clean cluster

@reiddraper
Copy link
Contributor

Moving to 2.1 milestone. Speak up if there are any objections, please.

@reiddraper reiddraper added this to the 2.1 milestone May 12, 2014
@atlantis
Copy link

In case it helps anyone, here's a quick ruby script to crawl through and remove the tombstones

use at your own risk - you're not supposed to list all keys in production, but for us the alternative was to have 560K tombstones sitting around taking up space when we only needed ~3K active keys. Toss this in a monthly cron job and there you go - also, if you needed the script to use less memory you could alter the first curl request to: curl 'localhost:8098/buckets/YOUR_BUCKET_HERE/keys?keys=stream > /tmp/tmp_file.json and then parse the resulting JSON file one chunk at a time.

Note that the ?pw=all&pr=all&w=all&r=all is the magic part (as per http://riak-users.197444.n3.nabble.com/Riak-Client-Resources-Deleting-a-Key-Doesn-t-Remove-it-from-bucket-keys-td4003576.html).

riak_tombstone_cleanup.rb

require 'json' 

puts "Getting list of riak keys" 
keys = JSON.parse(`curl 'localhost:8098/buckets/YOUR_BUCKET_HERE/keys?keys=true'`)['keys'] 

puts "#{keys.count} keys loaded" 

bad_keys_counter = 0 
good_keys_counter = 0 
keys.each do |key| 
  result = `curl -I 'localhost:8098/buckets/YOUR_BUCKET_HERE/keys/#{key}?pw=all&pr=all&w=all&r=all' 2>&1` 

  if result['404 Object Not Found'] 
    bad_keys_counter += 1 
  else 
    good_keys_counter += 1 
  end 
  puts "Processed #{bad_keys_counter} bad key(s) and #{good_keys_counter} good key(s)" if (bad_keys_counter + good_keys_counter) % 1000 == 0 
end

@bsparrow435
Copy link
Contributor

We spent quite a bit of time today discussing this behavior and have decided to roll better reaping functionality into AAE as opposed to relying entirely on the get after put which might not be done propagating(depending on dw/pw settings). Additionally, when AAE is used as a view of the data for scans etc, we'll be smarted about sifting out tombstones.

Thanks everyone for their discussion and contributions. Closing this issue.

@Basho-JIRA Basho-JIRA changed the title Tombstones are not reaped if reaping occurs before tombstones reach all replicas Tombstones are not reaped if reaping occurs before tombstones reach all replicas [JIRA: RIAK-2803] Oct 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants