New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Purge doesn't remove cached documents #1658

Closed
ArihantRk opened this Issue Mar 18, 2016 · 13 comments

Comments

Projects
None yet
10 participants
@ArihantRk
Copy link

ArihantRk commented Mar 18, 2016

sync gateway _purge accepts revisions as * to removes latest revision the document. but i could also see that older documents are not deleted from couchbase. still i can get the older document . _purge not working with giving revision id in place of *.

@ajres

This comment has been minimized.

Copy link
Contributor

ajres commented Mar 18, 2016

@ArihantRk The initial implementation of _purge removes the document and it's revision history from the underlying bucket. This is why currently you can only pass "*" as the target revision to be purged.

The current implementation does not remove the revisions from the revision cache or change cache so you may still get revisions returned for as period of time after calling _purge.

A document _purge is not propagated via sync to client devices.

@ArihantRk

This comment has been minimized.

Copy link

ArihantRk commented Mar 18, 2016

@ajres When will revision will get deleted from server permanently. is it configurable ?

@zgramana zgramana added this to the 1.3 milestone Mar 25, 2016

@adamcfraser adamcfraser modified the milestones: 2.0, 1.3 Jul 29, 2016

@adamcfraser adamcfraser changed the title sync gateway 1.2 purge not deleting older revisions Purge doesn't remove cached documents Oct 5, 2016

@adamcfraser adamcfraser removed this from the 2.0 milestone Oct 5, 2016

@djpongh djpongh added this to the 2.1.0 milestone Jan 31, 2018

@djpongh djpongh added the P1: high label Jan 31, 2018

@Fujio-Turner

This comment has been minimized.

Copy link

Fujio-Turner commented Feb 7, 2018

I have experience this in the wild. People many times have to restart SG which can be painful if _purge is a big part of managing data life span

@adamcfraser

This comment has been minimized.

Copy link
Contributor

adamcfraser commented Feb 7, 2018

@Fujio-Turner Generally speaking _purge isn't intended to be a "big" part of managing the data life span (which is why it's just a data store purge, and not a cache purge).

Can you provide some more specifics on the scenarios you're thinking of?

@Fujio-Turner

This comment has been minimized.

Copy link

Fujio-Turner commented Mar 21, 2018

@adamcfraser ,
Let me restate _purge is a "part" of managing data life span ... ttl(_exp) is the more preferred method.
With xattr and convergence there will be a higher percentage of users who use the SDK delete() to remove(purge) data. But with xattr and SDK delete the channel info is still preserved in xattr right? Can you use that to clean up SG cache when you get the on delete op on the DCP stream?

@djpongh djpongh modified the milestones: 2.1.0, 2.2.0 Mar 28, 2018

@djpongh djpongh removed the P1: high label Mar 28, 2018

@adamcfraser

This comment has been minimized.

Copy link
Contributor

adamcfraser commented Jul 13, 2018

When running /w enable_shared_bucket_access=true, SDK delete isn't equivalent to purge - it's equivalent to a normal Sync Gateway delete/tombstone.

A purge performed through Sync Gateway's REST API when enable_shared_bucket_access=true deletes both the document and the associated xattr.

The net result is the same (w/ xattrs or non-xattrs) when a non-tombstoned document is purged - when Sync Gateway receives the DCP delete mutation, there's no channel information available. Without that information, cache cleanup is expensive - SG would need to do a full scan of the cache (across all channels) looking for the key.

Given that purge isn't intended to be part of normal document life cycle management, we're currently not doing this expensive cache removal.

@erpankaj22081982

This comment has been minimized.

Copy link

erpankaj22081982 commented Jul 14, 2018

If an event somehow can reach to all other sync gateways to clean the cache for that particular document then I think this can be an efficient way to deal with. for this when ever the sync gateway purges a document It should create a local (temporary document) which keeps doc id getting purged, channels it used to belong.

Then after creating this document the original document can be deleted. if the original document is not deleted then the local document can be deleted so that rollback happens.

Now all other sync gateway will get the local document. And know that this particular document in this channel is deleted so they will arrange their cache. The document will work as a DCP event for notification of data deletion.

This local document can be deleted when all the sync gateways ack that they are good or after a fix amount of time, This fix amount of time also can be configured.

@adamcfraser

This comment has been minimized.

Copy link
Contributor

adamcfraser commented Jul 15, 2018

I'd considered the 'notification document' type approach, but it felt like a bit too much of a hack solely for purge notification functionality. There are also potential concurrency issues - in addition to the one you mention, there are also timing issues when the notification document and the original document are not in the same vbucket.

In general the motivation behind providing purge support is the ability to reclaim bucket storage space for long-obsolete documents. This sounds like it's not quite the use case here. However, I'm planning to review the cache handling for the more common approach to this problem (tombstone purge), as it may have a similar issue for caches with low throughput.

@erpankaj22081982

This comment has been minimized.

Copy link

erpankaj22081982 commented Jul 16, 2018

For us Purge is one of a big step in data life cycle management. Because CBL(at least 1.4.1) doesn't provide way to purge all the soft deleted documents (_deleted=true). So if the documents are expired from server all those documents will come down as tombstone and then there is no way to delete them from CBL 1.4.1. May be there is but then its not well documented. So Purge is the only hope to remove the data which doesnt reach mobiles. And mobiles can use expiration time or a query to find out which documents should be purged.

@erpankaj22

This comment has been minimized.

Copy link

erpankaj22 commented Aug 22, 2018

I think there is also problem where, the deleted documents(deleted:True) when getting removed as part of meta data removal from couchbase server are also not getting removed from cache of sync gateway. Hence the documents which previously got deleted are not getting removed from the sync gateway cache but removed from couchbase server and are coming in _change request of one shot. Due to which 100s of _bulk_get are made which takes time for the one shot to finish. We are using CBL 1.4.1, sync gateway 2.0 and CBS 5.1

@adamcfraser

This comment has been minimized.

Copy link
Contributor

adamcfraser commented Aug 22, 2018

@erpankaj22 In general tombstoned documents shouldn't be removed from the cache - it's required that these be replicated to clients, as they are the current revision of the document, and clients need to be up to date with the current revision.

What's the specific use case for your one shot replication where you don't need to retrieve tombstones?

@erpankaj22

This comment has been minimized.

Copy link

erpankaj22 commented Aug 23, 2018

I think I didnt explained correctly. So trying some other way.
Lets say the document got deleted at 22 Aug 2018 because the expiry was set as 22nd Aug 2018

The meta data purge interval is 1.25 on couch-base server So couchbase server expired the doc on 22nd and meta data removed on 24 morning.

On sync gateway we are using multiple couchbase server URL as :- http://host1,host2:8091
(I have noticed in logs that sync gateway is not able to get the meta data purge interval from couchbase server due to multiple URL. )

Now even on 28th the _change feed is giving the reference of deleted document (
(
{"seq":356361,"id":"GuestPayForDetail::35054144-226432656","deleted":true,"removed":["BLISS_emb_aci_t"],"changes":[{"rev":"2-54a9ff81bdb7014c876ff096f7c69292"}]}
). But the _bulk_get call will return as
--fd1732639a9ec22d87baa30faf1d39b53150a3f6e7b42bc9b15fa6e7a2d1
Content-Type: application/json; error="true"
{"error":"not_found","id":"GuestPayForDetail::35054144-226432656","reason":"missing","rev":"2-54a9ff81bdb7014c876ff096f7c69292","status":404}

So what it looks like not only purge documents but also expired documents whose meta data is removed from CBS are also not getting removed from cache of sync gateway.

you can also refer Request #24051. On couch-base support. I have raised a ticket there too.

@djpongh djpongh added the P2: medium label Aug 30, 2018

@adamcfraser adamcfraser added the ffc label Sep 4, 2018

@adamcfraser adamcfraser self-assigned this Sep 10, 2018

@adamcfraser adamcfraser modified the milestones: Iridium, 2.1.1 Sep 17, 2018

@adamcfraser adamcfraser assigned bbrks and unassigned adamcfraser Sep 27, 2018

@pasin pasin added in progress and removed ready labels Sep 28, 2018

@adamcfraser adamcfraser added the ready label Sep 28, 2018

@bbrks

This comment has been minimized.

Copy link
Member

bbrks commented Oct 1, 2018

Fixed in 2.1.1 by #3770

Calling _purge or _compact will now also go and clean up the channel caches.

@bbrks bbrks closed this Oct 1, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment