Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC-8558: Additional discussion of deletion filters #1958

Merged
merged 11 commits into from Jun 10, 2021

Conversation

tonyjhillman
Copy link
Contributor

No description provided.

@tonyjhillman tonyjhillman requested a review from nelio2k May 24, 2021 13:18
=== Configuring Deletion-Filters to Protect Data

Appropriate deletion-filter settings protect data.
However, in certain circumstances, inappropriate deletion-filter settings may cause _loss_ of data.
Copy link
Member

@nelio2k nelio2k May 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example talks about the loss of data. There is also situations where an inappropriate filter setting may cause accidental leak of stale data.

For example, if the user turns on the deletion filter in a replication, and the situation is where source bucket contains documents that contains sensitive information. If a source document was deleted for compliance purposes, then the target document could become stale and remain, even if were not meant to exist anymore.

It may help to create an example illustrating the opposite situation, covering both scenarios (if space or context suffices)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a similar subsection, which hopefully addresses this.

However, since document-deletions are replicated by default, replication _2_ has deleted documents of type _A_ from the target; and the actual outcome is therefore that only documents of type _B_ exist on the target; with documents of type _A_ existing on neither source nor target.

To avoid this outcome, replication _2_ could be created with deletion filters configured to prevent the replication of deletions: the prior deletions of documents of type _A_ from the source would thereby _not_ be replicated to the target.
Note, however, that this would also prevent the replication of future source-deletions of type _B_ documents: therefore, the creation of an entirely new source collection for type _B_ documents might be required, by means of _migration_, prior to the creation of replication _2_.
Copy link
Member

@nelio2k nelio2k May 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this line confusing. I'm not sure what migration has to do specifically with the concept of deletion being talked about here.
The suggestion here may cause more questions than answers since the deletion concept applies not only to the scope of collections migration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I've misunderstood. Is it just the last sentence that is problematic? ("Note, however," ... etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've removed the line with the reference to migration.


* Keep deletion filters at their default setting, to permit the replication of deletions.
When the migration is judged to have completed, delete the migration _prior to_ the deletion of any source data.
Then, once the migration is deleted, delete source data as appropriate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional note as a reminder - if the replication is re-created after this point without a deletion filter, and the tombstones from the deletes have not been purged, then the deletes will be replicated to the target bucket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a note to this effect.

@@ -172,6 +172,9 @@ Conversely, if this option is not checked, the TTL _is_ made part of the replica

For information on TTL and expiration, see xref:learn:buckets-memory-and-storage/expiration.adoc[Expiration].

Note that the replication of deletions is _not_ prevented by the specifying of a filter, formed with regular and other filtering expressions: to ensure that document-deletions are _not_ replicated, the appropriate deletion-filter checkboxes must be checked.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replication of deletions or expirations are not prevented...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added. I've also added some information to the front of the subsection in xdcr-filtering.adoc, explaining the function of each of the filters in a little more detail.


Either of the following approaches can be used to ensure that no migrated data is lost:

* Configure deletion filters to prohibit the replication of deletions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deletions or expirations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now tried to add "expirations, and/or TTLs" where appropriate, throughout.

+
image::manage-xdcr/filter-xdcr-deletion-filters.png[,320,align=left]
+
For each filter, to ensure that deletion is replicated, leave the setting at its default (with the checkbox unchecked); and to ensure that deletion is _not_ replicated, check the checkbox.
Copy link
Member

@nelio2k nelio2k May 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or expiry
(check the expiry checkbox)

Copy link
Member

@nelio2k nelio2k May 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Don't see deletion filter in XDCR adv filtering page)

If a source document contains an expiry, and the the user does not want the expiry to apply to the document when it reaches the target bucket, check the "TTL" checkbox...
This will cause the document's expiry to be set to 0 prior to transmission to the target.
However, this also means that the target bucket's expiry setting, if exists, will take effect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've included mention of expirations. As mentioned above, I've added an introduction that provides info on each deletion filter, including TTL.

Copy link
Member

@nelio2k nelio2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few in-line comments

@tonyjhillman
Copy link
Contributor Author

I've added as per the inline comments. I've also added, on the main advanced filtering page, an intro to deletion filters that describes each activity; and also passages on Tombstones and Replication, Expiration, TTL and Replication, and Stale-Data Replication.


When a document is deleted or is expired, a tombstone is created.
Tombstones and their management are described in xref:learn:buckets-memory-and-storage/storage.adoc#tombstones[Tombstones].
In order to replicate a deletion or an expiration, XDCR must be able to find, on the source, a tombstone that corresponds to the deleted or expired document.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XDCR doesn't really "find" it...
XDCR, as a recipient of the source KV DCP, must be able to receive a deletion from the DCP stream

Not sure how to best explain this concept without confusing the user with technicalities...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified this to include the notion of DCP.

When a document is deleted or is expired, a tombstone is created.
Tombstones and their management are described in xref:learn:buckets-memory-and-storage/storage.adoc#tombstones[Tombstones].
In order to replicate a deletion or an expiration, XDCR must be able to find, on the source, a tombstone that corresponds to the deleted or expired document.
When the tombstone is located, XDCR generates a corresponding deletion or expiration event, and replicates this to the target.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

located -> received (from source DCP)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also, I've added the notion of DCP.

If a document has been deleted or expired, and the resulting tombstone has been purged prior to XDCR being able to locate it, no deletion or replication event is replicated.
This situation might occur if:

* A document is deleted and then immediately recreated, such that the time during which the tombstone existed has been too brief for location of the tombstone by XDCR to occur.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tombstone technically still exists... however, the KV engine could decide that the tombstone is superceded by the recreated document, and for the sake of efficiency (deduplication), skip sending the deletion and just send the recreated document instead.

The phrase "tombstone existed has been too brief" would be incorrect and runs contrary to the concept of tombstone purging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rewritten this accordingly.

@tonyjhillman tonyjhillman merged commit 95e9091 into couchbase:release/7.0 Jun 10, 2021
@tonyjhillman tonyjhillman deleted the DOC-8558 branch June 10, 2021 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants