Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC-8558: Additional discussion of deletion filters #1958

Merged
merged 11 commits into from Jun 10, 2021
159 changes: 155 additions & 4 deletions modules/learn/pages/clusters-and-availability/xdcr-filtering.adoc
Expand Up @@ -33,13 +33,20 @@ Note that fields and values on which matches are to be made should typically be
If the fields or values of a given document are changed after a replication has started (see xref:learn:clusters-and-availability/xdcr-filtering.adoc#filter-expression-editing[Filter-Expression Editing], below), such a document may no longer meet the criterion for replication, and so go unreplicated in its new form — and yet may already reside in its _previous_ form on the target cluster; since it formerly met the criterion, and was duly replicated.
In consequence, a single document would be maintained with a different value on each cluster.

One filter can be applied per replication: this filter therefore affects all mappings between scopes and collections, whether implicit or explicit.
(For more information, see xref:learn:clusters-and-availability/xdcr-with-scopes-and-collections.adoc[XDCR with Scopes and Collections].)

This page explains XDCR Advanced Filtering at a conceptual level.
For the practical steps involved, see xref:manage:manage-xdcr/filter-xdcr-replication.adoc[Filter a Replication].
See also the information provided in the xref:xdcr-reference:xdcr-filtering-reference-intro.adoc[XDCR Advanced Filtering Reference].

[#xdcr-advanced-filtering-with-scopes-and-collections]
=== XDCR Advanced Filtering with Scopes and Collections

One filter can be applied per replication: this filter therefore affects all mappings between scopes and collections, whether implicit or explicit.
The examples on this page show replications as extending from source bucket to target bucket: all mappings between source and target collections within these buckets should be understood to have the single filter applied to them.
For more information, see xref:learn:clusters-and-availability/xdcr-with-scopes-and-collections.adoc[XDCR with Scopes and Collections].

Note that in the special case of _migration_, multiple filters _can_ be applied, each to one of multiple mappings: however, migration is intended only for the special case where data is initially established in newly created scopes and collections.
See xref:learn:clusters-and-availability/xdcr-with-scopes-and-collections.adoc#migration[Migration], for more information.

== No Filter Applied

When no filter is applied, all documents in the specified source bucket are replicated to the specified target bucket.
Expand Down Expand Up @@ -72,7 +79,7 @@ The document `airport_8835` does have a `type` field, but its value does not con

== Multiple Filters Applied

_Multiple Filters_ can be applied in either of two ways:
To support replication, _multiple filters_ can be applied in either of two ways:

* By means of ORing, within a single replication.
This allows a document to be replicated if any one of the specified filters makes a successful match.
Expand Down Expand Up @@ -163,3 +170,147 @@ image::xdcr/filter-replication-diagram-7.png[,720,align=left]

The new document `airline_8838` is added the source bucket, and is examined by _R1a_.
A successful match is made, and `airline_8838` is duly replicated to the target bucket.

[#using-deletion-filters]
== Using Deletion Filters

_Deletion filters_ control whether the deletion of a document at source causes deletion of a replica document that exists on the replication-target.
For the xref:manage:manage-xdcr/filter-xdcr-replication.adoc#deletion-filters[UI] each filter is selected by means of a checkbox.
For the xref:cli:cbcli/couchbase-cli-xdcr-replicate.adoc[CLI] and xref:rest-api:rest-xdcr-create-replication.adoc[REST API], parameter-values must be specified.

Examples of filtering are provided in xref:manage:manage-xdcr/filter-xdcr-replication.adoc[Filter a Replication].

=== Deletion-Filter Types

Deletion filters are of three types, and control the following.

==== Replication of Expirations

Configured through the xref:manage:manage-xdcr/filter-xdcr-replication.adoc#deletion-filters[UI] with the *Do not replicate document expirations* checkbox; through the xref:cli:cbcli/couchbase-cli-xdcr-replicate.adoc[CLI] with the `filter-expiration` flag; and through the xref:rest-api:rest-xdcr-create-replication.adoc[REST API] with the `filterExpiration` flag.
Selecting this option means that if, having been replicated, the document at source expires and is deleted, the replicated copy of the document will _not_ be deleted.
Conversely, if this option is not selected or left `false` (which are the defaults), expirations at source are replicated; meaning that the replicated copy of the document _will_ be deleted.

==== Replication of Deletions

Configured through the xref:manage:manage-xdcr/filter-xdcr-replication.adoc#deletion-filters[UI] with the *Do not replicate DELETE operations* checkbox; through the xref:cli:cbcli/couchbase-cli-xdcr-replicate.adoc[CLI] with the `filter-deletion` flag; and through the xref:rest-api:rest-xdcr-create-replication.adoc[REST API] with the `filterDeletion` flag.
Selecting this option determines that if, having been replicated, the document at source is deleted, the replicated copy of the document will _not_ be deleted.
Conversely, leaving this option unselected or `false` (which are the defaults) replicates deletions that occur at source, meaning that the replicated copy of the document _will_ be deleted.

==== Replication of TTL

Configured through the xref:manage:manage-xdcr/filter-xdcr-replication.adoc#deletion-filters[UI] with the *Remove TTL from replicated items* checkbox; through the xref:cli:cbcli/couchbase-cli-xdcr-replicate.adoc[CLI] with the `reset-expiry` flag; and with the xref:rest-api:rest-xdcr-create-replication.adoc[REST API] with the `filterBypassExpiry` flag.
Selecting this option determines that the TTL that a document bears at source is _not_ made part of the replicated copy of the document: instead, the TTL of the replicated copy is set to 0.
Conversely, if this option is not selected or left `false` (which are the defaults), the TTL is made part of the replicated copy of the document, and may thereby determine when the replicated copy of the document expires.
Note, however, that the TTL applied to the replicated document at the target may be that of either the collection or the bucket in which it resides: for information, see xref:learn:buckets-memory-and-storage/expiration.adoc[Expiration].

=== Deletion Filters versus Filter Expressions

By default, any source-document deletion (or expiration) _is_ replicated to the target; resulting in a corresponding target-document deletion.
Note that such replication is _not_ prevented by the specifying of a filter that is formed with regular and other filtering expressions: such expressions only determine which non-deleted documents are to be replicated.
Therefore, to ensure that document-deletions (and expirations) are _not_ replicated, _deletion filters_ must specifically be configured.

=== Tombstones, DCP Events, and Replication

When a document is deleted or is expired, a tombstone is created.
Tombstones and their management are described in xref:learn:buckets-memory-and-storage/storage.adoc#tombstones[Tombstones].
In order to replicate a deletion or an expiration, XDCR must be able to receive, on the source, a DCP event that corresponds to the creation of a tombstone for the deleted or expired document.
On receipt of the DCP event, XDCR generates its own, corresponding deletion or expiration event; and replicates this to the target.

However, in some instances, even if a tombstone has been created, XDCR may not receive the DCP event.
For example:

* A document is deleted and then immediately recreated, such that DCP interprets the tombstone to have been at once superseded by the recreated document; and so does not send an event.

* A replication is deleted; then, source documents are deleted or expired.
Tombstones are created; but no DCP event is sent to XDCR, since no replication exists.
Subsequently, the replication is recreated: the replication will from this point only receive DCP events that correspond to future deletions and expirations.
+
Note, however, that conversely, creation of a new replication in this way, if performed with greater immediacy, may indeed result in DCP sending events; and allow XDCR, in turn, to replicate deletion and expiration events to the target.

=== Expiration, TTL, and Replication

TTL can be established on individual documents, on collections, and on buckets.
The relationship between these settings, and the way the setting on an individual document is resolved when replicated to the target, is fully described in xref:learn:buckets-memory-and-storage/expiration.adoc[Expiration].

When a deletion or expiration event is replicated to the target, the replica-document at the target is deleted or expired irrespective of its current TTL.
Thus, the replica-document's TTL may have been modified on the target, such that it specifies expiration at a later point in time than that specified by the TTL of the source document: nevertheless, when the source document expires, an expiration event is replicated, and the replica-document on the target is immediately expired.

For more information, see xref:learn:clusters-and-availability/xdcr-filtering.adoc#configuring-deletion-filters-to-prevent-data-loss[Configuring Deletion Filters to Prevent Data-Loss], immediately below.

[#configuring-deletion-filters-to-prevent-data-loss]
=== Configuring Deletion Filters to Prevent Data-Loss

Appropriate deletion-filter settings protect data.
However, in certain circumstances, inappropriate deletion-filter settings may cause _loss_ of data.
Copy link
Member

@nelio2k nelio2k May 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example talks about the loss of data. There is also situations where an inappropriate filter setting may cause accidental leak of stale data.

For example, if the user turns on the deletion filter in a replication, and the situation is where source bucket contains documents that contains sensitive information. If a source document was deleted for compliance purposes, then the target document could become stale and remain, even if were not meant to exist anymore.

It may help to create an example illustrating the opposite situation, covering both scenarios (if space or context suffices)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a similar subsection, which hopefully addresses this.

For example:

. By means of replication _1_, documents of type _A_ and type _B_ are replicated to the target.

. Replication _1_ is deleted.

. Documents of type _A_ are deleted on the source; with the expectation that they will continue to exist on the target.

. Replication _2_ is created, with the default deletion-filter settings, so as to replicate to the target all future changes on the source to documents of type _B_.

Here, the (incorrectly) expected outcome has been that documents of both type _A_ and type _B_ continue to exist on the target.
However, since document-deletions are replicated by default, replication _2_ has deleted documents of type _A_ from the target; and the actual outcome is therefore that only documents of type _B_ exist on the target; with documents of type _A_ existing on neither source nor target.

To avoid this outcome, replication _2_ could be created with deletion filters configured to prevent the replication of deletions: the prior deletions of documents of type _A_ from the source would thereby _not_ be replicated to the target.
Note, however, that this would also prevent the replication of future source-deletions of type _B_ documents.

=== Configuring Deletion-Filters to Prevent Replication of Stale Data

In certain circumstances, inappropriate deletion-filter settings may allow _stale_ data to be inadvertently replicated to the target.
For example:

. A replication is established to replicate documents of type _A_ and type _B_ to the target.
Deletion-filter settings are configured to _prevent_ replication of deletions that occur on the source.

. After the replication has commenced, for reasons of security, documents of type _A_ are deleted from the source.

Here, the (incorrectly) expected outcome has been that security requirements have been complied with, since documents of type _A_ have been deleted from the source.
However, since deletion-filters have been configured _not_ to replicate deletions, documents of type _A_, subsequent to their replication, continue to exist on the target as _stale_ data; and do so in contravention of security requirements.

To avoid this outcome, the replication should be created with the default deletion-settings, so as to _permit_ the replication of deletions.
This ensures that deletions made on the source also occur on the target.

=== Deletion Filters and Migration

The appropriate configuring of deletion filters is critically important in cases of _migration_ where documents are being assigned to newly created collections, with their source collection (often the default collection of a legacy bucket) intended subsequently to be dropped.
For example, the following sequence results in loss of data:

. The default collection of a legacy bucket is determined to contain only documents that are either of type _A_ or of type _B_.

. Migration is configured to replicate documents of type _A_ from the source, default collection to a new, target collection, named _A_; and documents of type _B_ to another new, target collection, named _B_.
Deletion filters are left at their default settings.

. Migration proceeds.
Eventually, all type _A_ documents exist both in the source collection, and in the new, target collection _A_; and all type _B_ documents exist both in the source collection, and in the new, target collection _B_.

. The source, default collection is dropped; and all its data thereby deleted.

Here, the (incorrectly) expected result has been that all documents will continue to exist; in the new, target collections, _A_ and _B_.
However, since the migration was not deleted prior to the deletion of the source data, and since the default settings of the deletion filters specified that document-deletions should be replicated; the actual result is that all documents from the target collections _A_ and _B_ have been deleted, along with the source, default collection, and all its data.

==== Guarding Against Accidental Data-Loss during Migration

Either of the following approaches can be used to ensure that no migrated data is lost:

* Configure deletion filters to prohibit the replication of deletions and/or expirations.
This ensures that only documents and their mutations are replicated to their new collection.
+
Note, however, that if read-write application-access continues to be granted to the source collection during the life of the migration, application-deletions and/or expirations that occur on the source are not replicated to the target collection; eventually rendering source and target collections inconsistent.

* Keep deletion filters at their default setting, to permit the replication of deletions and/or expirations.
When the migration is judged to have completed, delete the migration _prior to_ the deletion of any source data.
Then, once the migration is deleted, delete source data as appropriate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional note as a reminder - if the replication is re-created after this point without a deletion filter, and the tombstones from the deletes have not been purged, then the deletes will be replicated to the target bucket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a note to this effect.

+
Note that if a new replication is subsequently created between the same source and target collections, with deletion filters configured to permit the replication of deletions and/or expirations, the deletions and/or expirations will be replicated to the target if the tombstones produced by the source-data deletions and/or expirations have not yet been purged.

Note that before and during migration, both the xref:learn:services-and-indexes/services/backup-service.adoc[Backup Service] and xref:backup-restore:enterprise-backup-restore.adoc[cbbackupmgr] can be used to protect data.

=== Configuring Deletion Filters

For information on configuring deletion filters with the UI, see xref:manage:manage-xdcr/filter-xdcr-replication.adoc#deletion-filters[Deletion Filters];
with the CLI, see xref:cli:cbcli/couchbase-cli-xdcr-replicate.adoc[xdcr-replicate];
with the REST API, see xref:rest-api:rest-xdcr-create-replication.adoc[Creating XDCR Replications].
Expand Up @@ -87,7 +87,8 @@ For information, See xref:clusters-and-availability/xdcr-filtering.adoc[XDCR Adv

Optionally, _deletion filters_ can be applied to a replication: these control whether the deletion of a document at source causes deletion of a replica that has been created.
Each filter covers a specific deletion-context.
For information, see xref:manage:manage-xdcr/filter-xdcr-replication.adoc#deletion-filters[Deletion Filters].
For a description of the individual deletion filters, see xref:manage:manage-xdcr/filter-xdcr-replication.adoc#deletion-filters[Deletion Filters].
For an explanation of the relationship between deletion filters and filters formed with regular and other filtering expressions, see xref:learn:clusters-and-availability/xdcr-filtering.adoc#using-deletion-filters[Using Deletion Filters].

[#xdcr-payloads]
== XDCR Payloads
Expand Down
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 5 additions & 2 deletions modules/manage/pages/manage-xdcr/filter-xdcr-replication.adoc
Expand Up @@ -167,11 +167,14 @@ If checked, this means that if, having been replicated, the document at source i
Conversely, if this option is not checked, deletions at source _are_ replicated; meaning that the replicated copy of the document _will_ be deleted.

* *Remove TTL from replicated items*.
If checked, this means that the TTL that a document bears at source is _not_ made part of the replicated copy of the document.
Conversely, if this option is not checked, the TTL _is_ made part of the replicated copy of the document, and thereby determines when the replicated copy of the document expires.
If checked, this means that the TTL that a document bears at source is _not_ made part of the replicated copy of the document: instead, the TTL of the replicated copy is set to 0.
Conversely, if this option is not checked, the TTL _is_ made part of the replicated copy of the document, and may thereby determine when the replicated copy of the document expires.

For more information on deletion filters, see xref:learn:clusters-and-availability/xdcr-filtering.adoc#using-deletion-filters[Using Deletion Filters].
For information on TTL and expiration, see xref:learn:buckets-memory-and-storage/expiration.adoc[Expiration].

Note that the replication of deletions, expirations, and/or TTLs is _not_ prevented by the specifying of a filter that is formed with regular and other filtering expressions: to ensure that document-deletions, expirations, and/or TTLs are _not_ replicated, the appropriate deletion-filter checkboxes must be checked.

[#editing-filters]
=== Editing

Expand Down