diff --git a/docs/understanding-airbyte/connections/incremental-append.md b/docs/understanding-airbyte/connections/incremental-append.md index f4c55837061e9..c5fa7c80fb2a8 100644 --- a/docs/understanding-airbyte/connections/incremental-append.md +++ b/docs/understanding-airbyte/connections/incremental-append.md @@ -22,51 +22,43 @@ As mentioned above, the delta from a sync will be _appended_ to the existing dat Assume that `updated_at` is our `cursor_field`. Let's say the following data already exists into our data warehouse. -```javascript -[ - { "name": "Louis XVI", "deceased": false, "updated_at": 1754 }, - { "name": "Marie Antoinette", "deceased": false, "updated_at": 1755 } -] -``` +| name | deceased | updated_at | +| :--- | :--- | :--- | +| Louis XVI | false | 1754 | +| Marie Antoinette | false | 1755 | In the next sync, the delta contains the following record: -```javascript - { "name": "Louis XVII", "deceased": false, "updated_at": 1785 } -``` +| name | deceased | updated_at | +| :--- | :--- | :--- | +| Louis XVII | false | 1785 | At the end of this incremental sync, the data warehouse would now contain: -```javascript -[ - { "name": "Louis XVI", "deceased": false, "updated_at": 1754 }, - { "name": "Marie Antoinette", "deceased": false, "updated_at": 1755 }, - { "name": "Louis XVII", "deceased": false, "updated_at": 1785 } -] -``` +| name | deceased | updated_at | +| :--- | :--- | :--- | +| Louis XVI | false | 1754 | +| Marie Antoinette | false | 1755 | +| Louis XVII | false | 1785 | ### Updating a Record -Let's assume that our warehouse contains all the data that it did at the end of the previous section. Now unfortunately the king and queen lose their heads. Let's see that delta: +Let's assume that our warehouse contains all the data that it did at the end of the previous section. Now, unfortunately the king and queen lose their heads. Let's see that delta: -```javascript -[ - { "name": "Louis XVI", "deceased": true, "updated_at": 1793 }, - { "name": "Marie Antoinette", "deceased": true, "updated_at": 1793 } -] -``` +| name | deceased | updated_at | +| :--- | :--- | :--- | +| Louis XVI | true | 1793 | +| Marie Antoinette | true | 1793 | The output we expect to see in the warehouse is as follows: -```javascript -[ - { "name": "Louis XVI", "deceased": false, "updated_at": 1754 }, - { "name": "Marie Antoinette", "deceased": false, "updated_at": 1755 }, - { "name": "Louis XVII", "deceased": false, "updated_at": 1785 }, - { "name": "Louis XVI", "deceased": true, "updated_at": 1793 }, - { "name": "Marie Antoinette", "deceased": true, "updated_at": 1793 } -] -``` +| name | deceased | updated_at | +| :--- | :--- | :--- | +| Louis XVI | false | 1754 | +| Marie Antoinette | false | 1755 | +| Louis XVII | false | 1785 | +| Louis XVI | true | 1793 | +| Marie Antoinette | true | 1793 | ## Source-Defined Cursor @@ -108,33 +100,27 @@ select * from table where cursor_field > 'last_sync_max_cursor_field_value' Let's say the following data already exists into our data warehouse. -```javascript -[ - { "name": "Louis XVI", "deceased": false, "updated_at": 1754 }, - { "name": "Marie Antoinette", "deceased": false, "updated_at": 1755 } -] -``` +| name | deceased | updated_at | +| :--- | :--- | :--- | +| Louis XVI | false | 1754 | +| Marie Antoinette | false | 1755 | At the start of the next sync, the source data contains the following new record: -```javascript -[ - { "name": "Louis XVI", "deceased": true, "updated_at": 1754 }, -] -``` +| name | deceased | updated_at | +| :--- | :--- | :--- | +| Louis XVI | true | 1754 | At the end of the second incremental sync, the data warehouse would still contain data from the first sync because the delta record did not provide a valid value for the cursor field \(the cursor field is not greater than last sync's max value, `1754 < 1755`\), so it is not emitted by the source as a new or modified record. -```javascript -[ - { "name": "Louis XVI", "deceased": false, "updated_at": 1754 }, - { "name": "Marie Antoinette", "deceased": false, "updated_at": 1755 } -] -``` +| name | deceased | updated_at | +| :--- | :--- | :--- | +| Louis XVI | false | 1754 | +| Marie Antoinette | false | 1755 | Similarly, if multiple modifications are made during the same day to the same records. If the frequency of the sync is not granular enough \(for example, set for every 24h\), then intermediate modifications to the data are not going to be detected and emitted. Only the state of data at the time the sync runs will be reflected in the destination. -Those concerns could be solved by using a different sync mode based on binary logs, Write-Ahead-Logs \(WAL\), or also called **Incremental - Change Data Capture**. \(coming to Airbyte in the near future\). +Those concerns could be solved by using a different incremental approach based on binary logs, Write-Ahead-Logs \(WAL\), or also called [Change Data Capture (CDC)](../cdc.md). The current behavior of **Incremental** is not able to handle source schema changes yet, for example, when a column is added, renamed or deleted from an existing table etc. It is recommended to trigger a [Full refresh - Overwrite](full-refresh-overwrite.md) to correctly replicate the data to the destination with the new schema changes. diff --git a/docs/understanding-airbyte/connections/incremental-deduped-history.md b/docs/understanding-airbyte/connections/incremental-deduped-history.md index 6dc9b2e1e1e41..37845122546d6 100644 --- a/docs/understanding-airbyte/connections/incremental-deduped-history.md +++ b/docs/understanding-airbyte/connections/incremental-deduped-history.md @@ -30,65 +30,53 @@ As mentioned above, the delta from a sync will be _appended_ to the existing his Assume that `updated_at` is our `cursor_field` and `name` is the `primary_key`. Let's say the following data already exists into our data warehouse. -```javascript -[ - { "name": "Louis XVI", "deceased": false, "updated_at": 1754 }, - { "name": "Marie Antoinette", "deceased": false, "updated_at": 1755 } -] -``` +| name | deceased | updated_at | +| :--- | :--- | :--- | +| Louis XVI | false | 1754 | +| Marie Antoinette | false | 1755 | In the next sync, the delta contains the following record: -```javascript - { "name": "Louis XVII", "deceased": false, "updated_at": 1785 } -``` +| name | deceased | updated_at | +| :--- | :--- | :--- | +| Louis XVII | false | 1785 | At the end of this incremental sync, the data warehouse would now contain: -```javascript -[ - { "name": "Louis XVI", "deceased": false, "updated_at": 1754 }, - { "name": "Marie Antoinette", "deceased": false, "updated_at": 1755 }, - { "name": "Louis XVII", "deceased": false, "updated_at": 1785 } -] -``` +| name | deceased | updated_at | +| :--- | :--- | :--- | +| Louis XVI | false | 1754 | +| Marie Antoinette | false | 1755 | +| Louis XVII | false | 1785 | ### Updating a Record -Let's assume that our warehouse contains all the data that it did at the end of the previous section. Now unfortunately the king and queen lose their heads. Let's see that delta: +Let's assume that our warehouse contains all the data that it did at the end of the previous section. Now, unfortunately the king and queen lose their heads. Let's see that delta: -```javascript -[ - { "name": "Louis XVI", "deceased": true, "updated_at": 1793 }, - { "name": "Marie Antoinette", "deceased": true, "updated_at": 1793 } -] -``` +| name | deceased | updated_at | +| :--- | :--- | :--- | +| Louis XVI | true | 1793 | +| Marie Antoinette | true | 1793 | The output we expect to see in the warehouse is as follows: In the history table: -```javascript -[ - { "name": "Louis XVI", "deceased": false, "updated_at": 1754, "start_at": 1754, "end_at": 1793 }, - { "name": "Louis XVI", "deceased": true, "updated_at": 1793, "start_at": 1793, "end_at": NULL }, - - { "name": "Louis XVII", "deceased": false, "updated_at": 1785, "start_at": 1785, "end_at": NULL } - - { "name": "Marie Antoinette", "deceased": false, "updated_at": 1755, "start_at": 1755, "end_at": 1793 }, - { "name": "Marie Antoinette", "deceased": true, "updated_at": 1793, "start_at: 1793, "end_at": NULL } -] -``` +| name | deceased | updated_at | start_at | end_at | +| :--- | :--- | :--- | :--- | :--- | +| Louis XVI | false | 1754 | 1754 | 1793 | +| Louis XVI | true | 1793 | 1793 | NULL | +| Louis XVII | false | 1785 | 1785 | NULL | +| Marie Antoinette | false | 1755 | 1755 | 1793 | +| Marie Antoinette | true | 1793 | 1793 | NULL | In the final de-duplicated table: -```javascript -[ - { "name": "Louis XVI", "deceased": true, "updated_at": 1793 }, - { "name": "Louis XVII", "deceased": false, "updated_at": 1785 }, - { "name": "Marie Antoinette", "deceased": true, "updated_at": 1793 } -] -``` +| name | deceased | updated_at | +| :--- | :--- | :--- | +| Louis XVI | true | 1793 | +| Louis XVII | false | 1785 | +| Marie Antoinette | true | 1793 | ## Source-Defined Cursor @@ -134,33 +122,27 @@ select * from table where cursor_field > 'last_sync_max_cursor_field_value' Let's say the following data already exists into our data warehouse. -```javascript -[ - { "name": "Louis XVI", "deceased": false, "updated_at": 1754 }, - { "name": "Marie Antoinette", "deceased": false, "updated_at": 1755 } -] -``` +| name | deceased | updated_at | +| :--- | :--- | :--- | +| Louis XVI | false | 1754 | +| Marie Antoinette | false | 1755 | At the start of the next sync, the source data contains the following new record: -```javascript -[ - { "name": "Louis XVI", "deceased": true, "updated_at": 1754 }, -] -``` +| name | deceased | updated_at | +| :--- | :--- | :--- | +| Louis XVI | true | 1754 | At the end of the second incremental sync, the data warehouse would still contain data from the first sync because the delta record did not provide a valid value for the cursor field \(the cursor field is not greater than last sync's max value, `1754 < 1755`\), so it is not emitted by the source as a new or modified record. -```javascript -[ - { "name": "Louis XVI", "deceased": false, "updated_at": 1754 }, - { "name": "Marie Antoinette", "deceased": false, "updated_at": 1755 } -] -``` +| name | deceased | updated_at | +| :--- | :--- | :--- | +| Louis XVI | false | 1754 | +| Marie Antoinette | false | 1755 | Similarly, if multiple modifications are made during the same day to the same records. If the frequency of the sync is not granular enough \(for example, set for every 24h\), then intermediate modifications to the data are not going to be detected and emitted. Only the state of data at the time the sync runs will be reflected in the destination. -Those concerns could be solved by using a different sync mode based on binary logs, Write-Ahead-Logs \(WAL\), or also called **Incremental - Change Data Capture**. \(coming to Airbyte in the near future\). +Those concerns could be solved by using a different incremental approach based on binary logs, Write-Ahead-Logs \(WAL\), or also called [Change Data Capture (CDC)](../cdc.md). The current behavior of **Incremental** is not able to handle source schema changes yet, for example, when a column is added, renamed or deleted from an existing table etc. It is recommended to trigger a [Full refresh - Overwrite](full-refresh-overwrite.md) to correctly replicate the data to the destination with the new schema changes. diff --git a/docs/understanding-airbyte/namespaces.md b/docs/understanding-airbyte/namespaces.md index 35edeb00f2986..24310207d5294 100644 --- a/docs/understanding-airbyte/namespaces.md +++ b/docs/understanding-airbyte/namespaces.md @@ -16,7 +16,13 @@ If the Destination does not support namespaces, the [namespace field](https://gi ## Destination namespace configuration -As part of the [connections sync settings](connections/README.md), it is possible to configure the namespace used by destination connectors. Available options are: +As part of the [connections sync settings](connections/README.md), it is possible to configure the namespace used by: +1. destination connectors: to store the `_airbyte_raw_*` tables. +2. basic normalization: to store the final normalized tables. + +Note that custom transformation outputs are not affected by the namespace settings from Airbyte: It is up to the configuration of the custom dbt project, and how it is written to handle its [custom schemas](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/using-custom-schemas). The default target schema for dbt in this case, will always be the destination namespace. + +Available options for namespace configurations are: ### - Mirror source structure