Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc explains normalization full-refresh implications #6097

Merged
merged 3 commits into from Sep 16, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/faq/data-loading.md
Expand Up @@ -6,6 +6,17 @@ It can take a while for Airbyte to load data into your destination. Some sources
data we can sync in a given time. Large amounts of data in your source can also make the initial sync take longer. You can check your
sync status in your connection detail page that you can access through the destination detail page or the source one.

## **Why my final tables are being recreated everytime?**

Airbyte ingests data into raw tables and applies the process of normalization if you selected it in the connection page.
The normalization runs a full refresh each sync and for some destinations like Snowflake, Redshift, Bigquery this may incur more
resource consumption and more costs. You need to pay attention to the frequency that you're retrieving your data to avoid issues.
For example, if you create a connection to sync every 5 minutes with incremental sync on, it will only retrieve new records into the raw tables but will apply normalization
to *all* the data in every sync! If you have tons of data, this may not be the right sync frequency for you.

There is a [Github issue](https://github.com/airbytehq/airbyte/issues/4286) to implement normalization using incremental, which will reduce
costs and resources in your destination.

## **What happens if a sync fails?**

You won't lose data when a sync fails, however, no data will be added or updated in your destination.
Expand Down
7 changes: 7 additions & 0 deletions docs/quickstart/set-up-a-connection.md
Expand Up @@ -42,3 +42,10 @@ This is just the beginning of using Airbyte. We support a large collection of so
If you have any questions at all, please reach out to us on [Slack](https://slack.airbyte.io/). We’re still in alpha, so if you see any rough edges or want to request a connector you need, please create an issue on our [Github](https://github.com/airbytehq/airbyte) or leave a thumbs up on an existing issue.

Thank you and we hope you enjoy using Airbyte.


{% hint style="warning" %}
At the moment, Airbyte runs a full-refresh to recreate the final tables. This can cause more costs in some destinations like Snowflake, Redshidt, and Bigquery.
To understand better what sync mode and frequency you should select, read [this doc](../understanding-airbyte/connections/README.md).
There is a FAQ section that more extensively explains the cost issue [here](../faq/data-loading.md#why-my-final-tables-are-being-recreated-everytime).
{% endhint %}