Skip to content

Commit

Permalink
Add issue about loading a lot of tables (#3975)
Browse files Browse the repository at this point in the history
* add issue about loading a lot of tables

* change feedback
  • Loading branch information
marcosmarxm committed Jun 9, 2021
1 parent b4793b2 commit 4881167
Show file tree
Hide file tree
Showing 4 changed files with 59 additions and 24 deletions.
49 changes: 37 additions & 12 deletions docs/faq/data-loading.md
Expand Up @@ -2,20 +2,31 @@

## **Why don’t I see any data in my destination yet?**

It can take a while for Airbyte to load data into your destination. Some sources have restrictive API limits which constrain how much data we can sync in a given time. Large amounts of data in your source can also make the initial sync take longer. You can check your sync status in your connection detail page that you can access through the destination detail page or the source one.
It can take a while for Airbyte to load data into your destination. Some sources have restrictive API limits which constrain how much
data we can sync in a given time. Large amounts of data in your source can also make the initial sync take longer. You can check your
sync status in your connection detail page that you can access through the destination detail page or the source one.

## **What happens if a sync fails?**

You won't loose data when a sync fails, however, no data will be added or updated in your destination.

Airbyte will automatically attempt to replicate data 3 times. You can see and export the logs for those attempts in the connection detail page. You can access this page through the Source or Destination detail page.
Airbyte will automatically attempt to replicate data 3 times. You can see and export the logs for those attempts in the connection
detail page. You can access this page through the Source or Destination detail page.

In the future, you will be able to configure a notification \(email, Slack...\) when a sync fails, with an option to create a GitHub issue with the logs. We’re still working on it, and the purpose would be to help the community and the Airbyte team fix the issue as soon as possible, especially if it is a connector issue.
You can configure a Slack webhook to warn you when a sync failed.

In the future you will be able to configuration other notification method (email, Sentry) and an option to create a
GitHub issue with the logs. We’re still working on it, and the purpose would be to help the community and the Airbyte team fix the
issue as soon as possible, especially if it is a connector issue.

Until Airbyte has this system in place, here is what you can do:

* File a GitHub issue: go [here](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=type%2Fbug&template=bug-report.md&title=) and file an issue with the detailed logs copied in the issue’s description. The team will be notified about your issue and will update it for any progress or comment on it.
* Fix the issue yourself: Airbyte is open source so you don’t need to wait for anybody to fix your issue if it is important to you. To do so, just fork the [GitHub project](http://github.com/airbytehq/airbyte) and fix the piece of code that need fixing. If you’re okay with contributing your fix to the community, you can submit a pull request. We will review it ASAP.
* File a GitHub issue: go [here](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=type%2Fbug&template=bug-report.md&title=)
and file an issue with the detailed logs copied in the issue’s description. The team will be notified about your issue and will update
it for any progress or comment on it.
* Fix the issue yourself: Airbyte is open source so you don’t need to wait for anybody to fix your issue if it is important to you.
To do so, just fork the [GitHub project](http://github.com/airbytehq/airbyte) and fix the piece of code that need fixing. If you’re okay
with contributing your fix to the community, you can submit a pull request. We will review it ASAP.
* Ask on Slack: don’t hesitate to ping the team on [Slack](https://slack.airbyte.io).

Once all this is done, Airbyte resumes your sync from where it left off.
Expand All @@ -24,14 +35,17 @@ We truly appreciate any contribution you make to help the community. Airbyte wil

## **Can Airbyte support 2-way sync i.e. changes from A go to B and changes from B go to A?**

Airbyte actually do not support this right now. There are some details around how we handle schema and tables names that isn't going to work for you in the current iteration.
Airbyte actually do not support this right now. There are some details around how we handle schema and tables names that isn't going to
work for you in the current iteration.
If you attempt to do a circular dependency between source and destination, you'll end up with the following
A.public.table_foo writes to B.public.public_table_foo to A.public.public_public_table_foo. You won't be writing into your original table, which I think is your intention.
A.public.table_foo writes to B.public.public_table_foo to A.public.public_public_table_foo. You won't be writing into your original table,
which I think is your intention.


## **What happens to data in the pipeline if the destination gets disconnected? Could I lose data, or wind up with duplicate data when the pipeline is reconnected?**

Airbyte is architected to prevent data loss or duplication. Airbyte will display a failure for the sync, and re-attempt it at the next syncing, according to the frequency you set.
Airbyte is architected to prevent data loss or duplication. Airbyte will display a failure for the sync, and re-attempt it at the next syncing,
according to the frequency you set.

## **How frequently can Airbyte sync data?**

Expand All @@ -43,7 +57,8 @@ While frequent data loads will give you more up-to-date data, there are a few re

* Higher API usage may cause you to hit a limit that could impact other systems that rely on that API.
* Higher cost of loading data into your warehouse.
* More frequent delays, resulting in increased delay notification emails. For instance, if the data source generally takes several hours to update but you choose five-minute increments, you may receive a delay notification every sync.
* More frequent delays, resulting in increased delay notification emails. For instance, if the data source generally takes several hours to
update but you choose five-minute increments, you may receive a delay notification every sync.

Generally is recommended setting the incremental loads to every hour to help limit API calls.

Expand All @@ -53,24 +68,34 @@ Unfortunately not yet.

## **Do you support change data capture \(CDC\) or logical replication for databases?**

Airbyte currently supports [CDC for Postgres and Mysql](../understanding-airbyte/cdc.md). Airbyte is adding support for a few other databases you can check in the roadmap.
Airbyte currently supports [CDC for Postgres and Mysql](../understanding-airbyte/cdc.md). Airbyte is adding support for a few other
databases you can check in the roadmap.

## Using incremental sync, is it possible to add more fields when some new columns are added to a source table, or when a new table is added?

For the moment, incremental sync doesn't support schema changes, so you would need to perform a full refresh whenever that happens.
Here’s a related [Github issue](https://github.com/airbytehq/airbyte/issues/1601).

## There is a limit of how many tables one connection can handle?

Yes, for more than 6000 thousand tables could be a problem to load the information on UI.

There are two Github issues about this limitation: [Issue #3942](https://github.com/airbytehq/airbyte/issues/3942)
and [Issue #3943](https://github.com/airbytehq/airbyte/issues/3943).

## **I see you support a lot of connectors – what about connectors Airbyte doesn’t support yet?**

You can either:

* Submit a [connector request](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=area%2Fintegration%2C+new-integration&template=new-integration-request.md&title=) on our Github project, and be notified once we or the community build a connector for it.
* Build a connector yourself by forking our [GitHub project](https://github.com/airbytehq/airbyte) and submitting a pull request. Here are the [instructions how to build a connector](../contributing-to-airbyte/building-new-connector/).
* Build a connector yourself by forking our [GitHub project](https://github.com/airbytehq/airbyte) and submitting a pull request. Here
are the [instructions how to build a connector](../contributing-to-airbyte/building-new-connector/).
* Ask on Slack: don’t hesitate to ping the team on [Slack](https://slack.airbyte.io).

## **What kind of notifications do I get?**

For the moment, the UI will only display one kind of notification: when a sync fails, Airbyte will display the failure at the source/destination level in the list of sources/destinations, and in the connection detail page along with the logs.
For the moment, the UI will only display one kind of notification: when a sync fails, Airbyte will display the failure at the source/destination
level in the list of sources/destinations, and in the connection detail page along with the logs.

However, there are other types of notifications:

Expand Down
21 changes: 16 additions & 5 deletions docs/troubleshooting/new-connection.md
Expand Up @@ -4,9 +4,7 @@ description: Common issues when trying to set up a new connection (source/destin

# Setting new connection

## Onboarding

### Airbyte is stuck while loading required configuration parameters for my connector
## Airbyte is stuck while loading required configuration parameters for my connector

Example of the issue:

Expand All @@ -24,11 +22,24 @@ One workaround is to manually pull the latest version of every connector you'll

If the above workaround does not fix your problem, please report it [here](https://github.com/airbytehq/airbyte/issues/1462) or in our [Slack](https://slack.airbyte.io).

### **Connection refused errors when connecting to a local db**
## Connection refused errors when connecting to a local db

Depending on your Docker network configuration, you may not be able to connect to `localhost` or `127.0.0.1` directly.

If you are running into connection refused errors when running Airbyte via Docker Compose on Mac, try using `host.docker.internal` as the host. On Linux, you may have to modify `docker-compose.yml` and add a host that maps to your local machine using [`extra_hosts`](https://docs.docker.com/compose/compose-file/compose-file-v3/#extra_hosts).

### I don’t see a form when selecting a connector
## I don’t see a form when selecting a connector

We’ve had that issue once. (no spinner & 500 http error). We don’t know why. Resolution: try to stop airbyte (`docker-compose down`) & restart (`docker-compose up`)

## Connection hangs when trying to run the discovery step

You receive the error below when you tried to sync a database with a lot of tables (6000 or more).

```bash
airbyte-scheduler | io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: grpc: received message larger than max (<NUMBER> vs. 4194304)
```
There are two Github issues tracking this problem: [Issue #3942](https://github.com/airbytehq/airbyte/issues/3942) and [Issue #3943](https://github.com/airbytehq/airbyte/issues/3943)

The workaround for this is trying to transfer the tables you really want to use to another namespace.
If you need all tables you should split them into separate namespaces and try to use two connections.
6 changes: 3 additions & 3 deletions docs/troubleshooting/on-deploy.md
Expand Up @@ -14,7 +14,7 @@ that should handle you getting reset to the beginning.
I would be curious if we can see the logs associated with the failure you are seeing. I would say if after you reset you run into it again we can debug that.


### I have run `docker-compose up` and can not access the interface
## I have run `docker-compose up` and can not access the interface

- If you see a blank screen and not a loading icon:

Expand Down Expand Up @@ -49,7 +49,7 @@ This command will create a file in the current directory. We advise you to send
If there is no error printed in both cases, we recommend running: `docker restart airbyte-server airbyte-scheduler` <br>
Wait a few moments and try to access the interface again.

### `docker.errors.DockerException`: Error while fetching server API version
## `docker.errors.DockerException`: Error while fetching server API version

If you see the following error:

Expand All @@ -64,6 +64,6 @@ It usually means that Docker isn't running on your machine \(and a running Docke
This happens (sometimes) on Windows system when you first install `docker`. You need to restart your machine.


### Getting a weird error related to setting up the Airbyte server when running Docker Compose -- wondering if this is because I played around with Airbyte in a past version?
## Getting a weird error related to setting up the Airbyte server when running Docker Compose -- wondering if this is because I played around with Airbyte in a past version?

If you are okay with losing your previous Airbyte configurations, you can run `docker-compose down -v` and that should fix things then `docker-compose up`.
7 changes: 3 additions & 4 deletions docs/troubleshooting/running-sync.md
@@ -1,7 +1,6 @@
# Syncing a connection


### One of your sync jobs is failing
## One of your sync jobs is failing

Several things to check:

Expand All @@ -10,7 +9,7 @@ Several things to check:

If the above workaround does not fix your problem, please report it [here](https://github.com/airbytehq/airbyte/issues/1462) or in our [Slack](https://slack.airbyte.io).

### Your incremental connection is not working
## Your incremental connection is not working

Our current version of incremental is [append](../understanding-airbyte/connections/incremental-append.md). It works from a cursor field. So you need to check which cursor field you're using and if it's well populated in every record in your table.

Expand All @@ -21,7 +20,7 @@ If this is true, then, there are still several things to check:

If the above workaround does not fix your problem, please report it [here](https://github.com/airbytehq/airbyte/issues/1462) or in our [Slack](https://slack.airbyte.io).

### **Airbyte says successful sync, but some records are missing**
## Airbyte says successful sync, but some records are missing

Several things to check:

Expand Down

0 comments on commit 4881167

Please sign in to comment.