🐛 Airbyte Core: Large schema fetching failure #4564

po3na4skld · 2021-07-06T09:32:47Z

Enviroment

Airbyte version: 0.26.4-alpha
OS Version / Instance: macOS 11.2.3
Deployment: Docker
Source Connector and version: Salesforce 0.2.3
Destination Connector and version: Local CSV 0.2.7, BigQuery 0.3.7
Severity: High
Step where error happened: Setup new connection

Current Behavior

When fetching the large schema from the source, it fails with the next error I found in doker-compose up logs output:
It has nothing to do with the source itself. I used this just to catch this issue

org.glassfish.jersey.server.ServerRuntime$Responder writeResponse
SEVERE: An I/O error has occurred while writing a response message entity to the container output stream.

It runs unlit the end of the daily request quota. Then schema fetching fails due to limits.

Expected Behavior

The size of the schema shouldn't affect the UI work.

Logs

error_log.txt

Steps to Reproduce

docker compose up airbyte project
create source Salesforce
create local CSV or BigQuery destination
set up a connection
see the error in docker compose up command output

I also tried to do this with filtered out streams and it worked well.

Are you willing to submit a PR?

No

The text was updated successfully, but these errors were encountered:

keu · 2021-07-06T09:42:28Z

@sherifnada @po3na4skld my initial guess seem right that this is the same issue we have in https://github.com/airbytehq/airbyte/pull/4175/files
I think the fix can be the same

po3na4skld · 2021-07-06T16:02:34Z

@sherifnada updated the title and the description

grishick · 2022-07-01T18:29:55Z

Related oncall issues:

evantahler · 2022-08-01T21:33:51Z

Is this still an issue?

grishick · 2022-08-01T22:20:44Z

Is this still an issue?

Yes, unless someone explicitly fixed it recently. I haven't seen any related changes or tests.

evantahler · 2022-08-01T23:30:23Z

Possible Solutions:

We could fail, but explicitly (e.g. https://github.com/airbytehq/airbyte-internal-issues/issues/599)
We could increase the message size allowed as a temporal response (e.g. Discover schema fails when databases have too many tables. #3943 (comment))
We could move the large payloads elsewhere (jobs database?) and not pass these large responses through temporal

There's some good additional information (Temporal message size for one) in #3943, which has been closed as a duplicate to this issue.

keu · 2022-08-02T08:42:25Z

@evantahler you could also add support for $ref and optimize the payload

evantahler · 2022-09-28T15:55:40Z

Linking #15888 which is part of the way we solve this problem

cody-scott · 2023-09-20T14:27:58Z

Any update on this or is the process still to create a seperate user with limited scope?

evantahler · 2023-09-20T15:07:48Z

cc @malikdiarra, as the Compose team is looking into this

arthurbarros · 2023-11-11T20:52:16Z

Any update on this or is the process still to create a seperate user with limited scope?

+1 on this. Also having the same issue trying to fetch 1200+ tables on Oracle DB.

surajmaurya14 · 2023-11-17T05:17:01Z

+1 Even I am having issue for MYSQL DB when it is trying discover_schema: 502 Bad Gateway. I had 1000+ tables in DB.

surajmaurya14 · 2023-11-17T05:21:16Z

+1 Even I am having issue for MYSQL DB when it is trying discover_schema: 502 Bad Gateway. I had 1000+ tables in DB.

@malikdiarra Any update for this?

philippeboyd · 2023-11-20T22:43:38Z

Same boat as @arthurbarros

Any news on this? I tried everything and no environment variables is changing the fact that Discovery fails after 5 minutes.

I also tried setting the worker's missing environment variables in .env and linking them in docker-compose.yaml for the worker

# Worker
ACTIVITY_CHECK_TIMEOUT=15
ACTIVITY_DISCOVERY_TIMEOUT=30

There's also the BASIC_AUTH_PROXY_TIMEOUT environment variable (seen in linked issue #19201) for nginx timeouts, but that's set to 900 so 15 minutes.

I also tried setting WORKLOAD_API_READ_TIMEOUT_SECONDS=1200 with no success;

Using Airbyte v0.50.34

surajmaurya14 · 2023-11-30T13:56:16Z

@malikdiarra are you'll looking into this case or is it on hold?

arthurbarros · 2023-12-12T16:58:27Z

Same boat as @arthurbarros

Any news on this? I tried everything and no environment variables is changing the fact that Discovery fails after 5 minutes.

I also tried setting the worker's missing environment variables in .env and linking them in docker-compose.yaml for the worker
# Worker
ACTIVITY_CHECK_TIMEOUT=15
ACTIVITY_DISCOVERY_TIMEOUT=30
There's also the BASIC_AUTH_PROXY_TIMEOUT environment variable (seen in linked issue #19201) for nginx timeouts, but that's set to 900 so 15 minutes.

I also tried setting WORKLOAD_API_READ_TIMEOUT_SECONDS=1200 with no success;

Using Airbyte v0.50.34

The only workaround I found working, is to create multiple users on Oracle DB and give permission to list just a subset of tables. With that have multiple Oracle DB connections for each of those users.

It's ugly but works.

evantahler · 2023-12-12T17:42:03Z

cc @pmossman - we've made some discovery/temporal improvments lately

philippeboyd · 2023-12-12T18:24:59Z

@evantahler Such as?
I'm looking at the release changelogs, what kind of improvements are we looking for?

evantahler · 2023-12-12T19:22:16Z

Part of the problem here was that until recently, the Airbyte platform could not handle discovered catalogs over ~4mb, due to a limitation in Temporal. We've changed up how we pass information between jobs recently which might help alleviate this.

surajmaurya14 · 2023-12-13T07:17:13Z

Currently, for me on cloud, discover_schema api gave below response (final line):
Discover primary keys for tables: [.....]

We had 2500+ tables on MYSQL.

But after that screen is freezed:

For now, only solution which works is create multiple users on DB with limited access as @arthurbarros said.

Any dates when will changes be released to stable @evantahler

pmossman · 2024-01-02T17:04:47Z

@surajmaurya14 the change to how we pass Temporal data is live on Cloud, but there may be another bottleneck somewhere in our system when handling such a large catalog.

Could you share your Cloud workspace ID and source name where you see the frozen screen so we can investigate where the bottleneck is? (Feel free to email it to me at parker@airbyte.io or message it to me on the Airbyte Slack, I'm @Parker Mossman there)

surajmaurya14 · 2024-01-05T07:23:51Z

@surajmaurya14 the change to how we pass Temporal data is live on Cloud, but there may be another bottleneck somewhere in our system when handling such a large catalog.

Could you share your Cloud workspace ID and source name where you see the frozen screen so we can investigate where the bottleneck is? (Feel free to email it to me at parker@airbyte.io or message it to me on the Airbyte Slack, I'm @Parker Mossman there)

Wrote an email to you @pmossman

pmossman · 2024-01-05T17:16:41Z

Thanks @surajmaurya14, I was able to reproduce the issue and investigate our Temporal cluster while the discovery job was running to see where the failure originated from.

In this case, we set a hard cap of 9 minute execution time for Discover jobs in Airbyte Cloud. I think this catalog is so large that it is taking longer than 9 minutes to generate, so Temporal terminates the job at the 9 minute mark before it finishes. I can follow up with a few folks internally to see if we can either:
(a): increase the 9 minute threshold to give cases like this more time
(b): investigate this particular source to see if there's an optimization we can make for large table counts (since 9+ minutes is obviously a poor user experience!)

pmossman · 2024-01-09T00:44:10Z

@surajmaurya14 I passed along this feedback to our database sources team, and they recommended we try increasing the 9 minute timeout to 30 minutes to see if your use case eventually succeeds.

I made this change today, so our Temporal workers in Airbyte Cloud will now keep Discover jobs running up to 30 minutes before terminating.

Obviously this is just a stop gap and it'd be ideal to optimize this source for cases where we have thousands of tables, but I'm hoping this unblocks you and gives us some more insight into where the bottleneck may be.

Can you give things another try and let me know how it goes? If the job still freezes/times out after 30 minutes, we'll likely need to do more investigation into the particular source connector to see where things are getting stuck.

surajmaurya14 · 2024-01-09T02:56:20Z

@pmossman Server temporarily unavailable error.
Wrote a reply on same email to you.

pmossman · 2024-01-09T19:36:36Z

Thanks @surajmaurya14, I did some more digging and here's what I found:

Your discover catalog jobs are now able to finish in Temporal, so the increase to 30 minutes on the Temporal side helped.
However, our Load Balancer is configured with a maximum request time of 10 minutes, which means that even if the Discover job eventually succeeds, the server issues a 502 before it can finish processing the network request.

I also observed 502 errors before the 10 minute mark is reached, which seems to correspond with new code deploys that cause server pods to restart. So even if we could raise the maximum request time from 10 minutes to 30 minutes, we deploy code so often that there's a high likelihood the server would drop the request before it could complete.

We have an ongoing project to convert the Discover API to an async model, so that our server no longer needs to keep an active thread open for the entire duration of the Discover job. This project should address the fundamental issue at hand here, I'll make a note to tag you when it lands so we can make sure your use case is finally unblocked.

Thanks for the back and forth here, I know it must be frustrating that the app isn't working for you now but this iteration is extremely helpful for improving the platform and I really appreciate your involvement!

pedrohsroque · 2024-04-02T17:28:02Z

Same issue here,
Trying to sync MySql to Snowflake

shmf · 2024-04-16T15:47:00Z

Same here with an Oracle Database

plenti-jacob-roe · 2024-04-18T05:54:35Z

Same here, we have tried increasing temporal message limit and http timeouts to no effect.

Trying to sync MSSQL to Databricks.

Though we have an old version of Airbyte(0.44.2) running and don't have this issue on that version with the same MSSQL Database and Databricks. Both are running on K8s

jonodutch · 2024-04-18T16:25:04Z

Also experiencing this issue with large Oracle db

shmf · 2024-04-18T18:12:53Z

Hey @evantahler is there any ETA? :) thanks

evantahler · 2024-04-18T18:23:47Z

cc @davinchia
There's no ETA as of yet, but dealing with large catalogs is on our near-term roadmap.

po3na4skld added the type/bug Something isn't working label Jul 6, 2021

po3na4skld mentioned this issue Jul 6, 2021

Source Salesforce: Full Sync fails Constantly #3914

Closed

po3na4skld self-assigned this Jul 6, 2021

po3na4skld changed the title ~~Source Salesforce: Schema fetching failure~~ Airbyte Core: Large schema fetching failure Jul 6, 2021

sherifnada added the area/platform issues related to the platform label Jul 6, 2021

po3na4skld removed their assignment Jul 8, 2021

karinakuz added connectors/destinations-files connectors/destination/local-csv labels Jan 13, 2022

bleonard added autoteam team/databases labels Apr 26, 2022

grishick added team/platform-move and removed autoteam labels Jul 1, 2022

grishick changed the title ~~Airbyte Core: Large schema fetching failure~~ 🐛 Airbyte Core: Large schema fetching failure Jul 1, 2022

evantahler added the platform-move/service label Aug 1, 2022

evantahler mentioned this issue Aug 1, 2022

Discover schema fails when databases have too many tables. #3943

Closed

grishick added the from/connector-ops label Sep 27, 2022

evantahler added team/compose and removed team/databases from/connector-ops labels Sep 28, 2022

mfsiega-airbyte added the team/platform-compose-discarded label Apr 25, 2023

bleonard added team/platform-hold and removed team/compose labels Apr 25, 2023

bleonard added the frozen Not being actively worked on label Mar 22, 2024

davinchia removed the frozen Not being actively worked on label Mar 25, 2024

evantahler assigned tryangul Apr 16, 2024

evantahler mentioned this issue Apr 16, 2024

Setting up MSSQL connection fails with a source with many (850+) tables #4546

Closed

airbytehq deleted a comment from evantahler Apr 16, 2024

evantahler assigned davinchia Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Airbyte Core: Large schema fetching failure #4564

🐛 Airbyte Core: Large schema fetching failure #4564

po3na4skld commented Jul 6, 2021 •

edited

keu commented Jul 6, 2021

po3na4skld commented Jul 6, 2021

grishick commented Jul 1, 2022

evantahler commented Aug 1, 2022

grishick commented Aug 1, 2022

evantahler commented Aug 1, 2022 •

edited

keu commented Aug 2, 2022

evantahler commented Sep 28, 2022

cody-scott commented Sep 20, 2023

evantahler commented Sep 20, 2023

arthurbarros commented Nov 11, 2023

surajmaurya14 commented Nov 17, 2023

surajmaurya14 commented Nov 17, 2023 •

edited

philippeboyd commented Nov 20, 2023 •

edited

surajmaurya14 commented Nov 30, 2023

arthurbarros commented Dec 12, 2023 •

edited

evantahler commented Dec 12, 2023

philippeboyd commented Dec 12, 2023

evantahler commented Dec 12, 2023

surajmaurya14 commented Dec 13, 2023

pmossman commented Jan 2, 2024 •

edited

surajmaurya14 commented Jan 5, 2024 •

edited

pmossman commented Jan 5, 2024 •

edited

pmossman commented Jan 9, 2024

surajmaurya14 commented Jan 9, 2024

pmossman commented Jan 9, 2024

pedrohsroque commented Apr 2, 2024

shmf commented Apr 16, 2024

plenti-jacob-roe commented Apr 18, 2024

jonodutch commented Apr 18, 2024

shmf commented Apr 18, 2024

evantahler commented Apr 18, 2024

🐛 Airbyte Core: Large schema fetching failure #4564

🐛 Airbyte Core: Large schema fetching failure #4564

Comments

po3na4skld commented Jul 6, 2021 • edited

Enviroment

Current Behavior

Expected Behavior

Logs

Steps to Reproduce

Are you willing to submit a PR?

keu commented Jul 6, 2021

po3na4skld commented Jul 6, 2021

grishick commented Jul 1, 2022

evantahler commented Aug 1, 2022

grishick commented Aug 1, 2022

evantahler commented Aug 1, 2022 • edited

keu commented Aug 2, 2022

evantahler commented Sep 28, 2022

cody-scott commented Sep 20, 2023

evantahler commented Sep 20, 2023

arthurbarros commented Nov 11, 2023

surajmaurya14 commented Nov 17, 2023

surajmaurya14 commented Nov 17, 2023 • edited

philippeboyd commented Nov 20, 2023 • edited

surajmaurya14 commented Nov 30, 2023

arthurbarros commented Dec 12, 2023 • edited

evantahler commented Dec 12, 2023

philippeboyd commented Dec 12, 2023

evantahler commented Dec 12, 2023

surajmaurya14 commented Dec 13, 2023

pmossman commented Jan 2, 2024 • edited

surajmaurya14 commented Jan 5, 2024 • edited

pmossman commented Jan 5, 2024 • edited

pmossman commented Jan 9, 2024

surajmaurya14 commented Jan 9, 2024

pmossman commented Jan 9, 2024

pedrohsroque commented Apr 2, 2024

shmf commented Apr 16, 2024

plenti-jacob-roe commented Apr 18, 2024

jonodutch commented Apr 18, 2024

shmf commented Apr 18, 2024

evantahler commented Apr 18, 2024

po3na4skld commented Jul 6, 2021 •

edited

evantahler commented Aug 1, 2022 •

edited

surajmaurya14 commented Nov 17, 2023 •

edited

philippeboyd commented Nov 20, 2023 •

edited

arthurbarros commented Dec 12, 2023 •

edited

pmossman commented Jan 2, 2024 •

edited

surajmaurya14 commented Jan 5, 2024 •

edited

pmossman commented Jan 5, 2024 •

edited