New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
馃悰 Airbyte Core: Large schema fetching failure #4564
Comments
@sherifnada @po3na4skld my initial guess seem right that this is the same issue we have in https://github.com/airbytehq/airbyte/pull/4175/files |
@sherifnada updated the title and the description |
Related oncall issues: |
Is this still an issue? |
Yes, unless someone explicitly fixed it recently. I haven't seen any related changes or tests. |
Possible Solutions:
There's some good additional information (Temporal message size for one) in #3943, which has been closed as a duplicate to this issue. |
@evantahler you could also add support for |
Linking #15888 which is part of the way we solve this problem |
Any update on this or is the process still to create a seperate user with limited scope? |
cc @malikdiarra, as the Compose team is looking into this |
+1 on this. Also having the same issue trying to fetch 1200+ tables on Oracle DB. |
+1 Even I am having issue for MYSQL DB when it is trying discover_schema: 502 Bad Gateway. I had 1000+ tables in DB. |
@malikdiarra Any update for this? |
Same boat as @arthurbarros Any news on this? I tried everything and no environment variables is changing the fact that Discovery fails after 5 minutes. I also tried setting the worker's missing environment variables in
There's also the I also tried setting Using Airbyte |
@malikdiarra are you'll looking into this case or is it on hold? |
The only workaround I found working, is to create multiple users on Oracle DB and give permission to list just a subset of tables. With that have multiple Oracle DB connections for each of those users. It's ugly but works. |
cc @pmossman - we've made some discovery/temporal improvments lately |
@evantahler Such as? |
Part of the problem here was that until recently, the Airbyte platform could not handle discovered catalogs over ~4mb, due to a limitation in Temporal. We've changed up how we pass information between jobs recently which might help alleviate this. |
Currently, for me on cloud, discover_schema api gave below response (final line): We had 2500+ tables on MYSQL. But after that screen is freezed: For now, only solution which works is create multiple users on DB with limited access as @arthurbarros said. Any dates when will changes be released to stable @evantahler |
@surajmaurya14 the change to how we pass Temporal data is live on Cloud, but there may be another bottleneck somewhere in our system when handling such a large catalog. Could you share your Cloud workspace ID and source name where you see the frozen screen so we can investigate where the bottleneck is? (Feel free to email it to me at |
Wrote an email to you @pmossman |
Thanks @surajmaurya14, I was able to reproduce the issue and investigate our Temporal cluster while the discovery job was running to see where the failure originated from. In this case, we set a hard cap of 9 minute execution time for Discover jobs in Airbyte Cloud. I think this catalog is so large that it is taking longer than 9 minutes to generate, so Temporal terminates the job at the 9 minute mark before it finishes. I can follow up with a few folks internally to see if we can either: |
@surajmaurya14 I passed along this feedback to our database sources team, and they recommended we try increasing the 9 minute timeout to 30 minutes to see if your use case eventually succeeds. I made this change today, so our Temporal workers in Airbyte Cloud will now keep Discover jobs running up to 30 minutes before terminating. Obviously this is just a stop gap and it'd be ideal to optimize this source for cases where we have thousands of tables, but I'm hoping this unblocks you and gives us some more insight into where the bottleneck may be. Can you give things another try and let me know how it goes? If the job still freezes/times out after 30 minutes, we'll likely need to do more investigation into the particular source connector to see where things are getting stuck. |
@pmossman Server temporarily unavailable error. |
Thanks @surajmaurya14, I did some more digging and here's what I found: Your discover catalog jobs are now able to finish in Temporal, so the increase to 30 minutes on the Temporal side helped. I also observed 502 errors before the 10 minute mark is reached, which seems to correspond with new code deploys that cause server pods to restart. So even if we could raise the maximum request time from 10 minutes to 30 minutes, we deploy code so often that there's a high likelihood the server would drop the request before it could complete. We have an ongoing project to convert the Discover API to an async model, so that our server no longer needs to keep an active thread open for the entire duration of the Discover job. This project should address the fundamental issue at hand here, I'll make a note to tag you when it lands so we can make sure your use case is finally unblocked. Thanks for the back and forth here, I know it must be frustrating that the app isn't working for you now but this iteration is extremely helpful for improving the platform and I really appreciate your involvement! |
Same issue here, |
Same here with an Oracle Database |
Same here, we have tried increasing temporal message limit and http timeouts to no effect. Trying to sync MSSQL to Databricks. Though we have an old version of Airbyte(0.44.2) running and don't have this issue on that version with the same MSSQL Database and Databricks. Both are running on K8s |
Also experiencing this issue with large Oracle db |
Hey @evantahler is there any ETA? :) thanks |
cc @davinchia |
Enviroment
Current Behavior
When fetching the large schema from the source, it fails with the next error I found in doker-compose up logs output:
It has nothing to do with the source itself. I used this just to catch this issue
It runs unlit the end of the daily request quota. Then schema fetching fails due to limits.
Expected Behavior
The size of the schema shouldn't affect the UI work.
Logs
error_log.txt
Steps to Reproduce
I also tried to do this with filtered out streams and it worked well.
Are you willing to submit a PR?
No
The text was updated successfully, but these errors were encountered: