New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-346] [Bug] Snapshot never succeeds (but does all the work successfully) #4853
Comments
I think I am having the same issue here. |
Hey @MaxKrog and @actonmarketing sorry to hear this happening. Here are some questions that might be able to help us figure out what is going on here.
|
This issue is only happening in snapshots for me, I have yet to have a snapshot run correctly. Here are the details from a recently failed snapshot run:
|
Hi @ChenyuLInx, This only affects snapshots, no other models are affected. Absolutely. These are the queries ran in BigQuery: All 4 completed, with runtimes:
|
Sorry that it took a while for us to get back on this.
This is when you mannually cancel the run right? And you are also running on BigQuery? @MaxKrog From your comment I think we are almost certain that for some reason dbt missed the query finish and was just thinking that the query is still ongoing, I am going to try to reproduce this on my side and troubleshoot it. |
I was able to reproduce this locally, the thing that fixed it for my test was to remove the description in config. I don't think |
RCA: when snapshot finish it would fire an event , then in |
@ChenyuLInx I don't manually cancel the run, it times out after ~24 hours and cancels itself. :( I am not running on BigQuery, it's Postgres SQL through an Amazon AWS RDS database. Here's a sample of my snapshot file, I don't have a
|
I had one of my snapshots run successfully, but two other snapshots fail with a "EOF Detected". Working snapshot:
Failed Snapshot:
|
@actonmarketing Sorry I didn't realize your issue is actually different.
Are these two describing the same failure? The time out one sounds a lot like something broke in a thread and we didn't handle it properly. The proper way to solve this kind of issues is to fix #4357. @jtcohen6 should we try to prioritize it? In the meantime, @actonmarketing looks like your case is tricky to reproduce on my side, can you try to set this value to |
@ChenyuLInx yes, those two errors are the same failure. I don't see a |
@actonmarketing sorry that it took me a while to get back to this! I was thinking of maybe you can setup
Thanks again for reporting this! |
@ChenyuLInx I am still struggling to get dbt running locally, so here are the logs, some sample data, and some steps that I take. The logs are from a recently failed dbt snapshot job. Note, the sample data is only 500 lines whereas my opp_source_xf table has |
@actonmarketing I am sorry to hear that.
Is there any additional information on that popup? And do you know which software that popup came from? Another thing I can see from the log is that the snapshot query didn't seem to finish. Have you tried to run that query against your data warehouse directly and see what happens? |
@ChenyuLInx the popup is in the dbt Web GUI. I can get a screenshot if that is helpful. How would I go about running it directly against my data warehouse? |
@actonmarketing I took a look at the log again and noticed something new. The 4 snapshots that didn't succeed got cancelled roughly after 24 hours.
And I am guessing that you are running this on cloud IDE? I remember there might be a hard limit of 24 hrs for a command to run so it might get canceled because of that. I forwarded the issue to the customer support team but you might want to contact them directly also to get directly in touch with them. |
@ChenyuLInx I have been using the cloud IDE, however I FINALLY got dbt running locally, and am running the snapshot command there to see what happens. Thank you for forwarding this to support, I will definitely reach out as well. |
@ChenyuLInx here are the result from running the above snapshot command locally: dbt snapshot -s sfdc_opportunity_snapshot |
Hey @actonmarketing, sorry to get back to you late, and thanks for taking the effort to run this locally. From what I gathered from @jeremyyeo (Thanks!!). |
@ChenyuLInx do I need to contact support directly to get that limit increased? My thought is that once the backfill job runs (one time) then the incremental snapshots will be easy. There's just a LOT of data to process at once for the job. I can also temporarily increase the data limits on my database, so that both sides of the coin are working together. |
@actonmarketing yes that would be great! Another thing you can also try is to run those snapshot models locally once since you already have the local dbt setup. And maybe try running 1 snapshot at a time could be helpful? I am not super familiar with the Postgres architecture but I would imagine if you run all of the big snapshot models all together that could create a lot of stress on the instance and increase the chance of them failing. |
@ChenyuLInx if I run them locally, how do I get it to sync back to the cloud? (forgive me, sort of new at all of this). |
@actonmarketing No problem!! I don't think you will need to sync anything to the cloud again. And some tips from @dbeatty10
|
Nice tip @dbeatty10. Ftr:
This tells me that the Postgres/RDS instance has got a Possibly what is could be happening in the dbt Cloud side is that this termination is not handled properly so from dbt Cloud's job perspective - it keeps waiting for the job to complete - running into the dbt Cloud job max of 24 hours and then getting cancelled. |
@jeremyyeo thank you for all your help. @dbeatty10 - great idea, testing now! I checked in RDS and my I am testing smaller runs with LIMITS coded into the snapshots and testing some of the other snapshots. One of them ran successfully, so I know the snapshots as a concept are working. :) Thank you all for your help and I will report back with the results. |
Status Update:
It still won't run... I validated in a postgres console that the dbt support upped my query timeout to 48 hours for the next week, so I am running snapshots locally in git console, to see if I can generate some logs or successes. Thanks again for all of your patience with this and your help group! |
So - there was something faulty with the original snapshot files. I created direct copies of them (without limits) and they ran just fine in a few seconds. I guess this case is closed for me. Thanks for all of the help. |
@actonmarketing glad it works out!!
|
@ChenyuLInx |
Thanks @actonmarketing I am going to close this we have everything resolved |
Is there an existing issue for this?
Current Behavior
When i run
dbt snapshot
my snapshot model is run, but the cli never moves on to the following task. As far as i can tell all the snapshot work is done successfully (the snapshot table is appended to etc).This happens both locally and on dbt cloud. Exact same behaviour.
Expected Behavior
I expect the cli to either successfully take me back to an empty prompt, or continue with the next command.
Steps To Reproduce
Wait an eternity (until a timeout of some sort, in dbt cloud i think it's 13 hours), until the job is cancelled.
Relevant log output
Environment
What database are you using dbt with?
bigquery
Additional Context
Snapshot model code:
The text was updated successfully, but these errors were encountered: