-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry dags folder upload, if upload fails (except failures due to client side issues). Total wait time should be around 2 minutes #1510
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #1510 +/- ##
==========================================
+ Coverage 85.92% 85.98% +0.05%
==========================================
Files 112 112
Lines 14838 14885 +47
==========================================
+ Hits 12750 12799 +49
+ Misses 1252 1251 -1
+ Partials 836 835 -1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment, looks good otherwise
pkg/fileutil/files.go
Outdated
} | ||
for i := 1; i <= args.MaxTries; i++ { | ||
// exponential backoff | ||
time.Sleep(time.Duration(retryDelayInMS) * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to sleep on the first try itself 😕?
Also, might be good to log that we would be sleeping for x seconds before retrying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1st point I'll address.
2nd point: we discussed it but decided that we don't want to expose any extra logs to the user. We are already showing "please wait, attempting to upload the dags"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO showing a Retrying...
type of message to the user would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielhoherd We are already showing "please wait, attempting to upload the dags"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious...why 7 times?
Also wondering if for whatever reason the api returns 500, does it makes sense to try 7 times per deploy, per deployment? Would this be generating a huge volume of alerts for us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kushalmalani This count was selected because, as per @pgvishnuram , the dag-server should come online in about 2 minutes. Also, we don't want to keep the user waiting for longer. I've changed the description for better clarity.
We don't have any alerting on the dag-server component.
If you performed functional testing locally, can you attach some screenshots for confidence? |
Done, added |
…er (#1518) * astro deploy should upload the current directory dags to the dag server * Fixed failing test * Fixed failing test * Added a couple of tests * Added houstonMock.AssertExpectations(t)
Description
Retry dags folder upload, if upload fails (except failures due to client side issues). Total wait time should be around 2 minutes
🎟 Issue(s)
Related issue
https://github.com/astronomer/issues/issues/6110
🧪 Functional Testing
Locally
📸 Screenshots
📋 Checklist
make test
before taking out of draftmake lint
before taking out of draft