Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry dags folder upload, if upload fails (except failures due to client side issues). Total wait time should be around 2 minutes #1510

Merged
merged 8 commits into from
Feb 2, 2024

Conversation

rujhan-arora-astronomer
Copy link
Contributor

@rujhan-arora-astronomer rujhan-arora-astronomer commented Jan 31, 2024

Description

Retry dags folder upload, if upload fails (except failures due to client side issues). Total wait time should be around 2 minutes

🎟 Issue(s)

Related issue
https://github.com/astronomer/issues/issues/6110

🧪 Functional Testing

Locally

📸 Screenshots

Screenshot 2024-02-02 at 1 40 33 PM

📋 Checklist

  • Rebased from the main (or release if patching) branch (before testing)
  • Ran make test before taking out of draft
  • Ran make lint before taking out of draft
  • Added/updated applicable tests
  • Tested against Astro-API (if necessary).
  • Tested against Houston-API and Astronomer (if necessary).
  • Communicated to/tagged owners of respective clients potentially impacted by these changes.
  • Updated any related documentation

@rujhan-arora-astronomer rujhan-arora-astronomer marked this pull request as ready for review January 31, 2024 13:09
@rujhan-arora-astronomer rujhan-arora-astronomer changed the title Retry dags folder upload 7 times, if upload fails Retry dags folder upload 7 times, if upload fails (except failures due to client side issues) Jan 31, 2024
Copy link

codecov bot commented Jan 31, 2024

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (5667296) 85.92% compared to head (5f225e2) 85.98%.

Files Patch % Lines
pkg/fileutil/files.go 96.00% 1 Missing and 1 partial ⚠️
software/deploy/deploy.go 95.83% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1510      +/-   ##
==========================================
+ Coverage   85.92%   85.98%   +0.05%     
==========================================
  Files         112      112              
  Lines       14838    14885      +47     
==========================================
+ Hits        12750    12799      +49     
+ Misses       1252     1251       -1     
+ Partials      836      835       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@danielhoherd danielhoherd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@neel-astro neel-astro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment, looks good otherwise

}
for i := 1; i <= args.MaxTries; i++ {
// exponential backoff
time.Sleep(time.Duration(retryDelayInMS) * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to sleep on the first try itself 😕?
Also, might be good to log that we would be sleeping for x seconds before retrying

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1st point I'll address.
2nd point: we discussed it but decided that we don't want to expose any extra logs to the user. We are already showing "please wait, attempting to upload the dags"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO showing a Retrying... type of message to the user would be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielhoherd We are already showing "please wait, attempting to upload the dags"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious...why 7 times?

Also wondering if for whatever reason the api returns 500, does it makes sense to try 7 times per deploy, per deployment? Would this be generating a huge volume of alerts for us?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kushalmalani This count was selected because, as per @pgvishnuram , the dag-server should come online in about 2 minutes. Also, we don't want to keep the user waiting for longer. I've changed the description for better clarity.
We don't have any alerting on the dag-server component.

pkg/fileutil/files.go Outdated Show resolved Hide resolved
@kushalmalani
Copy link
Contributor

If you performed functional testing locally, can you attach some screenshots for confidence?

@rujhan-arora-astronomer
Copy link
Contributor Author

If you performed functional testing locally, can you attach some screenshots for confidence?

Done, added

@rujhan-arora-astronomer rujhan-arora-astronomer changed the title Retry dags folder upload 7 times, if upload fails (except failures due to client side issues) Retry dags folder upload 8 times, if upload fails (except failures due to client side issues) Feb 2, 2024
@rujhan-arora-astronomer rujhan-arora-astronomer changed the title Retry dags folder upload 8 times, if upload fails (except failures due to client side issues) Retry dags folder upload, if upload fails (except failures due to client side issues). Total wait time should be around 2 minutes Feb 2, 2024
…er (#1518)

* astro deploy should upload the current directory dags to the dag server

* Fixed failing test

* Fixed failing test

* Added a couple of  tests

* Added houstonMock.AssertExpectations(t)
@kushalmalani kushalmalani merged commit ddc6cf1 into main Feb 2, 2024
4 of 5 checks passed
@kushalmalani kushalmalani deleted the 6110 branch February 2, 2024 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants