Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot retry when using :as :stream option #38

Closed
bhb opened this issue Sep 28, 2021 · 13 comments
Closed

Cannot retry when using :as :stream option #38

bhb opened this issue Sep 28, 2021 · 13 comments

Comments

@bhb
Copy link

bhb commented Sep 28, 2021

(require '[babashka.curl :as curl])

;; For context, this takes around 8s and throws error with 500 status, as expected. 
;; This is just an example of how it works normally.
(curl/get "https://httpstat.us/500" {:raw-args ["--retry" "3"] :debug true})

;; #### Repro of bug ######
;; Actual: Takes around 300ms and throws error with 500 status
;; Expected: Should retry 3 times, which would take about 8s total
(curl/get "https://httpstat.us/500" {:as :stream :raw-args ["--retry" "3"] :debug true})

;; For comparison
;; curl -N --retry 3 https://httpstat.us/500
;; retries as expected
;; (-N is the option added here
;; https://github.com/babashka/babashka.curl/blob/e1f33ffa22728553bf2242db5e2a4ca349fab04b/src/babashka/curl.clj#L126 )    
@borkdude
Copy link
Collaborator

Thanks, I'll have a look soon. In the newest bb we now also support the JDK 11 Http Client. And otherwise you could shell out to curl directly as a workaround.

@borkdude
Copy link
Collaborator

borkdude commented Oct 5, 2021

@bhb I can't repro this. I added the options :silent false and :err :inherit for debugging. With those I get:

user=> (require '[babashka.curl :as curl])
nil
user=> (curl/get "https://httpstat.us/500" {:err :inherit :silent false :raw-args ["--retry" "3"]})
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    25  100    25    0     0     53      0 --:--:-- --:--:-- --:--:--    53
Warning: Transient problem: HTTP error Will retry in 1 seconds. 3 retries
Warning: left.
100    25  100    25    0     0    110      0 --:--:-- --:--:-- --:--:--   110
Warning: Transient problem: HTTP error Will retry in 2 seconds. 2 retries
Warning: left.
100    25  100    25    0     0     66      0 --:--:-- --:--:-- --:--:--    66
Warning: Transient problem: HTTP error Will retry in 4 seconds. 1 retries
Warning: left.
100    25  100    25    0     0    150      0 --:--:-- --:--:-- --:--:--   150
Execution error (ExceptionInfo) at babashka.curl/request (curl.clj:237).
babashka.curl: status 500

@borkdude
Copy link
Collaborator

borkdude commented Oct 5, 2021

@bhb Could you try the above too to see if the same happens for you, with the recent commit on master?

@bhb
Copy link
Author

bhb commented Oct 6, 2021

@borkdude What do you see if you add :as :stream to the args above?

Could you try the above too to see if the same happens for you, with the recent commit on master?

If I adapt your code above to add :as :stream, I get the following (apologies if I'm not doing the right thing to use the latest version, I think the following should work)

First, I tried the version packaged in Babashka:

;; bb --version
;; babashka v0.6.1

(require '[babashka.curl :as curl])
(curl/get "https://httpstat.us/500" {:as :stream :err :inherit :silent false :raw-args ["--retry" "3"]})

Then I tried to include the latest babashka.curl

(require '[babashka.deps :as deps])
(deps/add-deps '{:deps {io.github.babashka/babashka.curl {:git/sha "75389c5ae0a084b33529190d654b71e021239fa4"}}})
(require '[babashka.curl :as curl])
(curl/get "https://httpstat.us/500" {:as :stream :raw-args ["--retry" "3"] :debug true})

In both cases, I get a 500 immediately and I don't see the output you posted above.

@borkdude
Copy link
Collaborator

borkdude commented Oct 6, 2021

@bhb I see. I think the problem was that bb.curl throws as soon as it discovers that the status code is 500-ish. Can you try your request with :throw false? I suspect this will fix your issue.

@bhb
Copy link
Author

bhb commented Oct 7, 2021

@borkdude In my tests, using :throw false will return the map with the status code, but it still happens immediately i.e. without retrying.

To summarize, here are my results

(require '[babashka.deps :as deps])
(deps/add-deps '{:deps {io.github.babashka/babashka.curl {:git/sha "75389c5ae0a084b33529190d654b71e021239fa4"}}})
(require '[babashka.curl :as curl])

;; With no ':as :stream', this will retry 3 times, then throw (as expected)
(curl/get "https://httpstat.us/500" {:err :inherit
                                     :silent false
                                     :raw-args ["--retry" "3"]})

;; With no ':as :stream' and ':throw false' this will retry 3 times, then return map that contains status code (as expected)
(curl/get "https://httpstat.us/500" {:throw false
                                     :err :inherit
                                     :silent false
                                     :raw-args ["--retry" "3"]})


;;; Unexpected behavior below

;; With ':as :stream', this immediately throws a 500 without retry
(curl/get "https://httpstat.us/500" {:as :stream
                                     :err :inherit
                                     :silent false
                                     :raw-args ["--retry" "3"]})

;; With ':as :stream' and ':throw false', this immmediately returns a map with a status code (without retrying)
(curl/get "https://httpstat.us/500" {:as :stream
                                     :throw false
                                     :err :inherit
                                     :silent false
                                     :raw-args ["--retry" "3"]})

@borkdude
Copy link
Collaborator

borkdude commented Oct 7, 2021

@bhb

I would say the but-last expression is unexpected but an edge case: the :status key is filled in from the first http status code it finds in the headers and when this is 500-ish, it will throw by default. You can suppress this with :throw false. Let's assume that you'll have to use this for your use case.

Then with the last expression. It returns immediately, because this is what :stream is for: it won't wait until the process is finished, e.g. for consuming a stream of events.

If you def your last expression like (def response ...) and then do (slurp (:body resp)) you'll see this:
"500 Internal Server Error500 Internal Server Error500 Internal Server Error500 Internal Server Error"

which I think indicates that there were three retries.

But perhaps this is not your use case and you just want a blocking request and want the bytes as a result? Could you clarify more context around your use case?

Perhaps an easier way for you to work with this is call curl manually using babashka.process/process:

 (def stream (:out (babashka.process/process "curl https://avatars.githubusercontent.com/u/64927540?s=400&u=644e8997a671220cb729260cfbf678d9cfddaa1b&v=4 --retry 3" {:err :inherit})))
(io/copy stream (io/file "icon.png"))

But let me know if you want a blocking bytes call because this is pretty easy to support with :as :bytes.

@bhb
Copy link
Author

bhb commented Oct 14, 2021

But perhaps this is not your use case and you just want a blocking request and want the bytes as a result? Could you clarify more context around your use case?

Great question - I probably should have started with this 😄 .

I am using babashka.curl to download a file that has been stored in the cloud. If possible, it would be ideal if I could just read this file directly into memory and process it without needing to write to a separate file and then re-read it.

Unfortunately sometimes I get transient errors like 500s or 503s when I try to download the file.

The two possible outcomes for my script are:

  1. The file is downloaded and I can proceed with processing. I don't care about the errors.
  2. After N retries, the file still has not been downloaded and I have to abort the entire script. In this case, I care about the error details (getting the last error is sufficient).

So yes, ideally there would be a blocking call that gets the entire file but will also allow me to pass --retry to curl so that the call doesn't fail if we get back e.g. a single 503.

I've worked around this by writing my own retry logic on top of babashka.curl, but it seems like it would be cleaner to just let curl take care of this.

Perhaps an easier way for you to work with this is call curl manually

That's a good idea, I can definitely do that!

@bhb
Copy link
Author

bhb commented Oct 14, 2021

But let me know if you want a blocking bytes call because this is pretty easy to support with :as :bytes.

Hm, maybe that's exactly what I need. The file I'm downloading is gzipped json, so I can't do anything meaningful with it until it's completely downloaded. I must have missed this option previously. My mistake!

@borkdude
Copy link
Collaborator

borkdude commented Oct 14, 2021

@bhb That option isn't yet there, it was a proposal. Will respond on the rest soon.

@borkdude
Copy link
Collaborator

@bhb OK, after reading your feedback, I think we can just support :bytes which is a blocking call to get the bytes from your downloaded file with support for retries.

@bhb
Copy link
Author

bhb commented Oct 15, 2021

@borkdude That sounds great. Thanks so much!

borkdude added a commit that referenced this issue Nov 24, 2021
@borkdude
Copy link
Collaborator

I implemented the :as :bytes option now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants