Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback? #1

Open
alphapapa opened this issue Oct 22, 2019 · 5 comments
Open

Feedback? #1

alphapapa opened this issue Oct 22, 2019 · 5 comments
Assignees

Comments

@alphapapa
Copy link
Owner

@skeeto Hi Chris,

I finally decided to put together a "simple" HTTP library inspired by elfeed-curl.el. This doesn't implement much yet, but I wonder if you have any feedback on the basic design.

Probably the most notable issue at the moment is that the async version calls the callback with a parsed response struct,with the body as a string, rather than allowing the callback to work directly in the response buffer. I'll see about making that more flexible.

Also, unlike your library, it doesn't allow fetching of multiple URLs with a single curl invocation. Do you think that's important enough that I should implement that, or would the complexity not be generally worth it?

Thanks for any feedback.

@skeeto
Copy link

skeeto commented Oct 23, 2019 via email

@alphapapa
Copy link
Owner Author

Very thorough review. Thanks very much, Chris. Your guidance is invaluable.

@alphapapa alphapapa self-assigned this Oct 23, 2019
@alphapapa
Copy link
Owner Author

Since you're using make-process — something I should have done — you can skip coding-system-for-read and process-connection-type. You're already using :connection-type, so also use :coding.

Thanks, done. There are a lot of process-related details like that in Emacs that I haven't worked with before, so I appreciate your pointing them out.

I'm sure you realize this already, but I want to highlight it anyway (especially for anyone else following along): Only the "binary" coding system (or equivalent) makes sense since the input potentially contains multiple different encodings that need to be decoded individually.

I guessed that was the case; thanks for confirming it.

Oh, nevermind, I just noticed you're using call-process in the sync version, so I suppose you still need to dynamically bind these variables anyway!

See below...

The type of value returned by plz-get (very clever name by the way!) varies depending on the arguments. You can get away with this sort of thing in a dynamically typed language, but it would make no sense at all in a statically typed language. This is a guideline I know I've broken myself, and, when I have, I always regret it later: Dynamically typed programs should still be consistent about its types. I mean this in the deep sense. A function doesn't just consistently return a list, it returns a list of values of a specific type, and so on — the sort of consistently required by static typing. It's good practice.

Yeah, I was a little uncomfortable with that too, so I've followed your advice and split the synchronous calls into separate functions. There's a little bit of code reuse, but not too much, and maybe that can be improved later.

So plz-get always returns the curl process now, and plz-get-sync returns an object as selected with its :as argument.

Make it a hard rule for async that either plz-success or plz-error is called exactly once per request.

Thanks. That should be the case. If it ever isn't, I'll fix it! :)

It looks like you never call plz-error?...

Errors are now handled: by default, if no error callback is given, one of two errors are signaled:

  • When curl exits non-zero, plz-curl-error, which includes the specific curl error in a plz-error struct.
  • When curl exits zero but the HTTP code is not 200, plz-http-error, which includes a plz-error struct containing the parsed HTTP response as a plz-response struct. (I'm guessing this may need to handle certain other status codes differently, but it's a start.)

If an error handler is given with the ELSE argument, it's called instead of signaling an error, and the same plz-error struct is its only argument. So errors always signal with or call the handler with a plz-error struct. So, e.g.:

(plz-get-sync "https://httpbinnnnnn.org/get/status/404")
;; Signals an error with:
;; (plz-curl-error . #s(plz-error :curl-error (6 . "Couldn't resolve host. The given remote host was not resolved.") :response nil))

(plz-get-sync "https://httpbin.org/get/status/404")
;; Signals an error with:
;; (plz-http-error .
;;  #s(plz-error :curl-error nil
;;               :response #s(plz-response :version 1.1 :status 404 :headers
;;                                         (("Access-Control-Allow-Credentials" . "true")
;;                                          ("Access-Control-Allow-Origin" . "*")
;;                                          ("Content-Encoding" . "gzip")
;;                                          ("Content-Type" . "text/html")
;;                                          ("Date" . "Wed, 23 Oct 2019 10:39:09 GMT")
;;                                          ("Referrer-Policy" . "no-referrer-when-downgrade")
;;                                          ("Server" . "nginx")
;;                                          ("X-Content-Type-Options" . "nosniff")
;;                                          ("X-Frame-Options" . "DENY")
;;                                          ("X-XSS-Protection" . "1; mode=block")
;;                                          ("Content-Length" . "198")
;;                                          ("Connection" . "keep-alive"))
;;                                         :body "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\">\n<title>404 Not Found</title>\n<h1>Not Found</h1>\n<p>The requested URL was not found on the server.  If you entered the URL manually please check your spelling and try again.</p>\n")))

;; Async error handler:
(plz-get "https://httpbin.org/get/status/404"
  :as 'string
  :then (lambda (s)
          (message s))
  :else (lambda (err)
          (message "OOPS: %S" (plz-response-status (plz-error-response err)))))

One of the things I enjoy about not making elfeed-curl a general URL fetching library is I get to take some shortcuts. For example, I don't quite fully parse the response headers. Elfeed is the only consumer of those headers, and it doesn't need them to be carefully parsed, so it doesn't matter. But for a general library, you perhaps want to follow through. Technically I believe you're supposed to decode header text values as ISO-8859-1, and perhaps also handle RFC 2047 MIME decoding. (How much does curl handle here already? I don't know, but you can probably rely on the headers being at least syntactically valid.)

Thanks, I'll look into those issues.

The machinery to make this work is a bit complicated as you've seen (random delimiter "tokens"), so you may not think it's worth the trouble for your library, but it is important for some applications, like Elfeed.

I think I'll save that for the future, but it sounds like a feature that ought to be supported. Ideally this library would be efficient and support cases like that.

Similar arguments can be made for last-modified and etag headers, which reduce the load on servers and is just good netiquette. Elfeed handles these headers at a higher level, and it's probably also out of scope for plz for the same reason. But it's something to consider.

Hm, yeah, I feel like that's inching toward being a browser[0], haha. But if it's needed, it should be supported, so I'll see what happens. I'll think about ways to better expose headers than only through the plz-response struct.

Returning a string of content is certainly simpler, and it's not wrong per se. I prefer the buffer-passing method of elfeed-curl purely for the sake of efficiency. Unlike strings, buffers can be "sliced" using narrowing, so there's less garbage, fewer allocations, and less copying when presenting a narrowed process output buffer to callbacks. In Elfeed's case, the result needs to be stored in a buffer anyway for the XML parser, so strings are an even less attractive option.

I've changed the signatures of the get functions to let the user select the type of argument he wants returned or passed to the callback. I hope this API is clear and easy to understand, but please let me know what you think. Here are some examples from the tests:

;; Call callback with response body as a decoded string
;; (decoding is automatic unless ":decode nil" is used).
(plz-get "https://httpbin.org/get"
  :as 'string
  :then (lambda (string)
          (setf test-string string)))

;; Call callback with response body as a parsed JSON object.
;; json-read is called in the response buffer with it narrowed to the
;; response body, after decoding it.
(let* ((test-json)
       (process (plz-get "https://httpbin.org/get"
                  :as #'json-read
                  :then (lambda (json)
                          (setf test-json json)))))
  test-json)

;; Return the response body as a non-decoded, binary string (in this
;; case, a JPEG image).
(image-type-from-data
  (plz-get-sync "https://httpbin.org/image/jpeg" :decode nil))
;; => jpeg

Next on the to-do list is probably POST requests and file uploads. At that point I can probably start trying to use it in my own packages, like matrix-client (which currently uses request for uploads and url for everything else, due to issues in each).

I know you're busy, so I appreciate your feedback!

0: This is mildly amusing, funny how easy it is in Emacs:

(defun plz-browse (url)
  (interactive "sURL: ")
  (let ((dom (plz-get-sync url
               :as (lambda ()
                     (libxml-parse-html-region (point-min) (point-max) url)))))
    (with-current-buffer (generate-new-buffer "*plz-browse*")
      (shr-insert-document dom)
      (pop-to-buffer (current-buffer)))))

@skeeto
Copy link

skeeto commented Oct 23, 2019 via email

@alphapapa
Copy link
Owner Author

@skeeto I've finally made an aio wrapper for plz. It seems to work well. Please see:

Since plz is on GNU ELPA and aio is on MELPA, I don't know if I could reasonably put an aio-plz package on GNU ELPA. Would you be willing to consider moving aio to GNU ELPA, or to non-GNU ELPA?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants