Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support streaming response #43

Closed
wants to merge 2 commits into from
Closed

Conversation

r0man
Copy link
Contributor

@r0man r0man commented Dec 3, 2023

Hi @alphapapa and @ahyatt,

This changed adds support for streaming responses by adding a process filter and a callback function. The process filter is based on the default default process filter as described in the manual. It inserts the response into the process buffer. Additionally, if a callback function is provided by the during keyword, it is called every time a part of the response body is received.

I'm interested in writing a curl based provider for the llm library and saw the following issue here:
#42

Would you like to collaborate in resolving this issue?

Thanks, Roman.

plz.el Outdated
@@ -320,6 +320,10 @@ NOTE: In v0.8 of `plz', only one error will be signaled:
to update their code while using v0.7 (i.e. any `condition-case'
forms should now handle only `plz-error', not the other two).

DURING is an optional callback function called each time a part
of the response body is arriving. It is called with one
argument, the partial raw response.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, and in the similar documentation below, can you make clear if the partial raw response is the cumulative response or the new response since the last callback? Also, you may want to make it clear whether it is expected to get all of the content eventually or not.

@ahyatt
Copy link

ahyatt commented Dec 4, 2023

This is great, thank you for adding this!

@alphapapa
Copy link
Owner

Hi @r0man,

This looks like a very good start. Two issues that I can think of quickly are:

  1. Have you done the FSF copyright assignment? plz is in GNU ELPA, so it's required for major contributions like this.
  2. There's a use case that I'm not sure this code would easily support, that of wanting to discard incoming data once a complete object of some kind (like a JSON entity) is received; i.e. it wouldn't be desirable for the object to remain in the process buffer after it's been processed by another function. IMO this is the hard part of this feature's design: we want to make it easy to follow that pattern without requiring the user to write a full process filter.

Thanks.

@alphapapa alphapapa self-assigned this Dec 4, 2023
@alphapapa alphapapa added the enhancement New feature or request label Dec 4, 2023
@alphapapa alphapapa added this to the Future milestone Dec 4, 2023
@r0man
Copy link
Contributor Author

r0man commented Dec 4, 2023

Hi @alphapapa and @ahyatt,

  • I will extend the doc string more once we settled with a strategy.

  • I just started the FSF copyright assignment process.

  • Regarding discarding of already consumed data, yes this is not supported with the current code. It just leaves everything in the buffer, and there is nothing done to make sure a full JOSN object is passed to the callback, which I agree isn't great :)

What do you think of the following:

Streaming of any data format

To support streaming of any data format, I think one strategy could be to call the during callback with the partial data currently received and delete everything that has previously been in the buffer. Thus moving the job to construct the response to the client. Since we don't have the full response anymore, we could call the then callback with nil as an argument at the end of the stream, indicating the connection has been closed.

Streaming of SSE (server sent events)

There's a standard called SSE, which the OpenAI API uses for example. It is basically text separated by 2 newlines. We could write a process filter that works with this format. We pass complete lines to the during callback, clear the buffer when we receive 2 newlines, and at the end of the stream call the then callback indicating the connection has been closed. Server sent events don't have to be in JSON format, but we could support them by decoding the lines, maybe leveraging the existing :as option with json-read or something similar.

[1] https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#event_stream_format

Line based

The /stream endpoint of httpbin.org seems to be line based, instead of following the SSE standard. So maybe we should support this one as well?

Other data format

Do you have any other data format in mind that I did not mention?

Wdyt?

@ahyatt
Copy link

ahyatt commented Dec 5, 2023

Perhaps it might be possible to two new arguments: a callback (you already did this) plus a predicate that says when the callback is to be used (a message called with the latest information, returning non-nil if the callback should called).

@alphapapa
Copy link
Owner

alphapapa commented Dec 5, 2023

This is timely in a few ways: the hyperdrive.el project (presented at EmacsConf last Sunday) uses plz and has sponsored some of the work on it. One of the features we are working toward is to use a JSONRPC-style protocol, similar to the LSP one used in eglot. There are a number of different such protocols, like EventSource/SSE, JSONRPC, LSP-style JSONRPC, etc.

Notably, LSP uses Content-Length headers so the receiver can count the number of bytes received and know when a full object has been received and is ready to parse--this is different than standard JSONRPC, which required us to override some of the code in the jsonrpc library when testing. The JSONRPC server being used by hyperdrive does not (yet?) support such a header, and we want to avoid trying to parse incomplete JSON objects for performance reasons.

I've been discussing some of this with João Távora on emacs-devel. Over at hyperdrive, we're not yet sure what solution we're going to pursue. And as hyperdrive moves toward that, it will likely move away from using HTTP altogether.

But it would be good to keep all of these things in mind as we try to design a solution for plz to support streaming responses.

Given how many variables there are, I'd suggest that we not merge anything into master and make a release with such a feature until we've prototyped and tested it extensively. I'd like to avoid having to change the API later to fix design mistakes made in the beginning. Also, we should probably review similar network libraries in other FOSS to see what we can learn (e.g. Python, JS, etc). Then we can decide on the best approach for Elisp and how to code it for Emacs's internals.

Ideally, I think we would have test cases to support each of the three formats we've mentioned: EventSource/SSE, JSONRPC, and LSP-style JSONRPC (perhaps also considering the JSON Lines format, which has some overlap). Then we could more easily see what the code would look like and try to make it easy to use.

@r0man Since your interest lies with llm, maybe you could write a test in llm that uses this in-development feature of plz that could help us learn how to best design it. Then after you get the FSF CA done, maybe a similar test could be added to plz's own tests.

Thanks to all for your interest in helping with this feature.

@ahyatt
Copy link

ahyatt commented Dec 7, 2023

If this could be a new branch, that would help. I could also experiment with using plz in llm, as presumably @r0man could also do their separate experimentation. This is something I probably will have time to play around with later this month.

@r0man
Copy link
Contributor Author

r0man commented Dec 7, 2023

@alphapapa and @ahyatt Sounds good to me. I can take a look at the llm library and how I could use plz with it. In fact I already did some experiments on using plz in llm. I basically added a llm-request-plz.el file, that has the same interface as the Emacs native llm-request.el. I just stopped when I realized that I need streaming support for it.

@alphapapa What's not clear to me from your message is, should I only focus on the llm stuff and you are looking into the streaming support in plz. Or do you want me to continue on plz.el with what I described above?

@alphapapa
Copy link
Owner

@r0man As long as the FSF copyright assignment is or eventually gets signed by you, I'm happy for you to experiment with code and APIs to add streaming support in your branches of plz (otherwise I couldn't use the code, and it could be awkward to develop alternative code that does the same thing). I think what is needed here is a somewhat broad experimental phase so we better understand the problems and the tradeoffs of various solutions.

@r0man
Copy link
Contributor Author

r0man commented Feb 22, 2024

Hi @alphapapa and @ahyatt,

priorities changed a bit and I might (nothing promised, priorities change quickly) look into this again. I'm also nearing the completion of the FSF copyright assignment process.

I was still thinking howto extend plz.el to support streaming. One suspicion I have is that it involves a process filter and the handling of the response needs to be tweaked. While thinking about this, looking around the code, and my previous experience working on clj-http and cljs-http I would suggest the following:

Right now the user specifies a function in plz.el which parses the response. This is up-front. If you specify you want JSON via passing in the json-parse function, the response is parsed as JSON. Regardless if it actually is JSON. In an ideal world if you send an Accept header ask for application/json you should also get it back. But I also have seen many cases where this breaks along the way. In the OpenAI API which I would like to use, the response format is triggered by a parameter in the request. If you set stream to true in the body of the request, it sends a response in the text/event-stream format, otherwise in application/json. So you need to coordinated this in 2 places (if you even send an accept header).

An Accpet header can also contain multiple mime times with different priorities and the server might do continent negotiation, and you might not know what you get in advance.

So, my proposal would be to change the response handling code to be more flexible, and actually look at the Content-Type header when parsing the response. The idea is we have a process filter that reads the status code, the headers, and then decides which function to call (maybe multiple times in the case of streaming responses) depending on the Content-Type. If the user passes in a function we use that one, otherwise we look up an handler in an alist from content type to handler function. plz.el could provide a default content-type to handler alist, which could be overridden by the user. All this with keeping the current interface intact.

Another thought I had was about middleware. I'm not sure if you are familiar with how clj-http works, but they have this concept of middleware. There is a request map and there is a response map, and both are getting passed through mutiple middleware functions that have all the same signature. Each middleware can modify the request before actually sending it, and also modify the response before it actually gets passed back to the user. There is a middleware to encode input, another to decode output, yet another that handles compression and so on. At the end all the middleware gets composed together and you get a "batteries included" HTTP client. You can see an example of it here:

https://github.com/dakrone/clj-http/blob/3.x/src/clj_http/client.clj#L1125

The nice thing about this design is that you can mix and match those middleware functions. If the default client does not fit your needs you can build your own one with the middleware you need and maybe add additional.

Maybe that ship has already sailed, but I wanted to mention this anyway.

Wdyt? (mostly about the first part)

@ahyatt
Copy link

ahyatt commented Feb 22, 2024

I'm super happy that you can hopefully devote more time on this. I like the ideas in the clj-http client - such things could be useful to implement lower-level logging.

I agree with @alphapapa's proposed approach above: implement something you think is reasonable - and I'll try it out as a backend for the llm package and let you know what I think.

@alphapapa
Copy link
Owner

Hi @r0man,

priorities changed a bit and I might (nothing promised, priorities change quickly) look into this again. I'm also nearing the completion of the FSF copyright assignment process.

No problem. Thanks for the update.

So, my proposal would be to change the response handling code to be more flexible, and actually look at the Content-Type header when parsing the response. The idea is we have a process filter that reads the status code, the headers, and then decides which function to call (maybe multiple times in the case of streaming responses) depending on the Content-Type. If the user passes in a function we use that one, otherwise we look up an handler in an alist from content type to handler function. plz.el could provide a default content-type to handler alist, which could be overridden by the user. All this with keeping the current interface intact.

For non-streaming responses, this should already be doable by using :as 'response, and then the user code can do whatever it likes with regard to examining the response and handling its contents.

For streaming ones, yes, it will be necessary to use a bespoke process filter.

I'd like to keep the API unchanged as much as possible, so my suggestion would be to start by copying the plz function to one named plz-stream, and then experiment with the code and APIs to make it work usefully. Once that's done and seems to work reliably, we could consider whether its features could be integrated into the main plz function (e.g. using an :as 'stream argument, maybe with :through FILTER-FN, or :as '(stream :through FILTER-FN), or something like that).

Thanks for the information about clj-http and its middleware API. That sounds like a flexible design. I'm not sure that I'd want to change plz to work that way, as it seems likely to be overkill for our needs. As well, user applications should be able to build such a layer on top of plz if necessary.

To that end, if an application did have a need like that, I'd suggest that we consider implementing a layer on top of plz that would work with a plz-request struct that could be passed through a chain of modifying functions and finally to plz itself. Or it might not even need a new struct; maybe just a list of the args applied to plz would work, ala plz-queue.

@ahyatt As you may have noticed, I've so far tried not to commit much logging code to plz; whenever I've used some during debugging, I've removed it before committing. That's mainly to try to keep the code as clean as possible. But if you encounter some situations where you need better logging support, please let me know and maybe we can come up with a good general solution.

Let me know what you think. Thanks.

@r0man
Copy link
Contributor Author

r0man commented Feb 23, 2024

@alphapapa Alright, thanks for your answer. I will give your suggestion a try.

@r0man r0man force-pushed the streaming branch 13 times, most recently from 5381ede to a8f3309 Compare February 24, 2024 12:29
@r0man
Copy link
Contributor Author

r0man commented Feb 24, 2024

Hi @alphapapa and @ahyatt,

I updated this PR. This is what changed:

  • I kept the API mostly the same, I removed the DURING callback which I initially introduced, I think we don't need it.
  • I added the STREAM keyword parameter. When non-nil, a process filter is added. This filter adds a chunk of the response to the process buffer, calls the existing :plz-then handler, and deletes the chunk from the buffer again. This is done repeatedly until the request is complete.
  • The THEN callback is called multiple times with the current chunk of the response (neither complete, nor cumulative, and no guarantee that it's a complete line, or even a valid JSON object).
  • The type of object passed to THEN depends on the AS parameter, it might be a chunk of the body of the request, the chunk in the body slot of a plz-response struct, etc.
  • As usual, the ELSE and FINALLY callbacks are called for errors, and when the request is complete.
  • I haven't done much about when AS is set to file, since the current behavior seems to be that it is an error if the file already exists. We could append the chunks to the file if STREAM is non-nil?

I think this is the least invasive change I could come up with.

@ahyatt is this something that would work in the llm library? I have a version of the llm-request.el file that uses plz.el for my custom provider I use at work and I believe it would be sufficient. We still would need to interpret the server sent event protocol by piecing together the chunks there though.

Yesterday I was also experimenting with some approaches that would add support for server sent events somehow into plz.el. Basically what @alphapapa suggested in his first message as the second point/issue. I could not come up with something satisfying yet. There are multiple places (the process filter, the :plz-then callback) that would need to be kept in sync, customized and play well together. The protocols are also quite different as I understand now. Server sent events for example are not "just" line based JSON blobs, but lines with prefix, then a JSON blob. JSON RPC is different as well. I' starting to think the code parsing all those different protocols would be too complex to bake it into plz.el, but instead should be built on top of plz.el, as separate libraries.

Wdyt?

@r0man
Copy link
Contributor Author

r0man commented Mar 22, 2024

Hi @alphapapa , (and /cc @ahyatt),

I pushed an update to this branch. I believe it contains now most of what we initially discussed. Handling different streaming data formats without users having to write a complete process filter. I can also be extended to formats not yet supported.

It is not intended to be merged (unless you change your mind). I plan to create 1 or 2 new repositories for this. Probably plz-media-type or plz-media, and plz-event-source and submit this to ELPA so the llm library can use it. Would you be fine with those names, or should I come up with something that does not mention plz at all?

To get this working, we would need 1 or 2 small changes to plz.el:

1.) Allow setting the process filter via an option to the plz function. I can do this already for asynchronous requests by setting the process filter on the process object returned by the plz function. For synchronous requests however, I'm out of luck. So what do you think of making this change here:

https://github.com/alphapapa/plz.el/pull/43/files#diff-86628899659777b5eeb393a8c5c3471f4bdbe10ed16a653a12705cee1225d07dR256-R408

2.) I followed your advice and now the :as 'buffer option in the plz-media-type function. Previously I had changed plz to add the process object to the plz-response struct. I got rid of that change to keep changes to plz itself minimal. However, now I'm running into the following issue. On synchrounous requests that raise an error the I don't know how to get a handle at the buffer used for the requests. Ideally I would like to kill the buffer and not leave it around. The point where I would like to get at that buffer is here:

https://github.com/alphapapa/plz.el/pull/43/files#diff-a1cfaecaaf8e3e9c2fcf318e5b4090858f0a16af40c2cf2ba1dae33e6924d8efR605-R606

Can you please support us with 1.) and do you have any suggestions how to handle 2.) ?

Thanks, Roman.

@r0man
Copy link
Contributor Author

r0man commented Mar 23, 2024

I added another small change in the meantime, calling widen in plz--respond before parsing the response.

https://github.com/alphapapa/plz.el/pull/43/files#diff-86628899659777b5eeb393a8c5c3471f4bdbe10ed16a653a12705cee1225d07dR757-R759

I sometimes saw the following error raised:

Debugger entered--Lisp error: (wrong-type-argument number-or-marker-p nil)
  <=(200 nil 299)
  (let* ((status val)) (<= 200 status 299))
  (if (let* ((status val)) (<= 200 status 299)) (let ((status val)) (ignore status) (ignore status) (funcall (process-get process :plz-then))) (let nil (let ((err (make-plz-error :response (plz--response)))) (let* ((val (process-get process :plz-else))) (cond ((null val) (let nil (process-put process :plz-result err))) ((functionp val) (let (...) (funcall fn err))) (t (let (...) (error "No clause matching `%S'" x80))))))))

I haven't found a way to reproduce this yet. But when it happens the status code and the headers are in the buffer, but it is narrowed. I think this is happening when (plz--sentinel process "finished\n") is called in the "HACK" section of the plz function.

The call to widen seems to fix the issue. It might also be an issue with the way I use call plz in the plz-media-type function, or like the other day, an issue with my Emacs config.

@ahyatt
Copy link

ahyatt commented Mar 24, 2024

This all looks great to me, and works in the llm library. The only thing missing is integrating the ability to output debug information about the requests into plz, but that should be a different PR than this one.

@alphapapa it's worth making sure the interfaces in the plz-media-type-request looks similar enough and works similar enough to how plz works not to be confusing. I suspect the :as should be changed, perhaps removed, in favor of adding new media type handlers. It shouldn't be possible to call plz-media-type-request for things you should really be using plz for.

But these are syntactic matters, which are easy to adapt the llm code to, since it not only works well, it's a significant improvement to how we're making requests and parsing things in the llm library.

This allow setting the process filter via an option to the plz
function. This could already be done for asynchronous requests by
setting the process filter on the process object returned by the plz
function.

This however does not work for synchronous requests, since the plz
function only returns (or raises an error) when the process has
already run.
When using a process filter to stream responses sometimes the
following error is raised.

```
Debugger entered--Lisp error: (wrong-type-argument number-or-marker-p nil)
  <=(200 nil 299)
  (let* ((status val)) (<= 200 status 299))
  (if (let* ((status val)) (<= 200 status 299)) (let ((status val)) (ignore status) (ignore status) (funcall (process-get process :plz-then))) (let nil (let ((err (make-plz-error :response (plz--response)))) (let* ((val (process-get process :plz-else))) (cond ((null val) (let nil (process-put process :plz-result err))) ((functionp val) (let (...) (funcall fn err))) (t (let (...) (error "No clause matching `%S'" x80))))))))
```

This only happens sometimes and I haven't found a reliable way to
reproduce it. Whenever it happens, the code was called from the
`(plz--sentinel process "finished\n")` form. Which seems to work
around a race condition that might happen. The code then runs in the
context of a narrowed process buffer.

The call to widen seem to fix the issue. I haven't seen it since.
@r0man
Copy link
Contributor Author

r0man commented Mar 24, 2024

Hi @alphapapa and @ahyatt,

I now updated this PR with 2 small changes to plz.el. It would be great if we could add this to plz.el? Can you support us with this @alphapapa?

The other code I now extracted into 2 seperate repositories. I chose 2 repos, because the plz-event-source contains more of the server sent events specification than what was necessary for plz-media-type.

The repositories are here:

https://github.com/r0man/plz-media-type
https://github.com/r0man/plz-event-source

I based my README on the one from plz.el to get an info manual, so thanks for this @alphapapa. I was thinking if those repos also need a mascott. Maybe a different picture of a pidgin, or some other bird? Where did you got your picture from @alphapapa? Do you have more pidgin pictures?

I plan to submit those repos to ELPA. I think I can do that only after my paperwork is ready. I sent it last week and I'm waiting for a response from the FSF.

Wdyt?

@alphapapa
Copy link
Owner

Hi @r0man,

I got your email that you sent outside of GitHub. I will have to ask for your patience as I have a lot of things going on in real life right now. I view this project to support streaming requests with plz as a somewhat long-term one. Bigger ideas require more time to review.

I will try to provide some specific feedback today, if I can find time, and some more later this week. Feel free to ping me if you need to, but note that I get a lot of email, so mentioning me on GitHub is generally a better way to get my attention on these issues. (I already saw these updates here, but I hadn't taken the time to deal with them yet.)

Generally I would recommend keeping the number of additional packages to a minimum. What you've done with plz-media-types looks interesting, but I'm not sure it warrants an additional package. I'd guess that similar functionality could be provided by a simple function that returned a THEN function to be used with (plz ... :as 'response :then ...); that would operate "within" plz's API rather than wrapping it in another layer, which would generally seem preferable to me.

plz-event-source seems to make more sense as a separate package, to me. But I'll have to look more closely at the details when I get a chance.

Thanks.

@alphapapa
Copy link
Owner

BTW, I have to give you credit for the funniest question I've ever been asked on GitHub:

Do you have more pidgin pictures?

:D

No, I don't have more pigeon pictures. Digging up the various animal mascot images I've used on some of my packages is generally tedious, requiring extensive searching of public domain-type images and then editing the most suitable one I can find to look decent in a readme. The one on plz's repo still needs work, too.

@r0man
Copy link
Contributor Author

r0man commented Mar 26, 2024

Hi @alphapapa, alright, thanks for answering. Please take your time.

@josephmturner
Copy link
Contributor

Thank you @r0man for working on this!!! hyperdrive.el could definitely make use of EventSource streaming built into plz.el

Generally I would recommend keeping the number of additional packages to a minimum. What you've done with plz-media-types looks interesting, but I'm not sure it warrants an additional package. I'd guess that similar functionality could be provided by a simple function that returned a THEN function to be used with (plz ... :as 'response :then ...); that would operate "within" plz's API rather than wrapping it in another layer, which would generally seem preferable to me.

We did something similar in hyperdrive.el, which has a top-level alist that maps types to functions (similar to plz-media-types). But then, instead of extending/overloading plz's :as argument, we invoke whichever handler matches the response's mime type inside :then.

For the purpose of this example, h/fill in the above link is just a wrapper around plz. In other words, would it be possible to do something like this pseudocode?

(plz 'get URL :as 'response
  :then #'plz-media-types--dispatcher)

(defun plz-media-types--dispatcher (response)
  "Takes a plz RESPONSE and dispatches based on its mime type."
  (let* ((type (plz-media-types--response-type response))
         (handler (alist-get type plz-media-types)))
    (funcall handler response)))

@r0man
Copy link
Contributor Author

r0man commented Mar 31, 2024

Hi @josephmturner,

thanks for you interest, the links and the snippet of code. Your snippet is roughly where I started. I had a few more requirements though:

  • Decode the body of successful asynchronous response (what your snippet does)
  • Decode the body of an unsuccessful asynchronous response (handle both HTTP and curl errors)
  • Decode the body of successful synchronous response
  • Decode the body of an unsuccessful synchronous response (catch the signaled error, handle both HTTP and curl errors)
  • Handle streaming responses for some content types and invoke a callback on some parts of the response (handle the async and sync invocation cases, and also errors signaled in the callback)

To get all of this working you need to align the stars in the right way:

  • Turn off buffering in curl with the --no-buffer option (at least for streaming when you want to receive bytes as soon as possible)
  • To process a streaming response you need to install a process filter
  • Set the :as option to 'buffer or 'response and make sure your process filter sets up the requirements for internal the :plz-then callback
  • The process filter for each content type has different requirements how to interpret the stream
  • Handle decoding of the HTTP body in 4 different places

I went through all of this and that is how I ended up with plz-media-type. I helped @ahyatt to convert the LLM library to use curl. We are using plz-media-type in the following file and I believe it is working fine so far (at least I noticed an improvement in reliability compared to the url-retrieve based implementation, and it works with my employers proxy):

Personally, I would have gone even a step further in the llm-request-plz:

  • Only have a single llm-request-plz function that handles the sync and async cases
  • Have an option for each media type that needs customization to install the callbacks

We went back and forth about the design on this and came to the following conclusion:

  • We don't add anything that handles a response based on content type to the plz library itself (too much code)
  • Experiment with solutions somewhere else (initially that was the LLM library, now we extracted this from there and we have plz-media-type and plz-event-source)

Coming back to your example and what I believe @alphapapa is looking for. You can use plz-media-type and operate "within" plz API. That would look something like this:

(plz 'get "https://httpbin.org/json"
  :as 'response
  :then (lambda (response)
          (let ((media-type (plz-media-type-of-response plz-media-types response)))
            (message "%s" (plz-media-type-then media-type response)))))

This however only gets you HTTP bodies in the 2xx range decoded. For other HTTP responses you can do the same in the :else option, just slightly different. But this is only the asynchronous case.

For the synchronous case you can do something similar with the return value of the plz function for the 2xx range. For other HTTP responses you need to catch the signaled error and do again something slightly different than in the :else case.

You haven't talked about how streaming works in your snippet. With plz-media-type you can also do it "within" the plz API (at least in the asynchronous case). You would install a process filter on the process object returned by the plz function and call plz-media-type-process-filter with the right media type and the chunk of the response. Since the process filter might signal an error, you might also want to catch it and let the user handle it somehow.

That's a lot of plumbing code. I don't want to write all of this for each HTTP request I do. The plz-media-type-request function is a helper function that does all of this when you use it with the :as (media-type MEDIA-TYPE) option. Otherwise it does the same thing that plz does. The use/overriding of :as was something suggested by @alphapapa. I'm open to other suggestions, but I would suggest to at least give it a try first.

I would like to complete the migration to plz in the LLM library, so I and my colleagues at work can use a LLM that sits behind a corporate proxy. My time on this is ticking and I would like to finish this project soon with (at least) @ahyatt and my colleagues being satisfied.

So, what I would like to get out of this PR here is 2 small changes:

  • The ability to install a process filter (because I can't do this in the sync case myself)
  • Widen the buffer in plz--respond (because I believe there is a race condition when using this as is with a process filter)

For improvements on plz-media-type and/or plz-event-source I'm happy to take suggestions and/or PRs. I personally will probably not work on any significant API changes there if no severe flaws come up.

@alphapapa
Copy link
Owner

  • The ability to install a process filter (because I can't do this in the sync case myself)

That should be no problem.

  • Widen the buffer in plz--respond (because I believe there is a race condition when using this as is with a process filter)

I'd need to know more details about that issue. The buffer is intentionally narrowed to the body so that functions that process the body content won't see the headers.

Other than that, AFAICT the main issue with your list of "aligned stars" is that you want to unify handling of responses rather than having it split between successful/2xx ones and errored/non-2xx ones. That seems reasonable to me, so we could talk about ways to make that easier within plz's API. For example, just off the top of my head, I can imagine an API like :as 'response-with-error that would pass a plz-response struct with an error slot that would be a plz-error struct in case of an error; then the THEN function would be called for both successful and errored requests, ignoring the ELSE function, which would consolidate the handling into a single function.

What do you think? Thanks.

@r0man
Copy link
Contributor Author

r0man commented Mar 31, 2024

Hi @alphapapa,

the option for installing the process filter would be great. Thanks for supporting this.

About the narrow/widen. I understand that the callbacks should be called with the narrowed buffer, and I don't want to change that. What I would like to change is that when the plz--respond function is called, the (goto-char (point-min)) form really goes to position 0 in that buffer, also in case the buffer is narrowed (for whatever reason).

I tried to describe the issue in the comment above:
#43 (comment)

Unfortunately I don't have a reliable way to reproduce this. It happens in rare cases which I don't understand fully. The only place where I narrow the buffer for a brief moment in plz-media-type is here through the use of calling plz--response:

https://github.com/r0man/plz-media-type/blob/main/plz-media-type.el#L194

But I immediately widen it again, so when the process finishes and plz--respond is called, it should in theory not be narrowed. But for some reason it crashes there with:

Debugger entered--Lisp error: (wrong-type-argument number-or-marker-p nil)
  <=(200 nil 299)
  (let* ((status val)) (<= 200 status 299))
  (if (let* ((status val)) (<= 200 status 299)) (let ((status val)) (ignore status) (ignore status) (funcall (process-get process :plz-then))) (let nil (let ((err (make-plz-error :response (plz--response)))) (let* ((val (process-get process :plz-else))) (cond ((null val) (let nil (process-put process :plz-result err))) ((functionp val) (let (...) (funcall fn err))) (t (let (...) (error "No clause matching `%S'" x80))))))))

I'm not sure what is causing this, but I have the suspicion it might be related to this:
https://github.com/alphapapa/plz.el/blob/master/plz.el#L515

About your suggestion to have another :as response-with-error. I'm actually quite happy with how plz-media-type-request works right now and I would not really need it. And for people that would like to build on top of plz, like I did, this is yet another different way how responses are handled. I think it would be more of a burden than helpful.

@alphapapa
Copy link
Owner

Hi @alphapapa,

the option for installing the process filter would be great. Thanks for supporting this.

Sure. Would you be willing to send a PR that implements just that?

About the narrow/widen. I understand that the callbacks should be called with the narrowed buffer, and I don't want to change that. What I would like to change is that when the plz--respond function is called, the (goto-char (point-min)) form really goes to position 0 in that buffer, also in case the buffer is narrowed (for whatever reason).

I tried to describe the issue in the comment above: #43 (comment)

I think we need to understand the problem before we try to fix it. AFAIK this error has never happened in plz itself, so it must be caused by the way you're doing something in, I guess, plz-media-type. It might be indicative of a convention that needs to be carefully followed in a sentinel or filter (which might even just be a way to reliably work around an idiosyncrasy of Emacs's process code).

Unfortunately I don't have a reliable way to reproduce this. It happens in rare cases which I don't understand fully. The only place where I narrow the buffer for a brief moment in plz-media-type is here through the use of calling plz--response:

https://github.com/r0man/plz-media-type/blob/main/plz-media-type.el#L194

But I immediately widen it again, so when the process finishes and plz--respond is called, it should in theory not be narrowed. But for some reason it crashes there with:

Debugger entered--Lisp error: (wrong-type-argument number-or-marker-p nil)
  <=(200 nil 299)
  (let* ((status val)) (<= 200 status 299))
  (if (let* ((status val)) (<= 200 status 299)) (let ((status val)) (ignore status) (ignore status) (funcall (process-get process :plz-then))) (let nil (let ((err (make-plz-error :response (plz--response)))) (let* ((val (process-get process :plz-else))) (cond ((null val) (let nil (process-put process :plz-result err))) ((functionp val) (let (...) (funcall fn err))) (t (let (...) (error "No clause matching `%S'" x80))))))))

I'm not sure what is causing this, but I have the suspicion it might be related to this: https://github.com/alphapapa/plz.el/blob/master/plz.el#L515

The problem seems to be that something is trying to parse the HTTP headers, but the buffer is narrowed to the body, so the headers aren't visible, so the status code is returned as nil, which is unexpected in the condition.

In any case, the code around line 515 that you cite mustn't be changed. That code was hard-won over many hours of hair-pulling spread over a few years of development. Now that it finally works reliably, changing anything in that area would have to be a last resort to fix a very serious problem.

About your suggestion to have another :as response-with-error. I'm actually quite happy with how plz-media-type-request works right now and I would not really need it. And for people that would like to build on top of plz, like I did, this is yet another different way how responses are handled. I think it would be more of a burden than helpful.

What I'm trying to avoid is a situation in which people decide to use plz-media-type as a wrapper by default. AFAICT, the issue that it solves--handling responses by content-type, and possibly unifying response handling regardless of errors--needn't be solved by a wrapper around plz. This is a design issue, and one that I've never been completely happy with, so this might be an opportunity to improve it for all users. If, instead, it were commonly handled by using plz-media-type, it might constrain future development of plz, making more general improvements more difficult.

I understand that you've already done that work and it works for you, but my goal here is to improve plz in general, for its long-term future. So I'd like to improve plz in a way that both enables you to meet your needs and serves similar, general purposes. This necessarily takes some time and "negotiation".

@r0man
Copy link
Contributor Author

r0man commented Mar 31, 2024

I opened a PR over here: #50

@josephmturner
Copy link
Contributor

Thanks for your patience, @r0man. Does the below pseudocode address all of your concerns?

  • Decode the body of successful asynchronous response (what your snippet does)

  • Decode the body of an unsuccessful asynchronous response (handle both HTTP and curl errors)

  • Decode the body of successful synchronous response

  • Decode the body of an unsuccessful synchronous response (catch the signaled error, handle both HTTP and curl errors)

  • Handle streaming responses for some content types and invoke a callback on some parts of the response (handle the async and sync invocation cases, and also errors signaled in the callback)

(defvar my/buf-local-type nil
  "Set locally to one of the keys of `my/type-handlers'.")

(defun my/response-type (buffer)
  "Return type for BUFFER.
Returned type should be one of the keys of `my/type-handlers'."
  ;; For the sake of example, this function just checks for end of
  ;; headers and sets `my/buf-local-type' to the media type.
  (when (re-search-forward plz-http-end-of-headers-regexp nil t)
    (my/get-media-type-for-buffer)))

(defvar my/type-handlers
  ;; type-a and type-b could be media types.  They could also be based
  ;; on some other header like Content-Size or...
  `((type-a
     .  ( :then ,#'my/type-a-then
          :else ,#'my/type-a-else
          :filter ,#'my/type-a-filter))
    (type-b
     .  ( :then ,#'my/type-b-then
          :else ,#'my/type-b-else
          :filter ,#'my/type-b-filter))))

(defun my/filter (process chunk)
  "Sets `my/buf-local-type' locally to PROCESS buffer."
  (when (buffer-live-p (process-buffer process))
    (with-current-buffer (process-buffer process)
      ;; Set local variable.
      (unless my/buf-local-type
        (when-let ((type (my/response-type)))
          (setq-local my/buf-local-type type)))
      ;; Continue to process stream according to media type.
      (let ((filter (map-elt
                     (map-elt my/type-handlers my/buf-local-type)
                     :filter)))
        (funcall filter process chunk)))))

(defun my/then (response)
  "Call appropriate handler with RESPONSE as its sole argument."
  (let ((handler
         (map-elt
          (map-elt my/type-handlers my/buf-local-type)
          :then)))
    (funcall handler response)))

(defun my/else (err)
  "Call appropriate error handler with ERR as its sole argument.
ERR is a `plz-error'.
Handles curl errors specially."
  (let ((http-error-handler
         (map-elt
          (map-elt my/type-handlers my/buf-local-type)
          :else)))
    (if (plz-error-curl-error err)
        ;; Handle curl errors separately if you want.
        (my/handle-curl-errors)
      (funcall http-error-handler err))))

;; Asynchronous
(let ((plz-curl-default-args (cons "--no-buffer" plz-curl-default-args)))
  (plz 'get url :as 'response
    :filter #'my/filter :then #'my/then :else #'my/else))

;; Synchronous
(let ((plz-curl-default-args (cons "--no-buffer" plz-curl-default-args)))
  (condition-case err
      (my/then
       (plz 'get url :as 'response :filter #'my/filter))
    (plz-error (my/else (caddr err)))))

If this approach does not address all of your concerns, I'll be happy to try again :)

@r0man
Copy link
Contributor Author

r0man commented Apr 1, 2024

Hi @josephmturner,

thanks for the pseudocode. Have you looked at the plz-media-type code before writing this? I think that's more or less how plz-media-type works internally. At least the overall pattern is very familiar to me. With some small differences about global state and how :else, :then and :filter are packaged up (you use a list, I use a CLOS class).

So what is your idea how we/I/you should proceed on this?

@josephmturner
Copy link
Contributor

@r0man You're right that the difference between the list and CLOS approach is an implementation detail.

IIUC, with :as MEDIA-TYPES, plz-media-type-request does these main things:

  1. Unify sync and async :then callbacks.
  2. Unify sync and async error handling (plz is wrapped in condition-case).
  3. Handle process filter.
  4. Catch errors in the process filter and make them available to the sync and async error handlers.
  5. Turn off curl buffering.

I need more time to understand how the plz API could be improved while remaining generally useful to different kinds of projects which may want to do only a subset of these things or which may want to combine the pieces in a different way.

@alphapapa
Copy link
Owner

So what is your idea how we/I/you should proceed on this?

Looking again at plz-media-type's source code, it looks like it has a lot of code that's specifically for the sake of your needs in the LLM packages. I'm a bit skeptical that a framework like that is generally needed outside of your LLM packages. But if it is, it seems to me like it's a bit early to factor it out and publish it separately; you'd likely be committing to certain design decisions that might need to be changed later, after gaining more experience with it. (Similarly, plz itself is based on some of the design of Chris Wellons's elfeed-curl library that was used within Elfeed for many years. And plz itself spent a few years maturing before being submitted to GNU ELPA.)

In a similar vein, the decision to use EIEIO is one that you may want to revisit in the future. I used EIEIO in some earlier work of mine and realized that it was just overkill for what was needed. A few structs, accessor functions, and other functions gets the job done more simply, and generally with better performance. With regard to downstream applications, using EIEIO in your library means that the applications must use it as well, and that might be a barrier to adoption. It's often just an unnecessary, extra layer of abstraction and complexity. (And since EIEIO isn't a full implementation of CLOS, it lacks some of the features that would make it as useful as CLOS.)

So, I would recommend that we make the minimal changes to plz itself that are necessary; likely just passing through the process filter argument. Then I'd recommend that you develop these libraries within your LLM applications for a time, and after they've matured and stabilized, then we could look at them again and consider what parts may be useful to factor out into other libraries, and possibly publish to ELPA. It would be good to avoid publishing packages to ELPA prematurely.

What do you think?

Thanks.

@r0man
Copy link
Contributor Author

r0man commented Apr 4, 2024

Hi,

none of plz-media-type is actually specific to LLMs. The response formats are generally applicable. I also don't agree that there is no need for such a thing. Look at all those LLM packages, all of them do their own ad-hoc parsing.

About EIEIO, yeah a matter of taste I guess. So far, I'm happy with it. I also read Chris article about structs and EIEIO. I think documentation for the slots and validation is useful enough. At least to me.

I would say let's close the issue. I think once the process filter option is merged into the main branch an released there is nothing we would need to be added to plz.

About the issue with the process filter. In the meantime I studied jsonrpc.el and saw it uses timers to move the message handling onto the Emacs main loop. This is done here:

https://github.com/emacs-mirror/emacs/blob/master/lisp/jsonrpc.el#L785

I'm probably going to do something similar, and started this over here:

r0man/plz-media-type@349c8ef

@r0man
Copy link
Contributor Author

r0man commented Apr 4, 2024

@ahyatt what do you think how we should proceed on this? I see the following options:

  • We could move the plz-media-type and plz-event-source code into the llm library and continue development there. If we do that I would also like to add the testing code there, since this helps me a lot to make sure things don't break.
  • We keep the the plz-media-type and plz-event-source repositories and develop things there. From time to time we copy over the implementation files as we did now. The llm library would then ship those 2 files (or a subset) with it.
  • Publish the plz-media-type and plz-event-source packages to ELPA and let the llm library depend on it.
  • Something else?

@ahyatt
Copy link

ahyatt commented Apr 6, 2024

I think either your first or third option seems good to me. If we do the first one, we should call it the llm-media-type, llm-event-source, classes. It would be weird having plz features in the llm library.

If you think the llm library will be the only client, the first option is probably best. But the @josephmturner makes me think maybe there is at least one other client, in which case putting the package on ELPA is most appealing.

@josephmturner
Copy link
Contributor

@ahyatt I don't know of any other client. My suggestion is to let the libraries mature within llm for now so you can adjust the APIs to suit the needs of llm. You can always package and release them later.

@alphapapa
Copy link
Owner

FWIW, I would suggest developing the libraries within llm's repo for a while, and then considering whether they should be factored out for wider use.

Also, I would tend to agree that plz shouldn't be in the names, at least not yet. The more that the libraries wrap plz in their own APIs, the more the use of plz becomes an internal implementation detail.

@r0man

About the issue with the process filter. In the meantime I studied jsonrpc.el and saw it uses timers to move the message handling onto the Emacs main loop.

plz also does that:

plz.el/plz.el

Line 739 in 12f747c

(run-at-time 0 nil #'plz--respond process buffer status))))))

@r0man
Copy link
Contributor Author

r0man commented Apr 9, 2024

I think there is nothing more to be done here, except for the "filter option" PR to be merged and released once Craig responded.
Even though we had an intense discussion, everyone learned a bit about their user base. We will soon have more reliable HTTP connectivity with support for mutual TLS in llm and Ellama, powered by plz. Thanks for the help and support from everyone involved. Until next time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants