Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BREAKING]: Change internal layers stack to be value-based instead of type-based #789

Merged
merged 19 commits into from
Mar 22, 2022

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented Dec 15, 2021

This is very proof-of-concept currently, but wanted to put the code up
to get initial feedback/ideas. Here's the gist:

  • I define 3 abstract subtypes of Layer in the Layers module:
    InitialLayer, RequestLayer, and ConnectionLayer; these
    correspond to the types of arguments the layer would receive when
    overloading request; InitialLayer => request(method, url, headers, body), RequestLayer => request(url, req, body), and
    ConnectionLayer => request(io, req, body). I feel like these are
    useful abstract types to help distinguish the different kinds of
    layers based on the arguments they receive in different parts of the
    overall stack. I feel like it hopefully also solves the "how do I
    insert my layer in the appropriate level of the stack" because you
    just pick which abstract layer to subtype
  • Custom layers are then required to:
    • be concrete types
    • must subtype one of the 3 abstract layer types mentioned
    • must have a constructor of the form: `Layer(next::Layer; kw...)
    • must have a field to store the next::Layer argument
    • must overload: request(x::MyLayer, args...; kw...) where args
      will depend on which abstract layer MyLayer subtypes
    • in the overloaded request method, it must, at some point, call
      request(layer.next, args...; kw...) to move to the next layer in
      the stack
    • the final requirement is the custom layer must overload
      Layers.keywordforlayer(::Val{:kw}) = MyLayer where :kw is a
      keyword argument "hook" or trigger that will result in MyLayer
      being inserted into the stack

What I like about this approach so far:

  • It feels more straightforward to define my own custom layer: pick
    which level of the stack I want it in, subtype that abstract layer,
    register a keyword argument that will include my layer, then define my
    own request overload; we can define all the current HTTP layers in
    terms of this machinery, custom packages could define their own layers
    and they all are treated just like any HTTP.jl-baked layer
  • It also feels like it would be easier to create my own custom "http
    client"; what I mean by this is that I feel like it's common for API
    packages to basically have their own AWS.get/AWS.post/AWS.put
    wrapper methods that eventually call HTTP.get/HTTP.post/etc., but
    in between, they're adding custom headers, validating targets, and
    whatever. A lot of that is like a custom layer, but by having your
    own AWS.get/AWS.post wrapper methods, you can ensure certain
    layers are included; and with this approach that's straightforward
    because they can indeed make their own custom layer as described
    above, and then just ensure every HTTP.request call includes the new
    keyword argument to "hook in" their custom layer

Potential downsides:

  • Relying on a global keyword registry; i.e. there's potential for
    clashes if two different packages try to overload
    Layers.keywordforlayer(::Val{:readtimeout}) = .... Whichever package
    is loaded last will "pirate" the keyword arg and overwrite the
    previous method. We could potentially add some extra safety around
    this by having another clientenv=:default keyword and then devs
    could overload Layers.keywordforlayer(::Val{:default}, ::Val{:readtimeout}) = ... but that feels a tad icky. I also don't
    expect there to be even dozens of total keywords/layers to process, so
    with that in mind, it also lowers the risk of clash.
  • The previous layer functionality allowed specifying the exact
    layer you wanted your custom layer to be inserted before; do we think
    that level of specificity/granularity is really necessary? I'm kind of
    going on a gut feel here that what you really want control over is
    what kind of args your overloaded request method will receive and
    that's what my 3 new abstract layer subtypes aim to solve. So you can
    pick 1 of 3 different general layers of where your custom layer gets
    inserted, but don't have control over anything more specific in terms
    of what layer comes before/after
  • One thing I noticed is that the current stack function actually
    has some hard-coded logic that checks keyword arguments and
    conditionally will include layers based on the value of the
    keyword arg. We could potentially handle that in the proposed scheme
    by saying that your custom Layer(next; kw...) constructor can
    return the nothing value, in which case, no layer will be inserted.
    That would give, for example, the TimeoutLayer the chance to check
    if the readtimeout > 0 before returning a TimeoutLayer or
    nothing

src/HTTP.jl Outdated

function stacklayertypes(::Type{T}, layers; kw...) where {T}
for (k, _) in pairs(kw)
layer = Layers.keywordforlayer(Val(k))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I understood the design here, but in order to insert a new layer you need to 1) overload this method, and 2) pass the matching kwarg?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the idea. The only "baked in" layers then are those where the keyword args included in this stack function call. I've also updated the mechanism (see below) so that if the layer constructor returns nothing it isn't included in the stack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it slightly strange that you need to do both of these things. If you need to modify the call to HTTP.request anyway (to add the kwarg) why not pass the layer directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be pretty inconvenient I think; it would be the difference between:

HTTP.get(...; retry=true) # includes the RetryLayer based on retry keyword "hook"

vs.

HTTP.get(...; layer=HTTP.RetryLayer(; retry=true))

There's awkwardness there because how do we "pass" the layer? We can't do retry=HTTP.RetryLayer() unless we loop through all keyword arguments and check if the value is a Layer or not. And then there's awkwardness because what if the layer author wants to use a keyword argument as an argument to their layer constructor? Then users have to be burdened with the responsibility of knowing whether to pass args/keyword args to the layer constructor or as a keyword arg to HTTP.request.

In my proposal, everything is keyword args and users don't have to be aware of layers at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but it becomes a bit of "keyword argument piracy". Right now AWS calls HTTP.request with their own stack I think? So in that case, even if I have loaded the aws package I won't accidentally use that layer. BrokenRecords.jl already does "layer piracy" I think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cases like AWS though shouldn't need to provide an entire custom stack. They should be able to just define their own custom layer or two and then it all composes with the "base" stack in HTTP.jl. The problem with everyone providing their own stacks is there isn't any composability. There isn't a nice way to use AWS custom layer along with my own custom layer in my company's auth system.

I'm going to take some more time to study the AWS and BrokenRecord setups now that this PR is "done", because I want to make sure I understand those use-cases thoroughly and how they fit in this potential new paradigm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the current solution isn't ideal either, just not sure the kwargs based one is better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back over the previous layer-manipulation code vs. the proposed here and there is a key important difference: the previous code led to the scenario where you were modifying a single, global, and I'll add, non-threadsafe, client request stack by inserting/removing a custom layer for a request. Ok, you weren't modifying the whole stack, but it was still a single global "extra custom layers" stack. That doesn't lend itself to concurrent code needs at all!

In my proposal there is no single global stack; but rather on a per request basis, and with the user-supplied keyword arguments, the stack is dynamically generated and used for the duration of that request, in isolation from any other callers.

I partly discovered this by coming back to this PR and wondering if we shouldn't just use a single global Dict to "register" keyword arguments instead of overloading the Layers.keywordforlayer with a Val, but then again, we could get into thread-safety issues if that keyword registry is being modified concurrently.

So while I still agree the Layers.keywordforlayer mechanism is somewhat ugly, it does have several advantages now that make it simpler in terms of implmentation.

@fredrikekre
Copy link
Member

The previous layer functionality allowed specifying the exact layer you wanted your custom layer to be inserted before; do we think that level of specificity/granularity is really necessary? [...]

I think it is nice that the order is deterministic at least.

@quinnj
Copy link
Member Author

quinnj commented Dec 15, 2021

@fredrikekre, the other consideration is I haven't looked into how exactly this interacts with #469; I admit to not fully remembering the motivation for that issue or if that can be folded into the changes here somehow.

@fredrikekre
Copy link
Member

The motivation was that #463 where calls to HTTP.request could unexpectedly dispatch directly to internal functions. Regardless of that, I think it makes sense to have a single entry point (HTTP.request) that is separated from the internal machiner (e.g. Layers.request or even HTTP._request or something like that).

@quinnj
Copy link
Member Author

quinnj commented Dec 15, 2021

Ok, sounds good. Do you mind if I hijack those changes and fold them into my PR here? It's pretty deeply tangled in the stuff I'm doing, so might as well?

@fredrikekre
Copy link
Member

Go ahead.

@codecov-commenter
Copy link

codecov-commenter commented Dec 16, 2021

Codecov Report

Merging #789 (3ea2c8b) into master (8743671) will decrease coverage by 0.01%.
The diff coverage is 87.13%.

@@            Coverage Diff             @@
##           master     #789      +/-   ##
==========================================
- Coverage   78.12%   78.10%   -0.02%     
==========================================
  Files          39       36       -3     
  Lines        2482     2430      -52     
==========================================
- Hits         1939     1898      -41     
+ Misses        543      532      -11     
Impacted Files Coverage Δ
src/sniff.jl 85.61% <ø> (ø)
src/ContentTypeRequest.jl 57.14% <57.14%> (+57.14%) ⬆️
src/HTTP.jl 83.05% <66.66%> (-14.83%) ⬇️
src/CanonicalizeRequest.jl 66.66% <75.00%> (+66.66%) ⬆️
src/DebugRequest.jl 80.00% <80.00%> (+80.00%) ⬆️
src/ConnectionRequest.jl 79.41% <81.81%> (+0.95%) ⬆️
src/CookieRequest.jl 91.30% <82.60%> (+5.59%) ⬆️
src/RedirectRequest.jl 83.33% <84.61%> (+5.20%) ⬆️
src/Messages.jl 86.52% <85.71%> (-2.45%) ⬇️
src/RetryRequest.jl 60.86% <92.85%> (+0.86%) ⬆️
... and 19 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@quinnj
Copy link
Member Author

quinnj commented Dec 16, 2021

Ok, quick update. This PR now includes the main changes from #469 since the 2 PRs basically touch all the same files/lines. I also went ahead and removed the AWSLayer (#486, cc: @mattBrzezinski ) since it was also intertwined with all the layer code updates.

I like how the code has shaped up; all the test suite passes locally for me, so let's see how CI does. My plan from here is to request feedback/review from folks (@christopher-dG, @mattBrzezinski, and @c42f), I'm going to take a deeper look at BrokenRecord.jl, which seems to be the biggest public package that heavily uses the current-master stack machinery, and then I also want to do a thorough update of the examples code and README where the custom layer/stack functionality are mentioned, so update all the docs around this proposed new stack architecture.

@quinnj
Copy link
Member Author

quinnj commented Mar 10, 2022

Ok, I've been mulling the entire client request mechanisms again and came up w/ a modified proposal from the one initially posted in this PR. I spent some time detailing everything in the new Layers module docs in this PR if you want to dive deep, but roughly, it works as follows:

  • Still have the new concept of layer "kinds": InitialLayer, RequestLayer, ConnectionLayer, and ResponseLayer
  • "Builtin" and custom layers choose one of those to subtype; must have constructor like CustomLayer(next; kw...) where next is the next layer in the stack, layer MUST have a field to store this next layer, MUST overload like Layers.request(layer::CustomLayer, args...), and must call Layers.request(layer.next, args...) at some point in their overload (you can look at each XRequest.jl file for how these requirements look for existing internal HTTP.jl layers)
  • Once a custom layer is properly defined, we really just need a way to "hook it" into the stack (this applies to custom layers; HTTP.jl builtin layers are hard-coded into the stack for performance); you could do it manually by calling something like
resp = HTTP.request(HTTP.stack(CustomLayer), "GET", "https://google.com")

But it obviously would get annoying having to include the HTTP.stack(CustomLayer) piece in every call.

  • A HTTP.@client CustomLayer macro is provided that allows quickly/easily making your own "custom" client that will include your CustomLayer in every call; so you would use HTTP.@client like:
module CustomClient

using HTTP
include("customlayer_definitions.jl")
HTTP.@client CustomLayer

end

and users could then easily tap into the new CustomLayer functionality by doing:

using CustomClient

resp = CustomClient.get("https://google.com")

Basically, HTTP.@client makes it really easy to "make your own client" by providing those common, user-facing methods that will just call back into the generic HTTP.request method and will include any custom layer automatically.

So notably this removes the Layers.keywordforlayer "registry" function I had proposed above and in place either requires passing a custom layer manually (to HTTP.stack and HTTP.request), or by creating your own little "client" that will wrap all the client calls w/ your custom layer.

So for AWS.jl, for example, they would, in theory, be able to define their own AWSLayer that did all the aws header computation stuff, and then they could defne HTTP.@client AWSLayer in their package and users could use the AWSLayer functionality by doing AWS.get, AWS.post, AWS.put, etc.

Similarly for BrokenRecord.jl, we can define the record/playback functionality in new custom Layers, and then make our own little client users could call to get the record/playback functionality, like BrokenRecord.get(...) (called 1st time, it actually makes request and stores response), BrokenRecord.get(...) called 2nd time w/ same params it finds a stored response and just deserializes response to return instead of making actual request.

The main downside I see of this approach is the single direction in which custom layers can compose w/ each other; i.e. you can't define CustomLayerA in PackageA.jl and CustomLayerB in PackageB.jl, load them both, and just "get" both custom layers automatically (like which was possible with the Layers.keywordforlayer approach). Instead, you'd have to make a "glue client" like HTTP.@client CustomLayerA CustomLayerB that included both custom layers in a "client" explicitly. The justification in my mind for giving up a little "composability" here is that frankly I just don't think you need to compose custom layers together that often if at all, and especially not in any kind of "automatic" or "magical" way. If I really do need the functionality from several disparate custom layers, then I've demonstrated how easy it is to wrap them all together in a little custom client that should serve your needs just fine.

@christopher-dG
Copy link
Member

So if I'm reading this right, anyone wanting to use BrokenRecord would need to update their source code to use BrokenRecord.get instead of HTTP.get? My ideal case is that the test dependency doesn't leak into the package the way Mocking.jl does.

@quinnj
Copy link
Member Author

quinnj commented Mar 10, 2022

So if I'm reading this right, anyone wanting to use BrokenRecord would need to update their source code to use BrokenRecord.get instead of HTTP.get? My ideal case is that the test dependency doesn't leak into the package the way Mocking.jl does.

Can you expound on what you mean by "doesn't leak into the package"? Perhaps share a small example of code and where it would live (either /src/ or /test)?

Part of the motivation in this latest proposal is to make it very explicit when custom layers are being integrated into the request stack. You can imagine this being a security issue in the future if someone were able to "inject" a malicious layer. In the latest proposal, layers must be explicitly included either via manual call to HTTP.stack or by using a custom client. I personally like the transparency that will come w/ this new mechanism.

@iamed2
Copy link
Contributor

iamed2 commented Mar 10, 2022

You can imagine this being a security issue in the future if someone were able to "inject" a malicious layer.

I believe this kind of injection would be what @christopher-dG wants to do with BrokenRecord.jl, where it would be possible to monitor a package using HTTP.get without that package having to depend on BrokenRecord.jl.

@quinnj
Copy link
Member Author

quinnj commented Mar 11, 2022

I believe this kind of injection would be what @christopher-dG wants to do with BrokenRecord.jl, where it would be possible to monitor a package using HTTP.get without that package having to depend on BrokenRecord.jl.

I just don't follow what this is supposed to mean. Monitoring a package without that package taking a dependency sounds......malicious? Undesirable? Wrong? What am I missing here.

@iamed2
Copy link
Contributor

iamed2 commented Mar 13, 2022

I just don't follow what this is supposed to mean. Monitoring a package without that package taking a dependency sounds......malicious? Undesirable? Wrong? What am I missing here.

Difference of philosophy? This is very common in dynamic languages, it's just monkey patching. But this BrokenRecord package would likely not be part of a typical dependency tree, it would sit as a test dependency. Compare https://github.com/vcr/vcr and (coincidentally?) Cassette.jl. AWS also provides a Python tool that patches requests and httplib to add X-Ray support, which is pretty similar to this.

An actually malicious Julia package could pretty easily replace all code in another package anyway.

@clarkevans
Copy link
Contributor

In this proposal would "body" be an IOStream? In this way, a client or server could process data incrementally.

@clarkevans
Copy link
Contributor

I'd shy away from naming filters in each layer by keyword and just have the programmer manually specify the stack. If one goes with kw based naming of filters within a given layer, one probably has to specify a way for one filter to mark that it comes before/after another. For example, server side, some filters in the response layer may require that the user is already authenticated, and hence, those layers may want to have a "after=auth" mark, so that they are invoked after authorization.

This is very proof-of-concept currently, but wanted to put the code up
to get initial feedback/ideas. Here's the gist:

  * I define 3 abstract subtypes of `Layer` in the Layers module:
  `InitialLayer`, `RequestLayer`, and `ConnectionLayer`; these
  correspond to the types of arguments the layer would receive when
  overloading `request`; `InitialLayer` => `request(method, url,
  headers, body), `RequestLayer` => `request(url, req, body)`, and
  `ConnectionLayer` => `request(io, req, body)`. I feel like these are
  useful abstract types to help distinguish the different _kinds_ of
  layers based on the arguments they receive in different parts of the
  overall stack. I feel like it hopefully also solves the "how do I
  insert my layer in the appropriate level of the stack" because you
  just pick which abstract layer to subtype
  * Custom layers are then required to:
    * be concrete types
    * must subtype one of the 3 abstract layer types mentioned
    * must have a constructor of the form: `Layer(next::Layer; kw...)
    * must have a field to store the `next::Layer` argument
    * must overload: `request(x::MyLayer, args...; kw...)` where `args`
    will depend on which abstract layer `MyLayer` subtypes
    * in the overloaded `request` method, it must, at some point, call
    `request(layer.next, args...; kw...)` to move to the next layer in
    the stack
    * the final requirement is the custom layer must overload
    `Layers.keywordforlayer(::Val{:kw}) = MyLayer` where `:kw` is a
    keyword argument "hook" or trigger that will result in `MyLayer`
    being inserted into the stack

What I like about this approach so far:
  * It feels more straightforward to define my own custom layer: pick
  which level of the stack I want it in, subtype that abstract layer,
  register a keyword argument that will include my layer, then define my
  own `request` overload; we can define all the current HTTP layers in
  terms of this machinery, custom packages could define their own layers
  and they all are treated just like any HTTP.jl-baked layer
  * It also feels like it would be easier to create my own custom "http
  client"; what I mean by this is that I feel like it's common for API
  packages to basically have their own `AWS.get`/`AWS.post`/`AWS.put`
  wrapper methods that eventually call `HTTP.get`/`HTTP.post`/etc., but
  in between, they're adding custom headers, validating targets, and
  whatever. A lot of that is _like_ a custom layer, but by having your
  own `AWS.get`/`AWS.post` wrapper methods, you can _ensure_ certain
  layers are included; and with this approach that's straightforward
  because they can indeed make their own custom layer as described
  above, and then just ensure every `HTTP.request` call includes the new
  keyword argument to "hook in" their custom layer

Potential downsides:
  * Relying on a global keyword registry; i.e. there's potential for
  clashes if two different packages try to overload
  `Layers.keywordforlayer(::Val{:readtimeout}) = ...`. Whichever package
  is loaded last will "pirate" the keyword arg and overwrite the
  previous method. We could potentially add some extra safety around
  this by having another `clientenv=:default` keyword and then devs
  could overload `Layers.keywordforlayer(::Val{:default},
  ::Val{:readtimeout}) = ...` but that feels a tad icky. I also don't
  expect there to be even dozens of total keywords/layers to process, so
  with that in mind, it also lowers the risk of clash.
  * The previous layer functionality allowed specifying the _exact_
  layer you wanted your custom layer to be inserted before; do we think
  that level of specificity/granularity is really necessary? I'm kind of
  going on a gut feel here that what you really want control over is
  _what kind of args_ your overloaded `request` method will receive and
  that's what my 3 new abstract layer subtypes aim to solve. So you can
  pick 1 of 3 different _general_ layers of where your custom layer gets
  inserted, but don't have control over anything more specific in terms
  of what layer comes before/after
  * One thing I noticed is that the current `stack` function actually
  has some hard-coded logic that checks keyword arguments and
  _conditionally_ will include layers based on the _value_ of the
  keyword arg. We could potentially handle that in the proposed scheme
  by saying that your custom `Layer(next; kw...)` constructor can
  return the `nothing` value, in which case, no layer will be inserted.
  That would give, for example, the `TimeoutLayer` the chance to check
  if the `readtimeout > 0` before returning a `TimeoutLayer` or
  `nothing`
drastically reduce the interface requirements
@c42f
Copy link
Contributor

c42f commented Mar 18, 2022

I just don't follow what this is supposed to mean. Monitoring a package without that package taking a dependency sounds......malicious? Undesirable? Wrong? What am I missing here.

It's about composability.

Suppose you're using BrokenRecord to generate test cases for SomeApplication. Within SomeApplication you make some requests directly using HTTP, but also use several third-party modules OtherModule1 OtherModule2 etc which also make HTTP requests.

The proposal with HTTP.@client yields a customized HTTP client which is lexically scoped within the module of SomeApplication. So it's fairly easy to add BrokenRecord as a dependency within SomeApplication itself. Though I think undesirable if it can't be made optional — BrokenRecord is a package for testing and you'd rather not load the code in production.

The bigger problem is capturing HTTP requests from OtherModule1 and OtherModule2. If these other modules use a separate HTTP.@client which is module-scoped it's impossible to inject the BrokenRecord dependency.

If we removed the default global HTTP client entirely and encouraged all libraries to accept a HTTP client context as a parameter this would "fix" the problem. But obviously that's intrusive on APIs using HTTP anywhere.

An analogy for the dependency injection which is desired here is logging: it's nice to have a logging frontend which is independent of the logging backends. Authors of various modules can independently use @info without caring about the fact that some users want to send their logs to the console, while other users want to send them to a file and others want to store them on AWS.

end
end
if !isempty(cookiestosend)
existingcookie = header(req.headers, "Cookie")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is net-new code; I realized our test here was wrong because by default cookies=false and so that request wasn't actually deleting any cookies, but just didn't include them at all. This is apparent because it should only be deleting the single hey cookie, yet the test was testing that there were no cookies.

The solution here is that we need to track when we automatically include cookies on a request, because in the case of a redirect, the initial response may return a set-cookie that expires or otherwise modifies a cookie we've already added to the request. When redirecting, we re-use the request headers that were set, but for cookies, we want to filter out any cookies that were automatically included on the 1st request and just let the "auto" logic do its thing by including them again if they're applicable. This lets cookies that have expired by removed and not sent in subsequent redirect requests.

It's unfortunate it's kind of lost in such a big refactor PR, but oh well.

@@ -0,0 +1,62 @@
module DefaultHeadersRequest
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new "layer" that splits some of the logic out from MessageRequest.jl layer to a separate layer since the MessageRequest.jl layer is now purely for constructing the Request object at the start of the stack

const STREAM_LAYERS = [timeoutlayer, exceptionlayer, debuglayer]
const REQUEST_LAYERS = [messagelayer, redirectlayer, defaultheaderslayer, basicauthlayer, contenttypedetectionlayer, cookielayer, retrylayer, canonicalizelayer]

pushlayer!(layer; request::Bool=true) = push!(request ? REQUEST_LAYERS : STREAM_LAYERS, layer)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christopher-dG, @c42f, here's the code for how I'm imagining we can do global stack modifications in BrokenRecord and elsewhere. Basically we have our 2 arrays of layer types above and we can document these pushlayer!/poplayer! functions as the way to add/remove layers globally.

end

request(::Type{Union{}}, resp::Response) = resp
macro client(requestlayers, streamlayers=[])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New lexically-scoped client macro generator that provides the most common/convenient user-facing functions that can be defined in your own module.

@@ -10,25 +10,37 @@ module IOExtras
using ..Sockets
using MbedTLS: MbedException

export bytes, ByteView, nobytes, CodeUnits, IOError, isioerror,
export bytes, isbytes, nbytes, ByteView, nobytes, IOError, isioerror,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here are a consolidation of the previous bodybytes/bytes and bodylength functions into a single place. There was some unfortunate mixups of when to use bodybytes vs bytes and we don't really need both.


"""
request(MessageLayer, method, ::URI, headers, body) -> HTTP.Response
messagelayer(method, ::URI, headers, body) -> HTTP.Response
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now require messagelayer to be the 1st/outermost layer in our stack as it translates the method, url, headers, and body into the Request object that all other layers consume.

@@ -103,26 +105,25 @@ Represents a HTTP Response Message.
You can get each data with [`HTTP.status`](@ref), [`HTTP.headers`](@ref), and [`HTTP.body`](@ref).

"""
mutable struct Response <: Message
mutable struct Response{T} <: Message
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Response and Request objects are now both parameterized on their body fields. We previously had some slight awkwardness of having body be hard-coded as Vector{UInt8}, but then needing to pass a separate body argument to all layer functions in case the body was an IO or some other supported format. We now just parameterize body on whatever the user passes and can remove the body_was_streamed business and extra argument passing.

@@ -140,19 +141,19 @@ HTTP.Response(200, headers; body = "Hello")
Response() = Request().response

Response(s::Int, body::AbstractVector{UInt8}) = Response(s; body=body)
Response(s::Int, body::AbstractString) = Response(s, bytes(body))
Response(s::Int, body::AbstractString) = Response(s; body=bytes(body))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this and the method just below were just wrong by not passing body as a keyword argument

response::Response
txcount::Int
parent
url::URI
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We add a new url field because there are many times throughout the client layers where we want access to the original url provided by the user and not just the resource(url) that we store as the target field.

parent
url::URI
parent::Union{Response, Nothing}
context::Context
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We add a new context field that is a Dict{Symbol, Any} that allows some state sharing between layers. It's not used a ton, but it makes a more streamlined/formal way layers can share state if needed instead of relying on some of the other hacks we were doing before (passing state via keyword arguments, or having the txcount field on the Request object, requiring extra positional args to layers, etc.).

verbose::Int=0,
kw...)::Response where Next

function streamlayer(stream::Stream; iofunction=nothing, verbose=0, redirect_limit::Int=3, kw...)::Response
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is our only client-side "handler" function that is not a middleware; i.e. it actually takes the stream, sends the request and reads the response. So all other client layers can be seen as "wrapping" this layer that does the actual send/receive operation.

@c42f
Copy link
Contributor

c42f commented Mar 18, 2022

Cool I haven't yet dived into the details here. But the layers you're suggesting remind me of Mux.jl's middleware. I think this can be powerful (IIUC express.js uses a similar design too?).

So this is interesting and I'll try to look a bit closer at it soon.

I do think closures can be a nice way to define these things, but I'd say that in practice I've found Mux pretty annoying to use! Some issues I've observed:

  • Mux tries to do everything by stacking middleware. There's some nice minimalism to this, but in practice it leads to problems.
  • For example, Mux's routing wants you to add a middleware layer for each route... this leads to really awful stack traces!
  • Having a big stack of closures as data rather than generic functions confuses Revise when the stack is persistent (common in the server case)

I'm not saying these things apply to the design here which I think is a little different from Mux (?) But they're aspects to be aware of: do we want to optimize so that it's easiest to define new middleware vs reuse existing middleware?

@quinnj
Copy link
Member Author

quinnj commented Mar 21, 2022

Ok, I'd like to merge this soon, so I'm going to throw out a: merge-in-24-hours call to any potential reviewers. I think I've addressed everything from previous reviews, and provided a necessary "escape hatch" for the global stack manipulation if needed. I'm punting on the docs for now, because I'm already sketching out a bigger docs overhaul to do before the next release.

Copy link
Member

@mattBrzezinski mattBrzezinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some formatting comments, nothing too in-depth.

import ..@debug, ..DEBUG_LEVEL

abstract type ContentTypeDetectionLayer{Next <: Layer} <: Layer{Next} end
export ContentTypeDetectionLayer
export contenttypedetectionlayer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like these layer names are getting a bit lengthy to read without some way of breaking apart words such as,

content_type_detection_layer

src/DefaultHeadersRequest.jl Outdated Show resolved Hide resolved
Comment on lines +33 to +34
l = nbytes(req.body)
if l !== nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
l = nbytes(req.body)
if l !== nothing
if nbytes(req.body) !== nothing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the l variable in the next line; but this is probably due to being a really poor variable name more than anything.

end
end

const USER_AGENT = Ref{Union{String, Nothing}}("HTTP.jl/$VERSION")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Const definitions under using/import?

@@ -74,6 +75,7 @@ import ..bytes

include("ascii.jl")

const nobody = UInt8[]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uppercase to match style in other files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is how its been defined for ages; but yeah, we should probably go through and update it.

test/chunking.jl Outdated Show resolved Hide resolved
test/client.jl Outdated Show resolved Hide resolved
test/cookies.jl Outdated Show resolved Hide resolved
test/loopback.jl Outdated Show resolved Hide resolved
@debug 1 "Adding Authorization: Basic header."
setkv(headers, "Authorization", "Basic $(base64encode(userinfo))")
function basicauthlayer(handler)
return function(req; basicauth::Bool=true, kw...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the stack traces look like now when everything are anonymous functions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also would it improve stacktraces if these were named functions?

e.g.

Suggested change
return function(req; basicauth::Bool=true, kw...)
return function basicauthhandler(req; basicauth::Bool=true, kw...)

so they have names like (::var"#basicauthhandler#5") (generic function with 1 method)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the backtraces are as legible as they used to be: the use of closures means we still see a nested list of the layers, and the use of a module per layer means its pretty clear where the layer is coming from.

image

quinnj and others added 5 commits March 22, 2022 09:31
Co-authored-by: mattBrzezinski <matt.brzezinski@invenia.ca>
Co-authored-by: mattBrzezinski <matt.brzezinski@invenia.ca>
Co-authored-by: mattBrzezinski <matt.brzezinski@invenia.ca>
Co-authored-by: mattBrzezinski <matt.brzezinski@invenia.ca>
Co-authored-by: mattBrzezinski <matt.brzezinski@invenia.ca>
@quinnj
Copy link
Member Author

quinnj commented Mar 22, 2022

Hmmm, this docs deploy error seems to be some kind of unrelated errro?

Failed to add the RSA host key for IP address '140.82.112.3' to the list of known hosts (/home/runner/.ssh/known_hosts).

@quinnj quinnj merged commit 4b3c633 into master Mar 22, 2022
@quinnj quinnj deleted the jq/value_layers branch March 22, 2022 16:26
sjkelly added a commit to sjkelly/HTTP.jl that referenced this pull request Feb 14, 2023
Seems to have been removed in: JuliaWeb#789
and is no longer needed.
quinnj pushed a commit that referenced this pull request Feb 15, 2023
Seems to have been removed in: #789
and is no longer needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants