Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] General batch endpoint #1645

Open
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@Toflar
Copy link
Contributor

commented Jan 12, 2018

Q A
Bug fix? no
New feature? yes
BC breaks? no
Deprecations? no
Tests pass? TODO
Fixed tickets
License MIT
Doc PR TODO

One of the major disadvantages of REST is that under certain circumstances you need a lot of requests to get to a desired state whether that is for data retrieval or data creation does not really matter. I think for data retrieval you should think about using GraphQL which we already have support for (awesome!).
This PR is about improving batch creation of new data via REST.
Imagine you have a quiz and you need to store 50 answers to that quiz. That would be 50 requests in my case because an answer is a resource.

So then I started thinking about how my batch endpoint should look like. Then, as I was thinking about the concept, I figured I could maybe build a general solution so I quickly put together this WIP to outline and allow for discussions 馃槃

It's far from being complete but here's the idea:
There's a general endpoint /batch that accepts POST requests. The structure is an array that represents single requests as if they would've been executed one at the time. Example:

POST /batch HTTP/1.1
Accept: application/json
Content-Type: application/json
Content-Length: 201

[
{
  "path": "/quizanswer",
  "method": "POST",
  "body": "..."
},
{
  "path": "/quizanswer",
  "method": "POST",
  "body": "..."
},
{
  "path": "/quizanswer",
  "method": "POST",
  "body": "..."
},
]

If you want you can also add headers which by default are the same as the ones from the master request (otherwise you would have to provide Content-Type etc. for every single entry).
The special batch endpoint then just takes every single entry and executes a sub request for them. So internally they are handled exactly as if you executed 50 isolated requests one after the other but from the outside, it's just one request. Neat, no? 馃槃
As a response you get an array with the most important data from each of the requests.

This allows you to batch-create any data. You can even combine different endpoints in the same request! It's a general purpose API endpoint bringing maximum flexibility to API Platform. I would not recommend to mix multiple resources in one batch request but I think we should leave that decisions to the developers.

Here's a few things to consider, todo's or just remarks:

  • Agree on the concept. Do you like it? Does it make sense to you?
  • Unit tests, once agreed on concept.
  • Make /batch path configurable, do you even like the name?
  • Is an array with path, methodetc. too cumbersome? Better ideas? I thought about just providing the requests in HTTP format (so POST /quizanswer....) but that would be a lot of repetition for e.g. Host: .. which is useless in that case anyway.
  • Should we use 207 as status code for that even though 207 is WebDav? But Multi-Status is somehow correct. Stick to http://www.odata.org/documentation/odata-version-3-0/batch-processing, so we use 202 Accepted.
  • Swagger integration
  • more?
@dunglas

This comment has been minimized.

Copy link
Member

commented Jan 13, 2018

Hi,

This is a very good idea and something we definitely need. Actually, this is on my todo list for a while and it鈥檚 awesome that you start working on this topic! Thank you.

I had a bit different implementation in mind. There is a specification for batch endpoints in the Odata protocol that is well designed: http://www.odata.org/documentation/odata-version-3-0/batch-processing/

The main idea is to send a multipart HTTP request to the batch endpoint that contains inner HTTP requests. It fits very well with the architecture of API Platform and Symfony, and allows to use 100% of the existing features of the framework (including content-negotiation).

It can be implemented this way:

  • A /$batch endpoint will be called by the client and include all inner requests in multipart, the boundaries will be used as identifier
  • The related Symfony controller will parse the outer multipart request to get the raw content of every embedded HTTP request
  • It will convert the raw content of the request to a Symfony Request instance
  • A Symfony subrequest will be executed for every inner HTTP request (as in your current implementation)
  • The batch controller will generate a multipart response containing all sub-responses (associated with the original id sent in the boundary by the client, so the client will be able to match the response with the original request)

OData also defines a way to use new entities in other requests in the same batch, it would be nice to support this in a second time.

There is almost everyhing we need in Symfony and API Platform to implement this, the only big missing part is... a HTTP parser. AFAIK, there is no PHP function nor popular library to parse a raw HTTP request. Symfony's HttpFoundation uses directly the PHP superglobals (Request::fromGlobals).
We will need to create a new class to convert a raw HTTP request to superglobal-like arrays, then pass these arrays to the Request contructor.

WDYT @Toflar?

@dunglas

This comment has been minimized.

Copy link
Member

commented Jan 13, 2018

Regarding your specific use case, you can already use the "embedded relations" feature of API Platform like this:

  • Add an explicit quiz -> answer relation
  • Use serialization groups to allow embedding answers objects when updating a quizz resource
  • Send all the answers this way

API Platform will already take care of updating existing answers, and creating the new ones.

A batch endpoint will only be useful when reading or updating resources with no relations at all between them.

@Toflar

This comment has been minimized.

Copy link
Contributor Author

commented Jan 14, 2018

Hey, thanks for the feedback!

Basically, what Odata does, is what I meant with

Is an array with path, method etc. too cumbersome? Better ideas? I thought about just providing the requests in HTTP format (so POST /quizanswer....) but that would be a lot of repetition for e.g. Host: .. which is useless in that case anyway.

I thought about it over the weekend and I have to admit that I don't fancy their solution. It is maybe the cleaner solution from a protocol perspective, yes. However, to me it's waaay to complicated to use. It would also require the client side to be able to build and parse http multipart requests/responses, not just API platform on the server side. So if you're for example using the API from within some JS with JSON, all you need is natively supported by JS and thus in any browser. So for batch requests you all of a sudden need an http parser, you need to understand http multipart requests with bounderies etc. I think acceptance of such and endpoint would be very low compared to a simple array of requests provided in the same data format you already work with for the rest of the API.

So in short: What Odata does is exactly the same as my proposal, it just makes use of HTTP spec possibilities. Normally I'd go for the spec but in that case I think I would go for the more straightforward solution because I'm pretty much convinced that acceptance would be way higher. And after all that's imho what an API is all about: making it easy for other developers to use and get the job done 馃槃

Wdyt?

BTW: Thank you for the hint regarding the relation, I might use that too, awesome! 馃憤

@dunglas

This comment has been minimized.

Copy link
Member

commented Jan 15, 2018

There is an existing JS implementation for OData: https://stackoverflow.com/a/23805287/1352334
It's just a few lines of code, and we can easily provide our own JS library if we want to.

It looks cleaner to me to stick with HTTP (and reuse the Microsoft's R&D work) than re-creating a custom protocol.

@Toflar

This comment has been minimized.

Copy link
Contributor Author

commented Jan 15, 2018

Okay, fine with me! What about https://github.com/Riverline/multipart-parser?

@Koc

This comment has been minimized.

Copy link

commented Jan 18, 2018

IMHO it is better to solve this problem using custom DTOs

@Toflar

This comment has been minimized.

Copy link
Contributor Author

commented Jun 14, 2018

I've started to work on a library to parse HTTP requests. It's WIP but should already get us pretty far. See https://github.com/Toflar/http-request-parser 馃槃

@soyuka soyuka referenced this pull request Jul 5, 2018

Open

POST collection #747

@asimonf

This comment has been minimized.

Copy link

commented Sep 12, 2018

Is this already decided? By the way, how would one go about making the batch operation atomic the Odata-way?

@dunglas

This comment has been minimized.

Copy link
Member

commented Sep 12, 2018

@asimonf can you elaborate a bit?

@bendavies

This comment has been minimized.

Copy link
Contributor

commented Jan 28, 2019

any interest in progressing this?

@Toflar

This comment has been minimized.

Copy link
Contributor Author

commented Jan 28, 2019

Not from my side atm, happy to help finishing and releasing a stable version of the parser in case it鈥榮 needed though.

@asimonf

This comment has been minimized.

Copy link

commented Jan 28, 2019

@dunglas sorry it took so long to answer. I actually missed the notification. What I meant was, suppose a batch using something like Odata were to be implemented, would there be any way to make such an operation atomic? As in, if one of the requests fail, all of them must fail?

@Toflar

This comment has been minimized.

Copy link
Contributor Author

commented Apr 6, 2019

As Symfony now has a MIME component I wanted to see how we can take advantage of core development here or contribute to make that possible. I've discussed briefly with @fabpot at EUFOSSA Hackathon. Just writing that down here for future reference, I hope I did get everything right 馃槃 :

So currently the MIME component supports generating MIME messages but not the other way around (no parser, yet). Moreover, multipart handling is the same for both HTTP and MIME so the logic can be shared there. Parsing HTTP is in fact pretty straight forward but for e-mails it's a lot harder due to encoding, maximum line lengths and a lot more.
But having a parser in the MIME component itself might be very interesting because it would also allow to serialize and unserialize MIME messages (e.g. for the messenger component).
So any efforts in creating a mulitpart capable HTTP parser should be directed to the MIME component.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can鈥檛 perform that action at this time.