Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[epic] [design] Custom transfer mode for multipart uploads #11

Closed
shevron opened this issue Apr 30, 2020 · 8 comments
Closed

[epic] [design] Custom transfer mode for multipart uploads #11

shevron opened this issue Apr 30, 2020 · 8 comments
Assignees
Labels
design A design task enhancement New feature or request

Comments

@shevron
Copy link
Contributor

shevron commented Apr 30, 2020

This is a container ticket to discuss the design of a custom transfer adapter supporting multipart upload. This is not a part of the official git-lfs spec, but will be extremely valuable to us and if it works, could be used by custom git-lfs clients, and eventually could be proposed as an addition to the LFS protocol.

Goal

Spec a transfer protocol that will allow uploading files in parts to a storage backend, focusing on cloud storage services such as S3 and Azure Blobs.

Design goals:

Must:

  • Abstract vendor specific API and flow into a generic protocol
  • Remain as close as possible to the basic transfer API
  • Work at least with the multi-part APIs of S3 and Azure Blobs, and local storage

Nice / Should:

  • Define how uploads can be resumed by re-doing parts and not-redoing parts that were uploaded successfully (this may be vendor specific and not always supported)

Initial Protocol design

  • The name of the transfer is multipart-basic
  • {"operation": "download"} requests work exactly like basic download request with no change
  • {"operation": "upload"} requests will break the upload into several actions:
    • init (optional), a request to initialize the upload
    • parts (optional), zero or more part upload requests
    • commit (optional), a request to finalize the upload
    • verify (optional), a request to verify the file is in storage, similar to basic upload verify actions
  • Just like basic transfers, if the file fully exists and is committed to storage, no actions will be provided and the upload can simply be skipped
  • Requests are the same as basic requests except that {"transfers": ["multipart-basic", "basic"]} is the expected transfers value.
  • Authentication and authorization behave just like with the basic protocol

Request Objects

The init, commit and each one of the parts actions contain a "request spec". These are similar to basic transfer adapter actions but in addition to href and header also include method (optional) and body (optional) attributes, to indicate the HTTP request method and body. This allows the protocol to be vendor agnostic, especially as the format of init and commit requests tends to vary greatly between storage backends.

The default values for these fields depends on the action:

  • init defaults to no body and POST method
  • commit defaults to no body and POST method
  • parts requests default to PUT method and should include the file part as body, just like with basic transfer adapters.

In addition, each parts request will include the pos attribute to indicate the position in bytes within the file in which the part should begin, and size attribute to indicate the part size in bytes. If pos is omitted, default to 0. If size is omitted, default to read until the end of file.

Examples

Sample Upload Request

The following is a ~10mb file upload request:

{ 
  "transfers": ["multipart-basic", "basic"],
  "operation": "upload",
  "ref": "some-ref",
  "objects": [
    {
      "oid": "20492a4d0d84f8beb1767f6616229f85d44c2827b64bdbfb260ee12fa1109e0e",
      "size": 10000000
    }
  ]
}

Sample Upload Response:

The following is a response for the same request, given an imagined storage backend:

{
  "transfer": "multipart-basic",
  "objects": [
    {
      "oid": "20492a4d0d84f8beb1767f6616229f85d44c2827b64bdbfb260ee12fa1109e0e",
      "size": 10000000,
      "actions": {
        "parts": [
          {
            "href": "https://foo.cloud.com/storage/upload/20492a4d0d84?part=0",
            "header": {
              "Authorization": "Bearer someauthorizationtokenwillbesethere"
            },
            "pos": 0,
            "size": 2500000
          },
          {
            "href": "https://foo.cloud.com/storage/upload/20492a4d0d84?part=1",
            "header": {
              "Authorization": "Bearer someauthorizationtokenwillbesethere"
            },
            "pos": 2500001,
            "size": 2500000
          },
          {
            "href": "https://foo.cloud.com/storage/upload/20492a4d0d84?part=2",
            "header": {
              "Authorization": "Bearer someauthorizationtokenwillbesethere"
            },
            "pos": 5000001,
            "size": 2500000
          },
          {
            "href": "https://foo.cloud.com/storage/upload/20492a4d0d84?part=3",
            "header": {
              "Authorization": "Bearer someauthorizationtokenwillbesethere"
            },
            "pos": 7500001
          }
        ],
        "commit": {
          "href": "https://lfs.mycompany.com/myorg/myrepo/multipart/commit",
          "authenticated": true,
          "header": {
            "Authorization": "Basic 123abc123abc123abc123abc123=",
            "Content-type": "application/vnd.git-lfs+json"
          },
          "body": "{\"oid\": \"20492a4d0d84\", \"size\": 10000000, \"parts\": 4, \"transferId\": \"foobarbazbaz\"}"
        },
        "verify": {
          "href": "https://lfs.mycompany.com/myorg/myrepo/multipart/verify",
          "authenticated": true,
          "header": {
            "Authorization": "Basic 123abc123abc123abc123abc123="
          },
        }
      }
    }
  ]
}

As you can see, the init action is omitted as will be the case with many backend implementations (we assume initialization, if needed, will most likely be done by the LFS server at the time of the batch request).

Chunk sizes

It is up to the LFS server to decide the size of each file chunk.

TBD: Should we allow clients to request a chunk size? Is there reason for that?

@shevron shevron added enhancement New feature or request design A design task labels Apr 30, 2020
@shevron
Copy link
Contributor Author

shevron commented Apr 30, 2020

Problem: abstracting vendor-specific multipart APIs that require state and capturing response data

There are complex, hard to abstract requirements from cloud vendors for multipart uploads - for example S3 needs the client is expected to retain the ETag header returned for each part and then send them all encapsulated in a specific XML envelope to S3 upon completion.

Needless to say the Azure BlockBlob chunked upload API is quite different.

Q: Does this mean there can't be a generic multi-part upload transfer adapter that can handle multiple vendors?

A: It means that most of the init / commit logic should be handled in the LFS server side, not the client; As long as the client is the one directly pushing the actual data blocks (all the parts requests), I still believe there is great benefit in defining this as a transfer protocol. The server will just have to handle vendor-specific logic in init / commit.

Q: Should we still allow custom init and commit actions?

A: commit - for sure, it has to be somehow sent by the client, even if the URL is an LFS server URL. init - not sure but most likely yes, defining it as part of the protocol can help support additional storage backend in the future.

Q: Should we still support allowing custom HTTP method and body in actions?

A: This may be a YAGNI thing; On the other hand, I find this to be lacking from the basic transfer protocol and is required to support really a really flexible array of vendors. I am really on the fence on this one.

Wait, so how do we solve the S3 Etag capturing problem?

We can try to let the client upload all the parts without capturing the ETag value, and then when the client calls the server to commit, issue a ListParts call to get all the uploaded parts and their ETags. (aside: this could also be used in init to identify missing parts for a pre-existing objec?). Then send a CompleteMultipartUpload request with the ETag values to finalize.

We should further investigate this approach by:

  1. Testing it
  2. See what other vendors (namely Azure and GCP) expects and how it compares
  3. Think what this means in terms of security and data consistency

Thought: we should define a response to the commit action in the protocol that says something like "you still have parts to upload" or "the data is not consistent yet". Most likely, a 422 response with a specific message basically telling the client to call the batch API again - if the problem is that some parts are missing, the reply should be a new upload reply with only the missing parts listed under parts.

@shevron
Copy link
Contributor Author

shevron commented May 1, 2020

Note that if we remove the custom body from commit / init, they will still need to allow some custom JSON attributes beyond the oid and size of the object. For example, S3 requires an uploadId token. Since we want to avoid or at least minimize state in the server, the client should retain this value and send it when calling commit.

Same goes for the init response - it will provide the client with such values and the client will need to know to store and send them again when calling commit.

For this reason, I think it may make sense to just allow specifying a custom body as in the example above, which means we can keep clients dumb.

@shevron
Copy link
Contributor Author

shevron commented May 1, 2020

Both S3 and Azure support content integrity verification using the Content-MD5 header which is a standard HTTP feature. It would be great if we can somehow incorporate this into the protocol. I'm wondering if we need flexibility (to specify the digest algorithm and header / other param in use) or can just hard-code support for Content-Md5.

A couple of suggestions:

MD5 specific

"actions": {
  "parts": [
    {
      "href": "https://foo.cloud.com/storage/upload/20492a4d0d84?part=3",
      "header": {
        "Authorization": "Bearer someauthorizationtokenwillbesethere"
      },
      "pos": 7500001,
      "set_content_md5": true
    }
  ]
}

Setting the set_content_md5 flag to true will tell the client to calculate an MD5 digest of the part, and send it as the content-md5 header.

More generic approach to content digest

This is inspired by RFC-3230 and RFC-5843 which define a more flexible approach to content digest in HTTP. The following approach specifies what digest header(s) to send with the content in an RFC-3230 like manner:

"actions": {
  "parts": [
    {
      "href": "https://foo.cloud.com/storage/upload/20492a4d0d84?part=3",
      "header": {
        "Authorization": "Bearer someauthorizationtokenwillbesethere"
      },
      "pos": 7500001,
      "want_digest": "contentMD5"
    }
  ]
}

The want_digest attribute value is to follow the spec in RFC-3230 section 4.3.1, with possible algorithms as specified by RFC-5853.

RFC-3230 defines contentMD5 as a special value which tells the client to send the Content-MD5 header with an MD5 digest of the payload in base64 encoding.

Other possible values include a comma-separated list of q-factor flagged algorithms, one of MD5, SHA, SHA-256 and SHA-512. Of one or more of these are specified, the digest of the payload is to be specified by the client as part of the Digest header, using the format specified by RFC-3230 section 4.3.2. For example:

"actions": {
  "parts": [
    {
      "href": "https://foo.cloud.com/storage/upload/20492a4d0d84?part=3",
      "header": {
        "Authorization": "Bearer someauthorizationtokenwillbesethere"
      },
      "pos": 7500001,
      "want_digest": "sha-256;q=1.0, md5;q=0.5"
    }
  ]
}

Will cause the client to set a request:

HTTP/1.1 PUT /storage/upload/20492a4d0d84?part=3
Authorization: Bearer someauthorizationtokenwillbesethere
Digest: SHA-256=thvDyvhfIqlvFe+A9MYgxAfm1q5=,MD5=qweqweqweqweqweqweqwe=

...

NOTE: Azure allows for a crc32 based check, but also supports content-md5 so I am not sure crc32 has any benefit over md5.

@shevron
Copy link
Contributor Author

shevron commented May 1, 2020

Point to consider: add a cancel or abort URL that can be used to optionally allow clients to cancel a started upload. This could allow cleaning up uploaded parts if supported / required by the storage vendor.

How vendors handle uncomitted partial uploads:

  • S3: You pay for them until you clean them up, but you can set up automatic cleanup after a period of time using object lifecycle management
  • Azure: They are cleaned up automatically after 7 days
  • Google: Resumable uploads - couldn't find any reference to how an unfinished resumable upload is managed, so I assume it is just a regular object with some missing data, which means it should be cleaned up manually.

Of course, there is no guarantee that the client will be able to successfully call the abort action even if supported - so maybe this is moot. Still could be nice.

@shevron
Copy link
Contributor Author

shevron commented May 1, 2020

Supporting cleanup of unfinished uploads in GCP could be implemented by tagging objects as "draft" when we init, and removing that tag when we commit. Then, users can write an external script to delete objects tagged as draft older than a certain age.

@shevron
Copy link
Contributor Author

shevron commented May 1, 2020

Q: does it make sense to have some kind of "common to all parts" attribute for the operation that specifies headers and other attributes that may be common to all parts objects?

These could be:

  • header values
  • want_digest values
  • Base URL

Pros: More compact and clean messages, less repetition
Cons: More divergence from the basic protocol, less encapsulation of objects, will require more complex code at least on the client side (client needs to have more state shared between action requests).

I am leaning against it, but may be convinced otherwise ;)

@pdelboca pdelboca self-assigned this Jul 20, 2020
@pdelboca pdelboca changed the title [design] Custom transfer mode for multipart uploads [epic] [design] Custom transfer mode for multipart uploads Aug 3, 2020
@shevron shevron self-assigned this Aug 11, 2020
@shevron
Copy link
Contributor Author

shevron commented Aug 11, 2020

Undecided:

  • Should we name it multipart or multipart-basic?
  • Define a generic way to reply to commit if not all parts have been uploaded successfully
  • Define a generic way to reply to init if some parts have already been uploaded
  • Should we allow clients to request a chunk size? Is there reason for that?
  • Should we have a "common to all parts" attribute to specify common headers, want_digest, etc.?
  • Should we define an "abort" action to support clean cancelling?
  • Specify handling of content digest (prob accept the RFC based suggestion above)

Tasks:

  • Get answers for undecided questions
  • Write up a "0.9" protocol draft (~1 hr)
  • Implement a local storage / giftless internal views based implementation to test the protocol (~1 day)
  • Design and estimate implementation for Azure, S3 and GCP (~2 hours each)
  • Implement specific backend support (~1-2 days each)

@shevron
Copy link
Contributor Author

shevron commented Oct 4, 2020

Discussion has been summarized into a spec doc here: https://github.com/datopian/giftless/blob/feature/11-multipart-protocol/multipart-spec.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design A design task enhancement New feature or request
Development

No branches or pull requests

2 participants