New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address server errors received during layer upload #121
Address server errors received during layer upload #121
Conversation
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This changeset addresses intermittent internal server errors encountered during pushes. The root cause has been isolated to layers that result in identical, empty filesystems but may have some path declarations (imaginge "./"), resulting in different tarsums. The main error message reported during these upload problems was a 500 error, which was not correct. Further investigation showed the errors to be rooted in digest verification when finishing uploads. Inspection of the surrounding code also identified a few issues. PutLayerChunk was slightly refactered into PutLayerUploadComplete. Helper methods were avoided to make handler less confusing. This simplification leveraged an earlier change in the spec that moved non-complete chunk uploads to the PATCH method. Simple logging was also added in the unknown error case that should help to avoid mysterious 500 errors in the future. At the same time, the glaring omission of a proper layer upload cancel method was rectified. This has been added in this change so it is not missed in the future. In the future, we may want to refactor the handler code to be more straightforward, hopefully letting us avoid these problems in the future. Added test cases that reproduce these errors and drove these changes include the following: 1. Push a layer with an empty body results in invalid blob upload. 2. Push a layer with a different tarsum (in this case, empty tar) 3. Deleting a layer upload works. 4. Getting status on a deleted layer upload returns 404. Common functionality was grouped into shared functions to remove repitition. The API tests will still require future love. Signed-off-by: Stephen J Day <stephen.day@docker.com>
|
||
// ----------------------------------------- | ||
// Do layer push with an invalid body | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are probably missing two cases here:
- Empty tar file (zero-length) with the correct digest should be accepted.
- Tar file with block of 1024 zero-valued bytes (a valid tar) with correct digest should be accepted.
Both of these cases result in a 400 error with this changeset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, both of these cases have the digest tarsum.v1+sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
.
@stevvooe I confirm:
|
@tiborvass Thank you! I have a few more changes to make in this PR then we'll get it out. |
Several API tests were added to ensure correct acceptance of zero-size and empty tar files. This led to several changes in the storage backend around the guarantees of remote file reading, which backs the layer and layer upload type. In support of these changes, zero-length and empty checks have been added to the digest package. These provide a sanity check against upstream tarsum changes. The fileReader has been modified to be more robust when reading and seeking on zero-length or non-existent files. The file no longer needs to exist for the reader to be created. Seeks can now move beyond the end of the file, causing reads to issue an io.EOF. This eliminates errors during certain race conditions for reading files which should be detected by stat calls. As a part of this, a few error types were factored out and the read buffer size was increased to something more reasonable. Signed-off-by: Stephen J Day <stephen.day@docker.com>
LGTM |
// a zero-length blob into a nonzero-length blob location. To | ||
// prevent this horrid thing, we employ the hack of only allowing | ||
// to this happen for the zero tarsum. | ||
if dgst == digest.DigestTarSumV1EmptyTar { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the case where the underlying file is deleted? Would the upload have to sent empty content?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If the upload was empty, the file won't exist. This could actually happen.
- Having the file deleted is more contrived. It could happen if two uploads to the same repository are being finished at the same time. Very contrived but we should handle it.
LGTM |
Address server errors received during layer upload
This changeset addresses intermittent internal server errors encountered during pushes. The root cause has been isolated to layers that result in identical, empty filesystems but may have some path declarations (imaginge "./"), resulting in different tarsums. The main error message reported during these upload problems was a 500 error, which was not correct. Further investigation showed the errors to be rooted in digest verification when finishing uploads.
Inspection of the surrounding code also identified a few issues. PutLayerChunk was slightly refactered into PutLayerUploadComplete. Helper methods were avoided to make handler less confusing. This simplification leveraged an earlier change in the spec that moved non-complete chunk uploads to the PATCH method. Simple logging was also added in the unknown error case that should help to avoid mysterious 500 errors in the future. This is bolstered by the addition of the ErrLayerUploadUnavailable which differentiates unknown layers from short or missing data.
At the same time, the glaring omission of a proper layer upload cancel method was rectified. This has been added in this change so it is not missed in the future.
In the future, we may want to refactor the handler code to be more straightforward, hopefully letting us avoid these problems in the future.
Added test cases that reproduce these errors and drove these changes include the following:
Common functionality was grouped into shared functions to remove repitition. The API tests will still require future love.
Also, an erroneous error code included in the delete API specification for layer uploads was removed.
Closes #112.