Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide WebDownload Checksum algorithms consistent with shasum algorithms #96

Closed
jrstarke opened this issue Jan 18, 2024 · 7 comments
Closed
Labels

Comments

@jrstarke
Copy link

Community Note

  • Please vote on this issue by adding a 馃憤 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
What do you want us to build?

WebDownload checksum algorithms that are consistent with those of shasum.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.

I'm trying to ensure that the artifacts that I'm downloading as part of my build are consistent with those I expect and haven't been tampered with. The SHA256 algorithm consistently yields different results than sha256sum does for the same artifact.

Are you currently working around this issue?
How are you currently solving this problem?

Run the image recipe once, wait for it to fail, get the checksum that it has for the same resource and feed it back in. This doesn't inspire trust in the resources though, as I can't verify that it's actually getting the same thing I'm getting.

Additional context
Anything else we should know?

Wrote this up on Re:Post

Attachments
If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

@jrstarke
Copy link
Author

According to David Cuthbert on my Re:Post question, it appears that WebDownload is also unzipping the gz file before checking the checksum. The value I found in the logs was consistent with the checksum of the unzipped Tar file.

@jrstarke
Copy link
Author

The second thing that I downloaded with WebDownload had the correct SHA. Seems like https://www.pdflib.com/binaries/PDFlib/1001/PDFlib-10.0.1-Linux-x64-php.tar.gz has content encoding of x-gzip, so it gets unzipped on download. The other site doesn't have a content encoding, so doesn't get unzipped.

@austoonz
Copy link

Thank you for reporting this!

Confirming this is definitely a bug. I've cut a ticket internally to the team and will keep you posted on the fix.

@dacut
Copy link

dacut commented Jan 20, 2024

Ugh. @jrstarke, if they're sending a Content-Encoding: x-gzip header, then this is not a bug in EC2 image builder.

From the MDN docs on Content-Encoding:

If the original media is encoded in some way (e.g. a zip file) then this information would not be included in the Content-Encoding header.

Image builder is acting correctly here. The original site needs to drop the Content-Encoding header or the URL should end with .tar, not .tar.gz.

@austoonz
Copy link

@dacut - whether Image Builder is acting correctly due to the content encoding of the source, or not, we've treated this as a bug. It's painful and customers shouldn't have to deal with this.

@jrstarke - This is now resolved. The following component YAML will now run with success with the latest versions of the AWSTOE binary (published within Image Builder and to the S3 Buckets outlined in the AWSTOE Downloads section of the service documentation).

schemaVersion: 1.0
phases:
  - name: build
    steps:
      - name: download
        action: WebDownload
        inputs:
          - source: https://www.pdflib.com/binaries/PDFlib/1001/PDFlib-10.0.1-Linux-x64-php.tar.gz
            destination: /tmp/PDFlib-10.0.1-Linux-x64-php.tar.gz
            algorithm: SHA256
            checksum: 31c589c76d96965ddeec3e3d89c0bf5322513dbe3f523dcc8d2352c6167cdc71

@dacut
Copy link

dacut commented Mar 11, 2024

@austoonz - The edge case you may have to deal with is potentially accepting multiple checksums (for the .tar and .tar.gz). If the original is a .tar file and the server decides to compress it on-the-fly (the intent of the Content-Encoding header), the checksums won't match.

While raw .tar files are unusual, the typical case I've seen is a file containing multiple binaries that are already compressed (or compressed + encrypted), such as firmware or media.

@austoonz
Copy link

@dacut - definitely good to know for sure, thank you. I'll note this to the team for awareness for now.

I'd imagine the use-cases for WebDownload are far more likely to download .tar.gz files (as they are more common), so I suspect this is something (with raw .tar files) we'll evaluate and solve if (or likely when) customers run into the issue you describe.

At least there is still the workaround to use curl or wget directly for any scenario where WebDownload isn't working as intended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants