Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: packaging of built artifacts #1789

Merged
merged 3 commits into from
Feb 18, 2020
Merged

Conversation

sriram-mv
Copy link
Contributor

@sriram-mv sriram-mv commented Feb 11, 2020

Why is this change necessary?

  • sam deploy of the same built artifacts can result in new uploads to s3
    during the package step.

How does it address the issue?

  • Hash of the directory is constructed rather than on the packaged zip
    file.

What side effects does this change have?

  • Potentially None

Checklist:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jfuss
Copy link
Contributor

jfuss commented Feb 11, 2020

Is there a way we can integ test this? It will help guard against regressions in the future (especially as this has been a pain point for customers for some time now).

@sriram-mv sriram-mv force-pushed the packaging_fix branch 3 times, most recently from d495cfa to 8dc4b44 Compare February 12, 2020 01:51

def file_checksum(file_name):
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain here it is specifically used with sam deploy and might potentially be back-compat issues if we break this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I can add a series of comments there.

md5 checksum of the directory.

"""
md5_dir = hashlib.md5()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is md5 the right one? I am not sure if md5 is stable enough or has issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for checksum purposes it should be fine (I dont think its cryptographically secure though), we can also look at sha256 as well.

"""
md5_dir = hashlib.md5()
# Walk through given directory and find all directories and files.
for dirpath, _, filenames in os.walk(directory, followlinks=followlinks):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have to catch circular links?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how os.walk handles that, can look into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to create a series of files that had a circular symlink reference.

 ~/w/ap/py37/s/.aws-sam/build/HelloWorldFunction ▓▒░ ls -lrth a                                                                                 ░▒▓ ✔ | py3.7 Py | 12:46:18
lrwxr-xr-x  1 srirammv  ANT\Domain Users     1B Feb 12 12:46 a -> c
 ~/w/ap/py37/s/.aws-sam/build/HelloWorldFunction ▓▒░ ls -lrth b                                                                                 ░▒▓ ✔ | py3.7 Py | 12:46:23
lrwxr-xr-x  1 srirammv  ANT\Domain Users     1B Feb 12 12:46 b -> a
 ~/w/ap/py37/s/.aws-sam/build/HelloWorldFunction ▓▒░ ls -lrth c                                                                                 ░▒▓ ✔ | py3.7 Py | 12:46:25
lrwxr-xr-x  1 srirammv  ANT\Domain Users     1B Feb 12 12:46 c -> b

Get this error:

Error: Unable to upload artifact HelloWorldFunction referenced by CodeUri parameter of HelloWorldFunction resource.
[Errno 62] Too many levels of symbolic links: '/Users/srirammv/workspace/apps/py37_app/sam-app/.aws-sam/build/HelloWorldFunction/a'

So Looks like circular references are taken care of.

# Go through every file in the directory and sub-directory.
for filepath in [os.path.join(dirpath, filename) for filename in filenames]:
# Encode file's checksum to be utf-8 and bytes.
md5_dir.update(file_checksum(filepath).encode("utf-8"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hang on, you are just checksumming the file path and not the content of the files. Is this intentional? If so, Why is this sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'm passing it through to the file_checksum method.

Why is this change necessary?

* sam deploy of the same built artifacts can result in new uploads to s3
during the package step.

How does it address the issue?

* Hash of the directory is constructed rather than on the packaged zip
file.

What side effects does this change have?

* Potentially None
* Remove dependency on 3rd party library.
* Integration test to check if repeated builds dont trigger new
deployments.
Why is this change necessary?

* Just changing a file name should trigger a new package/deploy
operation.

How does it address the issue?

* Checksumming on both file name and contents generates a new checksum,
which will not match with a checksum where the file name was the same as
before.

What side effects does this change have?

* None
Copy link
Contributor

@jfuss jfuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants