Skip to content

aws s3 cp streaming to stdout fails silently #3977

@hguturu

Description

@hguturu

We recently synced one of our s3 buckets to another bucket and were doing checks that the files moved successfully by doing md5 checksums. We noticed that sometimes the aws s3 cp stream truncates without returning an error or non-zero error code. This may only be relevant for large files - some of our files are 100G+.

e.g.
Round 1:

aws s3 cp s3://$BUCKET1/largefile - | md5sum
d1c2e6835a929e32df189ccfddf1d3fe  -
aws s3 cp s3://$BUCKET2/largefile - | md5sum
927958e42554e40a4f3c37fcced4ba22  -

Did our sync not happen properly? Lets try again.
Round 2:

aws s3 cp s3://$BUCKET1/largefile - | md5sum
927958e42554e40a4f3c37fcced4ba22  -
aws s3 cp s3://$BUCKET2/largefile - | md5sum
927958e42554e40a4f3c37fcced4ba22  -

Looks like it did happen successfully. We realized this was a file stream ending early since we have another special file format that has an marker to denote end of file and we weren't detecting that with our downstream tools.

This issue seems some how memory related -- typically when we see these failures aws s3 cp starts using a lot of memory. But, when we repeat the same job for success the memory usage is much lower.

We are using aws s3 --version:
aws-cli/1.16.102 Python/2.7.14 Linux/4.14.94-89.73.amzn2.x86_64 botocore/1.12.92.

Metadata

Metadata

Assignees

No one assigned

    Labels

    closing-soonThis issue will automatically close in 4 days unless further comments are made.needs-reproductionThis issue needs reproduction.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions