-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
asw s3 sync repeatedly uploads the same, unchanged files #5216
Comments
Hi @BLuFeNiX , |
Same happens for me when I try to run s3 sync multiple times: It keeps downloading the same identical files. It only happens with files that have a special character in their filename. Such as ó æ á ð ø ú é í. |
I have this same problem but I don't have special characters in my file names. The commands I'm using are: aws s3 sync $BUILD_DIR $S3_BUCKET --include "*.html" \
--metadata-directive REPLACE --expires 2034-01-01T00:00:00Z --acl public-read \
--cache-control no-cache
aws s3 sync $BUILD_DIR $S3_BUCKET --exclude "*.html" \
--metadata-directive REPLACE --expires 2034-01-01T00:00:00Z --acl public-read \
--cache-control max-age=31536000,public A file like |
I figured out what my problem was. Adding the Hope this helps! |
This is not the case with my situation. Files are identical (immutable, in fact) and still upload every time. |
This was exactly my problem. Thank you so much! |
@boomshadow sure ting! |
Hi @BLuFeNiX, Without seeing more of the logs this is difficult to diagnose. Do you happen to be using a third-party S3-compatible API (BackBlaze, DreamHost, etc). The symptom you're encountering sounds a lot like this: #5456 (comment) In that case, the S3-compatible API was using the
This would explain why specifying a more specific prefix path (probably with less files, under 1000) would change the behavior. |
Greetings! It looks like this issue hasn’t been active in longer than a week. We encourage you to check if this is still an issue in the latest release. Because it has been longer than a week since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or add an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one. |
the expected behavior of |
Quick update, in case anyone else experiences similar problems. After much effort, I was able to prove that the original problem that I reported here was due to a bug in the Wasabi S3 service. I reported the bug with a reproducible test case, so the ball is in their court now. |
Even i'm facing the same issue |
I was experiencing this issue. Running the following command:
repeatedly uploaded some files. I confirmed the re-uploaded files hadn't changed and also that the timestamp hadn't changed since the file was initially uploaded. Adding the However, it became apparent that the files that were repeatedly being uploaded had a modified date in year 2076. One might expect that the In anycase, I resolved the issue by setting the last modified file dates to something sensible (create date which was in the past). AWS CLI info... |
@agconti How will we account for when the file has changed but the file size has not changed? For example we change "hello" to "help" this will result in the file not being uploaded to s3? Is there a solution for this case? |
+1.. facing the same issue where existing files are being uploaded again for no reason. |
Encountered the same issue. In my case I could mitigate it like so: # before mitigation: incorrect detection of untouched files:
aws s3 sync _some_directory_name_ ... VS.: # after mitigation: correct detection of untouched files:
aws s3 sync (Resolve-Path -Path '_some_directory_name_').Path ... In other words: An absolute path made it work as expected. |
Including some debug logs of the issues The local file was not modified between runs and shows "2023-06-28 03:53:05-04:00" on the local file system before/after each run Dry run shows the modified time is different, local file is newer than remote file in s3
Running without dry run shows the file was uploaded
Performing the sync again shows that the uploaded file modified timestamp was updated in s3 but incorrectly, it does not match the local modified time and is selected to be re-uploaded as it is evaluated to be newer
The commands used were run in zsh on Mac M1 Version - Command without dryrun - Command with dryrun |
Note that Static Websites typically built with bundled JS/CSS, and the html tag become something like I did the very stupid mistake as written above. I hope this comment prevents people like me to do the same mistake before rolling to production. |
Confirm by changing [ ] to [x] below to ensure that it's a bug:
Describe the bug
A certain folder in my S3-compatible storage is consistently detected as out of sync with my local source directory. ie:
aws s3 sync /data/foo/ s3://my.bucket.tld/foo/
will always print dozens of lines like this, every time it is run.The files are identical, and I have confirmed so by hashing them locally, and then downloading a copy directly via
aws s3 cp
and hashing that too.There is an interesting clue as to the nature of the bug: changing the path to be more specific will get rid of the behavior. For example, rather than running
aws s3 sync /data/foo/ s3://my.bucket.tld/foo/
, if you runaws s3 sync /data/foo/bar/buzz/ s3://my.bucket.tld/foo/bar/buzz/
it will show no changed files, and not upload anything. The directories are the same in both scenarios, but it seems that there is some sort of comparison against other objects in the bucket.SDK version number
aws-cli/2.0.14 Python/3.7.3 Linux/4.19.107 botocore/2.0.0dev18
Platform/OS/Hardware/Device
Ubuntu 20.04 in a docker container
To Reproduce (observed behavior)
Unknown. It only happens with some files.
Expected behavior
Unchanged files will not be uploaded.
Logs/output
I do not wish to post full logs, but here is a relevant message:
It is the same message for every file, even though the file does exist in he destination bucket.
Additional context
It looks like this problem has existed since at least 2014, based on a similar report here:
https://forums.aws.amazon.com/thread.jspa?threadID=146851
I have also tried s3cmd, which does not exhibit this behavior.
The text was updated successfully, but these errors were encountered: