Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws s3 sync function re-uploading all the files in the Pronet network #108

Open
kcho opened this issue Jul 13, 2022 · 6 comments
Open

aws s3 sync function re-uploading all the files in the Pronet network #108

kcho opened this issue Jul 13, 2022 · 6 comments

Comments

@kcho
Copy link
Member

kcho commented Jul 13, 2022

aws s3 sync SOURCE s3://TARGET/TARGET_DIR has been re-uploading all the files under PHOENIX. This issue can be replicated by creating a folder by

cd /mnt/ProNET/Lochness/PHOENIX
aws s3 sync ${PWD}/GENERAL s3://prod-ampscz-pronet/PHOENIX_ROOT_PRONET/GENERAL

Repeating the aws s3 sync right after the previous run, does not re-upload files, but after about a minute from the previous aws s3 sync run, it re-uploads files with no-change.

stat and sha1sum have been checked in between aws s3 sync runs, to make sure there is no change in the file being re-uploaded.

(base) kc2357@PronetProd:/mnt/ProNET/Lochness/PHOENIX/GENERAL/PronetYA$ stat PronetYA_metadata.csv
  File: PronetYA_metadata.csv
  Size: 410             Blocks: 8          IO Block: 1048576 regular file
Device: 36h/54d Inode: 14641028700104065786  Links: 1
Access: (0774/-rwxrwxr--)  Uid: ( 1001/    zt84)   Gid: ( 1004/  pronet)
Access: 2022-07-13 18:01:48.386542000 +0000
Modify: 2022-06-29 19:02:07.722772000 +0000
Change: 2022-06-29 19:02:07.722772000 +0000
 Birth: -
(base) kc2357@PronetProd:/mnt/ProNET/Lochness/PHOENIX/GENERAL/PronetYA$ sha1sum PronetYA_metadata.csv
409392e84761d6612a80d8d6d6b27e8352a1deda  PronetYA_metadata.csv
(base) kc2357@PronetProd:/mnt/ProNET/Lochness/PHOENIX/GENERAL/PronetYA$

This behaviour is also noted here
aws/aws-cli#5216

@tashrifbillah
Copy link

Was the intended behavior to bypass unchanged files during next round's upload?

@kcho
Copy link
Member Author

kcho commented Jul 13, 2022

Yes. aws s3 sync behaves in the intended way of bypassing unchanged files in Prescient and Pronet dev serve, but it's showing this pattern in the Pronet production server.

@kcho
Copy link
Member Author

kcho commented Jul 14, 2022

Update:

This issue of aws s3 sync re-uploading unchanged files is only observed under /mnt, where the ownership of the file is forcefully changed to the administrator's id.

Repeating the commands below does not re-upload any files.

cd ~/test
aws s3 sync ${PWD} s3://prod-ampscz-pronet/TEST_aws

Repeating the commands below re-upload files, when it's repeated with about 1 min break

cd /mnt/ProNET/Lochness/test  # this 'test' directory is exact copy of ~/test
aws s3 sync ${PWD} s3://prod-ampscz-pronet/TEST_aws

@kcho
Copy link
Member Author

kcho commented Jul 20, 2022

Notes:

aws s3 sync /mnt/ProNET/Lochness/PHOENIX/GENERAL s3://prod-ampscz-pronet/PHOENIX_ROOT_PRONET/GENERAL --dryrun --debug 2>&1 | grep "modified time" > log1.txt
aws s3 sync /mnt/ProNET/Lochness/PHOENIX/GENERAL s3://prod-ampscz-pronet/PHOENIX_ROOT_PRONET/GENERAL --dryrun --debug 2>&1 | grep "modified time" > log2.txt
diff log1.txt log2.txt

modified time detected by aws s3 sync constantly changes.

@kcho
Copy link
Member Author

kcho commented Jul 20, 2022

Issue still present in AWS CLI version 2.7.16 and still specific to /mnt

@kcho
Copy link
Member Author

kcho commented Aug 18, 2022

Testing to see if the same mount on the dev server (mounted at /mnt on the dev server)

  1. create test files on the prod server
mkdir /mnt/ProNET/aws_test_kcho
touch /mnt/ProNET/aws_test_kcho/ha
touch /mnt/ProNET/aws_test_kcho/ho
  1. register dev s3 bucket credentials on the prod server
aws configure --profile test_dev
aws s3 ls s3://pronet-test/TEST_PHOENIX_ROOT_PRONET_PROD/ --profile test_dev
  1. test if the unexpected behavior continuously updated modified time is observed with the dev s3 bucket on the prod server
aws s3 sync /mnt/ProNET/aws_test_kcho s3://pronet-test/aws_test_kcho --profile test_dev
aws s3 sync /mnt/ProNET/aws_test_kcho s3://pronet-test/aws_test_kcho --profile test_dev --dryrun --debug 2>&1 | grep "modified time"

Yes - Modified time changes on the production server

(base) kc2357@PronetProd:/mnt/ProNET$ sudo aws s3 sync /mnt/ProNET/aws_test_kcho s3://pronet-test/aws_test_kcho --profile test_dev --dryrun --debug 2>&1 | grep "modified time"
2022-08-18 18:01:49,426 - MainThread - awscli.customizations.s3.syncstrategy.base - DEBUG - syncing: /mnt/ProNET/aws_test_kcho/ha -> pronet-test/aws_test_kcho/ha, size: 0 -> 0, modified time: 2022-08-18 18:01:49.317183+00:00 -> 2022-08-18 18:01:13+00:00
2022-08-18 18:01:49,427 - MainThread - awscli.customizations.s3.syncstrategy.base - DEBUG - syncing: /mnt/ProNET/aws_test_kcho/ho -> pronet-test/aws_test_kcho/ho, size: 0 -> 0, modified time: 2022-08-18 18:01:49.325183+00:00 -> 2022-08-18 18:01:13+00:00
(base) kc2357@PronetProd:/mnt/ProNET$ sudo aws s3 sync /mnt/ProNET/aws_test_kcho s3://pronet-test/aws_test_kcho --profile test_dev --dryrun --debug 2>&1 | grep "modified time"
2022-08-18 18:02:08,783 - MainThread - awscli.customizations.s3.syncstrategy.base - DEBUG - syncing: /mnt/ProNET/aws_test_kcho/ha -> pronet-test/aws_test_kcho/ha, size: 0 -> 0, modified time: 2022-08-18 18:02:08.657319+00:00 -> 2022-08-18 18:01:13+00:00
2022-08-18 18:02:08,784 - MainThread - awscli.customizations.s3.syncstrategy.base - DEBUG - syncing: /mnt/ProNET/aws_test_kcho/ho -> pronet-test/aws_test_kcho/ho, size: 0 -> 0, modified time: 2022-08-18 18:02:08.665319+00:00 -> 2022-08-18 18:01:13+00:00
  1. Now testing the same command with the same dev s3 bucket on the dev server.
aws s3 sync /mnt/ProNET/aws_test_kcho s3://pronet-test/aws_test_kcho --dryrun --debug 2>&1 | grep "modified time"

Returns nothing on the dev server, meaning that it's the mounting issue on the prod server

(base) kc2357@ip-10-5-36-53:/mnt/ProNET$ aws s3 sync /mnt/ProNET/aws_test_kcho s3://pronet-test/aws_test_kcho --dryrun --debug 2>&1 | grep "modified time"
(base) kc2357@ip-10-5-36-53:/mnt/ProNET$ aws s3 sync /mnt/ProNET/aws_test_kcho s3://pronet-test/aws_test_kcho --dryrun --debug 2>&1 | grep "modified time"
(base) kc2357@ip-10-5-36-53:/mnt/ProNET$ aws s3 sync /mnt/ProNET/aws_test_kcho s3://pronet-test/aws_test_kcho --dryrun --debug 2>&1 | grep "modified time"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants