From 108dbddd5adba1ea133ad5ab391c122bd71bf731 Mon Sep 17 00:00:00 2001 From: Narasimha Kulkarni Date: Mon, 23 Jan 2023 23:18:58 +0530 Subject: [PATCH] Release 10.17.0 (#2029) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Add mitigation for weird NtQuerySecurityObject behavior on NAS sources (#1872) * Add check for 0 length, attempt to validate the returned object. * Change to grabbing real SD length * Add comment describing issue * Prevent infinite loop upon listing failure * Fix GCP error checking * Fix GCP disable * Fix bad URL delete (#1892) * Manipulate URLs safely * Fix folder deletion test * Prevent infinite loop upon listing failure * Fix GCP error checking * Fix GCP disable * Fail when errors listing/clearing bucket * Update MacOS testing pipeline (#1896) * fixing small typo (,) in help of jobs clean (#1899) * Microsoft mandatory file * fixing small typo (,) in help of jobs clean Co-authored-by: microsoft-github-policy-service[bot] <77245923+microsoft-github-policy-service[bot]@users.noreply.github.com> Co-authored-by: Mohit Sharma <65536214+mohsha-msft@users.noreply.github.com> * Implement MD OAuth testing (#1859) * Implement MD OAuth testing * Handle async on RevokeAccess, handle job cancel/failure better * Prevent parallel testing of managed disks * lint check * Prevent infinite loop upon listing failure * Fix GCP error checking * Fix GCP disable * Fail when errors listing/clearing bucket * Add env vars * Avoid revoking MD access, as it can be shared. * Fix intermittent failures * Disable MD OAuth testing temporarily. * Add "all" to documentation (#1902) * 10.16.1 patch notes (#1913) * Add bugfixes to change log. * Correct wording & punctuation * Correct version * Export Successfully Updated bytes (#1884) * Add info in error message for mkdir on Log/Plan (#1883) * Microsoft mandatory file * Add info in error message for mkdir on Log/Plan Co-authored-by: microsoft-github-policy-service[bot] <77245923+microsoft-github-policy-service[bot]@users.noreply.github.com> Co-authored-by: Mohit Sharma <65536214+mohsha-msft@users.noreply.github.com> * Fix fixupTokenJson (#1890) * Microsoft mandatory file * Fix fixupTokenJson Co-authored-by: microsoft-github-policy-service[bot] <77245923+microsoft-github-policy-service[bot]@users.noreply.github.com> Co-authored-by: Mohit Sharma <65536214+mohsha-msft@users.noreply.github.com> Co-authored-by: Adam Orosz * Do not log request/response for container creation error (#1893) * Expose AZCOPY_DOWNLOAD_TO_TEMP_PATH environment variable. (#1895) * Slice against the correct string (#1927) * UX improvement: avoid crash when copying S2S with user delegation SAS (#1932) * Fix bad build + Prevent bad builds in the future (#1917) * Fix bad build + Prevent bad builds in the future * Add Windows build * Make sync use last write time for Azure Files (#1930) * Make sync use last write time for Azure Files * Implement test * 10.16.2 Changelog (#1948) * Update azcopy version * Fixed a bug where preserve permissions would not work with OAuth * Added CODEOWNERS file * Fixed issue where CPK would not be injected on retries * remove OAuth from test * Updated version check string to indicate current AzCopy version (#1969) * added codeowner * Enhance job summary with details about file/folders (#1952) * Add flag to disable version check (#1950) * darwin arm64 * Update golang version to 10.19.2 (#1925) * enable cgo * added tests * Minor fixes: More in description (#1968) * Echo auto-login failure if any * Update help for sync command to use trailing slash on directories * azcopy fail to copy 12TB file to Storage containers in Dev. The logic is used to calculate proper blockSize if it’s not provided, and due to the uint32 cast, it can’t give proper blockSize if filesize is between 50000 * (8 * 1024 * 1024) * X + 1, to 50000 * (8 * 1024 * 1024) * X + 49999. It should return 16MB instead of 8MB blockSize. Accommodated the changes suggested by Narasimha Kulkarni * Added extra logging when switching endpoints * Enable support for preserving SMB info on Linux. (#1723) * Microsoft mandatory file * Enable support for preserving SMB info on Linux. Implemented the GetSDDL/PutSDDL GetSMBProperties/PutSMBProperties methods for Linux using extended attributes. Following are the xattrs we use for fetching/setting various required info. // Extended Attribute (xattr) keys for fetching various information from Linux cifs client. const ( CIFS_XATTR_CREATETIME = "user.cifs.creationtime" // File creation time. CIFS_XATTR_ATTRIB = "user.cifs.dosattrib" // FileAttributes. CIFS_XATTR_CIFS_ACL = "system.cifs_acl" // DACL only. CIFS_XATTR_CIFS_NTSD = "system.cifs_ntsd" // Owner, Group, DACL. CIFS_XATTR_CIFS_NTSD_FULL = "system.cifs_ntsd_full" // Owner, Group, DACL, SACL. ) Majority of the changes are in sddl/sddlHelper_linux.go which implement the following Win32 APIs for dealing with SIDs. ConvertSecurityDescriptorToStringSecurityDescriptorW ConvertStringSecurityDescriptorToSecurityDescriptorW ConvertSidToStringSidW ConvertStringSidToSidW Note: I have skipped Object ACE support in sddl/sddlHelper_linux.go as those should not be used for filesystem properties, only AD object properties. Can someone confirm this? TBD: Conditional SID * Audited, fixed, tested support for "No ACL"/NO_ACCESS_CONTROL and ACL w/o any ACE Tested the following cases: c:\Users\natomar\Downloads>cd testacl // This has "No ACLs" and everyone should be allowed access. c:\Users\natomar\Downloads\testacl>touch NO_ACCESS_CONTROL.txt c:\Users\natomar\Downloads\testacl>cacls NO_ACCESS_CONTROL.txt /S:D:NO_ACCESS_CONTROL Are you sure (Y/N)?y processed file: c:\Users\natomar\Downloads\testacl\NO_ACCESS_CONTROL.txt // This has "No ACLs" and everyone should be allowed access. // It additionally has the "P" (protected) flag set, but that won't have // any effect as that just prevents ACE inheritance but this ACL will // not have any ACLs due to the NO_ACCESS_CONTROL flag. c:\Users\natomar\Downloads\testacl>touch PNO_ACCESS_CONTROL.txt c:\Users\natomar\Downloads\testacl>cacls PNO_ACCESS_CONTROL.txt /S:D:PNO_ACCESS_CONTROL Are you sure (Y/N)?y processed file: c:\Users\natomar\Downloads\testacl\PNO_ACCESS_CONTROL.txt // This should set DACL but with no ACEs, but since "P" is not set it // inherits ACEs from the parent dir. c:\Users\natomar\Downloads\testacl>touch empty_d.txt c:\Users\natomar\Downloads\testacl>cacls empty_d.txt /S:D: Are you sure (Y/N)?y processed file: c:\Users\natomar\Downloads\testacl\empty_d.txt // This should set DACL but with no ACEs, but since "P" is set it // doesn't inherit ACEs from the parent dir and hence this will block // all users. c:\Users\natomar\Downloads\testacl>touch empty_d_with_p.txt c:\Users\natomar\Downloads\testacl>cacls empty_d_with_p.txt /S:D:P Are you sure (Y/N)?y processed file: c:\Users\natomar\Downloads\testacl\empty_d_with_p.txt * Don't fail outright for ACL revision 4. Though our supported ACL types must carry ACL revision 2 as per the doc https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-dtyp/20233ed8-a6c6-4097-aafa-dd545ed24428 but I've seen some dirs have ACL revision 4 but ACL types are still supported ones. So instead of failing upfront, let it fail with unsupported ACE type. Also hexadecimal aceRights are more commonly seen than I expected, so removing a log. * Minor fix after running azcopy on a large dir. This was something which I have doubt on. Now that we got a real world issue due to this, it's all clear :-) * Some minor updates after the rebase to latest Azcopy. * Set default value of flag preserve-smb-info to true on Windows and false on other OS (cherry picked from commit ac5bedba7db2649faf1ecca444d77b6b76ecc54d) Co-authored-by: microsoft-github-policy-service[bot] <77245923+microsoft-github-policy-service[bot]@users.noreply.github.com> Co-authored-by: Nagendra Tomar * Added log indicating a sub-directory is being enqueued (#1999) * Log sync deletions to scanning logger (#2000) * ieproxy fix * remove cgo * fix * fix * fix * more testing * more testing * more testing * more testing * mod tidy * mod tidy * more testing * Added codespell (#2008) * Added codespell * Fixed initial codespell errors * Fix format in codespell.yml * Added s3 url parts * Added CodeQL (#2009) * Added linting file * Upgrade codeql to v2 * Fix incorrect conversion between integer types * Fix GCP URL parts * Fix for rare infinite loop on mutex acquisition (#2012) * small fix * removed test * Added trivy file (#2015) * Added trivy file * renamed trivy * Improve debug-ability of e2e tests by uploading logs of failed jobs (#1898) * Upload testing logs to storage account on failed test * Handle as pipeline artifact instead * mkdirall * copy plan files too * Fix failing tests * Change overwrite to affect any "locked in"/completed state * Fail copy job if single blob does not exist (#1981) * Job fail if single file does not exist * fixed change * fail only on a single file not existing * fail on file not found * fail on file not found * fail on file not found * cleanup * added tests * cleanup * removed test * Correct odd behavior around folder overwrites (#1961) * Fix files sync by determining which LMT to use via smb properties flag (#1958) * Fix files sync by determining which LMT to use via smb properties flag * Implement testing for LMT switch * Fix testing * Limit SMB testing to SMB-compatible environment * Enforce SMB LMT for Linux/MacOS test of SMB LMT preference * Fix metadata parsing (#1953) * Fix metadata parsing * rework metadata parsing to be more robust; add test * Fix comment lines * Codespell :| * Fix ADLSG2 intermittent failure (#1901) * Fix ADLSG2 intermittent failure * Add test * Reduce code dupe * Fix build errors * Fix infinite loop maybe? * Store source token and pass to other threads (#1996) * Store source token * testing * failing pipe * cleanup * test logger * fix test failure * fix 2 * fix * sync fix * cleanup check * Hash based sync (#2020) * Implement hash based sync for MD5 * Implement testing * Ensure folders are handled properly in HBS & Test S2S * Add skip/process logging * Include generic xattr syncmeta application * Fix 0-size blobs * Fix core testing * Revert "Include generic xattr syncmeta application" This reverts commit fba55e41572fd6697d1633a202e51233d42c0f34. * Warn on no hash @ source, remove MHP * Comments * Comments * Copy properties from Source (#1964) * Copy properties from Source * Remove unnecessary ws changes * Preserve UNIX properties * Move entity type to Overwrite option * Add python suite * Review comments * Fix test * Release notes and version update (#2028) Co-authored-by: adreed-msft <49764384+adreed-msft@users.noreply.github.com> Co-authored-by: mstenz Co-authored-by: microsoft-github-policy-service[bot] <77245923+microsoft-github-policy-service[bot]@users.noreply.github.com> Co-authored-by: Mohit Sharma <65536214+mohsha-msft@users.noreply.github.com> Co-authored-by: Adele Reed Co-authored-by: Karla Saur <1703543+ksaur@users.noreply.github.com> Co-authored-by: adam-orosz <106535811+adam-orosz@users.noreply.github.com> Co-authored-by: Adam Orosz Co-authored-by: Ze Qian Zhang Co-authored-by: Gauri Prasad Co-authored-by: Gauri Prasad <51212198+gapra-msft@users.noreply.github.com> Co-authored-by: Tamer Sherif Co-authored-by: Tamer Sherif <69483382+tasherif-msft@users.noreply.github.com> Co-authored-by: reshmav18 <73923840+reshmav18@users.noreply.github.com> Co-authored-by: linuxsmiths Co-authored-by: Nagendra Tomar --- .github/CODEOWNERS | 1 + .github/workflows/codeql-analysis.yml | 67 + .github/workflows/codespell.yml | 24 + .github/workflows/trivy.yml | 54 + ChangeLog.md | 31 +- azbfs/parsing_urls.go | 2 +- azbfs/zc_credential_token.go | 2 +- azure-pipelines.yml | 35 +- cmd/benchmark.go | 4 +- cmd/copy.go | 106 +- cmd/copyEnumeratorInit.go | 75 +- cmd/copyEnumeratorInit_test.go | 164 ++ cmd/credentialUtil.go | 47 +- cmd/jobsResume.go | 26 +- cmd/jobsShow.go | 24 +- cmd/list.go | 2 +- cmd/pathUtils.go | 2 +- cmd/removeEnumerator.go | 2 +- cmd/root.go | 19 +- cmd/setPropertiesEnumerator.go | 2 +- cmd/sync.go | 38 +- cmd/syncComparator.go | 112 +- cmd/syncEnumerator.go | 16 +- cmd/syncIndexer.go | 4 +- cmd/syncProcessor.go | 35 +- cmd/zc_enumerator.go | 29 +- cmd/zc_pipeline_init.go | 1 - cmd/zc_traverser_benchmark.go | 4 +- cmd/zc_traverser_blob.go | 31 +- cmd/zc_traverser_blob_account.go | 4 +- cmd/zc_traverser_blob_versions.go | 6 +- cmd/zc_traverser_blobfs.go | 4 +- cmd/zc_traverser_blobfs_account.go | 4 +- cmd/zc_traverser_file.go | 4 +- cmd/zc_traverser_file_account.go | 4 +- cmd/zc_traverser_gcp.go | 8 +- cmd/zc_traverser_gcp_service.go | 4 +- cmd/zc_traverser_list.go | 9 +- cmd/zc_traverser_local.go | 291 ++- cmd/zc_traverser_local_other.go | 1 + cmd/zc_traverser_local_windows.go | 1 + cmd/zc_traverser_s3.go | 8 +- cmd/zc_traverser_s3_service.go | 4 +- cmd/zt_copy_file_file_test.go | 4 +- cmd/zt_generic_filter_test.go | 2 +- cmd/zt_generic_service_traverser_test.go | 6 +- cmd/zt_generic_traverser_test.go | 8 +- cmd/zt_overwrite_posix_properties_test.go | 99 + cmd/zt_scenario_helpers_for_test.go | 5 +- cmd/zt_sync_comparator_test.go | 11 +- cmd/zt_test.go | 2 + cmd/zt_traverser_blob_test.go | 121 ++ common/ProxyLookupCache.go | 14 +- common/azError.go | 2 +- common/chunkStatusLogger.go | 4 +- common/credCache_darwin.go | 2 +- common/credCache_linux.go | 2 +- common/credentialFactory.go | 16 +- common/fe-ste-models.go | 106 +- common/folderCreationTracker_interface.go | 2 +- common/gcpURLParts.go | 18 +- common/gcpURLParts_test.go | 19 + common/hash_data.go | 13 + common/hash_data_other.go | 71 + common/hash_data_windows.go | 58 + common/iff.go | 2 +- common/lifecyleMgr.go | 2 +- common/oauthTokenManager.go | 11 +- common/proxy_forwarder.go | 14 + common/proxy_forwarder_windows.go | 14 + common/randomDataGenerator.go | 2 +- common/rpc-models.go | 6 +- common/s3URLParts.go | 4 +- common/version.go | 2 +- common/writeThoughFile.go | 17 +- common/writeThoughFile_linux.go | 133 ++ common/writeThoughFile_windows.go | 6 +- e2etest/arm.go | 6 +- e2etest/declarativeHelpers.go | 6 +- e2etest/declarativeRunner.go | 12 +- e2etest/declarativeScenario.go | 64 +- e2etest/declarativeTestFiles.go | 16 +- e2etest/factory.go | 4 +- e2etest/managedDisks.go | 8 +- e2etest/pointers.go | 6 + e2etest/runner.go | 40 +- e2etest/scenario_helpers.go | 131 +- e2etest/scenario_os_helpers_for_windows.go | 4 +- e2etest/zt_basic_copy_sync_remove_test.go | 280 ++- e2etest/zt_copy_file_smb_test.go | 6 +- e2etest/zt_preserve_properties_test.go | 4 +- e2etest/zt_preserve_smb_properties_test.go | 28 +- e2etest/zt_resume_windows_test.go | 8 +- e2etest/zz_tests_to_add.go | 2 +- go.mod | 3 +- go.sum | 12 +- jobsAdmin/JobsAdmin.go | 39 +- jobsAdmin/init.go | 108 +- main.go | 5 +- main_windows.go | 2 +- sddl/sddlHelper_linux.go | 1717 +++++++++++++++++ sddl/sidTranslation_linux.go | 28 + sddl/sidTranslation_other.go | 1 + ste/JobPartPlan.go | 5 +- ste/concurrency.go | 2 +- ste/downloader-azureFiles_linux.go | 230 +++ ste/downloader-blob.go | 5 +- ste/folderCreationTracker.go | 14 +- ste/jobStatusManager.go | 15 +- ste/mgr-JobMgr.go | 23 +- ste/mgr-JobPartMgr.go | 29 +- ste/mgr-JobPartTransferMgr.go | 8 +- ste/s2sCopier-URLToBlob.go | 2 +- ste/sender-azureFile.go | 20 +- ste/sender-blobFS.go | 17 +- ste/sender-blobFolders.go | 97 +- ste/sender-blockBlob.go | 24 +- ste/sender.go | 15 +- ste/sender_blockBlob_test.go | 2 +- ste/sourceInfoProvider-Local_linux.go | 64 + ste/xfer-anyToRemote-file.go | 6 +- ste/xfer-anyToRemote-fileProperties.go | 84 + ste/xfer-remoteToLocal-file.go | 25 +- testSuite/cmd/testblob.go | 2 +- .../scripts/test_autodetect_blob_type.py | 4 +- testSuite/scripts/test_blob_download.py | 2 +- testSuite/scripts/test_blob_piping.py | 2 +- testSuite/scripts/test_file_download.py | 2 +- testSuite/scripts/test_file_upload.py | 4 +- .../scripts/test_service_to_service_copy.py | 4 +- testSuite/scripts/test_upload_block_blob.py | 2 +- website/src/index.html | 10 +- website/src/js/scripts.js | 18 +- 133 files changed, 4766 insertions(+), 620 deletions(-) create mode 100644 .github/CODEOWNERS create mode 100644 .github/workflows/codeql-analysis.yml create mode 100644 .github/workflows/codespell.yml create mode 100644 .github/workflows/trivy.yml create mode 100644 cmd/copyEnumeratorInit_test.go mode change 100755 => 100644 cmd/syncEnumerator.go create mode 100644 cmd/zt_overwrite_posix_properties_test.go create mode 100644 cmd/zt_traverser_blob_test.go create mode 100644 common/hash_data.go create mode 100644 common/hash_data_other.go create mode 100644 common/hash_data_windows.go create mode 100644 common/proxy_forwarder.go create mode 100644 common/proxy_forwarder_windows.go create mode 100644 e2etest/pointers.go create mode 100644 sddl/sddlHelper_linux.go create mode 100644 sddl/sidTranslation_linux.go create mode 100644 ste/downloader-azureFiles_linux.go create mode 100644 ste/xfer-anyToRemote-fileProperties.go diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS new file mode 100644 index 000000000..b0c684776 --- /dev/null +++ b/.github/CODEOWNERS @@ -0,0 +1 @@ +* @gapra-msft @adreed-msft @nakulkar-msft @siminsavani-msft @vibhansa-msft @tasherif-msft diff --git a/.github/workflows/codeql-analysis.yml b/.github/workflows/codeql-analysis.yml new file mode 100644 index 000000000..1ef181f18 --- /dev/null +++ b/.github/workflows/codeql-analysis.yml @@ -0,0 +1,67 @@ +# For most projects, this workflow file will not need changing; you simply need +# to commit it to your repository. +# +# You may wish to alter this file to override the set of languages analyzed, +# or to provide custom queries or build logic. +# +# ******** NOTE ******** +# We have attempted to detect the languages in your repository. Please check +# the `language` matrix defined below to confirm you have the correct set of +# supported CodeQL languages. +# +name: "CodeQL" + +on: + pull_request: + branches: [ main, dev ] + push: + branches: [ main, dev ] + +jobs: + analyze: + name: Analyze + runs-on: ubuntu-latest + permissions: + actions: read + contents: read + security-events: write + + strategy: + fail-fast: false + matrix: + language: [ 'go' ] + # CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ] + # Learn more about CodeQL language support at https://git.io/codeql-language-support + + steps: + - name: Checkout repository + uses: actions/checkout@v2 + + # Initializes the CodeQL tools for scanning. + - name: Initialize CodeQL + uses: github/codeql-action/init@v2 + with: + languages: ${{ matrix.language }} + # If you wish to specify custom queries, you can do so here or in a config file. + # By default, queries listed here will override any specified in a config file. + # Prefix the list here with "+" to use these queries and those in the config file. + # queries: ./path/to/local/query, your-org/your-repo/queries@main + + # Autobuild attempts to build any compiled languages (C/C++, C#, or Java). + # If this step fails, then you should remove it and run the build manually (see below) + - name: Autobuild + uses: github/codeql-action/autobuild@v2 + + # ℹ️ Command-line programs to run using the OS shell. + # 📚 https://git.io/JvXDl + + # ✏️ If the Autobuild fails above, remove it and uncomment the following three lines + # and modify them (or add more) to build your code if your project + # uses a compiled language + + #- run: | + # make bootstrap + # make release + + - name: Perform CodeQL Analysis + uses: github/codeql-action/analyze@v2 \ No newline at end of file diff --git a/.github/workflows/codespell.yml b/.github/workflows/codespell.yml new file mode 100644 index 000000000..532588839 --- /dev/null +++ b/.github/workflows/codespell.yml @@ -0,0 +1,24 @@ +# GitHub Action to automate the identification of common misspellings in text files. +# https://github.com/codespell-project/actions-codespell +# https://github.com/codespell-project/codespell +name: codespell +on: + push: + branches: + - dev + - main + pull_request: + branches: + - dev + - main +jobs: + codespell: + name: Check for spelling errors + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v2 + - uses: codespell-project/actions-codespell@master + with: + check_filenames: true + skip: ./sddl/sddlPortable_test.go,./sddl/sddlHelper_linux.go + ignore_words_list: "resue,pase,cancl,cacl,froms" \ No newline at end of file diff --git a/.github/workflows/trivy.yml b/.github/workflows/trivy.yml new file mode 100644 index 000000000..f883eb7f7 --- /dev/null +++ b/.github/workflows/trivy.yml @@ -0,0 +1,54 @@ +# This workflow uses actions that are not certified by GitHub. +# They are provided by a third-party and are governed by +# separate terms of service, privacy policy, and support +# documentation. + +name: trivy + +on: + push: + branches: [ "main", "dev" ] + pull_request: + # The branches below must be a subset of the branches above + branches: [ "main", "dev" ] + schedule: + - cron: '31 19 * * 1' + +permissions: + contents: read + +jobs: + build: + permissions: + contents: read # for actions/checkout to fetch code + security-events: write # for github/codeql-action/upload-sarif to upload SARIF results + actions: read # only required for a private repository by github/codeql-action/upload-sarif to get the Action run status + + name: Build + runs-on: "ubuntu-22.04" + + steps: + - name: Checkout code + uses: actions/checkout@v3 + + - name: Build AzCopy + run: | + go build -o azcopy + ls -l + - name: Run Trivy vulnerability scanner + uses: aquasecurity/trivy-action@master + with: + scan-type: fs + scan-ref: './azcopy' + ignore-unfixed: true + format: 'sarif' + output: 'trivy-results-binary.sarif' + severity: 'CRITICAL,HIGH,MEDIUM,LOW' + + - name: List Issues + run: | + cat trivy-results-binary.sarif + - name: Upload Trivy scan results to GitHub Security tab + uses: github/codeql-action/upload-sarif@v2 + with: + sarif_file: 'trivy-results-binary.sarif' \ No newline at end of file diff --git a/ChangeLog.md b/ChangeLog.md index 6f5b79dd1..86e0338fb 100644 --- a/ChangeLog.md +++ b/ChangeLog.md @@ -1,6 +1,19 @@ # Change Log +## Version 10.17.0 + +### New features + +1. Added support for hash-based sync. AzCopy sync can now take two new flags `--compare-hash` and `--missing-hash-policy=Generate`, which which user will be able to transfer only those files which differ in their MD5 hash. + +### Bug fixes +1. Fixed [issue 1994](https://github.com/Azure/azure-storage-azcopy/pull/1994): Error in calculation of block size +2. Fixed [issue 1957](https://github.com/Azure/azure-storage-azcopy/pull/1957): Repeated Authentication token refresh +3. Fixed [issue 1870](https://github.com/Azure/azure-storage-azcopy/pull/1870): Fixed issue where CPK would not be injected on retries +4. Fixed [issue 1946](https://github.com/Azure/azure-storage-azcopy/issues/1946): Fixed Metadata parsing +5: Fixed [issue 1931](https://github.com/Azure/azure-storage-azcopy/issues/1931) + ## Version 10.16.2 ### Bug Fixes @@ -35,7 +48,7 @@ 1. Fixed [issue 1506](https://github.com/Azure/azure-storage-azcopy/issues/1506): Added input watcher to resolve issue since job could not be resumed. 2. Fixed [issue 1794](https://github.com/Azure/azure-storage-azcopy/issues/1794): Moved log-level to root.go so log-level arguments do not get ignored. 3. Fixed [issue 1824](https://github.com/Azure/azure-storage-azcopy/issues/1824): Avoid creating .azcopy under HOME if plan/log location is specified elsewhere. -4. Fixed [isue 1830](https://github.com/Azure/azure-storage-azcopy/issues/1830), [issue 1412](https://github.com/Azure/azure-storage-azcopy/issues/1418), and [issue 873](https://github.com/Azure/azure-storage-azcopy/issues/873): Improved error message for when AzCopy cannot determine if source is directory. +4. Fixed [issue 1830](https://github.com/Azure/azure-storage-azcopy/issues/1830), [issue 1412](https://github.com/Azure/azure-storage-azcopy/issues/1418), and [issue 873](https://github.com/Azure/azure-storage-azcopy/issues/873): Improved error message for when AzCopy cannot determine if source is directory. 5. Fixed [issue 1777](https://github.com/Azure/azure-storage-azcopy/issues/1777): Fixed job list to handle respective output-type correctly. 6. Fixed win64 alignment issue. @@ -191,7 +204,7 @@ ### New features 1. Added option to [disable parallel blob listing](https://github.com/Azure/azure-storage-azcopy/pull/1263) -1. Added support for uploading [large files](https://github.com/Azure/azure-storage-azcopy/pull/1254/files) upto 4TiB. Please refer the [public documentation](https://docs.microsoft.com/en-us/rest/api/storageservices/create-file) for more information +1. Added support for uploading [large files](https://github.com/Azure/azure-storage-azcopy/pull/1254/files) up to 4TiB. Please refer the [public documentation](https://docs.microsoft.com/en-us/rest/api/storageservices/create-file) for more information 1. Added support for `include-before`flag. Refer [this](https://github.com/Azure/azure-storage-azcopy/issues/1075) for more information ### Bug fixes @@ -469,7 +482,7 @@ disallowed because none (other than include-path) are respected. 1. The `*` character is no longer supported as a wildcard in URLs, except for the two exceptions noted below. It remains supported in local file paths. - 1. The first execption is that `/*` is still allowed at the very end of the "path" section of a + 1. The first exception is that `/*` is still allowed at the very end of the "path" section of a URL. This is illustrated by the difference between these two source URLs: `https://account/container/virtual?SAS` and `https://account/container/virtualDir/*?SAS`. The former copies the virtual directory @@ -501,7 +514,7 @@ disallowed because none (other than include-path) are respected. 1. Percent complete is displayed as each job runs. 1. VHD files are auto-detected as page blobs. 1. A new benchmark mode allows quick and easy performance benchmarking of your network connection to - Blob Storage. Run AzCopy with the paramaters `bench --help` for details. This feature is in + Blob Storage. Run AzCopy with the parameters `bench --help` for details. This feature is in Preview status. 1. The location for AzCopy's "plan" files can be specified with the environment variable `AZCOPY_JOB_PLAN_LOCATION`. (If you move the plan files and also move the log files using the existing @@ -520,7 +533,7 @@ disallowed because none (other than include-path) are respected. 1. Memory usage can be controlled by setting the new environment variable `AZCOPY_BUFFER_GB`. Decimal values are supported. Actual usage will be the value specified, plus some overhead. 1. An extra integrity check has been added: the length of the - completed desination file is checked against that of the source. + completed destination file is checked against that of the source. 1. When downloading, AzCopy can automatically decompress blobs (or Azure Files) that have a `Content-Encoding` of `gzip` or `deflate`. To enable this behaviour, supply the `--decompress` parameter. @@ -685,21 +698,21 @@ information, including those needed to set the new headers. 1. For creating MD5 hashes when uploading, version 10.x now has the OPPOSITE default to version AzCopy 8.x. Specifically, as of version 10.0.9, MD5 hashes are NOT created by default. To create - Content-MD5 hashs when uploading, you must now specify `--put-md5` on the command line. + Content-MD5 hashes when uploading, you must now specify `--put-md5` on the command line. ### New features 1. Can migrate data directly from Amazon Web Services (AWS). In this high-performance data path the data is read directly from AWS by the Azure Storage service. It does not need to pass through - the machine running AzCopy. The copy happens syncronously, so you can see its exact progress. + the machine running AzCopy. The copy happens synchronously, so you can see its exact progress. 1. Can migrate data directly from Azure Files or Azure Blobs (any blob type) to Azure Blobs (any blob type). In this high-performance data path the data is read directly from the source by the Azure Storage service. It does not need to pass through the machine running AzCopy. The copy - happens syncronously, so you can see its exact progress. + happens synchronously, so you can see its exact progress. 1. Sync command prompts with 4 options about deleting unneeded files from the target: Yes, No, All or None. (Deletion only happens if the `--delete-destination` flag is specified). 1. Can download to /dev/null. This throws the data away - but is useful for testing raw network - performance unconstrained by disk; and also for validing MD5 hashes in bulk (when run in a cloud + performance unconstrained by disk; and also for validating MD5 hashes in bulk (when run in a cloud VM in the same region as the Storage account) ### Bug fixes diff --git a/azbfs/parsing_urls.go b/azbfs/parsing_urls.go index 8405176a4..f2d03cd9d 100644 --- a/azbfs/parsing_urls.go +++ b/azbfs/parsing_urls.go @@ -20,7 +20,7 @@ type BfsURLParts struct { isIPEndpointStyle bool // Ex: "https://ip/accountname/filesystem" } -// isIPEndpointStyle checkes if URL's host is IP, in this case the storage account endpoint will be composed as: +// isIPEndpointStyle checks if URL's host is IP, in this case the storage account endpoint will be composed as: // http(s)://IP(:port)/storageaccount/share(||container||etc)/... func isIPEndpointStyle(url url.URL) bool { return net.ParseIP(url.Host) != nil diff --git a/azbfs/zc_credential_token.go b/azbfs/zc_credential_token.go index 1a82fba3c..bfe3a2248 100644 --- a/azbfs/zc_credential_token.go +++ b/azbfs/zc_credential_token.go @@ -25,7 +25,7 @@ type TokenCredential interface { // indicating how long the TokenCredential object should wait before calling your tokenRefresher function again. func NewTokenCredential(initialToken string, tokenRefresher func(credential TokenCredential) time.Duration) TokenCredential { tc := &tokenCredential{} - tc.SetToken(initialToken) // We dont' set it above to guarantee atomicity + tc.SetToken(initialToken) // We don't set it above to guarantee atomicity if tokenRefresher == nil { return tc // If no callback specified, return the simple tokenCredential } diff --git a/azure-pipelines.yml b/azure-pipelines.yml index 0b3d9fcc3..74a2116b5 100644 --- a/azure-pipelines.yml +++ b/azure-pipelines.yml @@ -29,10 +29,10 @@ jobs: env: GO111MODULE: 'on' inputs: - version: '1.17.9' + version: '1.19.2' - script: | - curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s v1.43.0 + curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s v1.46.2 echo 'Installation complete' ./bin/golangci-lint --version ./bin/golangci-lint run e2etest @@ -83,7 +83,12 @@ jobs: - script: | go build -o "$(Build.ArtifactStagingDirectory)/azcopy_darwin_amd64" - displayName: 'Generate MacOS Build' + displayName: 'Generate MacOS Build with AMD64' + condition: eq(variables.type, 'mac-os') + + - script: | + GOARCH=arm64 CGO_ENABLED=1 go build -o "$(Build.ArtifactStagingDirectory)/azcopy_darwin_arm64" + displayName: 'Generate MacOS Build with ARM64' condition: eq(variables.type, 'mac-os') - task: PublishBuildArtifacts@1 @@ -116,7 +121,7 @@ jobs: steps: - task: GoTool@0 inputs: - version: '1.17.9' + version: '1.19.2' # Running E2E Tests on Linux - AMD64 - script: | @@ -134,6 +139,7 @@ jobs: AZCOPY_E2E_CLIENT_SECRET: $(AZCOPY_SPA_CLIENT_SECRET) AZCOPY_E2E_CLASSIC_ACCOUNT_NAME: $(AZCOPY_E2E_CLASSIC_ACCOUNT_NAME) AZCOPY_E2E_CLASSIC_ACCOUNT_KEY: $(AZCOPY_E2E_CLASSIC_ACCOUNT_KEY) + AZCOPY_E2E_LOG_OUTPUT: '$(System.DefaultWorkingDirectory)/logs' AZCOPY_E2E_OAUTH_MANAGED_DISK_CONFIG: $(AZCOPY_E2E_OAUTH_MANAGED_DISK_CONFIG) AZCOPY_E2E_STD_MANAGED_DISK_CONFIG: $(AZCOPY_E2E_STD_MANAGED_DISK_CONFIG) CPK_ENCRYPTION_KEY: $(CPK_ENCRYPTION_KEY) @@ -157,6 +163,7 @@ jobs: AZCOPY_E2E_CLIENT_SECRET: $(AZCOPY_SPA_CLIENT_SECRET) AZCOPY_E2E_CLASSIC_ACCOUNT_NAME: $(AZCOPY_E2E_CLASSIC_ACCOUNT_NAME) AZCOPY_E2E_CLASSIC_ACCOUNT_KEY: $(AZCOPY_E2E_CLASSIC_ACCOUNT_KEY) + AZCOPY_E2E_LOG_OUTPUT: '$(System.DefaultWorkingDirectory)/logs' AZCOPY_E2E_OAUTH_MANAGED_DISK_CONFIG: $(AZCOPY_E2E_OAUTH_MANAGED_DISK_CONFIG) AZCOPY_E2E_STD_MANAGED_DISK_CONFIG: $(AZCOPY_E2E_STD_MANAGED_DISK_CONFIG) CPK_ENCRYPTION_KEY: $(CPK_ENCRYPTION_KEY) @@ -182,13 +189,21 @@ jobs: AZCOPY_E2E_CLIENT_SECRET: $(AZCOPY_SPA_CLIENT_SECRET) AZCOPY_E2E_CLASSIC_ACCOUNT_NAME: $(AZCOPY_E2E_CLASSIC_ACCOUNT_NAME) AZCOPY_E2E_CLASSIC_ACCOUNT_KEY: $(AZCOPY_E2E_CLASSIC_ACCOUNT_KEY) + AZCOPY_E2E_LOG_OUTPUT: '$(System.DefaultWorkingDirectory)/logs' AZCOPY_E2E_OAUTH_MANAGED_DISK_CONFIG: $(AZCOPY_E2E_OAUTH_MANAGED_DISK_CONFIG) AZCOPY_E2E_STD_MANAGED_DISK_CONFIG: $(AZCOPY_E2E_STD_MANAGED_DISK_CONFIG) CPK_ENCRYPTION_KEY: $(CPK_ENCRYPTION_KEY) CPK_ENCRYPTION_KEY_SHA256: $(CPK_ENCRYPTION_KEY_SHA256) - displayName: 'E2E Test MacOs' + displayName: 'E2E Test MacOs AMD64' condition: eq(variables.type, 'mac-os') + - task: PublishBuildArtifacts@1 + displayName: 'Publish logs' + condition: succeededOrFailed() + inputs: + pathToPublish: '$(System.DefaultWorkingDirectory)/logs' + artifactName: logs + - job: Test_On_Ubuntu variables: isMutexSet: 'false' @@ -204,7 +219,7 @@ jobs: - task: GoTool@0 name: 'Set_up_Golang' inputs: - version: '1.17.9' + version: '1.19.2' - task: DownloadSecureFile@1 name: ciGCSServiceAccountKey displayName: 'Download GCS Service Account Key' @@ -212,10 +227,14 @@ jobs: secureFile: 'ci-gcs-dev.json' - script: | pip install azure-storage-blob==12.12.0 + # set the variable to indicate that the mutex is being acquired + # note: we set it before acquiring the mutex to ensure we release the mutex. + # setting this after can result in an un-broken mutex if someone cancels the pipeline after we acquire the + # mutex but before we set this variable. + # setting this before will always work since it is valid to break an un-acquired mutex. + echo '##vso[task.setvariable variable=isMutexSet]true' # acquire the mutex before running live tests to avoid conflicts python ./tool_distributed_mutex.py lock "$(MUTEX_URL)" - # set the variable to indicate that the mutex was actually acquired - echo '##vso[task.setvariable variable=isMutexSet]true' name: 'Acquire_the_distributed_mutex' - script: | # run unit test and build executable diff --git a/cmd/benchmark.go b/cmd/benchmark.go index 234dac301..978c35789 100644 --- a/cmd/benchmark.go +++ b/cmd/benchmark.go @@ -273,7 +273,7 @@ func (h benchmarkSourceHelper) FromUrl(s string) (fileCount uint, bytesPerFile i pieces[0] = strings.Split(pieces[0], "=")[1] pieces[1] = strings.Split(pieces[1], "=")[1] pieces[2] = strings.Split(pieces[2], "=")[1] - fc, err := strconv.ParseUint(pieces[0], 10, 64) + fc, err := strconv.ParseUint(pieces[0], 10, 32) if err != nil { return 0, 0, 0, err } @@ -281,7 +281,7 @@ func (h benchmarkSourceHelper) FromUrl(s string) (fileCount uint, bytesPerFile i if err != nil { return 0, 0, 0, err } - nf, err := strconv.ParseUint(pieces[2], 10, 64) + nf, err := strconv.ParseUint(pieces[2], 10, 32) if err != nil { return 0, 0, 0, err } diff --git a/cmd/copy.go b/cmd/copy.go index 684a80125..31a4f9d6b 100644 --- a/cmd/copy.go +++ b/cmd/copy.go @@ -26,7 +26,6 @@ import ( "encoding/json" "errors" "fmt" - "github.com/Azure/azure-storage-azcopy/v10/jobsAdmin" "io" "math" "net/url" @@ -36,6 +35,8 @@ import ( "sync" "time" + "github.com/Azure/azure-storage-azcopy/v10/jobsAdmin" + "github.com/Azure/azure-pipeline-go/pipeline" "github.com/Azure/azure-storage-blob-go/azblob" @@ -279,7 +280,12 @@ func (raw rawCopyCmdArgs) cook() (CookedCopyCmdArgs, error) { dstDfs := dst == common.ELocation.BlobFS() && src != common.ELocation.Local() if dstDfs { raw.dst = strings.Replace(raw.dst, ".dfs", ".blob", 1) - glcm.Info("Switching to use blob endpoint on destination account.") + msg := fmt.Sprintf("Switching to use blob endpoint on destination account. There are some limitations when switching endpoints. " + + "Please refer to https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-known-issues#blob-storage-apis") + glcm.Info(msg) + if azcopyScanningLogger != nil { + azcopyScanningLogger.Log(pipeline.LogInfo, msg) + } } cooked.isHNStoHNS = srcDfs && dstDfs @@ -649,11 +655,7 @@ func (raw rawCopyCmdArgs) cook() (CookedCopyCmdArgs, error) { glcm.SetOutputFormat(common.EOutputFormat.None()) } - cooked.preserveSMBInfo = areBothLocationsSMBAware(cooked.FromTo) - // If user has explicitly specified not to copy SMB Information, set cooked.preserveSMBInfo to false - if !raw.preserveSMBInfo { - cooked.preserveSMBInfo = false - } + cooked.preserveSMBInfo = raw.preserveSMBInfo && areBothLocationsSMBAware(cooked.FromTo) cooked.preservePOSIXProperties = raw.preservePOSIXProperties if cooked.preservePOSIXProperties && !areBothLocationsPOSIXAware(cooked.FromTo) { @@ -928,10 +930,11 @@ func validateForceIfReadOnly(toForce bool, fromTo common.FromTo) error { func areBothLocationsSMBAware(fromTo common.FromTo) bool { // preserverSMBInfo will be true by default for SMB-aware locations unless specified false. - // 1. Upload (Windows -> Azure File) - // 2. Download (Azure File -> Windows) + // 1. Upload (Windows/Linux -> Azure File) + // 2. Download (Azure File -> Windows/Linux) // 3. S2S (Azure File -> Azure File) - if runtime.GOOS == "windows" && (fromTo == common.EFromTo.LocalFile() || fromTo == common.EFromTo.FileLocal()) { + if (runtime.GOOS == "windows" || runtime.GOOS == "linux") && + (fromTo == common.EFromTo.LocalFile() || fromTo == common.EFromTo.FileLocal()) { return true } else if fromTo == common.EFromTo.FileFile() { return true @@ -956,8 +959,9 @@ func validatePreserveSMBPropertyOption(toPreserve bool, fromTo common.FromTo, ov return fmt.Errorf("%s is set but the job is not between %s-aware resources", flagName, common.IffString(flagName == PreservePermissionsFlag, "permission", "SMB")) } - if toPreserve && (fromTo.IsUpload() || fromTo.IsDownload()) && runtime.GOOS != "windows" { - return fmt.Errorf("%s is set but persistence for up/downloads is a Windows-only feature", flagName) + if toPreserve && (fromTo.IsUpload() || fromTo.IsDownload()) && + runtime.GOOS != "windows" && runtime.GOOS != "linux" { + return fmt.Errorf("%s is set but persistence for up/downloads is supported only in Windows and Linux", flagName) } return nil @@ -1000,9 +1004,6 @@ func validatePutMd5(putMd5 bool, fromTo common.FromTo) error { if putMd5 && fromTo.IsS2S() { glcm.Info(" --put-md5 flag to check data consistency between source and destination is not applicable for S2S Transfers (i.e. When both the source and the destination are remote). AzCopy cannot compute MD5 hash of data stored at remote location.") } - if putMd5 && !fromTo.IsUpload() { - return fmt.Errorf("put-md5 is set but the job is not an upload") - } return nil } @@ -1110,7 +1111,9 @@ type CookedCopyCmdArgs struct { FollowSymlinks bool ForceWrite common.OverwriteOption // says whether we should try to overwrite ForceIfReadOnly bool // says whether we should _force_ any overwrites (triggered by forceWrite) to work on Azure Files objects that are set to read-only - autoDecompress bool + IsSourceDir bool + + autoDecompress bool // options from flags blockSize int64 @@ -1301,7 +1304,7 @@ func (cca *CookedCopyCmdArgs) processRedirectionDownload(blobResource common.Res return fmt.Errorf("fatal: cannot download blob due to error: %s", err.Error()) } - blobBody := blobStream.Body(azblob.RetryReaderOptions{MaxRetryRequests: ste.MaxRetryPerDownloadBody}) + blobBody := blobStream.Body(azblob.RetryReaderOptions{MaxRetryRequests: ste.MaxRetryPerDownloadBody, ClientProvidedKeyOptions: clientProvidedKey}) defer blobBody.Close() // step 4: pipe everything into Stdout @@ -1374,6 +1377,49 @@ func (cca *CookedCopyCmdArgs) processRedirectionUpload(blobResource common.Resou return err } +// get source credential - if there is a token it will be used to get passed along our pipeline +func (cca *CookedCopyCmdArgs) getSrcCredential(ctx context.Context, jpo *common.CopyJobPartOrderRequest) (common.CredentialInfo, error) { + srcCredInfo := common.CredentialInfo{} + var err error + var isPublic bool + + if srcCredInfo, isPublic, err = GetCredentialInfoForLocation(ctx, cca.FromTo.From(), cca.Source.Value, cca.Source.SAS, true, cca.CpkOptions); err != nil { + return srcCredInfo, err + // If S2S and source takes OAuthToken as its cred type (OR) source takes anonymous as its cred type, but it's not public and there's no SAS + } else if cca.FromTo.IsS2S() && + ((srcCredInfo.CredentialType == common.ECredentialType.OAuthToken() && cca.FromTo.To() != common.ELocation.Blob()) || // Blob can forward OAuth tokens + (srcCredInfo.CredentialType == common.ECredentialType.Anonymous() && !isPublic && cca.Source.SAS == "")) { + return srcCredInfo, errors.New("a SAS token (or S3 access key) is required as a part of the source in S2S transfers, unless the source is a public resource, or the destination is blob storage") + } + + if cca.Source.SAS != "" && cca.FromTo.IsS2S() && jpo.CredentialInfo.CredentialType == common.ECredentialType.OAuthToken() { + //glcm.Info("Authentication: If the source and destination accounts are in the same AAD tenant & the user/spn/msi has appropriate permissions on both, the source SAS token is not required and OAuth can be used round-trip.") + } + + if cca.FromTo.IsS2S() { + jpo.S2SSourceCredentialType = srcCredInfo.CredentialType + + if jpo.S2SSourceCredentialType.IsAzureOAuth() { + uotm := GetUserOAuthTokenManagerInstance() + // get token from env var or cache + if tokenInfo, err := uotm.GetTokenInfo(ctx); err != nil { + return srcCredInfo, err + } else { + cca.credentialInfo.OAuthTokenInfo = *tokenInfo + jpo.CredentialInfo.OAuthTokenInfo = *tokenInfo + } + // if the source is not local then store the credential token if it was OAuth to avoid constant refreshing + jpo.CredentialInfo.SourceBlobToken = common.CreateBlobCredential(ctx, srcCredInfo, common.CredentialOpOptions{ + // LogInfo: glcm.Info, //Comment out for debugging + LogError: glcm.Info, + }) + cca.credentialInfo.SourceBlobToken = jpo.CredentialInfo.SourceBlobToken + srcCredInfo.SourceBlobToken = jpo.CredentialInfo.SourceBlobToken + } + } + return srcCredInfo, nil +} + // handles the copy command // dispatches the job order (in parts) to the storage engine func (cca *CookedCopyCmdArgs) processCopyJobPartOrders() (err error) { @@ -1486,11 +1532,12 @@ func (cca *CookedCopyCmdArgs) processCopyJobPartOrders() (err error) { common.EFromTo.BenchmarkFile(): var e *CopyEnumerator - e, err = cca.initEnumerator(jobPartOrder, ctx) + srcCredInfo, _ := cca.getSrcCredential(ctx, &jobPartOrder) + + e, err = cca.initEnumerator(jobPartOrder, srcCredInfo, ctx) if err != nil { return fmt.Errorf("failed to initialize enumerator: %w", err) } - err = e.enumerate() case common.EFromTo.BlobTrash(), common.EFromTo.FileTrash(): e, createErr := newRemoveEnumerator(cca) @@ -1669,7 +1716,6 @@ func (cca *CookedCopyCmdArgs) ReportProgressOrExit(lcm common.LifecycleMgr) (tot // indicate whether constrained by disk or not isBenchmark := cca.FromTo.From() == common.ELocation.Benchmark() perfString, diskString := getPerfDisplayText(summary.PerfStrings, summary.PerfConstraint, duration, isBenchmark) - return fmt.Sprintf("%.1f %%, %v Done, %v Failed, %v Pending, %v Skipped, %v Total%s, %s%s%s", summary.PercentComplete, summary.TransfersCompleted, @@ -1701,9 +1747,12 @@ Elapsed Time (Minutes): %v Number of File Transfers: %v Number of Folder Property Transfers: %v Total Number of Transfers: %v -Number of Transfers Completed: %v -Number of Transfers Failed: %v -Number of Transfers Skipped: %v +Number of File Transfers Completed: %v +Number of Folder Transfers Completed: %v +Number of File Transfers Failed: %v +Number of Folder Transfers Failed: %v +Number of File Transfers Skipped: %v +Number of Folder Transfers Skipped: %v TotalBytesTransferred: %v Final Job Status: %v%s%s `, @@ -1712,9 +1761,12 @@ Final Job Status: %v%s%s summary.FileTransfers, summary.FolderPropertyTransfers, summary.TotalTransfers, - summary.TransfersCompleted, - summary.TransfersFailed, - summary.TransfersSkipped, + summary.TransfersCompleted-summary.FoldersCompleted, + summary.FoldersCompleted, + summary.TransfersFailed-summary.FoldersFailed, + summary.FoldersFailed, + summary.TransfersSkipped-summary.FoldersSkipped, + summary.FoldersSkipped, summary.TotalBytesTransferred, summary.JobStatus, screenStats, @@ -1940,7 +1992,7 @@ func init() { cpCmd.PersistentFlags().BoolVar(&raw.preserveSMBPermissions, "preserve-smb-permissions", false, "False by default. Preserves SMB ACLs between aware resources (Windows and Azure Files). For downloads, you will also need the --backup flag to restore permissions where the new Owner will not be the user running AzCopy. This flag applies to both files and folders, unless a file-only filter is specified (e.g. include-pattern).") cpCmd.PersistentFlags().BoolVar(&raw.asSubdir, "as-subdir", true, "True by default. Places folder sources as subdirectories under the destination.") cpCmd.PersistentFlags().BoolVar(&raw.preserveOwner, common.PreserveOwnerFlagName, common.PreserveOwnerDefault, "Only has an effect in downloads, and only when --preserve-smb-permissions is used. If true (the default), the file Owner and Group are preserved in downloads. If set to false, --preserve-smb-permissions will still preserve ACLs but Owner and Group will be based on the user running AzCopy") - cpCmd.PersistentFlags().BoolVar(&raw.preserveSMBInfo, "preserve-smb-info", true, "For SMB-aware locations, flag will be set to true by default. Preserves SMB property info (last write time, creation time, attribute bits) between SMB-aware resources (Windows and Azure Files). Only the attribute bits supported by Azure Files will be transferred; any others will be ignored. This flag applies to both files and folders, unless a file-only filter is specified (e.g. include-pattern). The info transferred for folders is the same as that for files, except for Last Write Time which is never preserved for folders.") + cpCmd.PersistentFlags().BoolVar(&raw.preserveSMBInfo, "preserve-smb-info", (runtime.GOOS == "windows"), "Preserves SMB property info (last write time, creation time, attribute bits) between SMB-aware resources (Windows and Azure Files). On windows, this flag will be set to true by default. If the source or destination is a volume mounted on Linux using SMB protocol, this flag will have to be explicitly set to true. Only the attribute bits supported by Azure Files will be transferred; any others will be ignored. This flag applies to both files and folders, unless a file-only filter is specified (e.g. include-pattern). The info transferred for folders is the same as that for files, except for Last Write Time which is never preserved for folders.") cpCmd.PersistentFlags().BoolVar(&raw.preservePOSIXProperties, "preserve-posix-properties", false, "'Preserves' property info gleaned from stat or statx into object metadata.") cpCmd.PersistentFlags().BoolVar(&raw.forceIfReadOnly, "force-if-read-only", false, "When overwriting an existing file on Windows or Azure Files, force the overwrite to work even if the existing file has its read-only attribute set") cpCmd.PersistentFlags().BoolVar(&raw.backupMode, common.BackupModeFlagName, false, "Activates Windows' SeBackupPrivilege for uploads, or SeRestorePrivilege for downloads, to allow AzCopy to see read all files, regardless of their file system permissions, and to restore all permissions. Requires that the account running AzCopy already has these permissions (e.g. has Administrator rights or is a member of the 'Backup Operators' group). All this flag does is activate privileges that the account already has") diff --git a/cmd/copyEnumeratorInit.go b/cmd/copyEnumeratorInit.go index d54b0e823..c3541046e 100755 --- a/cmd/copyEnumeratorInit.go +++ b/cmd/copyEnumeratorInit.go @@ -28,52 +28,38 @@ type BucketToContainerNameResolver interface { ResolveName(bucketName string) (string, error) } -func (cca *CookedCopyCmdArgs) initEnumerator(jobPartOrder common.CopyJobPartOrderRequest, ctx context.Context) (*CopyEnumerator, error) { - var traverser ResourceTraverser - - srcCredInfo := common.CredentialInfo{} - var isPublic bool +func (cca *CookedCopyCmdArgs) validateSourceDir(traverser ResourceTraverser) error { var err error - - if srcCredInfo, isPublic, err = GetCredentialInfoForLocation(ctx, cca.FromTo.From(), cca.Source.Value, cca.Source.SAS, true, cca.CpkOptions); err != nil { - return nil, err - // If S2S and source takes OAuthToken as its cred type (OR) source takes anonymous as its cred type, but it's not public and there's no SAS - } else if cca.FromTo.IsS2S() && - ((srcCredInfo.CredentialType == common.ECredentialType.OAuthToken() && cca.FromTo.To() != common.ELocation.Blob()) || // Blob can forward OAuth tokens - (srcCredInfo.CredentialType == common.ECredentialType.Anonymous() && !isPublic && cca.Source.SAS == "")) { - return nil, errors.New("a SAS token (or S3 access key) is required as a part of the source in S2S transfers, unless the source is a public resource, or the destination is blob storage") - } - - if cca.Source.SAS != "" && cca.FromTo.IsS2S() && jobPartOrder.CredentialInfo.CredentialType == common.ECredentialType.OAuthToken() { - glcm.Info("Authentication: If the source and destination accounts are in the same AAD tenant & the user/spn/msi has appropriate permissions on both, the source SAS token is not required and OAuth can be used round-trip.") + // Ensure we're only copying a directory under valid conditions + cca.IsSourceDir, err = traverser.IsDirectory(true) + if cca.IsSourceDir && + !cca.Recursive && // Copies the folder & everything under it + !cca.StripTopDir { // Copies only everything under it + // todo: dir only transfer, also todo: support syncing the root folder's acls on sync. + return errors.New("cannot use directory as source without --recursive or a trailing wildcard (/*)") } - - if cca.FromTo.IsS2S() { - jobPartOrder.S2SSourceCredentialType = srcCredInfo.CredentialType - - if jobPartOrder.S2SSourceCredentialType.IsAzureOAuth() { - uotm := GetUserOAuthTokenManagerInstance() - // get token from env var or cache - if tokenInfo, err := uotm.GetTokenInfo(ctx); err != nil { - return nil, err - } else { - cca.credentialInfo.OAuthTokenInfo = *tokenInfo - jobPartOrder.CredentialInfo.OAuthTokenInfo = *tokenInfo - } - } + // check if error is file not found - if it is then we need to make sure it's not a wild card + if err != nil && strings.EqualFold(err.Error(), common.FILE_NOT_FOUND) && !cca.StripTopDir { + return err } + return nil +} +func (cca *CookedCopyCmdArgs) initEnumerator(jobPartOrder common.CopyJobPartOrderRequest, srcCredInfo common.CredentialInfo, ctx context.Context) (*CopyEnumerator, error) { + var traverser ResourceTraverser + var err error jobPartOrder.CpkOptions = cca.CpkOptions jobPartOrder.PreserveSMBPermissions = cca.preservePermissions jobPartOrder.PreserveSMBInfo = cca.preserveSMBInfo - jobPartOrder.PreservePOSIXProperties = cca.preservePOSIXProperties + // We set preservePOSIXProperties if the customer has explicitly asked for this in transfer or if it is just a Posix-property only transfer + jobPartOrder.PreservePOSIXProperties = cca.preservePOSIXProperties || (cca.ForceWrite == common.EOverwriteOption.PosixProperties()) // Infer on download so that we get LMT and MD5 on files download // On S2S transfers the following rules apply: // If preserve properties is enabled, but get properties in backend is disabled, turn it on // If source change validation is enabled on files to remote, turn it on (consider a separate flag entirely?) getRemoteProperties := cca.ForceWrite == common.EOverwriteOption.IfSourceNewer() || - (cca.FromTo.From() == common.ELocation.File() && !cca.FromTo.To().IsRemote()) || // If download, we still need LMT and MD5 from files. + (cca.FromTo.From() == common.ELocation.File() && !cca.FromTo.To().IsRemote()) || // If it's a download, we still need LMT and MD5 from files. (cca.FromTo.From() == common.ELocation.File() && cca.FromTo.To().IsRemote() && (cca.s2sSourceChangeValidation || cca.IncludeAfter != nil || cca.IncludeBefore != nil)) || // If S2S from File to *, and sourceChangeValidation is enabled, we get properties so that we have LMTs. Likewise, if we are using includeAfter or includeBefore, which require LMTs. (cca.FromTo.From().IsRemote() && cca.FromTo.To().IsRemote() && cca.s2sPreserveProperties && !cca.s2sGetPropertiesInBackend) // If S2S and preserve properties AND get properties in backend is on, turn this off, as properties will be obtained in the backend. jobPartOrder.S2SGetPropertiesInBackend = cca.s2sPreserveProperties && !getRemoteProperties && cca.s2sGetPropertiesInBackend // Infer GetProperties if GetPropertiesInBackend is enabled. @@ -85,24 +71,20 @@ func (cca *CookedCopyCmdArgs) initEnumerator(jobPartOrder common.CopyJobPartOrde traverser, err = InitResourceTraverser(cca.Source, cca.FromTo.From(), &ctx, &srcCredInfo, &cca.FollowSymlinks, cca.ListOfFilesChannel, cca.Recursive, getRemoteProperties, cca.IncludeDirectoryStubs, cca.permanentDeleteOption, func(common.EntityType) {}, cca.ListOfVersionIDs, - cca.S2sPreserveBlobTags, azcopyLogVerbosity.ToPipelineLogLevel(), cca.CpkOptions, nil /* errorChannel */) + cca.S2sPreserveBlobTags, common.ESyncHashType.None(), azcopyLogVerbosity.ToPipelineLogLevel(), cca.CpkOptions, nil /* errorChannel */) if err != nil { return nil, err } - // Ensure we're only copying a directory under valid conditions - isSourceDir := traverser.IsDirectory(true) - if isSourceDir && - !cca.Recursive && // Copies the folder & everything under it - !cca.StripTopDir { // Copies only everything under it - // todo: dir only transfer, also todo: support syncing the root folder's acls on sync. - return nil, errors.New("cannot use directory as source without --recursive or a trailing wildcard (/*)") + err = cca.validateSourceDir(traverser) + if err != nil { + return nil, err } - // Check if the destination is a directory so we can correctly decide where our files land + // Check if the destination is a directory to correctly decide where our files land isDestDir := cca.isDestDirectory(cca.Destination, &ctx) - if cca.ListOfVersionIDs != nil && (!(cca.FromTo == common.EFromTo.BlobLocal() || cca.FromTo == common.EFromTo.BlobTrash()) || isSourceDir || !isDestDir) { + if cca.ListOfVersionIDs != nil && (!(cca.FromTo == common.EFromTo.BlobLocal() || cca.FromTo == common.EFromTo.BlobTrash()) || cca.IsSourceDir || !isDestDir) { log.Fatalf("Either source is not a blob or destination is not a local folder") } srcLevel, err := DetermineLocationLevel(cca.Source.Value, cca.FromTo.From(), true) @@ -361,13 +343,14 @@ func (cca *CookedCopyCmdArgs) isDestDirectory(dst common.ResourceString, ctx *co rt, err := InitResourceTraverser(dst, cca.FromTo.To(), ctx, &dstCredInfo, nil, nil, false, false, false, common.EPermanentDeleteOption.None(), - func(common.EntityType) {}, cca.ListOfVersionIDs, false, pipeline.LogNone, cca.CpkOptions, nil /* errorChannel */) + func(common.EntityType) {}, cca.ListOfVersionIDs, false, common.ESyncHashType.None(), pipeline.LogNone, cca.CpkOptions, nil /* errorChannel */) if err != nil { return false } - return rt.IsDirectory(false) + isDir, _ := rt.IsDirectory(false) + return isDir } // Initialize the modular filters outside of copy to increase readability. @@ -459,7 +442,7 @@ func (cca *CookedCopyCmdArgs) createDstContainer(containerName string, dstWithSA if dstCredInfo, _, err = GetCredentialInfoForLocation(ctx, cca.FromTo.To(), cca.Destination.Value, cca.Destination.SAS, false, cca.CpkOptions); err != nil { return err } - + // TODO: we can pass cred here as well dstPipeline, err := InitPipeline(ctx, cca.FromTo.To(), dstCredInfo, logLevel.ToPipelineLogLevel()) if err != nil { return diff --git a/cmd/copyEnumeratorInit_test.go b/cmd/copyEnumeratorInit_test.go new file mode 100644 index 000000000..d99d2e5ca --- /dev/null +++ b/cmd/copyEnumeratorInit_test.go @@ -0,0 +1,164 @@ +// Copyright © 2017 Microsoft +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in +// all copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +// THE SOFTWARE. + +package cmd + +import ( + "context" + "github.com/Azure/azure-storage-azcopy/v10/common" + "github.com/Azure/azure-storage-azcopy/v10/ste" + "github.com/Azure/azure-storage-blob-go/azblob" + chk "gopkg.in/check.v1" +) + +type copyEnumeratorSuite struct{} + +var _ = chk.Suite(©EnumeratorSuite{}) + +// ============================================= BLOB TRAVERSER TESTS ======================================= +func (ce *copyEnumeratorSuite) TestValidateSourceDirThatExists(c *chk.C) { + bsu := getBSU() + + // Generate source container and blobs + containerURL, containerName := createNewContainer(c, bsu) + defer deleteContainer(c, containerURL) + c.Assert(containerURL, chk.NotNil) + + dirName := "source_dir" + createNewDirectoryStub(c, containerURL, dirName) + // set up to create blob traverser + ctx := context.WithValue(context.TODO(), ste.ServiceAPIVersionOverride, ste.DefaultServiceApiVersion) + p := azblob.NewPipeline(azblob.NewAnonymousCredential(), azblob.PipelineOptions{}) + + // List + rawBlobURLWithSAS := scenarioHelper{}.getRawBlobURLWithSAS(c, containerName, dirName) + blobTraverser := newBlobTraverser(&rawBlobURLWithSAS, p, ctx, true, true, func(common.EntityType) {}, false, common.CpkOptions{}, false, false, false) + + // dir but recursive flag not set - fail + cca := CookedCopyCmdArgs{StripTopDir: false, Recursive: false} + err := cca.validateSourceDir(blobTraverser) + c.Assert(err.Error(), chk.Equals, "cannot use directory as source without --recursive or a trailing wildcard (/*)") + + // dir but recursive flag set - pass + cca.Recursive = true + err = cca.validateSourceDir(blobTraverser) + c.Assert(err, chk.IsNil) + c.Assert(cca.IsSourceDir, chk.Equals, true) +} + +func (ce *copyEnumeratorSuite) TestValidateSourceDirDoesNotExist(c *chk.C) { + bsu := getBSU() + + // Generate source container and blobs + containerURL, containerName := createNewContainer(c, bsu) + defer deleteContainer(c, containerURL) + c.Assert(containerURL, chk.NotNil) + + dirName := "source_dir/" + // set up to create blob traverser + ctx := context.WithValue(context.TODO(), ste.ServiceAPIVersionOverride, ste.DefaultServiceApiVersion) + p := azblob.NewPipeline(azblob.NewAnonymousCredential(), azblob.PipelineOptions{}) + + // List + rawBlobURLWithSAS := scenarioHelper{}.getRawBlobURLWithSAS(c, containerName, dirName) + blobTraverser := newBlobTraverser(&rawBlobURLWithSAS, p, ctx, true, true, func(common.EntityType) {}, false, common.CpkOptions{}, false, false, false) + + // dir but recursive flag not set - fail + cca := CookedCopyCmdArgs{StripTopDir: false, Recursive: false} + err := cca.validateSourceDir(blobTraverser) + c.Assert(err.Error(), chk.Equals, "cannot use directory as source without --recursive or a trailing wildcard (/*)") + + // dir but recursive flag set - pass + cca.Recursive = true + err = cca.validateSourceDir(blobTraverser) + c.Assert(err, chk.IsNil) + c.Assert(cca.IsSourceDir, chk.Equals, true) +} + +func (ce *copyEnumeratorSuite) TestValidateSourceFileExists(c *chk.C) { + bsu := getBSU() + + // Generate source container and blobs + containerURL, containerName := createNewContainer(c, bsu) + defer deleteContainer(c, containerURL) + c.Assert(containerURL, chk.NotNil) + + fileName := "source_file" + _, fileName = createNewBlockBlob(c, containerURL, fileName) + + ctx := context.WithValue(context.TODO(), ste.ServiceAPIVersionOverride, ste.DefaultServiceApiVersion) + p := azblob.NewPipeline(azblob.NewAnonymousCredential(), azblob.PipelineOptions{}) + + // List + rawBlobURLWithSAS := scenarioHelper{}.getRawBlobURLWithSAS(c, containerName, fileName) + blobTraverser := newBlobTraverser(&rawBlobURLWithSAS, p, ctx, true, true, func(common.EntityType) {}, false, common.CpkOptions{}, false, false, false) + + cca := CookedCopyCmdArgs{StripTopDir: false, Recursive: false} + err := cca.validateSourceDir(blobTraverser) + c.Assert(err, chk.IsNil) + c.Assert(cca.IsSourceDir, chk.Equals, false) +} + +func (ce *copyEnumeratorSuite) TestValidateSourceFileDoesNotExist(c *chk.C) { + bsu := getBSU() + + // Generate source container and blobs + containerURL, containerName := createNewContainer(c, bsu) + defer deleteContainer(c, containerURL) + c.Assert(containerURL, chk.NotNil) + + fileName := "source_file" + + ctx := context.WithValue(context.TODO(), ste.ServiceAPIVersionOverride, ste.DefaultServiceApiVersion) + p := azblob.NewPipeline(azblob.NewAnonymousCredential(), azblob.PipelineOptions{}) + + // List + rawBlobURLWithSAS := scenarioHelper{}.getRawBlobURLWithSAS(c, containerName, fileName) + blobTraverser := newBlobTraverser(&rawBlobURLWithSAS, p, ctx, true, true, func(common.EntityType) {}, false, common.CpkOptions{}, false, false, false) + + cca := CookedCopyCmdArgs{StripTopDir: false, Recursive: false} + err := cca.validateSourceDir(blobTraverser) + c.Assert(err.Error(), chk.Equals, common.FILE_NOT_FOUND) + c.Assert(cca.IsSourceDir, chk.Equals, false) +} + +func (ce *copyEnumeratorSuite) TestValidateSourceWithWildCard(c *chk.C) { + bsu := getBSU() + + // Generate source container and blobs + containerURL, containerName := createNewContainer(c, bsu) + defer deleteContainer(c, containerURL) + c.Assert(containerURL, chk.NotNil) + + dirName := "source_dir_does_not_exist" + // set up to create blob traverser + ctx := context.WithValue(context.TODO(), ste.ServiceAPIVersionOverride, ste.DefaultServiceApiVersion) + p := azblob.NewPipeline(azblob.NewAnonymousCredential(), azblob.PipelineOptions{}) + + // List + rawBlobURLWithSAS := scenarioHelper{}.getRawBlobURLWithSAS(c, containerName, dirName) + blobTraverser := newBlobTraverser(&rawBlobURLWithSAS, p, ctx, true, true, func(common.EntityType) {}, false, common.CpkOptions{}, false, false, false) + + // dir but recursive flag not set - fail + cca := CookedCopyCmdArgs{StripTopDir: true, Recursive: false} + err := cca.validateSourceDir(blobTraverser) + c.Assert(err, chk.IsNil) + c.Assert(cca.IsSourceDir, chk.Equals, false) +} diff --git a/cmd/credentialUtil.go b/cmd/credentialUtil.go index ea6eab324..a5692ead3 100644 --- a/cmd/credentialUtil.go +++ b/cmd/credentialUtil.go @@ -80,8 +80,14 @@ func GetOAuthTokenManagerInstance() (*common.UserOAuthTokenManager, error) { var err error autoOAuth.Do(func() { var lca loginCmdArgs - if glcm.GetEnvironmentVariable(common.EEnvironmentVariable.AutoLoginType()) == "" { - err = errors.New("no login type specified") + autoLoginType := strings.ToUpper(glcm.GetEnvironmentVariable(common.EEnvironmentVariable.AutoLoginType())) + if autoLoginType == "" { + glcm.Info("Autologin not specified.") + return + } + + if autoLoginType != "SPN" && autoLoginType != "MSI" && autoLoginType != "DEVICE" { + glcm.Error("Invalid Auto-login type specified.") return } @@ -113,7 +119,9 @@ func GetOAuthTokenManagerInstance() (*common.UserOAuthTokenManager, error) { } lca.persistToken = false - err = lca.process() + if err = lca.process(); err != nil { + glcm.Error(fmt.Sprintf("Failed to perform Auto-login: %v.", err.Error())) + } }) if err != nil { @@ -462,7 +470,7 @@ func checkAuthSafeForTarget(ct common.CredentialType, resource, extraSuffixesAAD // something like https://someApi.execute-api.someRegion.amazonaws.com is AWS but is a customer- // written code, not S3. ok := false - host := "" + host := "" u, err := url.Parse(resource) if err == nil { host = u.Host @@ -475,14 +483,14 @@ func checkAuthSafeForTarget(ct common.CredentialType, resource, extraSuffixesAAD if !ok { return fmt.Errorf( - "s3 authentication to %s is not currently suported in AzCopy", host) + "s3 authentication to %s is not currently supported in AzCopy", host) } case common.ECredentialType.GoogleAppCredentials(): if resourceType != common.ELocation.GCP() { return fmt.Errorf("Google Application Credentials to %s is not valid", resourceType.String()) } - host := "" + host := "" u, err := url.Parse(resource) if err == nil { host = u.Host @@ -550,13 +558,6 @@ func doGetCredentialTypeForLocation(ctx context.Context, location common.Locatio credType = common.ECredentialType.Anonymous() case common.ELocation.Blob(): credType, isPublic, err = getBlobCredentialType(ctx, resource, isSource, resourceSAS, cpkOptions) - if azErr, ok := err.(common.AzError); ok && azErr.Equals(common.EAzError.LoginCredMissing()) { - _, autoLoginErr := GetOAuthTokenManagerInstance() - if autoLoginErr == nil { - err = nil // Autologin succeeded, reset original error - credType, isPublic = common.ECredentialType.OAuthToken(), false - } - } if err != nil { return common.ECredentialType.Unknown(), false, err } @@ -566,13 +567,6 @@ func doGetCredentialTypeForLocation(ctx context.Context, location common.Locatio } case common.ELocation.BlobFS(): credType, err = getBlobFSCredentialType(ctx, resource, resourceSAS != "") - if azErr, ok := err.(common.AzError); ok && azErr.Equals(common.EAzError.LoginCredMissing()) { - _, autoLoginErr := GetOAuthTokenManagerInstance() - if autoLoginErr == nil { - err = nil // Autologin succeeded, reset original error - credType, isPublic = common.ECredentialType.OAuthToken(), false - } - } if err != nil { return common.ECredentialType.Unknown(), false, err } @@ -661,11 +655,14 @@ func getCredentialType(ctx context.Context, raw rawFromToInfo, cpkOptions common // pipeline factory methods // ============================================================================================== func createBlobPipeline(ctx context.Context, credInfo common.CredentialInfo, logLevel pipeline.LogLevel) (pipeline.Pipeline, error) { - credential := common.CreateBlobCredential(ctx, credInfo, common.CredentialOpOptions{ - // LogInfo: glcm.Info, //Comment out for debugging - LogError: glcm.Info, - }) - + // are we getting dest token? + credential := credInfo.SourceBlobToken + if credential == nil { + credential = common.CreateBlobCredential(ctx, credInfo, common.CredentialOpOptions{ + // LogInfo: glcm.Info, //Comment out for debugging + LogError: glcm.Info, + }) + } logOption := pipeline.LogOptions{} if azcopyScanningLogger != nil { logOption = pipeline.LogOptions{ diff --git a/cmd/jobsResume.go b/cmd/jobsResume.go index 2ef1d5fa5..97edac966 100644 --- a/cmd/jobsResume.go +++ b/cmd/jobsResume.go @@ -152,15 +152,33 @@ func (cca *resumeJobController) ReportProgressOrExit(lcm common.LifecycleMgr) (t return string(jsonOutput) } else { return fmt.Sprintf( - "\n\nJob %s summary\nElapsed Time (Minutes): %v\nNumber of File Transfers: %v\nNumber of Folder Property Transfers: %v\nTotal Number Of Transfers: %v\nNumber of Transfers Completed: %v\nNumber of Transfers Failed: %v\nNumber of Transfers Skipped: %v\nTotalBytesTransferred: %v\nFinal Job Status: %v\n", + ` + +Job %s summary +Elapsed Time (Minutes): %v +Number of File Transfers: %v +Number of Folder Property Transfers: %v +Total Number Of Transfers: %v +Number of File Transfers Completed: %v +Number of Folder Transfers Completed: %v +Number of File Transfers Failed: %v +Number of Folder Transfers Failed: %v +Number of File Transfers Skipped: %v +Number of Folder Transfers Skipped: %v +TotalBytesTransferred: %v +Final Job Status: %v +`, summary.JobID.String(), jobsAdmin.ToFixed(duration.Minutes(), 4), summary.FileTransfers, summary.FolderPropertyTransfers, summary.TotalTransfers, - summary.TransfersCompleted, - summary.TransfersFailed, - summary.TransfersSkipped, + summary.TransfersCompleted-summary.FoldersCompleted, + summary.FoldersCompleted, + summary.TransfersFailed-summary.FoldersFailed, + summary.FoldersFailed, + summary.TransfersSkipped-summary.FoldersSkipped, + summary.FoldersSkipped, summary.TotalBytesTransferred, summary.JobStatus) } diff --git a/cmd/jobsShow.go b/cmd/jobsShow.go index 255272453..921db7942 100644 --- a/cmd/jobsShow.go +++ b/cmd/jobsShow.go @@ -147,14 +147,30 @@ func PrintJobProgressSummary(summary common.ListJobSummaryResponse) { } return fmt.Sprintf( - "\nJob %s summary\nNumber of File Transfers: %v\nNumber of Folder Property Transfers: %v\nTotal Number Of Transfers: %v\nNumber of Transfers Completed: %v\nNumber of Transfers Failed: %v\nNumber of Transfers Skipped: %v\nPercent Complete (approx): %.1f\nFinal Job Status: %v\n", + ` +Job %s summary +Number of File Transfers: %v +Number of Folder Property Transfers: %v +Total Number Of Transfers: %v +Number of File Transfers Completed: %v +Number of Folder Transfers Completed: %v +Number of File Transfers Failed: %v +Number of Folder Transfers Failed: %v +Number of File Transfers Skipped: %v +Number of Folder Transfers Skipped: %v +Percent Complete (approx): %.1f +Final Job Status: %v +`, summary.JobID.String(), summary.FileTransfers, summary.FolderPropertyTransfers, summary.TotalTransfers, - summary.TransfersCompleted, - summary.TransfersFailed, - summary.TransfersSkipped, + summary.TransfersCompleted-summary.FoldersCompleted, + summary.FoldersCompleted, + summary.TransfersFailed-summary.FoldersFailed, + summary.FoldersFailed, + summary.TransfersSkipped-summary.FoldersSkipped, + summary.FoldersSkipped, summary.PercentComplete, // noted as approx in the format string because won't include in-flight files if this Show command is run from a different process summary.JobStatus, ) diff --git a/cmd/list.go b/cmd/list.go index 571d45865..ef2f14b75 100755 --- a/cmd/list.go +++ b/cmd/list.go @@ -224,7 +224,7 @@ func (cooked cookedListCmdArgs) HandleListContainerCommand() (err error) { traverser, err := InitResourceTraverser(source, cooked.location, &ctx, &credentialInfo, nil, nil, true, false, false, common.EPermanentDeleteOption.None(), func(common.EntityType) {}, - nil, false, pipeline.LogNone, common.CpkOptions{}, nil /* errorChannel */) + nil, false, common.ESyncHashType.None(), pipeline.LogNone, common.CpkOptions{}, nil /* errorChannel */) if err != nil { return fmt.Errorf("failed to initialize traverser: %s", err.Error()) diff --git a/cmd/pathUtils.go b/cmd/pathUtils.go index d12aa87ee..7cbaa3f57 100644 --- a/cmd/pathUtils.go +++ b/cmd/pathUtils.go @@ -297,7 +297,7 @@ func splitQueryFromSaslessResource(resource string, loc common.Location) (mainUr if u, err := url.Parse(resource); err == nil && u.Query().Get("sig") != "" { panic("this routine can only be called after the SAS has been removed") // because, for security reasons, we don't want SASs returned in queryAndFragment, since - // we wil persist that (but we don't want to persist SAS's) + // we will persist that (but we don't want to persist SAS's) } // Work directly with a string-based format, so that we get both snapshot identifiers AND any other unparsed params diff --git a/cmd/removeEnumerator.go b/cmd/removeEnumerator.go index c598a30cf..71f6e9c85 100755 --- a/cmd/removeEnumerator.go +++ b/cmd/removeEnumerator.go @@ -51,7 +51,7 @@ func newRemoveEnumerator(cca *CookedCopyCmdArgs) (enumerator *CopyEnumerator, er sourceTraverser, err = InitResourceTraverser(cca.Source, cca.FromTo.From(), &ctx, &cca.credentialInfo, nil, cca.ListOfFilesChannel, cca.Recursive, false, cca.IncludeDirectoryStubs, cca.permanentDeleteOption, func(common.EntityType) {}, cca.ListOfVersionIDs, false, - azcopyLogVerbosity.ToPipelineLogLevel(), cca.CpkOptions, nil /* errorChannel */) + common.ESyncHashType.None(), azcopyLogVerbosity.ToPipelineLogLevel(), cca.CpkOptions, nil /* errorChannel */) // report failure to create traverser if err != nil { diff --git a/cmd/root.go b/cmd/root.go index f09b40997..f6c1543f8 100644 --- a/cmd/root.go +++ b/cmd/root.go @@ -56,6 +56,7 @@ var azcopyAwaitContinue bool var azcopyAwaitAllowOpenFiles bool var azcopyScanningLogger common.ILoggerResetable var azcopyCurrentJobID common.JobID +var azcopySkipVersionCheck bool type jobLoggerInfo struct { jobID common.JobID @@ -151,12 +152,14 @@ var rootCmd = &cobra.Command{ common.IncludeAfterFlagName, IncludeAfterDateFilter{}.FormatAsUTC(adjustedTime)) jobsAdmin.JobsAdmin.LogToJobLog(startTimeMessage, pipeline.LogInfo) - // spawn a routine to fetch and compare the local application's version against the latest version available - // if there's a newer version that can be used, then write the suggestion to stderr - // however if this takes too long the message won't get printed - // Note: this function is necessary for non-help, non-login commands, since they don't reach the corresponding - // beginDetectNewVersion call in Execute (below) - beginDetectNewVersion() + if !azcopySkipVersionCheck { + // spawn a routine to fetch and compare the local application's version against the latest version available + // if there's a newer version that can be used, then write the suggestion to stderr + // however if this takes too long the message won't get printed + // Note: this function is necessary for non-help, non-login commands, since they don't reach the corresponding + // beginDetectNewVersion call in Execute (below) + beginDetectNewVersion() + } if debugSkipFiles != "" { for _, v := range strings.Split(debugSkipFiles, ";") { @@ -213,6 +216,8 @@ func init() { rootCmd.PersistentFlags().StringVar(&cmdLineExtraSuffixesAAD, trustedSuffixesNameAAD, "", "Specifies additional domain suffixes where Azure Active Directory login tokens may be sent. The default is '"+ trustedSuffixesAAD+"'. Any listed here are added to the default. For security, you should only put Microsoft Azure domains here. Separate multiple entries with semi-colons.") + rootCmd.PersistentFlags().BoolVar(&azcopySkipVersionCheck, "skip-version-check", false, "Do not perform the version check at startup. Intended for automation scenarios & airgapped use.") + // Note: this is due to Windows not supporting signals properly rootCmd.PersistentFlags().BoolVar(&cancelFromStdin, "cancel-from-stdin", false, "Used by partner teams to send in `cancel` through stdin to stop a job.") @@ -292,7 +297,7 @@ func beginDetectNewVersion() chan struct{} { executableName := executablePathSegments[len(executablePathSegments)-1] // output in info mode instead of stderr, as it was crashing CI jobs of some people - glcm.Info(executableName + ": A newer version " + remoteVersion + " is available to download\n") + glcm.Info(executableName + " " + common.AzcopyVersion + ": A newer version " + remoteVersion + " is available to download\n") } // let caller know we have finished, if they want to know diff --git a/cmd/setPropertiesEnumerator.go b/cmd/setPropertiesEnumerator.go index 6c6f730ba..d2e35afd9 100755 --- a/cmd/setPropertiesEnumerator.go +++ b/cmd/setPropertiesEnumerator.go @@ -50,7 +50,7 @@ func setPropertiesEnumerator(cca *CookedCopyCmdArgs) (enumerator *CopyEnumerator sourceTraverser, err = InitResourceTraverser(cca.Source, cca.FromTo.From(), &ctx, &cca.credentialInfo, nil, cca.ListOfFilesChannel, cca.Recursive, false, cca.IncludeDirectoryStubs, cca.permanentDeleteOption, func(common.EntityType) {}, cca.ListOfVersionIDs, false, - azcopyLogVerbosity.ToPipelineLogLevel(), cca.CpkOptions, nil /* errorChannel */) + common.ESyncHashType.None(), azcopyLogVerbosity.ToPipelineLogLevel(), cca.CpkOptions, nil /* errorChannel */) // report failure to create traverser if err != nil { diff --git a/cmd/sync.go b/cmd/sync.go index 37e9f657c..7bfa922a2 100644 --- a/cmd/sync.go +++ b/cmd/sync.go @@ -24,11 +24,13 @@ import ( "context" "encoding/json" "fmt" - "github.com/Azure/azure-storage-azcopy/v10/jobsAdmin" + "runtime" "strings" "sync/atomic" "time" + "github.com/Azure/azure-storage-azcopy/v10/jobsAdmin" + "github.com/Azure/azure-pipeline-go/pipeline" "github.com/Azure/azure-storage-azcopy/v10/common" @@ -54,6 +56,7 @@ type rawSyncCmdArgs struct { legacyExclude string // for warning messages only includeRegex string excludeRegex string + compareHash string preservePermissions bool preserveSMBPermissions bool // deprecated and synonymous with preservePermissions @@ -143,7 +146,12 @@ func (raw *rawSyncCmdArgs) cook() (cookedSyncCmdArgs, error) { if loc := InferArgumentLocation(raw.dst); loc == common.ELocation.BlobFS() { raw.dst = strings.Replace(raw.dst, ".dfs", ".blob", 1) - glcm.Info("Sync operates only on blob endpoint. Switching to use blob endpoint on destination account.") + msg := fmt.Sprintf("Sync operates only on blob endpoint. Switching to use blob endpoint on destination account. There are some limitations when switching endpoints. " + + "Please refer to https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-known-issues#blob-storage-apis") + glcm.Info(msg) + if azcopyScanningLogger != nil { + azcopyScanningLogger.Log(pipeline.LogInfo, msg) + } dstHNS = true } @@ -239,11 +247,7 @@ func (raw *rawSyncCmdArgs) cook() (cookedSyncCmdArgs, error) { cooked.includeFileAttributes = raw.parsePatterns(raw.includeFileAttributes) cooked.excludeFileAttributes = raw.parsePatterns(raw.excludeFileAttributes) - cooked.preserveSMBInfo = areBothLocationsSMBAware(cooked.fromTo) - // If user has explicitly specified not to copy SMB Information, set cooked.preserveSMBInfo to false - if !raw.preserveSMBInfo { - cooked.preserveSMBInfo = false - } + cooked.preserveSMBInfo = raw.preserveSMBInfo && areBothLocationsSMBAware(cooked.fromTo) if err = validatePreserveSMBPropertyOption(cooked.preserveSMBInfo, cooked.fromTo, nil, "preserve-smb-info"); err != nil { return cooked, err @@ -271,6 +275,17 @@ func (raw *rawSyncCmdArgs) cook() (cookedSyncCmdArgs, error) { return cooked, fmt.Errorf("in order to use --preserve-posix-properties, both the source and destination must be POSIX-aware (valid pairings are Linux->Blob, Blob->Linux, Blob->Blob)") } + if err = cooked.compareHash.Parse(raw.compareHash); err != nil { + return cooked, err + } else { + switch cooked.compareHash { + case common.ESyncHashType.MD5(): + // Save any new MD5s on files we download. + raw.putMd5 = true + default: // no need to put a hash of any kind. + } + } + cooked.putMd5 = raw.putMd5 if err = validatePutMd5(cooked.putMd5, cooked.fromTo); err != nil { return cooked, err @@ -381,6 +396,7 @@ type cookedSyncCmdArgs struct { excludeRegex []string // options + compareHash common.SyncHashType preservePermissions common.PreservePermissionsOption preserveSMBInfo bool preservePOSIXProperties bool @@ -751,7 +767,11 @@ func init() { // TODO: enable for copy with IfSourceNewer // smb info/permissions can be persisted in the scenario of File -> File syncCmd.PersistentFlags().BoolVar(&raw.preserveSMBPermissions, "preserve-smb-permissions", false, "False by default. Preserves SMB ACLs between aware resources (Azure Files). This flag applies to both files and folders, unless a file-only filter is specified (e.g. include-pattern).") - syncCmd.PersistentFlags().BoolVar(&raw.preserveSMBInfo, "preserve-smb-info", true, "For SMB-aware locations, flag will be set to true by default. Preserves SMB property info (last write time, creation time, attribute bits) between SMB-aware resources (Azure Files). This flag applies to both files and folders, unless a file-only filter is specified (e.g. include-pattern). The info transferred for folders is the same as that for files, except for Last Write Time which is not preserved for folders. ") + syncCmd.PersistentFlags().BoolVar(&raw.preserveSMBInfo, "preserve-smb-info", (runtime.GOOS == "windows"), "Preserves SMB property info (last write time, creation time, attribute bits)"+ + " between SMB-aware resources (Windows and Azure Files). On windows, this flag will be set to true by default. If the source or destination is a "+ + "volume mounted on Linux using SMB protocol, this flag will have to be explicitly set to true. Only the attribute bits supported by Azure Files "+ + "will be transferred; any others will be ignored. This flag applies to both files and folders, unless a file-only filter is specified "+ + "(e.g. include-pattern). The info transferred for folders is the same as that for files, except for Last Write Time which is never preserved for folders.") syncCmd.PersistentFlags().BoolVar(&raw.preservePOSIXProperties, "preserve-posix-properties", false, "'Preserves' property info gleaned from stat or statx into object metadata.") // TODO: enable when we support local <-> File @@ -785,6 +805,8 @@ func init() { syncCmd.PersistentFlags().BoolVar(&raw.mirrorMode, "mirror-mode", false, "Disable last-modified-time based comparison and overwrites the conflicting files and blobs at the destination if this flag is set to true. Default is false") syncCmd.PersistentFlags().BoolVar(&raw.dryrun, "dry-run", false, "Prints the path of files that would be copied or removed by the sync command. This flag does not copy or remove the actual files.") + syncCmd.PersistentFlags().StringVar(&raw.compareHash, "compare-hash", "None", "Inform sync to rely on hashes as an alternative to LMT. Missing hashes at a remote source will throw an error. (None, MD5) Default: None") + // temp, to assist users with change in param names, by providing a clearer message when these obsolete ones are accidentally used syncCmd.PersistentFlags().StringVar(&raw.legacyInclude, "include", "", "Legacy include param. DO NOT USE") syncCmd.PersistentFlags().StringVar(&raw.legacyExclude, "exclude", "", "Legacy exclude param. DO NOT USE") diff --git a/cmd/syncComparator.go b/cmd/syncComparator.go index 4d17c6dcf..4b0e69793 100644 --- a/cmd/syncComparator.go +++ b/cmd/syncComparator.go @@ -20,7 +20,35 @@ package cmd -import "strings" +import ( + "fmt" + "github.com/Azure/azure-pipeline-go/pipeline" + "github.com/Azure/azure-storage-azcopy/v10/common" + "reflect" + "strings" +) + +const ( + syncSkipReasonTime = "the source has an older LMT than the destination" + syncSkipReasonMissingHash = "the source lacks an associated hash; please upload with --put-md5" + syncSkipReasonSameHash = "the source has the same hash" + syncOverwriteReasonNewerHash = "the source has a differing hash" + syncOverwriteResaonNewerLMT = "the source is more recent than the destination" + syncStatusSkipped = "skipped" + syncStatusOverwritten = "overwritten" +) + +func syncComparatorLog(fileName, status, skipReason string, stdout bool) { + out := fmt.Sprintf("File %s was %s because %s", fileName, status, skipReason) + + if azcopyScanningLogger != nil { + azcopyScanningLogger.Log(pipeline.LogInfo, out) + } + + if stdout { + glcm.Info(out) + } +} // with the help of an objectIndexer containing the source objects // find out the destination objects that should be transferred @@ -35,11 +63,14 @@ type syncDestinationComparator struct { // storing the source objects sourceIndex *objectIndexer + comparisonHashType common.SyncHashType + + preferSMBTime bool disableComparison bool } -func newSyncDestinationComparator(i *objectIndexer, copyScheduler, cleaner objectProcessor, disableComparison bool) *syncDestinationComparator { - return &syncDestinationComparator{sourceIndex: i, copyTransferScheduler: copyScheduler, destinationCleaner: cleaner, disableComparison: disableComparison} +func newSyncDestinationComparator(i *objectIndexer, copyScheduler, cleaner objectProcessor, comparisonHashType common.SyncHashType, preferSMBTime, disableComparison bool) *syncDestinationComparator { + return &syncDestinationComparator{sourceIndex: i, copyTransferScheduler: copyScheduler, destinationCleaner: cleaner, preferSMBTime: preferSMBTime, disableComparison: disableComparison, comparisonHashType: comparisonHashType} } // it will only schedule transfers for destination objects that are present in the indexer but stale compared to the entry in the map @@ -57,12 +88,36 @@ func (f *syncDestinationComparator) processIfNecessary(destinationObject StoredO // if the destinationObject is present at source and stale, we transfer the up-to-date version from source if present { defer delete(f.sourceIndex.indexMap, destinationObject.relativePath) - if f.disableComparison || sourceObjectInMap.isMoreRecentThan(destinationObject) { - err := f.copyTransferScheduler(sourceObjectInMap) - if err != nil { - return err + + if f.disableComparison { + return f.copyTransferScheduler(sourceObjectInMap) + } + + if f.comparisonHashType != common.ESyncHashType.None() && sourceObjectInMap.entityType == common.EEntityType.File() { + switch f.comparisonHashType { + case common.ESyncHashType.MD5(): + if sourceObjectInMap.md5 == nil { + syncComparatorLog(sourceObjectInMap.relativePath, syncStatusSkipped, syncSkipReasonMissingHash, true) + return nil + } + + if !reflect.DeepEqual(sourceObjectInMap.md5, destinationObject.md5) { + syncComparatorLog(sourceObjectInMap.relativePath, syncStatusOverwritten, syncOverwriteReasonNewerHash, false) + + // hash inequality = source "newer" in this model. + return f.copyTransferScheduler(sourceObjectInMap) + } + default: + panic("sanity check: unsupported hash type " + f.comparisonHashType.String()) } + + syncComparatorLog(sourceObjectInMap.relativePath, syncStatusSkipped, syncSkipReasonSameHash, false) + return nil + } else if sourceObjectInMap.isMoreRecentThan(destinationObject, f.preferSMBTime) { + return f.copyTransferScheduler(sourceObjectInMap) } + + syncComparatorLog(sourceObjectInMap.relativePath, syncStatusOverwritten, syncOverwriteResaonNewerLMT, false) } else { // purposefully ignore the error from destinationCleaner // it's a tolerable error, since it just means some extra destination object might hang around a bit longer @@ -82,16 +137,20 @@ type syncSourceComparator struct { // storing the destination objects destinationIndex *objectIndexer + comparisonHashType common.SyncHashType + + preferSMBTime bool disableComparison bool } -func newSyncSourceComparator(i *objectIndexer, copyScheduler objectProcessor, disableComparison bool) *syncSourceComparator { - return &syncSourceComparator{destinationIndex: i, copyTransferScheduler: copyScheduler, disableComparison: disableComparison} +func newSyncSourceComparator(i *objectIndexer, copyScheduler objectProcessor, comparisonHashType common.SyncHashType, preferSMBTime, disableComparison bool) *syncSourceComparator { + return &syncSourceComparator{destinationIndex: i, copyTransferScheduler: copyScheduler, preferSMBTime: preferSMBTime, disableComparison: disableComparison, comparisonHashType: comparisonHashType} } // it will only transfer source items that are: -// 1. not present in the map +// 1. not present in the map // 2. present but is more recent than the entry in the map +// // note: we remove the StoredObject if it is present so that when we have finished // the index will contain all objects which exist at the destination but were NOT seen at the source func (f *syncSourceComparator) processIfNecessary(sourceObject StoredObject) error { @@ -106,11 +165,38 @@ func (f *syncSourceComparator) processIfNecessary(sourceObject StoredObject) err if present { defer delete(f.destinationIndex.indexMap, relPath) - // if destination is stale, schedule source for transfer - if f.disableComparison || sourceObject.isMoreRecentThan(destinationObjectInMap) { + // if destination is stale, schedule source for transfer + if f.disableComparison { return f.copyTransferScheduler(sourceObject) } - // skip if source is more recent + + if f.comparisonHashType != common.ESyncHashType.None() && sourceObject.entityType == common.EEntityType.File() { + switch f.comparisonHashType { + case common.ESyncHashType.MD5(): + if sourceObject.md5 == nil { + syncComparatorLog(sourceObject.relativePath, syncStatusSkipped, syncSkipReasonMissingHash, true) + return nil + } + + if !reflect.DeepEqual(sourceObject.md5, destinationObjectInMap.md5) { + // hash inequality = source "newer" in this model. + syncComparatorLog(sourceObject.relativePath, syncStatusOverwritten, syncOverwriteReasonNewerHash, false) + return f.copyTransferScheduler(sourceObject) + } + default: + panic("sanity check: unsupported hash type " + f.comparisonHashType.String()) + } + + syncComparatorLog(sourceObject.relativePath, syncStatusSkipped, syncSkipReasonSameHash, false) + return nil + } else if sourceObject.isMoreRecentThan(destinationObjectInMap, f.preferSMBTime) { + // if destination is stale, schedule source + syncComparatorLog(sourceObject.relativePath, syncStatusOverwritten, syncOverwriteResaonNewerLMT, false) + return f.copyTransferScheduler(sourceObject) + } + + syncComparatorLog(sourceObject.relativePath, syncStatusSkipped, syncSkipReasonTime, false) + // skip if dest is more recent return nil } diff --git a/cmd/syncEnumerator.go b/cmd/syncEnumerator.go old mode 100755 new mode 100644 index 1a2f08bcc..1a8898ad6 --- a/cmd/syncEnumerator.go +++ b/cmd/syncEnumerator.go @@ -65,7 +65,7 @@ func (cca *cookedSyncCmdArgs) initEnumerator(ctx context.Context) (enumerator *s if entityType == common.EEntityType.File() { atomic.AddUint64(&cca.atomicSourceFilesScanned, 1) } - }, nil, cca.s2sPreserveBlobTags, azcopyLogVerbosity.ToPipelineLogLevel(), cca.cpkOptions, nil /* errorChannel */) + }, nil, cca.s2sPreserveBlobTags, cca.compareHash, azcopyLogVerbosity.ToPipelineLogLevel(), cca.cpkOptions, nil /* errorChannel */) if err != nil { return nil, err @@ -86,15 +86,18 @@ func (cca *cookedSyncCmdArgs) initEnumerator(ctx context.Context) (enumerator *s if entityType == common.EEntityType.File() { atomic.AddUint64(&cca.atomicDestinationFilesScanned, 1) } - }, nil, cca.s2sPreserveBlobTags, azcopyLogVerbosity.ToPipelineLogLevel(), cca.cpkOptions, nil /* errorChannel */) + }, nil, cca.s2sPreserveBlobTags, cca.compareHash, azcopyLogVerbosity.ToPipelineLogLevel(), cca.cpkOptions, nil /* errorChannel */) if err != nil { return nil, err } // verify that the traversers are targeting the same type of resources - if sourceTraverser.IsDirectory(true) != destinationTraverser.IsDirectory(true) { + sourceIsDir, _ := sourceTraverser.IsDirectory(true) + destIsDir, _ := destinationTraverser.IsDirectory(true) + if sourceIsDir != destIsDir { return nil, errors.New("trying to sync between different resource types (either file <-> directory or directory <-> file) which is not allowed." + - "sync must happen between source and destination of the same type, e.g. either file <-> file or directory <-> directory") + "sync must happen between source and destination of the same type, e.g. either file <-> file or directory <-> directory." + + "To make sure target is handled as a directory, add a trailing '/' to the target.") } // set up the filters in the right order @@ -155,7 +158,8 @@ func (cca *cookedSyncCmdArgs) initEnumerator(ctx context.Context) (enumerator *s // when uploading, we can delete remote objects immediately, because as we traverse the remote location // we ALREADY have available a complete map of everything that exists locally // so as soon as we see a remote destination object we can know whether it exists in the local source - comparator = newSyncDestinationComparator(indexer, transferScheduler.scheduleCopyTransfer, destCleanerFunc, cca.mirrorMode).processIfNecessary + + comparator = newSyncDestinationComparator(indexer, transferScheduler.scheduleCopyTransfer, destCleanerFunc, cca.compareHash, cca.preserveSMBInfo, cca.mirrorMode).processIfNecessary finalize = func() error { // schedule every local file that doesn't exist at the destination err = indexer.traverse(transferScheduler.scheduleCopyTransfer, filters) @@ -179,7 +183,7 @@ func (cca *cookedSyncCmdArgs) initEnumerator(ctx context.Context) (enumerator *s indexer.isDestinationCaseInsensitive = IsDestinationCaseInsensitive(cca.fromTo) // in all other cases (download and S2S), the destination is scanned/indexed first // then the source is scanned and filtered based on what the destination contains - comparator = newSyncSourceComparator(indexer, transferScheduler.scheduleCopyTransfer, cca.mirrorMode).processIfNecessary + comparator = newSyncSourceComparator(indexer, transferScheduler.scheduleCopyTransfer, cca.compareHash, cca.preserveSMBInfo, cca.mirrorMode).processIfNecessary finalize = func() error { // remove the extra files at the destination that were not present at the source diff --git a/cmd/syncIndexer.go b/cmd/syncIndexer.go index 15d99d896..df0d25121 100644 --- a/cmd/syncIndexer.go +++ b/cmd/syncIndexer.go @@ -26,8 +26,8 @@ import ( // the objectIndexer is essential for the generic sync enumerator to work // it can serve as a: -// 1. objectProcessor: accumulate a lookup map with given StoredObjects -// 2. resourceTraverser: go through the entities in the map like a traverser +// 1. objectProcessor: accumulate a lookup map with given StoredObjects +// 2. resourceTraverser: go through the entities in the map like a traverser type objectIndexer struct { indexMap map[string]StoredObject counter int diff --git a/cmd/syncProcessor.go b/cmd/syncProcessor.go index 364cf5b83..6ad2f9213 100644 --- a/cmd/syncProcessor.go +++ b/cmd/syncProcessor.go @@ -157,7 +157,11 @@ func (d *interactiveDeleteProcessor) removeImmediately(object StoredObject) (err err = d.deleter(object) if err != nil { - glcm.Info(fmt.Sprintf("error %s deleting the object %s", err.Error(), object.relativePath)) + msg := fmt.Sprintf("error %s deleting the object %s", err.Error(), object.relativePath) + glcm.Info(msg) + if azcopyScanningLogger != nil { + azcopyScanningLogger.Log(pipeline.LogInfo, msg) + } } if d.incrementDeletionCount != nil { @@ -226,14 +230,15 @@ type localFileDeleter struct { // As at version 10.4.0, we intentionally don't delete directories in sync, // even if our folder properties option suggests we should. // Why? The key difficulties are as follows, and its the third one that we don't currently have a solution for. -// 1. Timing (solvable in theory with FolderDeletionManager) -// 2. Identifying which should be removed when source does not have concept of folders (e.g. BLob) -// Probably solution is to just respect the folder properties option setting (which we already do in our delete processors) -// 3. In Azure Files case (and to a lesser extent on local disks) users may have ACLS or other properties -// set on the directories, and wish to retain those even tho the directories are empty. (Perhaps less of an issue -// when syncing from folder-aware sources that DOES NOT HAVE the directory. But still an issue when syncing from -// blob. E.g. we delete a folder because there's nothing in it right now, but really user wanted it there, -// and have set up custom ACLs on it for future use. If we delete, they lose the custom ACL setup. +// 1. Timing (solvable in theory with FolderDeletionManager) +// 2. Identifying which should be removed when source does not have concept of folders (e.g. BLob) +// Probably solution is to just respect the folder properties option setting (which we already do in our delete processors) +// 3. In Azure Files case (and to a lesser extent on local disks) users may have ACLS or other properties +// set on the directories, and wish to retain those even tho the directories are empty. (Perhaps less of an issue +// when syncing from folder-aware sources that DOES NOT HAVE the directory. But still an issue when syncing from +// blob. E.g. we delete a folder because there's nothing in it right now, but really user wanted it there, +// and have set up custom ACLs on it for future use. If we delete, they lose the custom ACL setup. +// // TODO: shall we add folder deletion support at some stage? (In cases where folderPropertiesOption says that folders should be processed) func shouldSyncRemoveFolders() bool { return false @@ -241,7 +246,11 @@ func shouldSyncRemoveFolders() bool { func (l *localFileDeleter) deleteFile(object StoredObject) error { if object.entityType == common.EEntityType.File() { - glcm.Info("Deleting extra file: " + object.relativePath) + msg := "Deleting extra file: " + object.relativePath + glcm.Info(msg) + if azcopyScanningLogger != nil { + azcopyScanningLogger.Log(pipeline.LogInfo, msg) + } return os.Remove(common.GenerateFullPath(l.rootPath, object.relativePath)) } if shouldSyncRemoveFolders() { @@ -286,7 +295,11 @@ func newRemoteResourceDeleter(rawRootURL *url.URL, p pipeline.Pipeline, ctx cont func (b *remoteResourceDeleter) delete(object StoredObject) error { if object.entityType == common.EEntityType.File() { // TODO: use b.targetLocation.String() in the next line, instead of "object", if we can make it come out as string - glcm.Info("Deleting extra object: " + object.relativePath) + msg := "Deleting extra object: " + object.relativePath + glcm.Info(msg) + if azcopyScanningLogger != nil { + azcopyScanningLogger.Log(pipeline.LogInfo, msg) + } switch b.targetLocation { case common.ELocation.Blob(): blobURLParts := azblob.NewBlobURLParts(*b.rootURL) diff --git a/cmd/zc_enumerator.go b/cmd/zc_enumerator.go index c62bcb735..3410e6201 100755 --- a/cmd/zc_enumerator.go +++ b/cmd/zc_enumerator.go @@ -92,13 +92,13 @@ type StoredObject struct { leaseDuration azblob.LeaseDurationType } -func (s *StoredObject) isMoreRecentThan(storedObject2 StoredObject) bool { +func (s *StoredObject) isMoreRecentThan(storedObject2 StoredObject, preferSMBTime bool) bool { lmtA := s.lastModifiedTime - if !s.smbLastModifiedTime.IsZero() { + if preferSMBTime && !s.smbLastModifiedTime.IsZero() { lmtA = s.smbLastModifiedTime } lmtB := storedObject2.lastModifiedTime - if !storedObject2.smbLastModifiedTime.IsZero() { + if preferSMBTime && !storedObject2.smbLastModifiedTime.IsZero() { lmtB = storedObject2.smbLastModifiedTime } @@ -138,6 +138,18 @@ func (s *StoredObject) isCompatibleWithFpo(fpo common.FolderPropertyOption) bool } } +// ErrorNoHashPresent , ErrorHashNoLongerValid, and ErrorHashNotCompatible indicate a hash is not present, not obtainable, and/or not usable. +// For the sake of best-effort, when these errors are emitted, depending on the sync hash policy +var ErrorNoHashPresent = errors.New("no hash present on file") +var ErrorHashNoLongerValid = errors.New("attached hash no longer valid") +var ErrorHashNotCompatible = errors.New("hash types do not match") + +// ErrorHashAsyncCalculation is not a strict "the hash is unobtainable", but a "the hash is not currently present". +// In effect, when it is returned, it indicates we have placed the target onto a queue to be handled later. +// It can be treated like a promise, and the item can cease processing in the immediate term. +// This option is only used locally on sync-downloads when the user has specified that azcopy should create a new hash. +var ErrorHashAsyncCalculation = errors.New("hash is calculating asynchronously") + // Returns a func that only calls inner if StoredObject isCompatibleWithFpo // We use this, so that we can easily test for compatibility in the sync deletion code (which expects an objectProcessor) func newFpoAwareProcessor(fpo common.FolderPropertyOption, inner objectProcessor) objectProcessor { @@ -274,7 +286,7 @@ func newStoredObject(morpher objectMorpher, name string, relativePath string, en // pass each StoredObject to the given objectProcessor if it passes all the filters type ResourceTraverser interface { Traverse(preprocessor objectMorpher, processor objectProcessor, filters []ObjectFilter) error - IsDirectory(isSource bool) bool + IsDirectory(isSource bool) (bool, error) // isDirectory has an isSource flag for a single exception to blob. // Blob should ONLY check remote if it's a source. // On destinations, because blobs and virtual directories can share names, we should support placing in both ways. @@ -322,7 +334,7 @@ type enumerationCounterFunc func(entityType common.EntityType) func InitResourceTraverser(resource common.ResourceString, location common.Location, ctx *context.Context, credential *common.CredentialInfo, followSymlinks *bool, listOfFilesChannel chan string, recursive, getProperties, includeDirectoryStubs bool, permanentDeleteOption common.PermanentDeleteOption, incrementEnumerationCounter enumerationCounterFunc, listOfVersionIds chan string, - s2sPreserveBlobTags bool, logLevel pipeline.LogLevel, cpkOptions common.CpkOptions, errorChannel chan ErrorFileInfo) (ResourceTraverser, error) { + s2sPreserveBlobTags bool, syncHashType common.SyncHashType, logLevel pipeline.LogLevel, cpkOptions common.CpkOptions, errorChannel chan ErrorFileInfo) (ResourceTraverser, error) { var output ResourceTraverser var p *pipeline.Pipeline @@ -350,7 +362,6 @@ func InitResourceTraverser(resource common.ResourceString, location common.Locat // Initialize the pipeline if creds and ctx is provided if ctx != nil && credential != nil { tmppipe, err := InitPipeline(*ctx, location, *credential, logLevel) - if err != nil { return nil, err } @@ -407,9 +418,9 @@ func InitResourceTraverser(resource common.ResourceString, location common.Locat globChan, includeDirectoryStubs, incrementEnumerationCounter, s2sPreserveBlobTags, logLevel, cpkOptions) } else { if ctx != nil { - output = newLocalTraverser(*ctx, resource.ValueLocal(), recursive, toFollow, incrementEnumerationCounter, errorChannel) + output = newLocalTraverser(*ctx, resource.ValueLocal(), recursive, toFollow, syncHashType, incrementEnumerationCounter, errorChannel) } else { - output = newLocalTraverser(context.TODO(), resource.ValueLocal(), recursive, toFollow, incrementEnumerationCounter, errorChannel) + output = newLocalTraverser(context.TODO(), resource.ValueLocal(), recursive, toFollow, syncHashType, incrementEnumerationCounter, errorChannel) } } case common.ELocation.Benchmark(): @@ -759,7 +770,7 @@ func getProcessingError(errin error) (ignored bool, err error) { return true, nil } - return false, err + return false, errin } func processIfPassedFilters(filters []ObjectFilter, storedObject StoredObject, processor objectProcessor) (err error) { diff --git a/cmd/zc_pipeline_init.go b/cmd/zc_pipeline_init.go index 7cfb85769..e6d87a8ab 100644 --- a/cmd/zc_pipeline_init.go +++ b/cmd/zc_pipeline_init.go @@ -3,7 +3,6 @@ package cmd import ( "context" "fmt" - "github.com/Azure/azure-pipeline-go/pipeline" "github.com/Azure/azure-storage-azcopy/v10/common" diff --git a/cmd/zc_traverser_benchmark.go b/cmd/zc_traverser_benchmark.go index 0d7ee38a7..ca9d2a6eb 100644 --- a/cmd/zc_traverser_benchmark.go +++ b/cmd/zc_traverser_benchmark.go @@ -45,8 +45,8 @@ func newBenchmarkTraverser(source string, incrementEnumerationCounter enumeratio nil } -func (t *benchmarkTraverser) IsDirectory(bool) bool { - return true +func (t *benchmarkTraverser) IsDirectory(bool) (bool, error) { + return true, nil } func (_ *benchmarkTraverser) toReversedString(i uint) string { diff --git a/cmd/zc_traverser_blob.go b/cmd/zc_traverser_blob.go index 92332fdf7..142de3b58 100644 --- a/cmd/zc_traverser_blob.go +++ b/cmd/zc_traverser_blob.go @@ -61,28 +61,30 @@ type blobTraverser struct { includeSnapshot bool includeVersion bool + + stripTopDir bool } -func (t *blobTraverser) IsDirectory(isSource bool) bool { +func (t *blobTraverser) IsDirectory(isSource bool) (bool, error) { isDirDirect := copyHandlerUtil{}.urlIsContainerOrVirtualDirectory(t.rawURL) // Skip the single blob check if we're checking a destination. // This is an individual exception for blob because blob supports virtual directories and blobs sharing the same name. if isDirDirect || !isSource { - return isDirDirect + return isDirDirect, nil } - _, _, isDirStub, err := t.getPropertiesIfSingleBlob() + _, _, isDirStub, blobErr := t.getPropertiesIfSingleBlob() - if stgErr, ok := err.(azblob.StorageError); ok { + if stgErr, ok := blobErr.(azblob.StorageError); ok { // We know for sure this is a single blob still, let it walk on through to the traverser. if stgErr.ServiceCode() == common.CPK_ERROR_SERVICE_CODE { - return false + return false, nil } } - if err == nil { - return isDirStub + if blobErr == nil { + return isDirStub, nil } blobURLParts := azblob.NewBlobURLParts(*t.rawURL) @@ -95,15 +97,21 @@ func (t *blobTraverser) IsDirectory(isSource bool) bool { msg := fmt.Sprintf("Failed to check if the destination is a folder or a file (Azure Files). Assuming the destination is a file: %s", err) azcopyScanningLogger.Log(pipeline.LogError, msg) } - return false + return false, nil } if len(resp.Segment.BlobItems) == 0 { //Not a directory - return false + if stgErr, ok := blobErr.(azblob.StorageError); ok { + // if the blob is not found return the error to throw + if stgErr.ServiceCode() == common.BLOB_NOT_FOUND { + return false, errors.New(common.FILE_NOT_FOUND) + } + } + return false, blobErr } - return true + return true, nil } func (t *blobTraverser) getPropertiesIfSingleBlob() (props *azblob.BlobGetPropertiesResponse, isBlob bool, isDirStub bool, err error) { @@ -271,6 +279,9 @@ func (t *blobTraverser) parallelList(containerURL azblob.ContainerURL, container if t.recursive { for _, virtualDir := range lResp.Segment.BlobPrefixes { enqueueDir(virtualDir.Name) + if azcopyScanningLogger != nil { + azcopyScanningLogger.Log(pipeline.LogDebug, fmt.Sprintf("Enqueuing sub-directory %s for enumeration.", virtualDir.Name)) + } if t.includeDirectoryStubs { // try to get properties on the directory itself, since it's not listed in BlobItems diff --git a/cmd/zc_traverser_blob_account.go b/cmd/zc_traverser_blob_account.go index 10c27637a..6c946e01c 100644 --- a/cmd/zc_traverser_blob_account.go +++ b/cmd/zc_traverser_blob_account.go @@ -47,8 +47,8 @@ type blobAccountTraverser struct { cpkOptions common.CpkOptions } -func (t *blobAccountTraverser) IsDirectory(_ bool) bool { - return true // Returns true as account traversal is inherently folder-oriented and recursive. +func (t *blobAccountTraverser) IsDirectory(_ bool) (bool, error) { + return true, nil // Returns true as account traversal is inherently folder-oriented and recursive. } func (t *blobAccountTraverser) listContainers() ([]string, error) { diff --git a/cmd/zc_traverser_blob_versions.go b/cmd/zc_traverser_blob_versions.go index 95e90e403..c280a7964 100644 --- a/cmd/zc_traverser_blob_versions.go +++ b/cmd/zc_traverser_blob_versions.go @@ -40,17 +40,17 @@ type blobVersionsTraverser struct { cpkOptions common.CpkOptions } -func (t *blobVersionsTraverser) IsDirectory(isSource bool) bool { +func (t *blobVersionsTraverser) IsDirectory(isSource bool) (bool, error) { isDirDirect := copyHandlerUtil{}.urlIsContainerOrVirtualDirectory(t.rawURL) // Skip the single blob check if we're checking a destination. // This is an individual exception for blob because blob supports virtual directories and blobs sharing the same name. if isDirDirect || !isSource { - return isDirDirect + return isDirDirect, nil } // The base blob may not exist in some cases. - return false + return false, nil } func (t *blobVersionsTraverser) getBlobProperties(versionID string) (props *azblob.BlobGetPropertiesResponse, err error) { diff --git a/cmd/zc_traverser_blobfs.go b/cmd/zc_traverser_blobfs.go index 35e0d646d..7ed5405b6 100644 --- a/cmd/zc_traverser_blobfs.go +++ b/cmd/zc_traverser_blobfs.go @@ -54,8 +54,8 @@ func newBlobFSTraverser(rawURL *url.URL, p pipeline.Pipeline, ctx context.Contex return } -func (t *blobFSTraverser) IsDirectory(bool) bool { - return copyHandlerUtil{}.urlIsBFSFileSystemOrDirectory(t.ctx, t.rawURL, t.p) // This gets all the fanciness done for us. +func (t *blobFSTraverser) IsDirectory(bool) (bool, error) { + return copyHandlerUtil{}.urlIsBFSFileSystemOrDirectory(t.ctx, t.rawURL, t.p), nil // This gets all the fanciness done for us. } func (t *blobFSTraverser) getPropertiesIfSingleFile() (*azbfs.PathGetPropertiesResponse, bool, error) { diff --git a/cmd/zc_traverser_blobfs_account.go b/cmd/zc_traverser_blobfs_account.go index 3c5f7396c..b768b7b0d 100644 --- a/cmd/zc_traverser_blobfs_account.go +++ b/cmd/zc_traverser_blobfs_account.go @@ -44,8 +44,8 @@ type BlobFSAccountTraverser struct { incrementEnumerationCounter enumerationCounterFunc } -func (t *BlobFSAccountTraverser) IsDirectory(isSource bool) bool { - return true // Returns true as account traversal is inherently folder-oriented and recursive. +func (t *BlobFSAccountTraverser) IsDirectory(isSource bool) (bool, error) { + return true, nil // Returns true as account traversal is inherently folder-oriented and recursive. } func (t *BlobFSAccountTraverser) listContainers() ([]string, error) { diff --git a/cmd/zc_traverser_file.go b/cmd/zc_traverser_file.go index 8ec1a4269..06027f6eb 100644 --- a/cmd/zc_traverser_file.go +++ b/cmd/zc_traverser_file.go @@ -46,8 +46,8 @@ type fileTraverser struct { incrementEnumerationCounter enumerationCounterFunc } -func (t *fileTraverser) IsDirectory(bool) bool { - return copyHandlerUtil{}.urlIsAzureFileDirectory(t.ctx, t.rawURL, t.p) // This handles all of the fanciness for us. +func (t *fileTraverser) IsDirectory(bool) (bool, error) { + return copyHandlerUtil{}.urlIsAzureFileDirectory(t.ctx, t.rawURL, t.p), nil // This handles all of the fanciness for us. } func (t *fileTraverser) getPropertiesIfSingleFile() (*azfile.FileGetPropertiesResponse, bool) { diff --git a/cmd/zc_traverser_file_account.go b/cmd/zc_traverser_file_account.go index e4b50f2fa..edbcbb50a 100644 --- a/cmd/zc_traverser_file_account.go +++ b/cmd/zc_traverser_file_account.go @@ -42,8 +42,8 @@ type fileAccountTraverser struct { incrementEnumerationCounter enumerationCounterFunc } -func (t *fileAccountTraverser) IsDirectory(isSource bool) bool { - return true // Returns true as account traversal is inherently folder-oriented and recursive. +func (t *fileAccountTraverser) IsDirectory(isSource bool) (bool, error) { + return true, nil // Returns true as account traversal is inherently folder-oriented and recursive. } func (t *fileAccountTraverser) listContainers() ([]string, error) { diff --git a/cmd/zc_traverser_gcp.go b/cmd/zc_traverser_gcp.go index 25010b327..278df9327 100644 --- a/cmd/zc_traverser_gcp.go +++ b/cmd/zc_traverser_gcp.go @@ -24,20 +24,20 @@ type gcpTraverser struct { incrementEnumerationCounter enumerationCounterFunc } -func (t *gcpTraverser) IsDirectory(isSource bool) bool { +func (t *gcpTraverser) IsDirectory(isSource bool) (bool, error) { //Identify whether directory or not syntactically isDirDirect := !t.gcpURLParts.IsObjectSyntactically() && (t.gcpURLParts.IsDirectorySyntactically() || t.gcpURLParts.IsBucketSyntactically()) if !isSource { - return isDirDirect + return isDirDirect, nil } bkt := t.gcpClient.Bucket(t.gcpURLParts.BucketName) obj := bkt.Object(t.gcpURLParts.ObjectKey) //Directories do not have attributes and hence throw error _, err := obj.Attrs(t.ctx) if err == gcpUtils.ErrObjectNotExist { - return true + return true, nil } - return false + return false, nil } func (t *gcpTraverser) Traverse(preprocessor objectMorpher, processor objectProcessor, filters []ObjectFilter) error { diff --git a/cmd/zc_traverser_gcp_service.go b/cmd/zc_traverser_gcp_service.go index 3a076529f..e4bf2ff89 100644 --- a/cmd/zc_traverser_gcp_service.go +++ b/cmd/zc_traverser_gcp_service.go @@ -24,8 +24,8 @@ type gcpServiceTraverser struct { var projectID = "" -func (t *gcpServiceTraverser) IsDirectory(isSource bool) bool { - return true //Account traversals are inherently folder based +func (t *gcpServiceTraverser) IsDirectory(isSource bool) (bool, error) { + return true, nil //Account traversals are inherently folder based } func (t *gcpServiceTraverser) listContainers() ([]string, error) { diff --git a/cmd/zc_traverser_list.go b/cmd/zc_traverser_list.go index 791639248..ec578634e 100755 --- a/cmd/zc_traverser_list.go +++ b/cmd/zc_traverser_list.go @@ -40,8 +40,8 @@ type listTraverser struct { type childTraverserGenerator func(childPath string) (ResourceTraverser, error) // There is no impact to a list traverser returning false because a list traverser points directly to relative paths. -func (l *listTraverser) IsDirectory(bool) bool { - return false +func (l *listTraverser) IsDirectory(bool) (bool, error) { + return false, nil } // To kill the traverser, close() the channel under it. @@ -61,7 +61,8 @@ func (l *listTraverser) Traverse(preprocessor objectMorpher, processor objectPro } // listTraverser will only ever execute on the source - if !l.recursive && childTraverser.IsDirectory(true) { + isDir, _ := childTraverser.IsDirectory(true) + if !l.recursive && isDir { continue // skip over directories } @@ -109,7 +110,7 @@ func newListTraverser(parent common.ResourceString, parentType common.Location, // Construct a traverser that goes through the child traverser, err := InitResourceTraverser(source, parentType, ctx, credential, &followSymlinks, nil, recursive, getProperties, includeDirectoryStubs, common.EPermanentDeleteOption.None(), incrementEnumerationCounter, - nil, s2sPreserveBlobTags, logLevel, cpkOptions, nil /* errorChannel */) + nil, s2sPreserveBlobTags, common.ESyncHashType.None(), logLevel, cpkOptions, nil /* errorChannel */) if err != nil { return nil, err } diff --git a/cmd/zc_traverser_local.go b/cmd/zc_traverser_local.go index a5980a7b0..ef44ec0b7 100755 --- a/cmd/zc_traverser_local.go +++ b/cmd/zc_traverser_local.go @@ -22,18 +22,23 @@ package cmd import ( "context" + "crypto/md5" + "encoding/base64" "errors" "fmt" + "github.com/Azure/azure-pipeline-go/pipeline" + "github.com/Azure/azure-storage-azcopy/v10/common" + "github.com/Azure/azure-storage-azcopy/v10/common/parallel" + "hash" + "io" "io/ioutil" "os" "path" "path/filepath" "runtime" "strings" - - "github.com/Azure/azure-pipeline-go/pipeline" - "github.com/Azure/azure-storage-azcopy/v10/common" - "github.com/Azure/azure-storage-azcopy/v10/common/parallel" + "sync" + "sync/atomic" ) const MAX_SYMLINKS_TO_FOLLOW = 40 @@ -46,20 +51,24 @@ type localTraverser struct { // a generic function to notify that a new stored object has been enumerated incrementEnumerationCounter enumerationCounterFunc errorChannel chan ErrorFileInfo + + targetHashType common.SyncHashType + // receives fullPath entries and manages hashing of files lacking metadata. + hashTargetChannel chan string } -func (t *localTraverser) IsDirectory(bool) bool { +func (t *localTraverser) IsDirectory(bool) (bool, error) { if strings.HasSuffix(t.fullPath, "/") { - return true + return true, nil } props, err := common.OSStat(t.fullPath) if err != nil { - return false + return false, err } - return props.IsDir() + return props.IsDir(), nil } func (t *localTraverser) getInfoIfSingleFile() (os.FileInfo, bool, error) { @@ -358,17 +367,259 @@ func WalkWithSymlinks(appCtx context.Context, fullPath string, walkFunc filepath } }) } + return } +func (t *localTraverser) GetHashData(fullpath string) (common.SyncHashData, error) { + if t.targetHashType == common.ESyncHashType.None() { + return common.SyncHashData{}, nil // no-op + } + + fi, err := os.Stat(fullpath) // grab the stat so we can tell if the hash is valid + if err != nil { + return common.SyncHashData{}, err + } + + if fi.IsDir() { + return common.SyncHashData{}, nil // there is no hash data on directories + } + + // If a hash is considered unusable by some metric, attempt to set it up for generation, if the user allows it. + handleHashingError := func(err error) (common.SyncHashData, error) { + switch err { + case ErrorNoHashPresent, + ErrorHashNoLongerValid, + ErrorHashNotCompatible: + break + default: + return common.SyncHashData{}, err + } + + // defer hashing to the goroutine + t.hashTargetChannel <- fullpath + return common.SyncHashData{}, ErrorHashAsyncCalculation + } + + // attempt to grab existing hash data, and ensure it's validity. + data, err := common.TryGetHashData(fullpath) + if err != nil { + // Treat failure to read/parse/etc like a missing hash. + return handleHashingError(ErrorNoHashPresent) + } else { + if data.Mode != t.targetHashType { + return handleHashingError(ErrorHashNotCompatible) + } + + if !data.LMT.Equal(fi.ModTime()) { + return handleHashingError(ErrorHashNoLongerValid) + } + + return data, nil + } +} + +// prepareHashingThreads creates background threads to perform hashing on local files that are missing hashes. +// It returns a finalizer and a wrapped processor-- Use the wrapped processor in place of the original processor (even if synchashtype is none) +// and wrap the error getting returned in the finalizer function to kill the background threads. +func (t *localTraverser) prepareHashingThreads(preprocessor objectMorpher, processor objectProcessor, filters []ObjectFilter) (finalizer func(existingErr error) error, hashingProcessor func(obj StoredObject) error) { + if t.targetHashType == common.ESyncHashType.None() { // if no hashing is needed, do nothing. + return func(existingErr error) error { + return existingErr // nothing to overwrite with, no-op + }, processor + } + + // set up for threaded hashing + t.hashTargetChannel = make(chan string, 1_000) // "reasonable" backlog + // Use half of the available CPU cores for hashing to prevent throttling the STE too hard if hashing is still occurring when the first job part gets sent out + hashingThreadCount := runtime.NumCPU() / 2 + hashError := make(chan error, hashingThreadCount) + wg := &sync.WaitGroup{} + immediateStopHashing := int32(0) + + // create return wrapper to handle hashing errors + finalizer = func(existingErr error) error { + if existingErr != nil { + close(t.hashTargetChannel) // stop sending hashes + atomic.StoreInt32(&immediateStopHashing, 1) // force the end of hashing + wg.Wait() // Await the finalization of all hashing + + return existingErr // discard all hashing errors + } else { + close(t.hashTargetChannel) // stop sending hashes + + wg.Wait() // Await the finalization of all hashing + close(hashError) // close out the error channel + for err := range hashError { // inspect all hashing errors + if err != nil { + return err + } + } + + return nil + } + } + + // wrap the processor, preventing a data race + commitMutex := &sync.Mutex{} + mutexProcessor := func(proc objectProcessor) objectProcessor { + return func(object StoredObject) error { + commitMutex.Lock() // prevent committing two objects at once to prevent a data race + defer commitMutex.Unlock() + err := proc(object) + + return err + } + } + + // spin up hashing threads + for i := 0; i < hashingThreadCount; i++ { + wg.Add(1) + + go func() { + defer wg.Done() // mark the hashing thread as completed + + for toHash := range t.hashTargetChannel { + if atomic.LoadInt32(&immediateStopHashing) == 1 { // should we stop hashing? + return + } + + glcm.Info("Generating hash for " + toHash + "... This may take some time.") + + fi, err := os.Stat(toHash) // query LMT & if it's a directory + if err != nil { + err = fmt.Errorf("failed to get properties of file result %s: %s", toHash, err.Error()) + hashError <- err + return + } + + if fi.IsDir() { // this should never happen + panic(toHash) + } + + f, err := os.OpenFile(toHash, os.O_RDONLY, 0644) // perm is not used here since it's RO + if err != nil { + err = fmt.Errorf("failed to open file for reading result %s: %s", toHash, err.Error()) + hashError <- err + return + } + + var hasher hash.Hash // set up hasher + switch t.targetHashType { + case common.ESyncHashType.MD5(): + hasher = md5.New() + } + + // hash.Hash provides a writer type, allowing us to do a (small, 32MB to be precise) buffered write into the hasher and avoid memory concerns + _, err = io.Copy(hasher, f) + if err != nil { + err = fmt.Errorf("failed to read file into hasher result %s: %s", toHash, err.Error()) + hashError <- err + return + } + + sum := hasher.Sum([]byte{}) + + hashData := common.SyncHashData{ + Mode: t.targetHashType, + Data: base64.StdEncoding.EncodeToString(sum), + LMT: fi.ModTime(), + } + + // failing to store hash data doesn't mean we can't transfer (e.g. RO directory) + _ = common.PutHashData(toHash, hashData) + + // build the internal path + relPath := strings.TrimPrefix(strings.TrimPrefix(cleanLocalPath(toHash), cleanLocalPath(t.fullPath)), common.DeterminePathSeparator(t.fullPath)) + + err = processIfPassedFilters(filters, + newStoredObject( + func(storedObject *StoredObject) { + // apply the hash data + // storedObject.hashData = hashData + switch hashData.Mode { + case common.ESyncHashType.MD5(): + storedObject.md5 = sum + default: // no-op + } + + if preprocessor != nil { + // apply the original preprocessor + preprocessor(storedObject) + } + }, + fi.Name(), + strings.ReplaceAll(relPath, common.DeterminePathSeparator(t.fullPath), common.AZCOPY_PATH_SEPARATOR_STRING), + + common.EEntityType.File(), + fi.ModTime(), + fi.Size(), + noContentProps, // Local MD5s are computed in the STE, and other props don't apply to local files + noBlobProps, + noMetdata, + "", // Local has no such thing as containers + ), + mutexProcessor(processor), + ) + _, err = getProcessingError(err) + if err != nil { + hashError <- err + return + } + } + }() + } + + // wrap the processor, try to grab hashes, or defer processing to the goroutines + hashingProcessor = func(storedObject StoredObject) error { + if storedObject.entityType != common.EEntityType.File() { + return processor(storedObject) // no process folders + } + + if strings.HasSuffix(path.Base(storedObject.relativePath), common.AzCopyHashDataStream) { + return nil // do not process hash data files. + } + + fullPath := common.GenerateFullPath(t.fullPath, storedObject.relativePath) + hashData, err := t.GetHashData(fullPath) + + if err != nil { + switch err { + case ErrorNoHashPresent, ErrorHashNoLongerValid, ErrorHashNotCompatible: + glcm.Info("No usable hash is present for " + fullPath + ". Will transfer if not present at destination.") + return processor(storedObject) // There is no hash data, so this file will be overwritten (in theory). + case ErrorHashAsyncCalculation: + return nil // File will be processed later + default: + return err // Cannot get or create hash data for some reason + } + } + + // storedObject.hashData = hashData + switch hashData.Mode { + case common.ESyncHashType.MD5(): + md5data, _ := base64.StdEncoding.DecodeString(hashData.Data) // If decode fails, treat it like no hash is present. + storedObject.md5 = md5data + default: // do nothing, no hash is present. + } + + // delay the mutex until after potentially long-running operations + return mutexProcessor(processor)(storedObject) + } + + return finalizer, hashingProcessor +} + func (t *localTraverser) Traverse(preprocessor objectMorpher, processor objectProcessor, filters []ObjectFilter) (err error) { singleFileInfo, isSingleFile, err := t.getInfoIfSingleFile() - + // it fails here if file does not exist if err != nil { azcopyScanningLogger.Log(pipeline.LogError, fmt.Sprintf("Failed to scan path %s: %s", t.fullPath, err.Error())) return fmt.Errorf("failed to scan path %s due to %s", t.fullPath, err.Error()) } + finalizer, hashingProcessor := t.prepareHashingThreads(preprocessor, processor, filters) + // if the path is a single file, then pass it through the filters and send to processor if isSingleFile { if t.incrementEnumerationCounter != nil { @@ -388,10 +639,11 @@ func (t *localTraverser) Traverse(preprocessor objectMorpher, processor objectPr noMetdata, "", // Local has no such thing as containers ), - processor, + hashingProcessor, // hashingProcessor handles the mutex wrapper ) _, err = getProcessingError(err) - return err + + return finalizer(err) } else { if t.recursive { processFile := func(filePath string, fileInfo os.FileInfo, fileError error) error { @@ -439,11 +691,12 @@ func (t *localTraverser) Traverse(preprocessor objectMorpher, processor objectPr noMetdata, "", // Local has no such thing as containers ), - processor) + hashingProcessor, // hashingProcessor handles the mutex wrapper + ) } // note: Walk includes root, so no need here to separately create StoredObject for root (as we do for other folder-aware sources) - return WalkWithSymlinks(t.appCtx, t.fullPath, processFile, t.followSymlinks, t.errorChannel) + return finalizer(WalkWithSymlinks(t.appCtx, t.fullPath, processFile, t.followSymlinks, t.errorChannel)) } else { // if recursive is off, we only need to scan the files immediately under the fullPath // We don't transfer any directory properties here, not even the root. (Because the root's @@ -509,26 +762,28 @@ func (t *localTraverser) Traverse(preprocessor objectMorpher, processor objectPr noMetdata, "", // Local has no such thing as containers ), - processor) + hashingProcessor, // hashingProcessor handles the mutex wrapper + ) _, err = getProcessingError(err) if err != nil { - return err + return finalizer(err) } } } } - return + return finalizer(err) } -func newLocalTraverser(ctx context.Context, fullPath string, recursive bool, followSymlinks bool, incrementEnumerationCounter enumerationCounterFunc, errorChannel chan ErrorFileInfo) *localTraverser { +func newLocalTraverser(ctx context.Context, fullPath string, recursive bool, followSymlinks bool, syncHashType common.SyncHashType, incrementEnumerationCounter enumerationCounterFunc, errorChannel chan ErrorFileInfo) *localTraverser { traverser := localTraverser{ fullPath: cleanLocalPath(fullPath), recursive: recursive, followSymlinks: followSymlinks, appCtx: ctx, incrementEnumerationCounter: incrementEnumerationCounter, - errorChannel: errorChannel} + errorChannel: errorChannel, + targetHashType: syncHashType} return &traverser } diff --git a/cmd/zc_traverser_local_other.go b/cmd/zc_traverser_local_other.go index 61f470be4..ebce9c858 100644 --- a/cmd/zc_traverser_local_other.go +++ b/cmd/zc_traverser_local_other.go @@ -1,3 +1,4 @@ +//go:build !windows // +build !windows package cmd diff --git a/cmd/zc_traverser_local_windows.go b/cmd/zc_traverser_local_windows.go index 6a16504c9..2c555b92d 100644 --- a/cmd/zc_traverser_local_windows.go +++ b/cmd/zc_traverser_local_windows.go @@ -1,3 +1,4 @@ +//go:build windows // +build windows package cmd diff --git a/cmd/zc_traverser_s3.go b/cmd/zc_traverser_s3.go index f30dd4bff..825ad4fcb 100644 --- a/cmd/zc_traverser_s3.go +++ b/cmd/zc_traverser_s3.go @@ -45,22 +45,22 @@ type s3Traverser struct { incrementEnumerationCounter enumerationCounterFunc } -func (t *s3Traverser) IsDirectory(isSource bool) bool { +func (t *s3Traverser) IsDirectory(isSource bool) (bool, error) { // Do a basic syntax check isDirDirect := !t.s3URLParts.IsObjectSyntactically() && (t.s3URLParts.IsDirectorySyntactically() || t.s3URLParts.IsBucketSyntactically()) // S3 can convert directories and objects sharing names as well. if !isSource { - return isDirDirect + return isDirDirect, nil } _, err := t.s3Client.StatObject(t.s3URLParts.BucketName, t.s3URLParts.ObjectKey, minio.StatObjectOptions{}) if err != nil { - return true + return true, nil } - return false + return false, nil } func (t *s3Traverser) Traverse(preprocessor objectMorpher, processor objectProcessor, filters []ObjectFilter) (err error) { diff --git a/cmd/zc_traverser_s3_service.go b/cmd/zc_traverser_s3_service.go index 1362292cb..413a09baf 100644 --- a/cmd/zc_traverser_s3_service.go +++ b/cmd/zc_traverser_s3_service.go @@ -48,8 +48,8 @@ type s3ServiceTraverser struct { incrementEnumerationCounter enumerationCounterFunc } -func (t *s3ServiceTraverser) IsDirectory(isSource bool) bool { - return true // Returns true as account traversal is inherently folder-oriented and recursive. +func (t *s3ServiceTraverser) IsDirectory(isSource bool) (bool, error) { + return true, nil // Returns true as account traversal is inherently folder-oriented and recursive. } func (t *s3ServiceTraverser) listContainers() ([]string, error) { diff --git a/cmd/zt_copy_file_file_test.go b/cmd/zt_copy_file_file_test.go index fcd29ff74..94b2db042 100644 --- a/cmd/zt_copy_file_file_test.go +++ b/cmd/zt_copy_file_file_test.go @@ -153,7 +153,7 @@ func (s *cmdIntegrationSuite) TestFileCopyS2SWithIncludeFlag(c *chk.C) { raw.include = includeString raw.recursive = true - // verify that only the files specified by the include flag are copyed + // verify that only the files specified by the include flag are copied runCopyAndVerify(c, raw, func(err error) { c.Assert(err, chk.IsNil) validateS2STransfersAreScheduled(c, "/", "/", filesToInclude, mockedRPC) @@ -232,7 +232,7 @@ func (s *cmdIntegrationSuite) TestFileCopyS2SWithIncludeAndExcludeFlag(c *chk.C) raw.exclude = excludeString raw.recursive = true - // verify that only the files specified by the include flag are copyed + // verify that only the files specified by the include flag are copied runCopyAndVerify(c, raw, func(err error) { c.Assert(err, chk.IsNil) validateS2STransfersAreScheduled(c, "/", "/", filesToInclude, mockedRPC) diff --git a/cmd/zt_generic_filter_test.go b/cmd/zt_generic_filter_test.go index 7c56d91c3..7d8d0a36c 100644 --- a/cmd/zt_generic_filter_test.go +++ b/cmd/zt_generic_filter_test.go @@ -175,7 +175,7 @@ func (_ *genericFilterSuite) findAmbiguousTime() (string, time.Time, time.Time, localString := u.Local().Format(localTimeFormat) hourLaterLocalString := u.Add(time.Hour).Local().Format(localTimeFormat) if localString == hourLaterLocalString { - // return the string, and the two UTC times that map to that local time (with their fractional seconds trucated away) + // return the string, and the two UTC times that map to that local time (with their fractional seconds truncated away) return localString, u.Truncate(time.Second), u.Add(time.Hour).Truncate(time.Second), nil } } diff --git a/cmd/zt_generic_service_traverser_test.go b/cmd/zt_generic_service_traverser_test.go index a593db99f..33450e609 100644 --- a/cmd/zt_generic_service_traverser_test.go +++ b/cmd/zt_generic_service_traverser_test.go @@ -58,7 +58,7 @@ func (s *genericTraverserSuite) TestBlobFSServiceTraverserWithManyObjects(c *chk scenarioHelper{}.generateLocalFilesFromList(c, dstDirName, objectList) // Create a local traversal - localTraverser := newLocalTraverser(context.TODO(), dstDirName, true, true, func(common.EntityType) {}, nil) + localTraverser := newLocalTraverser(context.TODO(), dstDirName, true, true, common.ESyncHashType.None(), func(common.EntityType) {}, nil) // Invoke the traversal with an indexer so the results are indexed for easy validation localIndexer := newObjectIndexer() @@ -174,7 +174,7 @@ func (s *genericTraverserSuite) TestServiceTraverserWithManyObjects(c *chk.C) { scenarioHelper{}.generateLocalFilesFromList(c, dstDirName, objectList) // Create a local traversal - localTraverser := newLocalTraverser(context.TODO(), dstDirName, true, true, func(common.EntityType) {}, nil) + localTraverser := newLocalTraverser(context.TODO(), dstDirName, true, true, common.ESyncHashType.None(), func(common.EntityType) {}, nil) // Invoke the traversal with an indexer so the results are indexed for easy validation localIndexer := newObjectIndexer() @@ -358,7 +358,7 @@ func (s *genericTraverserSuite) TestServiceTraverserWithWildcards(c *chk.C) { scenarioHelper{}.generateLocalFilesFromList(c, dstDirName, objectList) // Create a local traversal - localTraverser := newLocalTraverser(context.TODO(), dstDirName, true, true, func(common.EntityType) {}, nil) + localTraverser := newLocalTraverser(context.TODO(), dstDirName, true, true, common.ESyncHashType.None(), func(common.EntityType) {}, nil) // Invoke the traversal with an indexer so the results are indexed for easy validation localIndexer := newObjectIndexer() diff --git a/cmd/zt_generic_traverser_test.go b/cmd/zt_generic_traverser_test.go index dc7970f66..0c9602dfb 100644 --- a/cmd/zt_generic_traverser_test.go +++ b/cmd/zt_generic_traverser_test.go @@ -484,7 +484,7 @@ func (s *genericTraverserSuite) TestTraverserWithSingleObject(c *chk.C) { scenarioHelper{}.generateLocalFilesFromList(c, dstDirName, blobList) // construct a local traverser - localTraverser := newLocalTraverser(context.TODO(), filepath.Join(dstDirName, dstFileName), false, false, func(common.EntityType) {}, nil) + localTraverser := newLocalTraverser(context.TODO(), filepath.Join(dstDirName, dstFileName), false, false, common.ESyncHashType.None(), func(common.EntityType) {}, nil) // invoke the local traversal with a dummy processor localDummyProcessor := dummyProcessor{} @@ -644,7 +644,7 @@ func (s *genericTraverserSuite) TestTraverserContainerAndLocalDirectory(c *chk.C // test two scenarios, either recursive or not for _, isRecursiveOn := range []bool{true, false} { // construct a local traverser - localTraverser := newLocalTraverser(context.TODO(), dstDirName, isRecursiveOn, false, func(common.EntityType) {}, nil) + localTraverser := newLocalTraverser(context.TODO(), dstDirName, isRecursiveOn, false, common.ESyncHashType.None(), func(common.EntityType) {}, nil) // invoke the local traversal with an indexer // so that the results are indexed for easy validation @@ -805,7 +805,7 @@ func (s *genericTraverserSuite) TestTraverserWithVirtualAndLocalDirectory(c *chk // test two scenarios, either recursive or not for _, isRecursiveOn := range []bool{true, false} { // construct a local traverser - localTraverser := newLocalTraverser(context.TODO(), filepath.Join(dstDirName, virDirName), isRecursiveOn, false, func(common.EntityType) {}, nil) + localTraverser := newLocalTraverser(context.TODO(), filepath.Join(dstDirName, virDirName), isRecursiveOn, false, common.ESyncHashType.None(), func(common.EntityType) {}, nil) // invoke the local traversal with an indexer // so that the results are indexed for easy validation @@ -897,7 +897,7 @@ func (s *genericTraverserSuite) TestTraverserWithVirtualAndLocalDirectory(c *chk c.Assert(correspondingLocalFile.name, chk.Equals, storedObject.name) // Say, here's a good question, why do we have this last check? // None of the other tests have it. - c.Assert(correspondingLocalFile.isMoreRecentThan(storedObject), chk.Equals, true) + c.Assert(correspondingLocalFile.isMoreRecentThan(storedObject, false), chk.Equals, true) if !isRecursiveOn { c.Assert(strings.Contains(storedObject.relativePath, common.AZCOPY_PATH_SEPARATOR_STRING), chk.Equals, false) diff --git a/cmd/zt_overwrite_posix_properties_test.go b/cmd/zt_overwrite_posix_properties_test.go new file mode 100644 index 000000000..e97b5c55f --- /dev/null +++ b/cmd/zt_overwrite_posix_properties_test.go @@ -0,0 +1,99 @@ +// Copyright © 2017 Microsoft +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in +// all copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +// THE SOFTWARE. + +package cmd + +import ( + "context" + "os" + "path/filepath" + "runtime" + "strconv" + "time" + + "github.com/Azure/azure-storage-azcopy/v10/common" + "github.com/Azure/azure-storage-blob-go/azblob" + chk "gopkg.in/check.v1" +) + +func (s *cmdIntegrationSuite) TestOverwritePosixProperties(c *chk.C) { + if runtime.GOOS != "linux" { + c.Skip("This test will run only on linux") + } + + bsu := getBSU() + containerURL, containerName := createNewContainer(c, bsu) + defer deleteContainer(c, containerURL) + + files := []string{ + "filea", + "fileb", + "filec", + } + + dirPath := scenarioHelper{}.generateLocalDirectory(c) + defer os.RemoveAll(dirPath) + scenarioHelper{}.generateLocalFilesFromList(c, dirPath, files) + + mockedRPC := interceptor{} + Rpc = mockedRPC.intercept + mockedRPC.init() + + rawBlobURLWithSAS := scenarioHelper{}.getRawContainerURLWithSAS(c, containerName) + raw := getDefaultRawCopyInput(dirPath, rawBlobURLWithSAS.String()) + raw.recursive = true + + runCopyAndVerify(c, raw, func(err error) { + c.Assert(err, chk.IsNil) + + c.Assert(len(mockedRPC.transfers), chk.Equals, 3) + // trim / and /folder/ off + validateDownloadTransfersAreScheduled(c, "/", "/"+filepath.Base(dirPath)+"/", files[:], mockedRPC) + }) + + time.Sleep(10 * time.Second) + + newTimeStamp := time.Now() + for _, file := range files { + os.Chtimes(filepath.Join(dirPath, file), newTimeStamp, newTimeStamp) + } + + //===================================== + mockedRPC.reset() + raw.forceWrite = "posixproperties" + + runCopyAndVerify(c, raw, func(err error) { + c.Assert(err, chk.IsNil) + + c.Assert(len(mockedRPC.transfers), chk.Equals, 3) + // trim / and /folder/ off + validateDownloadTransfersAreScheduled(c, "/", "/"+filepath.Base(dirPath)+"/", files[:], mockedRPC) + }) + + listBlob, err := containerURL.ListBlobsFlatSegment(context.TODO(), azblob.Marker{}, + azblob.ListBlobsSegmentOptions{Details: azblob.BlobListingDetails{Metadata: true, Tags: true}, Prefix: filepath.Base(dirPath)}) + + c.Assert(err, chk.Equals, nil) + + for _, blob := range listBlob.Segment.BlobItems { + c.Assert(blob.Metadata[common.POSIXCTimeMeta], chk.Equals, strconv.FormatInt(newTimeStamp.UnixNano(), 10)) + c.Assert(blob.Metadata[common.POSIXATimeMeta], chk.Equals, strconv.FormatInt(newTimeStamp.UnixNano(), 10)) + } +} diff --git a/cmd/zt_scenario_helpers_for_test.go b/cmd/zt_scenario_helpers_for_test.go index 3e66a9691..5fe024534 100644 --- a/cmd/zt_scenario_helpers_for_test.go +++ b/cmd/zt_scenario_helpers_for_test.go @@ -466,7 +466,7 @@ func (scenarioHelper) generateCommonRemoteScenarioForS3(c *chk.C, client *minio. objectName5 := createNewObject(c, client, bucketName, prefix+specialNames[i]) // Note: common.AZCOPY_PATH_SEPARATOR_STRING is added before bucket or objectName, as in the change minimize JobPartPlan file size, - // transfer.Source & transfer.Destination(after trimed the SourceRoot and DestinationRoot) are with AZCOPY_PATH_SEPARATOR_STRING suffix, + // transfer.Source & transfer.Destination(after trimming the SourceRoot and DestinationRoot) are with AZCOPY_PATH_SEPARATOR_STRING suffix, // when user provided source & destination are without / suffix, which is the case for scenarioHelper generated URL. bucketPath := "" @@ -496,7 +496,7 @@ func (scenarioHelper) generateCommonRemoteScenarioForGCP(c *chk.C, client *gcpUt objectName5 := createNewGCPObject(c, client, bucketName, prefix+specialNames[i]) // Note: common.AZCOPY_PATH_SEPARATOR_STRING is added before bucket or objectName, as in the change minimize JobPartPlan file size, - // transfer.Source & transfer.Destination(after trimed the SourceRoot and DestinationRoot) are with AZCOPY_PATH_SEPARATOR_STRING suffix, + // transfer.Source & transfer.Destination(after trimming the SourceRoot and DestinationRoot) are with AZCOPY_PATH_SEPARATOR_STRING suffix, // when user provided source & destination are without / suffix, which is the case for scenarioHelper generated URL. bucketPath := "" @@ -870,6 +870,7 @@ func getDefaultSyncRawInput(src, dst string) rawSyncCmdArgs { recursive: true, deleteDestination: deleteDestination.String(), md5ValidationOption: common.DefaultHashValidationOption.String(), + compareHash: common.ESyncHashType.None().String(), } } diff --git a/cmd/zt_sync_comparator_test.go b/cmd/zt_sync_comparator_test.go index 43c1c7b90..ecb2b854a 100644 --- a/cmd/zt_sync_comparator_test.go +++ b/cmd/zt_sync_comparator_test.go @@ -21,6 +21,7 @@ package cmd import ( + "github.com/Azure/azure-storage-azcopy/v10/common" chk "gopkg.in/check.v1" "time" ) @@ -36,7 +37,7 @@ func (s *syncComparatorSuite) TestSyncSourceComparator(c *chk.C) { // set up the indexer as well as the source comparator indexer := newObjectIndexer() - sourceComparator := newSyncSourceComparator(indexer, dummyCopyScheduler.process, false) + sourceComparator := newSyncSourceComparator(indexer, dummyCopyScheduler.process, common.ESyncHashType.None(), false, false) // create a sample destination object sampleDestinationObject := StoredObject{name: "test", relativePath: "/usr/test", lastModifiedTime: time.Now(), md5: destMD5} @@ -88,7 +89,7 @@ func (s *syncComparatorSuite) TestSyncSrcCompDisableComparator(c *chk.C) { // set up the indexer as well as the source comparator indexer := newObjectIndexer() - sourceComparator := newSyncSourceComparator(indexer, dummyCopyScheduler.process, true) + sourceComparator := newSyncSourceComparator(indexer, dummyCopyScheduler.process, common.ESyncHashType.None(), false, true) // test the comparator in case a given source object is not present at the destination // meaning no entry in the index, so the comparator should pass the given object to schedule a transfer @@ -137,7 +138,7 @@ func (s *syncComparatorSuite) TestSyncDestinationComparator(c *chk.C) { // set up the indexer as well as the destination comparator indexer := newObjectIndexer() - destinationComparator := newSyncDestinationComparator(indexer, dummyCopyScheduler.process, dummyCleaner.process, false) + destinationComparator := newSyncDestinationComparator(indexer, dummyCopyScheduler.process, dummyCleaner.process, common.ESyncHashType.None(), false, false) // create a sample source object sampleSourceObject := StoredObject{name: "test", relativePath: "/usr/test", lastModifiedTime: time.Now(), md5: srcMD5} @@ -194,7 +195,7 @@ func (s *syncComparatorSuite) TestSyncDestCompDisableComparison(c *chk.C) { // set up the indexer as well as the destination comparator indexer := newObjectIndexer() - destinationComparator := newSyncDestinationComparator(indexer, dummyCopyScheduler.process, dummyCleaner.process, true) + destinationComparator := newSyncDestinationComparator(indexer, dummyCopyScheduler.process, dummyCleaner.process, common.ESyncHashType.None(), false, true) // create a sample source object currTime := time.Now() @@ -203,7 +204,7 @@ func (s *syncComparatorSuite) TestSyncDestCompDisableComparison(c *chk.C) { {name: "test2", relativePath: "/usr/test2", lastModifiedTime: currTime, md5: srcMD5}, } - //onlyAtSrc := StoredObject{name: "only_at_src", relativePath: "/usr/only_at_src", lastModifiedTime: currTime, md5: destMD5} + // onlyAtSrc := StoredObject{name: "only_at_src", relativePath: "/usr/only_at_src", lastModifiedTime: currTime, md5: destMD5} destinationStoredObjects := []StoredObject{ // file whose last modified time is greater than that of source diff --git a/cmd/zt_test.go b/cmd/zt_test.go index 1c317b49d..76eba7e0a 100644 --- a/cmd/zt_test.go +++ b/cmd/zt_test.go @@ -270,6 +270,7 @@ func getAccountAndKey() (string, string) { return name, key } +// get blob account service URL func getBSU() azblob.ServiceURL { accountName, accountKey := getAccountAndKey() u, _ := url.Parse(fmt.Sprintf("https://%s.blob.core.windows.net/", accountName)) @@ -357,6 +358,7 @@ func createNewBlockBlob(c *chk.C, container azblob.ContainerURL, prefix string) return } +// create metadata indicating that this is a dir func createNewDirectoryStub(c *chk.C, container azblob.ContainerURL, dirPath string) { dir := container.NewBlockBlobURL(dirPath) diff --git a/cmd/zt_traverser_blob_test.go b/cmd/zt_traverser_blob_test.go new file mode 100644 index 000000000..cd2eb4879 --- /dev/null +++ b/cmd/zt_traverser_blob_test.go @@ -0,0 +1,121 @@ +// Copyright © 2017 Microsoft +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in +// all copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +// THE SOFTWARE. + +package cmd + +import ( + "context" + "github.com/Azure/azure-storage-azcopy/v10/common" + "github.com/Azure/azure-storage-azcopy/v10/ste" + "github.com/Azure/azure-storage-blob-go/azblob" + chk "gopkg.in/check.v1" +) + +type traverserBlobSuite struct{} + +var _ = chk.Suite(&traverserBlobSuite{}) + +func (s *traverserBlobSuite) TestIsSourceDirWithStub(c *chk.C) { + bsu := getBSU() + + // Generate source container and blobs + containerURL, containerName := createNewContainer(c, bsu) + defer deleteContainer(c, containerURL) + c.Assert(containerURL, chk.NotNil) + + dirName := "source_dir" + createNewDirectoryStub(c, containerURL, dirName) + // set up to create blob traverser + ctx := context.WithValue(context.TODO(), ste.ServiceAPIVersionOverride, ste.DefaultServiceApiVersion) + p := azblob.NewPipeline(azblob.NewAnonymousCredential(), azblob.PipelineOptions{}) + + // List + rawBlobURLWithSAS := scenarioHelper{}.getRawBlobURLWithSAS(c, containerName, dirName) + blobTraverser := newBlobTraverser(&rawBlobURLWithSAS, p, ctx, true, true, func(common.EntityType) {}, false, common.CpkOptions{}, false, false, false) + + isDir, err := blobTraverser.IsDirectory(true) + c.Assert(isDir, chk.Equals, true) + c.Assert(err, chk.Equals, nil) +} + +func (s *traverserBlobSuite) TestIsSourceDirWithNoStub(c *chk.C) { + bsu := getBSU() + + // Generate source container and blobs + containerURL, containerName := createNewContainer(c, bsu) + defer deleteContainer(c, containerURL) + c.Assert(containerURL, chk.NotNil) + + dirName := "source_dir/" + ctx := context.WithValue(context.TODO(), ste.ServiceAPIVersionOverride, ste.DefaultServiceApiVersion) + p := azblob.NewPipeline(azblob.NewAnonymousCredential(), azblob.PipelineOptions{}) + + // List + rawBlobURLWithSAS := scenarioHelper{}.getRawBlobURLWithSAS(c, containerName, dirName) + blobTraverser := newBlobTraverser(&rawBlobURLWithSAS, p, ctx, true, true, func(common.EntityType) {}, false, common.CpkOptions{}, false, false, false) + + isDir, err := blobTraverser.IsDirectory(true) + c.Assert(isDir, chk.Equals, true) + c.Assert(err, chk.Equals, nil) +} + +func (s *traverserBlobSuite) TestIsSourceFileExists(c *chk.C) { + bsu := getBSU() + + // Generate source container and blobs + containerURL, containerName := createNewContainer(c, bsu) + defer deleteContainer(c, containerURL) + c.Assert(containerURL, chk.NotNil) + + fileName := "source_file" + _, fileName = createNewBlockBlob(c, containerURL, fileName) + + ctx := context.WithValue(context.TODO(), ste.ServiceAPIVersionOverride, ste.DefaultServiceApiVersion) + p := azblob.NewPipeline(azblob.NewAnonymousCredential(), azblob.PipelineOptions{}) + + // List + rawBlobURLWithSAS := scenarioHelper{}.getRawBlobURLWithSAS(c, containerName, fileName) + blobTraverser := newBlobTraverser(&rawBlobURLWithSAS, p, ctx, true, true, func(common.EntityType) {}, false, common.CpkOptions{}, false, false, false) + + isDir, err := blobTraverser.IsDirectory(true) + c.Assert(isDir, chk.Equals, false) + c.Assert(err, chk.IsNil) +} + +func (s *traverserBlobSuite) TestIsSourceFileDoesNotExist(c *chk.C) { + bsu := getBSU() + + // Generate source container and blobs + containerURL, containerName := createNewContainer(c, bsu) + defer deleteContainer(c, containerURL) + c.Assert(containerURL, chk.NotNil) + + fileName := "file_does_not_exist" + ctx := context.WithValue(context.TODO(), ste.ServiceAPIVersionOverride, ste.DefaultServiceApiVersion) + p := azblob.NewPipeline(azblob.NewAnonymousCredential(), azblob.PipelineOptions{}) + + // List + rawBlobURLWithSAS := scenarioHelper{}.getRawBlobURLWithSAS(c, containerName, fileName) + blobTraverser := newBlobTraverser(&rawBlobURLWithSAS, p, ctx, true, true, func(common.EntityType) {}, false, common.CpkOptions{}, false, false, false) + + isDir, err := blobTraverser.IsDirectory(true) + c.Assert(isDir, chk.Equals, false) + c.Assert(err.Error(), chk.Equals, common.FILE_NOT_FOUND) +} diff --git a/common/ProxyLookupCache.go b/common/ProxyLookupCache.go index ee24504be..258685bfd 100644 --- a/common/ProxyLookupCache.go +++ b/common/ProxyLookupCache.go @@ -23,7 +23,6 @@ package common import ( "errors" "fmt" - "github.com/mattn/go-ieproxy" "net/http" "net/url" "strings" @@ -41,7 +40,7 @@ func init() { refreshInterval: time.Minute * 5, // this is plenty, given the usual retry policies in AzCopy span a much longer time period in the total retry sequence lookupTimeout: time.Minute, // equals the documented max allowable execution time for WinHttpGetProxyForUrl lookupLock: &sync.Mutex{}, - lookupMethod: ieproxy.GetProxyFunc(), + lookupMethod: GetProxyFunc(), } ev := GetLifecycleMgr().GetEnvironmentVariable(EEnvironmentVariable.CacheProxyLookup()) @@ -74,11 +73,14 @@ type proxyLookupResult struct { // permanent GR per cache entry. That assumption makes sense in AzCopy, but is not correct in general (e.g. // if used in something with usage patterns like a web browser). // TODO: should we one day find a better solution, so it can be contributed to mattn.go-ieproxy instead of done here? -// or maybe just the code in getProxyNoCache could be contributed there? +// +// or maybe just the code in getProxyNoCache could be contributed there? +// // TODO: consider that one consequence of the current lack of integration with mattn.go-ieproxy is that pipelines created by -// pipeline.NewPipeline don't use proxyLookupCache at all. However, that only affects the enumeration portion of our code, -// for Azure Files and ADLS Gen2. The issues that proxyLookupCache solves have not been reported there. The issues matter in -// the STE, where request counts are much higher (and there, we always do use this cache, because we make our own pipelines). +// +// pipeline.NewPipeline don't use proxyLookupCache at all. However, that only affects the enumeration portion of our code, +// for Azure Files and ADLS Gen2. The issues that proxyLookupCache solves have not been reported there. The issues matter in +// the STE, where request counts are much higher (and there, we always do use this cache, because we make our own pipelines). type proxyLookupCache struct { m *sync.Map // is optimized for caches that only grow (as is the case here) refreshInterval time.Duration diff --git a/common/azError.go b/common/azError.go index 141157f87..39f9fc01b 100644 --- a/common/azError.go +++ b/common/azError.go @@ -27,7 +27,7 @@ type AzError struct { additonalInfo string } -// NewAzError composes an AzError with given code and messgae +// NewAzError composes an AzError with given code and message func NewAzError(base AzError, additionalInfo string) AzError { base.additonalInfo = additionalInfo return base diff --git a/common/chunkStatusLogger.go b/common/chunkStatusLogger.go index 573bfebf1..7b5f43ba1 100644 --- a/common/chunkStatusLogger.go +++ b/common/chunkStatusLogger.go @@ -276,7 +276,7 @@ func NewChunkStatusLogger(jobID JobID, cpuMon CPUMonitor, logFileFolder string, } func numWaitReasons() int32 { - return EWaitReason.Cancelled().index + 1 // assume that maitainers follow the comment above to always keep Cancelled as numerically the greatest one + return EWaitReason.Cancelled().index + 1 // assume that maintainers follow the comment above to always keep Cancelled as numerically the greatest one } type chunkStatusCount struct { @@ -538,7 +538,7 @@ DateTime? ParseStart(string s) } } -// convert to real datetime (default unparseable ones to a fixed value, simply to avoid needing to deal with nulls below, and because all valid records should be parseable. Only exception would be something partially written a time of a crash) +// convert to real datetime (default unparsable ones to a fixed value, simply to avoid needing to deal with nulls below, and because all valid records should be parseable. Only exception would be something partially written a time of a crash) var parsed = data.Select(d => new { d.Name, d.Offset, d.State, StateStartTime = ParseStart(d.StateStartTime) ?? DateTime.MaxValue}).ToList(); var grouped = parsed.GroupBy(c => new {c.Name, c.Offset}); diff --git a/common/credCache_darwin.go b/common/credCache_darwin.go index 7b71c1bd6..7e3cdb0a0 100644 --- a/common/credCache_darwin.go +++ b/common/credCache_darwin.go @@ -60,7 +60,7 @@ func NewCredCache(options CredCacheOptions) *CredCache { } } -// keychain is used for intenal integration as well. +// keychain is used for internal integration as well. var NewCredCacheInternalIntegration = NewCredCache // HasCachedToken returns if there is cached token for current executing user. diff --git a/common/credCache_linux.go b/common/credCache_linux.go index c04cfbef2..6ce9bf38f 100644 --- a/common/credCache_linux.go +++ b/common/credCache_linux.go @@ -30,7 +30,7 @@ import ( ) // CredCache manages credential caches. -// Use keyring in Linux OS. Session keyring is choosed, +// Use keyring in Linux OS. Session keyring is chosen, // the session hooks key should be created since user first login (i.e. by pam). // So the session is inherited by processes created from login session. // When user logout, the session keyring is recycled. diff --git a/common/credentialFactory.go b/common/credentialFactory.go index 47a16ea66..7ce6905b7 100644 --- a/common/credentialFactory.go +++ b/common/credentialFactory.go @@ -105,18 +105,22 @@ func CreateBlobCredential(ctx context.Context, credInfo CredentialInfo, options } // Create TokenCredential with refresher. - return azblob.NewTokenCredential( - credInfo.OAuthTokenInfo.AccessToken, - func(credential azblob.TokenCredential) time.Duration { - return refreshBlobToken(ctx, credInfo.OAuthTokenInfo, credential, options) - }) + if credInfo.SourceBlobToken != nil { + return credInfo.SourceBlobToken + } else { + return azblob.NewTokenCredential( + credInfo.OAuthTokenInfo.AccessToken, + func(credential azblob.TokenCredential) time.Duration { + return refreshBlobToken(ctx, credInfo.OAuthTokenInfo, credential, options) + }) + } } return credential } // refreshPolicyHalfOfExpiryWithin is used for calculating next refresh time, -// it checkes how long it will be before the token get expired, and use half of the value as +// it checks how long it will be before the token get expired, and use half of the value as // duration to wait. func refreshPolicyHalfOfExpiryWithin(token *adal.Token, options CredentialOpOptions) time.Duration { if token == nil { diff --git a/common/fe-ste-models.go b/common/fe-ste-models.go index a0a32f4e0..f4555979b 100644 --- a/common/fe-ste-models.go +++ b/common/fe-ste-models.go @@ -23,6 +23,7 @@ package common import ( "bytes" "encoding/json" + "errors" "fmt" "math" "os" @@ -55,6 +56,8 @@ const ( // Since we haven't updated the Go SDKs to handle CPK just yet, we need to detect CPK related errors // and inform the user that we don't support CPK yet. CPK_ERROR_SERVICE_CODE = "BlobUsesCustomerSpecifiedEncryption" + BLOB_NOT_FOUND = "BlobNotFound" + FILE_NOT_FOUND = "The specified file was not found." ) //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// @@ -110,7 +113,7 @@ type PartNumber uint32 type Version uint32 type Status uint32 -//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// +// ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// var EDeleteSnapshotsOption = DeleteSnapshotsOption(0) type DeleteSnapshotsOption uint8 @@ -145,7 +148,7 @@ func (d DeleteSnapshotsOption) ToDeleteSnapshotsOptionType() azblob.DeleteSnapsh return azblob.DeleteSnapshotsOptionType(strings.ToLower(d.String())) } -//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// +// ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// var EPermanentDeleteOption = PermanentDeleteOption(3) // Default to "None" type PermanentDeleteOption uint8 @@ -250,6 +253,7 @@ func (OverwriteOption) True() OverwriteOption { return OverwriteOption( func (OverwriteOption) False() OverwriteOption { return OverwriteOption(1) } func (OverwriteOption) Prompt() OverwriteOption { return OverwriteOption(2) } func (OverwriteOption) IfSourceNewer() OverwriteOption { return OverwriteOption(3) } +func (OverwriteOption) PosixProperties() OverwriteOption {return OverwriteOption(4)} func (o *OverwriteOption) Parse(s string) error { val, err := enum.Parse(reflect.TypeOf(o), s, true) @@ -610,7 +614,7 @@ func (ft *FromTo) IsPropertyOnlyTransfer() bool { var BenchmarkLmt = time.Date(1900, 1, 1, 0, 0, 0, 0, time.UTC) -//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// +// ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// // Enumerates the values for blob type. type BlobType uint8 @@ -670,13 +674,20 @@ var ETransferStatus = TransferStatus(0) type TransferStatus int32 // Must be 32-bit for atomic operations; negative #s represent a specific failure code +func (t TransferStatus) StatusLocked() bool { // Is an overwrite necessary to change tx status? + // Any kind of failure, or success is considered "locked in". + return t <= ETransferStatus.Failed() || t == ETransferStatus.Success() +} + // Transfer is ready to transfer and not started transferring yet func (TransferStatus) NotStarted() TransferStatus { return TransferStatus(0) } // TODO confirm whether this is actually needed -// Outdated: -// Transfer started & at least 1 chunk has successfully been transferred. -// Used to resume a transfer that started to avoid transferring all chunks thereby improving performance +// +// Outdated: +// Transfer started & at least 1 chunk has successfully been transferred. +// Used to resume a transfer that started to avoid transferring all chunks thereby improving performance +// // Update(Jul 2020): This represents the state of transfer as soon as the file is scheduled. func (TransferStatus) Started() TransferStatus { return TransferStatus(1) } @@ -972,7 +983,7 @@ func (i *InvalidMetadataHandleOption) UnmarshalJSON(b []byte) error { return i.Parse(s) } -//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// +// ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// const ( DefaultBlockBlobBlockSize = 8 * 1024 * 1024 MaxBlockBlobBlockSize = 4000 * 1024 * 1024 @@ -1075,13 +1086,53 @@ func UnMarshalToCommonMetadata(metadataString string) (Metadata, error) { func StringToMetadata(metadataString string) (Metadata, error) { metadataMap := Metadata{} if len(metadataString) > 0 { - for _, keyAndValue := range strings.Split(metadataString, ";") { // key/value pairs are separated by ';' - kv := strings.Split(keyAndValue, "=") // key/value are separated by '=' - // what if '=' not present? - if len(kv) != 2 { - return metadataMap, fmt.Errorf("invalid metadata string passed") + cKey := "" + cVal := "" + keySet := false + ignoreRules := false + + addchar := func(c rune) { + if !keySet { + cKey += string(c) + } else { + cVal += string(c) } - metadataMap[kv[0]] = kv[1] + } + for _, c := range metadataString { + if ignoreRules { + addchar(c) + ignoreRules = false + } else { + switch c { + case '=': + if keySet { + addchar(c) + } else { + keySet = true + } + + case ';': + if !keySet { + return Metadata{}, errors.New("metadata names must conform to C# naming rules (https://learn.microsoft.com/en-us/rest/api/storageservices/naming-and-referencing-containers--blobs--and-metadata#metadata-names)") + } + + metadataMap[cKey] = cVal + cKey = "" + cVal = "" + keySet = false + ignoreRules = false + + case '\\': + ignoreRules = true // ignore the rules on the next character + + default: + addchar(c) + } + } + } + + if cKey != "" { + metadataMap[cKey] = cVal } } return metadataMap, nil @@ -1561,7 +1612,7 @@ func GetClientProvidedKey(options CpkOptions) azblob.ClientProvidedKeyOptions { return ToClientProvidedKeyOptions(_cpkInfo, _cpkScopeInfo) } -//////////////////////////////////////////////////////////////////////////////// +// ////////////////////////////////////////////////////////////////////////////// type SetPropertiesFlags uint32 // [0000000000...32 times] var ESetPropertiesFlags = SetPropertiesFlags(0) @@ -1584,7 +1635,7 @@ func (op *SetPropertiesFlags) ShouldTransferBlobTags() bool { return (*op)&ESetPropertiesFlags.SetBlobTags() == ESetPropertiesFlags.SetBlobTags() } -//////////////////////////////////////////////////////////////////////////////// +// ////////////////////////////////////////////////////////////////////////////// type RehydratePriorityType uint8 var ERehydratePriorityType = RehydratePriorityType(0) // setting default as none @@ -1614,3 +1665,28 @@ func (rpt RehydratePriorityType) ToRehydratePriorityType() azblob.RehydratePrior return azblob.RehydratePriorityStandard } } + +// ////////////////////////////////////////////////////////////////////////////// +type SyncHashType uint8 + +var ESyncHashType SyncHashType = 0 + +func (SyncHashType) None() SyncHashType { + return 0 +} + +func (SyncHashType) MD5() SyncHashType { + return 1 +} + +func (ht *SyncHashType) Parse(s string) error { + val, err := enum.ParseInt(reflect.TypeOf(ht), s, true, true) + if err == nil { + *ht = val.(SyncHashType) + } + return err +} + +func (ht SyncHashType) String() string { + return enum.StringInt(ht, reflect.TypeOf(ht)) +} diff --git a/common/folderCreationTracker_interface.go b/common/folderCreationTracker_interface.go index 9244c09a8..556e73691 100644 --- a/common/folderCreationTracker_interface.go +++ b/common/folderCreationTracker_interface.go @@ -5,7 +5,7 @@ package common // with the fact that when overwrite == false, we only set file properties on files created // by the current job) type FolderCreationTracker interface { - RecordCreation(folder string) + CreateFolder(folder string, doCreation func() error) error ShouldSetProperties(folder string, overwrite OverwriteOption, prompter Prompter) bool StopTracking(folder string) } diff --git a/common/gcpURLParts.go b/common/gcpURLParts.go index 6ea38388f..96de4cd4f 100644 --- a/common/gcpURLParts.go +++ b/common/gcpURLParts.go @@ -7,8 +7,8 @@ import ( "strings" ) -//GCPURLParts structure is used to parse and hold the different -//components of GCP Object/Service/Bucket URL +// GCPURLParts structure is used to parse and hold the different +// components of GCP Object/Service/Bucket URL type GCPURLParts struct { Scheme string Host string @@ -17,13 +17,13 @@ type GCPURLParts struct { UnparsedParams string } -const gcpHostPattern = "^storage.cloud.google.com" +const gcpHostPattern = "^storage\\.cloud\\.google\\.com" const invalidGCPURLErrorMessage = "Invalid GCP URL" const gcpEssentialHostPart = "google.com" var gcpHostRegex = regexp.MustCompile(gcpHostPattern) -//IsGCPURL validates whether a given URL is a valid GCP Object/Service/Bucket URL +// IsGCPURL validates whether a given URL is a valid GCP Object/Service/Bucket URL func IsGCPURL(u url.URL) bool { if _, isGCPURL := findGCPURLMatches(strings.ToLower(u.Host)); isGCPURL { return true @@ -39,8 +39,8 @@ func findGCPURLMatches(lower string) ([]string, bool) { return matches, true } -//NewGCPURLParts processes the given URL and returns a valid GCPURLParts -//structure that contains all the necessary components. +// NewGCPURLParts processes the given URL and returns a valid GCPURLParts +// structure that contains all the necessary components. func NewGCPURLParts(u url.URL) (GCPURLParts, error) { host := strings.ToLower(u.Host) _, isGCPURL := findGCPURLMatches(host) @@ -70,7 +70,7 @@ func NewGCPURLParts(u url.URL) (GCPURLParts, error) { return up, nil } -//URL returns a valid net/url.URL object initialised from the components of GCP URL +// URL returns a valid net/url.URL object initialised from the components of GCP URL func (gUrl *GCPURLParts) URL() url.URL { path := "" @@ -118,8 +118,8 @@ func (gUrl *GCPURLParts) IsObjectSyntactically() bool { return false } -//IsDirectorySyntactically returns true if the given GCPURLParts -//points to a directory or not based on the path. +// IsDirectorySyntactically returns true if the given GCPURLParts +// points to a directory or not based on the path. func (gUrl *GCPURLParts) IsDirectorySyntactically() bool { if gUrl.IsObjectSyntactically() && strings.HasSuffix(gUrl.ObjectKey, "/") { return true diff --git a/common/gcpURLParts_test.go b/common/gcpURLParts_test.go index 30d03753a..7a206656c 100644 --- a/common/gcpURLParts_test.go +++ b/common/gcpURLParts_test.go @@ -47,3 +47,22 @@ func (s *gcpURLPartsTestSuite) TestGCPURLParseNegative(c *chk.C) { c.Assert(err, chk.NotNil) c.Assert(strings.Contains(err.Error(), invalidGCPURLErrorMessage), chk.Equals, true) } + +func (s *gcpURLPartsTestSuite) TestIsGCPURL(c *chk.C) { + u, _ := url.Parse("http://storage.cloud.google.com/bucket/keyname/") + isGCP := IsGCPURL(*u) + c.Assert(isGCP, chk.Equals, true) + + // Negative Test Cases + u, _ = url.Parse("http://storage.cloudxgoogle.com/bucket/keyname/") + isGCP = IsGCPURL(*u) + c.Assert(isGCP, chk.Equals, false) + + u, _ = url.Parse("http://storage.cloud.googlexcom/bucket/keyname/") + isGCP = IsGCPURL(*u) + c.Assert(isGCP, chk.Equals, false) + + u, _ = url.Parse("http://storagexcloud.google.com/bucket/keyname/") + isGCP = IsGCPURL(*u) + c.Assert(isGCP, chk.Equals, false) +} diff --git a/common/hash_data.go b/common/hash_data.go new file mode 100644 index 000000000..ac28660c6 --- /dev/null +++ b/common/hash_data.go @@ -0,0 +1,13 @@ +package common + +import "time" + +// AzCopyHashDataStream is used as both the name of a data stream, xattr key, and the suffix of os-agnostic hash data files. +// The local traverser intentionally skips over files with this suffix. +const AzCopyHashDataStream = `.azcopysyncmeta` + +type SyncHashData struct { + Mode SyncHashType + Data string // base64 encoded + LMT time.Time +} diff --git a/common/hash_data_other.go b/common/hash_data_other.go new file mode 100644 index 000000000..f9383667e --- /dev/null +++ b/common/hash_data_other.go @@ -0,0 +1,71 @@ +//go:build !windows +// +build !windows + +package common + +import ( + "encoding/json" + "io" + "os" + "path/filepath" + "time" +) + +func getHashPath(fullpath string) string { + // get meta file name + dir, fn := filepath.Split(fullpath) + var metaFile string + if dir == "." { // Hide the file on UNIX-like systems (e.g. Linux, OSX) + metaFile = "./." + fn + AzCopyHashDataStream + } else { + metaFile = dir + "/." + fn + AzCopyHashDataStream + } + + return metaFile +} + +func TryGetHashData(fullpath string) (SyncHashData, error) { + // get meta file name + metaFile := getHashPath(fullpath) + // open file for reading + f, err := os.OpenFile(metaFile, os.O_RDONLY, 0644) + if err != nil { + return SyncHashData{}, err + } + defer f.Close() + + buf, err := io.ReadAll(f) + if err != nil { + return SyncHashData{}, err + } + + var out SyncHashData + err = json.Unmarshal(buf, &out) + + return out, err +} + +func PutHashData(fullpath string, data SyncHashData) error { + // get meta file name + metaFile := getHashPath(fullpath) + // open file for writing; truncate. + f, err := os.OpenFile(metaFile, os.O_CREATE|os.O_TRUNC|os.O_RDWR, 0644) + if err != nil { + return err + } + defer f.Close() + + buf, err := json.Marshal(data) + if err != nil { + return err + } + + _, err = f.Write(buf) + if err != nil { + return err + } + + _ = f.Close() // double closing won't hurt because it's a no-op + + return os.Chtimes(fullpath, time.Now(), data.LMT) +} diff --git a/common/hash_data_windows.go b/common/hash_data_windows.go new file mode 100644 index 000000000..0bab9ba84 --- /dev/null +++ b/common/hash_data_windows.go @@ -0,0 +1,58 @@ +package common + +import ( + "encoding/json" + "io" + "os" + "time" +) + +// TryGetHashData On Windows attempts to use Alternate Data Streams +func TryGetHashData(fullpath string) (SyncHashData, error) { + // get meta file name + metaFile := fullpath + ":" + AzCopyHashDataStream + f, err := os.OpenFile(metaFile, os.O_RDONLY, 0644) + if err != nil { + return SyncHashData{}, err + } + defer f.Close() + + buf, err := io.ReadAll(f) + if err != nil { + return SyncHashData{}, err + } + + var out SyncHashData + err = json.Unmarshal(buf, &out) + + return out, err +} + +func PutHashData(fullpath string, data SyncHashData) error { + // get meta file name + metaFile := fullpath + ":" + AzCopyHashDataStream + f, err := os.OpenFile(metaFile, os.O_CREATE|os.O_TRUNC|os.O_RDWR, 0644) + if err != nil { + return err + } + defer f.Close() + + buf, err := json.Marshal(data) + if err != nil { + return err + } + + _, err = f.Write(buf) + if err != nil { + return err + } + + _ = f.Close() // double closing won't hurt because it's a no-op + + err = os.Chtimes(fullpath, time.Now(), data.LMT) + if err != nil { + return err + } + + return nil +} diff --git a/common/iff.go b/common/iff.go index a006b1f8e..9e6b29c32 100644 --- a/common/iff.go +++ b/common/iff.go @@ -20,7 +20,7 @@ package common -// GetBlocksRoundedUp returns the number of blocks given sie, rounded up +// GetBlocksRoundedUp returns the number of blocks given size, rounded up func GetBlocksRoundedUp(size uint64, blockSize uint64) uint16 { return uint16(size/blockSize) + Iffuint16((size%blockSize) == 0, 0, 1) } diff --git a/common/lifecyleMgr.go b/common/lifecyleMgr.go index 02d57fedd..9320ea624 100644 --- a/common/lifecyleMgr.go +++ b/common/lifecyleMgr.go @@ -619,7 +619,7 @@ func (_ *lifecycleMgr) awaitChannel(ch chan struct{}, timeout time.Duration) { } } -// E2EAwaitContinue is used in case where a developer want's to debug AzCopy by attaching to the running process, +// E2EAwaitContinue is used in case where a developer wants to debug AzCopy by attaching to the running process, // before it starts doing any actual work. func (lcm *lifecycleMgr) E2EAwaitContinue() { lcm.e2eAllowAwaitContinue = true // not technically gorountine safe (since its shared state) but its consistent with EnableInputWatcher diff --git a/common/oauthTokenManager.go b/common/oauthTokenManager.go index 1d5fae0bc..48fdd2f98 100644 --- a/common/oauthTokenManager.go +++ b/common/oauthTokenManager.go @@ -108,9 +108,10 @@ func newAzcopyHTTPClient() *http.Client { } // GetTokenInfo gets token info, it follows rule: -// 1. If there is token passed from environment variable(note this is only for testing purpose), -// use token passed from environment variable. -// 2. Otherwise, try to get token from cache. +// 1. If there is token passed from environment variable(note this is only for testing purpose), +// use token passed from environment variable. +// 2. Otherwise, try to get token from cache. +// // This method either successfully return token, or return error. func (uotm *UserOAuthTokenManager) GetTokenInfo(ctx context.Context) (*OAuthTokenInfo, error) { if uotm.stashedInfo != nil { @@ -508,7 +509,7 @@ func (uotm *UserOAuthTokenManager) UserLogin(tenantID, activeDirectoryEndpoint s // getCachedTokenInfo get a fresh token from local disk cache. // If access token is expired, it will refresh the token. // If refresh token is expired, the method will fail and return failure reason. -// Fresh token is persisted if acces token or refresh token is changed. +// Fresh token is persisted if access token or refresh token is changed. func (uotm *UserOAuthTokenManager) getCachedTokenInfo(ctx context.Context) (*OAuthTokenInfo, error) { hasToken, err := uotm.credCache.HasCachedToken() if err != nil { @@ -592,7 +593,7 @@ func (uotm *UserOAuthTokenManager) getTokenInfoFromEnvVar(ctx context.Context) ( } // Remove the env var after successfully fetching once, - // in case of env var is further spreading into child processes unexpectly. + // in case of env var is further spreading into child processes unexpectedly. lcm.ClearEnvironmentVariable(EEnvironmentVariable.OAuthTokenInfo()) tokenInfo, err := jsonToTokenInfo([]byte(rawToken)) diff --git a/common/proxy_forwarder.go b/common/proxy_forwarder.go new file mode 100644 index 000000000..85b581359 --- /dev/null +++ b/common/proxy_forwarder.go @@ -0,0 +1,14 @@ +//go:build !windows +// +build !windows + +package common + +import ( + "net/http" + "net/url" +) + +// GetProxyFunc is a forwarder for the OS-Exclusive proxyMiddleman_os.go files +func GetProxyFunc() func(*http.Request) (*url.URL, error) { + return http.ProxyFromEnvironment +} diff --git a/common/proxy_forwarder_windows.go b/common/proxy_forwarder_windows.go new file mode 100644 index 000000000..25e5d1339 --- /dev/null +++ b/common/proxy_forwarder_windows.go @@ -0,0 +1,14 @@ +//go:build windows +// +build windows + +package common + +import ( + "github.com/mattn/go-ieproxy" + "net/http" + "net/url" +) + +func GetProxyFunc() func(*http.Request) (*url.URL, error) { + return ieproxy.GetProxyFunc() +} diff --git a/common/randomDataGenerator.go b/common/randomDataGenerator.go index d6a5ddf6d..450cc2f37 100644 --- a/common/randomDataGenerator.go +++ b/common/randomDataGenerator.go @@ -121,7 +121,7 @@ func (r *randomDataGenerator) freshenRandomData(count int) { // ALSO flip random bits in every yth one (where y is much smaller than the x we used above) // This is not as random as what we do above, but its faster. And without it, the data is too compressible - var skipSize = 2 // with skip-size = 3 its slightly faster, and still uncompressible with zip but it is + var skipSize = 2 // with skip-size = 3 its slightly faster, and still incompressible with zip but it is // compressible (down to 30% of original size) with 7zip's compression bitFlipMask := byte(r.randGen.Int31n(128)) + 128 for i := r.readIterationCount % skipSize; i < count; i += skipSize { diff --git a/common/rpc-models.go b/common/rpc-models.go index accb27e48..24782b538 100644 --- a/common/rpc-models.go +++ b/common/rpc-models.go @@ -108,7 +108,7 @@ func ConsolidatePathSeparators(path string) string { // ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// // Transfers describes each file/folder being transferred in a given JobPartOrder, and -// other auxilliary details of this order. +// other auxiliary details of this order. type Transfers struct { List []CopyTransfer TotalSizeInBytes uint64 @@ -164,6 +164,7 @@ type CredentialInfo struct { OAuthTokenInfo OAuthTokenInfo S3CredentialInfo S3CredentialInfo GCPCredentialInfo GCPCredentialInfo + SourceBlobToken azblob.Credential } func (c CredentialInfo) WithType(credentialType CredentialType) CredentialInfo { @@ -260,8 +261,11 @@ type ListJobSummaryResponse struct { FileTransfers uint32 `json:",string"` FolderPropertyTransfers uint32 `json:",string"` + FoldersCompleted uint32 `json:",string"` // Files can be figured out by TransfersCompleted - FoldersCompleted TransfersCompleted uint32 `json:",string"` + FoldersFailed uint32 `json:",string"` TransfersFailed uint32 `json:",string"` + FoldersSkipped uint32 `json:",string"` TransfersSkipped uint32 `json:",string"` // includes bytes sent in retries (i.e. has double counting, if there are retries) and in failed transfers diff --git a/common/s3URLParts.go b/common/s3URLParts.go index 8ef3ec25f..9ad5e5326 100644 --- a/common/s3URLParts.go +++ b/common/s3URLParts.go @@ -64,7 +64,7 @@ const s3EssentialHostPart = "amazonaws.com" var s3HostRegex = regexp.MustCompile(s3HostPattern) -// IsS3URL verfies if a given URL points to S3 URL supported by AzCopy-v10 +// IsS3URL verifies if a given URL points to S3 URL supported by AzCopy-v10 func IsS3URL(u url.URL) bool { if _, isS3URL := findS3URLMatches(strings.ToLower(u.Host)); isS3URL { return true @@ -102,7 +102,7 @@ func NewS3URLParts(u url.URL) (S3URLParts, error) { } // Check what's the path style, and parse accordingly. - if matchSlices[1] != "" { // Go's implementatoin is a bit strange, even if the first subexp fail to be matched, "" will be returned for that sub exp + if matchSlices[1] != "" { // Go's implementation is a bit strange, even if the first subexp fail to be matched, "" will be returned for that sub exp // In this case, it would be in virtual-hosted-style URL, and has host prefix like bucket.s3[-.] up.BucketName = matchSlices[1][:len(matchSlices[1])-1] // Removing the trailing '.' at the end up.ObjectKey = path diff --git a/common/version.go b/common/version.go index 6240c121d..55222b260 100644 --- a/common/version.go +++ b/common/version.go @@ -1,6 +1,6 @@ package common -const AzcopyVersion = "10.16.2" +const AzcopyVersion = "10.17.0" const UserAgent = "AzCopy/" + AzcopyVersion const S3ImportUserAgent = "S3Import " + UserAgent const GCPImportUserAgent = "GCPImport " + UserAgent diff --git a/common/writeThoughFile.go b/common/writeThoughFile.go index 82cc57444..16bc4f100 100644 --- a/common/writeThoughFile.go +++ b/common/writeThoughFile.go @@ -62,20 +62,9 @@ func CreateDirectoryIfNotExist(directory string, tracker FolderCreationTracker) CreateParentDirectoryIfNotExist(directory, tracker) // then create the directory - mkDirErr := os.Mkdir(directory, os.ModePerm) - - // if Mkdir succeeds, no error is dropped-- it is nil. - // therefore, returning here is perfectly acceptable as it either succeeds (or it doesn't) - if mkDirErr == nil { - // To run our folder overwrite logic, we have to know if this current job created the folder. - // As per the comments above, we are technically wrong here in a write-only scenario (maybe it already - // existed and our Stat failed). But using overwrite=false on a write-only destination doesn't make - // a lot of sense anyway. Yes, we'll make the wrong decision here in a write-only scenario, but we'll - // make the _same_ wrong overwrite decision for all the files too (not just folders). So this is, at least, - // consistent. - tracker.RecordCreation(directory) - return nil - } + mkDirErr := tracker.CreateFolder(directory, func() error { + return os.Mkdir(directory, os.ModePerm) + }) // another routine might have created the directory at the same time // check whether the directory now exists diff --git a/common/writeThoughFile_linux.go b/common/writeThoughFile_linux.go index 004041922..994755ad0 100644 --- a/common/writeThoughFile_linux.go +++ b/common/writeThoughFile_linux.go @@ -21,10 +21,143 @@ package common import ( + "encoding/binary" + "fmt" "os" "syscall" + "time" + + "github.com/pkg/xattr" + "golang.org/x/sys/unix" +) + +// Extended Attribute (xattr) keys for fetching various information from Linux cifs client. +const ( + CIFS_XATTR_CREATETIME = "user.cifs.creationtime" // File creation time. + CIFS_XATTR_ATTRIB = "user.cifs.dosattrib" // FileAttributes. + CIFS_XATTR_CIFS_ACL = "system.cifs_acl" // DACL only. + CIFS_XATTR_CIFS_NTSD = "system.cifs_ntsd" // Owner, Group, DACL. + CIFS_XATTR_CIFS_NTSD_FULL = "system.cifs_ntsd_full" // Owner, Group, DACL, SACL. ) +// 100-nanosecond intervals from Windows Epoch (January 1, 1601) to Unix Epoch (January 1, 1970). +const ( + TICKS_FROM_WINDOWS_EPOCH_TO_UNIX_EPOCH = 116444736000000000 +) + +// windows.Filetime. +type Filetime struct { + LowDateTime uint32 + HighDateTime uint32 +} + +// windows.ByHandleFileInformation +type ByHandleFileInformation struct { + FileAttributes uint32 + CreationTime Filetime + LastAccessTime Filetime + LastWriteTime Filetime + VolumeSerialNumber uint32 + FileSizeHigh uint32 + FileSizeLow uint32 + NumberOfLinks uint32 + FileIndexHigh uint32 + FileIndexLow uint32 +} + +// Nanoseconds converts Filetime (as ticks since Windows Epoch) to nanoseconds since Unix Epoch (January 1, 1970). +func (ft *Filetime) Nanoseconds() int64 { + // 100-nanosecond intervals (ticks) since Windows Epoch (January 1, 1601). + nsec := int64(ft.HighDateTime)<<32 + int64(ft.LowDateTime) + + // 100-nanosecond intervals since Unix Epoch (January 1, 1970). + nsec -= TICKS_FROM_WINDOWS_EPOCH_TO_UNIX_EPOCH + + // nanoseconds since Unix Epoch. + return nsec * 100 +} + +// Convert nanoseconds since Unix Epoch (January 1, 1970) to Filetime since Windows Epoch (January 1, 1601). +func NsecToFiletime(nsec int64) Filetime { + // 100-nanosecond intervals since Unix Epoch (January 1, 1970). + nsec /= 100 + + // 100-nanosecond intervals since Windows Epoch (January 1, 1601). + nsec += TICKS_FROM_WINDOWS_EPOCH_TO_UNIX_EPOCH + + return Filetime{LowDateTime: uint32(nsec & 0xFFFFFFFF), HighDateTime: uint32(nsec >> 32)} +} + +// WindowsTicksToUnixNano converts ticks (100-ns intervals) since Windows Epoch to nanoseconds since Unix Epoch. +func WindowsTicksToUnixNano(ticks int64) int64 { + // 100-nanosecond intervals since Unix Epoch (January 1, 1970). + ticks -= TICKS_FROM_WINDOWS_EPOCH_TO_UNIX_EPOCH + + // nanoseconds since Unix Epoch (January 1, 1970). + return ticks * 100 +} + +// UnixNanoToWindowsTicks converts nanoseconds since Unix Epoch to ticks since Windows Epoch. +func UnixNanoToWindowsTicks(nsec int64) int64 { + // 100-nanosecond intervals since Unix Epoch (January 1, 1970). + nsec /= 100 + + // 100-nanosecond intervals since Windows Epoch (January 1, 1601). + nsec += TICKS_FROM_WINDOWS_EPOCH_TO_UNIX_EPOCH + + return nsec +} + +// StatxTimestampToFiletime converts the unix StatxTimestamp (sec, nsec) to the Windows' Filetime. +// Note that StatxTimestamp is from Unix Epoch while Filetime holds time from Windows Epoch. +func StatxTimestampToFiletime(ts unix.StatxTimestamp) Filetime { + return NsecToFiletime(ts.Sec*int64(time.Second) + int64(ts.Nsec)) +} + +func GetFileInformation(path string) (ByHandleFileInformation, error) { + var stx unix.Statx_t + + // We want all attributes including Btime (aka creation time). + // For consistency with Windows implementation we pass flags==0 which causes it to follow symlinks. + err := unix.Statx(unix.AT_FDCWD, path, 0 /* flags */, unix.STATX_ALL, &stx) + if err == unix.ENOSYS || err == unix.EPERM { + panic(fmt.Errorf("statx syscall is not available: %v", err)) + } else if err != nil { + return ByHandleFileInformation{}, fmt.Errorf("statx(%s) failed: %v", path, err) + } + + // For getting FileAttributes we need to query the CIFS_XATTR_ATTRIB extended attribute. + // Note: This doesn't necessarily cause a new QUERY_PATH_INFO call to the SMB server, instead + // the value cached in the inode (likely as a result of the above Statx call) will be + // returned. + xattrbuf, err := xattr.Get(path, CIFS_XATTR_ATTRIB) + if err != nil { + return ByHandleFileInformation{}, + fmt.Errorf("xattr.Get(%s, %s) failed: %v", path, CIFS_XATTR_ATTRIB, err) + } + + var info ByHandleFileInformation + + info.FileAttributes = binary.LittleEndian.Uint32(xattrbuf) + + info.CreationTime = StatxTimestampToFiletime(stx.Btime) + info.LastAccessTime = StatxTimestampToFiletime(stx.Atime) + info.LastWriteTime = StatxTimestampToFiletime(stx.Mtime) + + // TODO: Do we need this? + info.VolumeSerialNumber = 0 + + info.FileSizeHigh = uint32(stx.Size >> 32) + info.FileSizeLow = uint32(stx.Size & 0xFFFFFFFF) + + info.NumberOfLinks = stx.Nlink + + info.FileIndexHigh = uint32(stx.Ino >> 32) + info.FileIndexLow = uint32(stx.Ino & 0xFFFFFFFF) + + return info, nil +} + func CreateFileOfSizeWithWriteThroughOption(destinationPath string, fileSize int64, writeThrough bool, t FolderCreationTracker, forceIfReadOnly bool) (*os.File, error) { // forceIfReadOnly is not used on this OS diff --git a/common/writeThoughFile_windows.go b/common/writeThoughFile_windows.go index 4b6e3a910..d218bdf5d 100644 --- a/common/writeThoughFile_windows.go +++ b/common/writeThoughFile_windows.go @@ -223,7 +223,7 @@ func OpenWithWriteThroughSetting(path string, mode int, perm uint32, writeThroug return h, e } -// SetBackupMode optionally enables special priviledges on Windows. +// SetBackupMode optionally enables special privileges on Windows. // For a description, see https://docs.microsoft.com/en-us/windows-hardware/drivers/ifs/privileges // and https://superuser.com/a/1430372 // and run this: whoami /priv @@ -233,7 +233,9 @@ func OpenWithWriteThroughSetting(path string, mode int, perm uint32, writeThroug // 1. Uploading data where normal file system ACLs would prevent AzCopy from reading it. Simply run // AzCopy as an account that has SeBackupPrivilege (typically an administrator account using // an elevated command prompt, or a member of the "Backup Operators" group) -// and set the AzCopy flag for this routine to be called. +// +// and set the AzCopy flag for this routine to be called. +// // 2. Downloading where you are preserving SMB permissions, and some of the permissions include // owners that are NOT the same account as the one running AzCopy. Again, run AzCopy // from a elevated admin command prompt (or as a member of the "Backup Operators" group), diff --git a/e2etest/arm.go b/e2etest/arm.go index b01679f25..1cc3f9c62 100644 --- a/e2etest/arm.go +++ b/e2etest/arm.go @@ -4,7 +4,7 @@ import ( "encoding/json" "fmt" "github.com/Azure/go-autorest/autorest/adal" - "io/ioutil" + "io" "net/http" "reflect" "strconv" @@ -53,7 +53,7 @@ func ResolveAzureAsyncOperation(OAuth *adal.ServicePrincipalToken, uri string, p } var buf []byte - buf, err = ioutil.ReadAll(resp.Body) + buf, err = io.ReadAll(resp.Body) if err != nil { return nil, fmt.Errorf("failed to read response body (resp code 200): %w", err) } @@ -68,7 +68,7 @@ func ResolveAzureAsyncOperation(OAuth *adal.ServicePrincipalToken, uri string, p } if resp.StatusCode != 200 { - rBody, err := ioutil.ReadAll(resp.Body) + rBody, err := io.ReadAll(resp.Body) if err != nil { return nil, fmt.Errorf("failed to read response body (resp code %d): %w", resp.StatusCode, err) } diff --git a/e2etest/declarativeHelpers.go b/e2etest/declarativeHelpers.go index ea01fb8af..ef09d4622 100644 --- a/e2etest/declarativeHelpers.go +++ b/e2etest/declarativeHelpers.go @@ -157,7 +157,7 @@ type params struct { cancelFromStdin bool backupMode bool preserveSMBPermissions bool - preserveSMBInfo bool + preserveSMBInfo *bool preservePOSIXProperties bool relativeSourcePath string blobTags string @@ -171,6 +171,7 @@ type params struct { s2sPreserveAccessTier bool accessTier azblob.AccessTierType checkMd5 common.HashValidationOption + compareHash common.SyncHashType destNull bool @@ -504,6 +505,9 @@ type hookHelper interface { // GetTestFiles returns (a copy of) the testFiles object that defines which files will be used in the test GetTestFiles() testFiles + // SetTestFiles allows the test to set the test files in a callback (e.g. adding new files to the test dynamically w/o creation) + SetTestFiles(fs testFiles) + // CreateFiles creates the specified files (overwriting any that are already there of the same name) CreateFiles(fs testFiles, atSource bool, setTestFiles bool, createSourceFilesAtDest bool) diff --git a/e2etest/declarativeRunner.go b/e2etest/declarativeRunner.go index dc762ceab..e0615bb34 100644 --- a/e2etest/declarativeRunner.go +++ b/e2etest/declarativeRunner.go @@ -109,18 +109,18 @@ func RunScenarios( operations Operation, testFromTo TestFromTo, validate Validate, // TODO: do we really want the test author to have to nominate which validation should happen? Pros: better perf of tests. Cons: they have to tell us, and if they tell us wrong test may not test what they think it tests -// _ interface{}, // TODO if we want it??, blockBLobsOnly or specifc/all blob types + // _ interface{}, // TODO if we want it??, blockBlobsOnly or specific/all blob types -// It would be a pain to list out every combo by hand, -// In addition to the fact that not every credential type is sensible. -// Thus, the E2E framework takes in a requested set of credential types, and applies them where sensible. -// This allows you to make tests use OAuth only, SAS only, etc. + // It would be a pain to list out every combo by hand, + // In addition to the fact that not every credential type is sensible. + // Thus, the E2E framework takes in a requested set of credential types, and applies them where sensible. + // This allows you to make tests use OAuth only, SAS only, etc. requestedCredentialTypesSrc []common.CredentialType, requestedCredentialTypesDst []common.CredentialType, p params, hs *hooks, fs testFiles, -// TODO: do we need something here to explicitly say that we expect success or failure? For now, we are just inferring that from the elements of sourceFiles + // TODO: do we need something here to explicitly say that we expect success or failure? For now, we are just inferring that from the elements of sourceFiles destAccountType AccountType, srcAccountType AccountType, scenarioSuffix string) { diff --git a/e2etest/declarativeScenario.go b/e2etest/declarativeScenario.go index 73aa4d66a..d79f1525f 100644 --- a/e2etest/declarativeScenario.go +++ b/e2etest/declarativeScenario.go @@ -70,6 +70,31 @@ type scenarioState struct { func (s *scenario) Run() { defer s.cleanup() + // setup runner + azcopyDir, err := os.MkdirTemp("", "") + if err != nil { + s.a.Error(err.Error()) + return + } + azcopyRan := false + defer func() { + if os.Getenv("AZCOPY_E2E_LOG_OUTPUT") == "" { + s.a.Assert(os.RemoveAll(azcopyDir), equals(), nil) + return // no need, just delete logdir + } + + err := os.MkdirAll(os.Getenv("AZCOPY_E2E_LOG_OUTPUT"), os.ModePerm|os.ModeDir) + if err != nil { + s.a.Assert(err, equals(), nil) + return + } + if azcopyRan && s.a.Failed() { + s.uploadLogs(azcopyDir) + s.a.(*testingAsserter).t.Log("uploaded logs for job " + s.state.result.jobID.String() + " as an artifact") + } + }() + + // setup scenario // First, validate the accounts make sense for the source/dests if s.srcAccountType.IsBlobOnly() { s.a.Assert(s.fromTo.From(), equals(), common.ELocation.Blob()) @@ -97,14 +122,15 @@ func (s *scenario) Run() { } // execute - s.runAzCopy() + azcopyRan = true + s.runAzCopy(azcopyDir) if s.a.Failed() { return // execution failed. No point in running validation } // resume if needed if s.needResume { - tx, err := s.state.result.GetTransferList(common.ETransferStatus.Cancelled()) + tx, err := s.state.result.GetTransferList(common.ETransferStatus.Cancelled(), azcopyDir) s.a.AssertNoErr(err, "Failed to get transfer list for Cancelled") s.a.Assert(len(tx), equals(), len(s.p.debugSkipFiles), "Job cancel didn't completely work") @@ -112,14 +138,14 @@ func (s *scenario) Run() { return } - s.resumeAzCopy() + s.resumeAzCopy(azcopyDir) } if s.a.Failed() { return // resume failed. No point in running validation } // check - s.validateTransferStates() + s.validateTransferStates(azcopyDir) if s.a.Failed() { return // no point in doing more validation } @@ -138,6 +164,13 @@ func (s *scenario) Run() { s.runHook(s.hs.afterValidation) } +func (s *scenario) uploadLogs(logDir string) { + if s.state.result == nil || os.Getenv("AZCOPY_E2E_LOG_OUTPUT") == "" { + return // nothing to upload + } + s.a.Assert(os.Rename(logDir, filepath.Join(os.Getenv("AZCOPY_E2E_LOG_OUTPUT"), s.state.result.jobID.String())), equals(), nil) +} + func (s *scenario) runHook(h hookFunc) bool { if h == nil { return true // nothing to do. So "successful" @@ -179,10 +212,10 @@ func (s *scenario) assignSourceAndDest() { return &resourceBlobContainer{accountType: s.destAccountType} } case common.ELocation.BlobFS(): - s.a.Error("Not implementd yet for blob FS") + s.a.Error("Not implemented yet for blob FS") return &resourceDummy{} case common.ELocation.S3(): - s.a.Error("Not implementd yet for S3") + s.a.Error("Not implemented yet for S3") return &resourceDummy{} case common.ELocation.Unknown(): return &resourceDummy{} @@ -195,7 +228,7 @@ func (s *scenario) assignSourceAndDest() { s.state.dest = createTestResource(s.fromTo.To(), false) } -func (s *scenario) runAzCopy() { +func (s *scenario) runAzCopy(logDirectory string) { s.chToStdin = make(chan string) // unubuffered seems the most predictable for our usages defer close(s.chToStdin) @@ -223,9 +256,9 @@ func (s *scenario) runAzCopy() { result, wasClean, err := r.ExecuteAzCopyCommand( s.operation, s.state.source.getParam(s.stripTopDir, needsSAS(s.credTypes[0]), tf.objectTarget), - s.state.dest.getParam(false, needsSAS(s.credTypes[1]), common.IffString(tf.destTarget != "", tf.destTarget, tf.objectTarget)), - s.credTypes[0].IsAzureOAuth() || s.credTypes[1].IsAzureOAuth(), // needsOAuth - afterStart, s.chToStdin) + s.state.dest.getParam(false, needsSAS(s.credTypes[1]), common.IffString(tf.destTarget != "", tf.destTarget, tf.objectTarget)), + s.credTypes[0] == common.ECredentialType.OAuthToken() || s.credTypes[1] == common.ECredentialType.OAuthToken(), // needsOAuth + afterStart, s.chToStdin, logDirectory) if !wasClean { s.a.AssertNoErr(err, "running AzCopy") @@ -243,7 +276,7 @@ func (s *scenario) runAzCopy() { s.state.result = &result } -func (s *scenario) resumeAzCopy() { +func (s *scenario) resumeAzCopy(logDir string) { s.chToStdin = make(chan string) // unubuffered seems the most predictable for our usages defer close(s.chToStdin) @@ -274,6 +307,7 @@ func (s *scenario) resumeAzCopy() { false, afterStart, s.chToStdin, + logDir, ) if !wasClean { @@ -295,7 +329,7 @@ func (s *scenario) validateRemove() { } } } -func (s *scenario) validateTransferStates() { +func (s *scenario) validateTransferStates(azcopyDir string) { if s.operation == eOperation.Remove() { s.validateRemove() return @@ -318,7 +352,7 @@ func (s *scenario) validateTransferStates() { // Is that OK? (Not sure what to do if it's not, because azcopy jobs show, apparently doesn't offer us a way to get the skipped list) } { expectedTransfers := s.fs.getForStatus(statusToTest, expectFolders, expectRootFolder) - actualTransfers, err := s.state.result.GetTransferList(statusToTest) + actualTransfers, err := s.state.result.GetTransferList(statusToTest, azcopyDir) s.a.AssertNoErr(err) Validator{}.ValidateCopyTransfersAreScheduled(s.a, isSrcEncoded, isDstEncoded, srcRoot, dstRoot, expectedTransfers, actualTransfers, statusToTest, s.FromTo(), s.srcAccountType, s.destAccountType) @@ -633,6 +667,10 @@ func (s *scenario) GetTestFiles() testFiles { return s.fs } +func (s *scenario) SetTestFiles(fs testFiles) { + s.fs = fs +} + func (s *scenario) CreateFiles(fs testFiles, atSource bool, setTestFiles bool, createSourceFilesAtDest bool) { original := s.fs s.fs = fs diff --git a/e2etest/declarativeTestFiles.go b/e2etest/declarativeTestFiles.go index 49529072f..41a607b4f 100644 --- a/e2etest/declarativeTestFiles.go +++ b/e2etest/declarativeTestFiles.go @@ -55,8 +55,10 @@ func (h *contentHeaders) DeepCopy() *contentHeaders { ret.contentEncoding = h.contentEncoding ret.contentLanguage = h.contentLanguage ret.contentType = h.contentType - ret.contentMD5 = make([]byte, len(h.contentMD5)) - copy(ret.contentMD5, h.contentMD5) + if h.contentMD5 != nil { + ret.contentMD5 = make([]byte, len(h.contentMD5)) + copy(ret.contentMD5, h.contentMD5) + } return &ret } @@ -184,6 +186,8 @@ type testObject struct { name string expectedFailureMessage string // the failure message that we expect to see in the log for this file/folder (only populated for expected failures) + body []byte + // info to be used at creation time. Usually, creationInfo and and verificationInfo will be the same // I.e. we expect the creation properties to be preserved. But, for flexibility, they can be set to something different. creationProperties objectProperties @@ -197,6 +201,11 @@ func (t *testObject) DeepCopy() *testObject { ret.expectedFailureMessage = t.expectedFailureMessage ret.creationProperties = t.creationProperties.DeepCopy() + if t.body != nil { + ret.body = make([]byte, len(t.body)) + copy(ret.body, t.body) + } + if t.verificationProperties != nil { vp := (*t.verificationProperties).DeepCopy() ret.verificationProperties = &vp @@ -366,7 +375,8 @@ func (*testFiles) copyList(src []interface{}) []interface{} { // takes a mixed list of (potentially) strings and testObjects, and returns them all as test objects // TODO: do we want to continue supporting plain strings in the expectation file lists (for convenience of test coders) -// or force them to use f() for every file? +// +// or force them to use f() for every file? func (*testFiles) toTestObjects(rawList []interface{}, isFail bool) []*testObject { result := make([]*testObject, 0, len(rawList)) for _, r := range rawList { diff --git a/e2etest/factory.go b/e2etest/factory.go index 382bddeb5..5217efca0 100644 --- a/e2etest/factory.go +++ b/e2etest/factory.go @@ -23,8 +23,8 @@ package e2etest import ( "context" "fmt" - "io/ioutil" "net/url" + "os" "path" "runtime" "strings" @@ -184,7 +184,7 @@ func (TestResourceFactory) CreateNewFileShareSnapshot(c asserter, fileShare azfi } func (TestResourceFactory) CreateLocalDirectory(c asserter) (dstDirName string) { - dstDirName, err := ioutil.TempDir("", "AzCopyLocalTest") + dstDirName, err := os.MkdirTemp("","AzCopyLocalTest") c.AssertNoErr(err) return } diff --git a/e2etest/managedDisks.go b/e2etest/managedDisks.go index 5a19290aa..e7166dfb6 100644 --- a/e2etest/managedDisks.go +++ b/e2etest/managedDisks.go @@ -4,7 +4,7 @@ import ( "bytes" "encoding/json" "fmt" - "io/ioutil" + "io" "net/http" "net/url" "path" @@ -77,7 +77,7 @@ func (config *ManagedDiskConfig) GetAccess() (*url.URL, error) { return nil, fmt.Errorf("failed to get access (async op): %w", err) } } else { // error - rBody, err := ioutil.ReadAll(resp.Body) + rBody, err := io.ReadAll(resp.Body) if err != nil { return nil, fmt.Errorf("failed to read response body (resp code %d): %w", resp.StatusCode, err) } @@ -85,7 +85,7 @@ func (config *ManagedDiskConfig) GetAccess() (*url.URL, error) { return nil, fmt.Errorf("failed to get access (resp code %d): %s", resp.StatusCode, string(rBody)) } } else { // immediate response - rBody, err := ioutil.ReadAll(resp.Body) + rBody, err := io.ReadAll(resp.Body) if err != nil { return nil, fmt.Errorf("failed to read response body: %w", err) } @@ -132,7 +132,7 @@ func (config *ManagedDiskConfig) RevokeAccess() error { return err } - rBody, err := ioutil.ReadAll(resp.Body) + rBody, err := io.ReadAll(resp.Body) if err != nil { return fmt.Errorf("failed to read response body (resp code %d): %w", resp.StatusCode, err) } diff --git a/e2etest/pointers.go b/e2etest/pointers.go new file mode 100644 index 000000000..15aa1fea8 --- /dev/null +++ b/e2etest/pointers.go @@ -0,0 +1,6 @@ +package e2etest + +// todo: upgrade to go 1.18 and use generics +func BoolPointer(b bool) *bool { + return &b +} diff --git a/e2etest/runner.go b/e2etest/runner.go index 895ce3c65..27b397683 100644 --- a/e2etest/runner.go +++ b/e2etest/runner.go @@ -26,6 +26,8 @@ import ( "fmt" "os" "os/exec" + "path/filepath" + "reflect" "strconv" "strings" @@ -58,6 +60,17 @@ func (t *TestRunner) SetAllFlags(p params, o Operation) { return // nothing to do. The flag is not supposed to be set } + reflectVal := reflect.ValueOf(value) // check for pointer + if reflectVal.Kind() == reflect.Pointer { + result := reflectVal.Elem() // attempt to deref + + if result != (reflect.Value{}) && result.CanInterface() { // can we grab the underlying value? + value = result.Interface() + } else { + return // nothing to use + } + } + format := "%v" if len(formats) > 0 { format = formats[0] @@ -65,6 +78,7 @@ func (t *TestRunner) SetAllFlags(p params, o Operation) { t.flags[key] = fmt.Sprintf(format, value) } + set("log-level", "debug", "debug") // TODO: TODO: nakulkar-msft there will be many more to add here set("recursive", p.recursive, false) @@ -82,7 +96,7 @@ func (t *TestRunner) SetAllFlags(p params, o Operation) { set("s2s-detect-source-changed", p.s2sSourceChangeValidation, false) set("metadata", p.metadata, "") set("cancel-from-stdin", p.cancelFromStdin, false) - set("preserve-smb-info", p.preserveSMBInfo, false) + set("preserve-smb-info", p.preserveSMBInfo, nil) set("preserve-smb-permissions", p.preserveSMBPermissions, false) set("backup", p.backupMode, false) set("blob-tags", p.blobTags, "") @@ -98,6 +112,7 @@ func (t *TestRunner) SetAllFlags(p params, o Operation) { set("preserve-posix-properties", p.preservePOSIXProperties, "") } else if o == eOperation.Sync() { set("preserve-posix-properties", p.preservePOSIXProperties, false) + set("compare-hash", p.compareHash.String(), "None") } } @@ -142,7 +157,7 @@ func (t *TestRunner) execDebuggableWithOutput(name string, args []string, env [] runErr := c.Start() if runErr == nil { defer func() { - _ = c.Process.Kill() // in case we never finish c.Wait() below, and get paniced or killed + _ = c.Process.Kill() // in case we never finish c.Wait() below, and get panicked or killed }() if debug { @@ -184,7 +199,7 @@ func (t *TestRunner) execDebuggableWithOutput(name string, args []string, env [] return stdout.Bytes(), runErr } -func (t *TestRunner) ExecuteAzCopyCommand(operation Operation, src, dst string, needsOAuth bool, afterStart func() string, chToStdin <-chan string) (CopyOrSyncCommandResult, bool, error) { +func (t *TestRunner) ExecuteAzCopyCommand(operation Operation, src, dst string, needsOAuth bool, afterStart func() string, chToStdin <-chan string, logDir string) (CopyOrSyncCommandResult, bool, error) { capLen := func(b []byte) []byte { if len(b) < 1024 { return b @@ -234,6 +249,11 @@ func (t *TestRunner) ExecuteAzCopyCommand(operation Operation, src, dst string, } } + if logDir != "" { + env = append(env, "AZCOPY_LOG_LOCATION="+logDir) + env = append(env, "AZCOPY_JOB_PLAN_LOCATION="+filepath.Join(logDir, "plans")) + } + out, err := t.execDebuggableWithOutput(GlobalInputManager{}.GetExecutablePath(), args, env, afterStart, chToStdin) wasClean := true @@ -268,9 +288,15 @@ func (t *TestRunner) SetTransferStatusFlag(value string) { t.flags["with-status"] = value } -func (t *TestRunner) ExecuteJobsShowCommand(jobID common.JobID) (JobsShowCommandResult, error) { +func (t *TestRunner) ExecuteJobsShowCommand(jobID common.JobID, azcopyDir string) (JobsShowCommandResult, error) { args := append([]string{"jobs", "show", jobID.String()}, t.computeArgs()...) - out, err := exec.Command(GlobalInputManager{}.GetExecutablePath(), args...).Output() + cmd := exec.Command(GlobalInputManager{}.GetExecutablePath(), args...) + + if azcopyDir != "" { + cmd.Env = append(cmd.Env, "AZCOPY_JOB_PLAN_LOCATION="+filepath.Join(azcopyDir, "plans")) + } + + out, err := cmd.Output() if err != nil { return JobsShowCommandResult{}, err } @@ -307,12 +333,12 @@ func newCopyOrSyncCommandResult(rawOutput string) (CopyOrSyncCommandResult, bool return CopyOrSyncCommandResult{jobID: jobSummary.JobID, finalStatus: jobSummary}, true } -func (c *CopyOrSyncCommandResult) GetTransferList(status common.TransferStatus) ([]common.TransferDetail, error) { +func (c *CopyOrSyncCommandResult) GetTransferList(status common.TransferStatus, azcopyDir string) ([]common.TransferDetail, error) { runner := newTestRunner() runner.SetTransferStatusFlag(status.String()) // invoke AzCopy to get the status from the plan files - result, err := runner.ExecuteJobsShowCommand(c.jobID) + result, err := runner.ExecuteJobsShowCommand(c.jobID, azcopyDir) if err != nil { return make([]common.TransferDetail, 0), err } diff --git a/e2etest/scenario_helpers.go b/e2etest/scenario_helpers.go index 10983d021..2ad0157b3 100644 --- a/e2etest/scenario_helpers.go +++ b/e2etest/scenario_helpers.go @@ -23,10 +23,13 @@ package e2etest import ( + "bytes" "context" "crypto/md5" + "encoding/base64" "fmt" - "io/ioutil" + "github.com/google/uuid" + "io" "net/url" "os" "path" @@ -67,7 +70,7 @@ var specialNames = []string{ // note: this is to emulate the list-of-files flag // nolint func (scenarioHelper) generateListOfFiles(c asserter, fileList []string) (path string) { - parentDirName, err := ioutil.TempDir("", "AzCopyLocalTest") + parentDirName, err := os.MkdirTemp("", "AzCopyLocalTest") c.AssertNoErr(err) // create the file @@ -77,22 +80,24 @@ func (scenarioHelper) generateListOfFiles(c asserter, fileList []string) (path s // pipe content into it content := strings.Join(fileList, "\n") - err = ioutil.WriteFile(path, []byte(content), common.DEFAULT_FILE_PERM) + err = os.WriteFile(path, []byte(content), common.DEFAULT_FILE_PERM) c.AssertNoErr(err) return } // nolint func (scenarioHelper) generateLocalDirectory(c asserter) (dstDirName string) { - dstDirName, err := ioutil.TempDir("", "AzCopyLocalTest") + dstDirName, err := os.MkdirTemp("", "AzCopyLocalTest") c.AssertNoErr(err) return } // create a test file -func (scenarioHelper) generateLocalFile(filePath string, fileSize int) ([]byte, error) { - // generate random data - _, bigBuff := getRandomDataAndReader(fileSize) +func (scenarioHelper) generateLocalFile(filePath string, fileSize int, body []byte) ([]byte, error) { + if body == nil { + // generate random data + _, body = getRandomDataAndReader(fileSize) + } // create all parent directories err := os.MkdirAll(filepath.Dir(filePath), os.ModePerm) @@ -101,8 +106,8 @@ func (scenarioHelper) generateLocalFile(filePath string, fileSize int) ([]byte, } // write to file and return the data - err = ioutil.WriteFile(filePath, bigBuff, common.DEFAULT_FILE_PERM) - return bigBuff, err + err = os.WriteFile(filePath, body, common.DEFAULT_FILE_PERM) + return body, err } type generateLocalFilesFromList struct { @@ -129,12 +134,15 @@ func (s scenarioHelper) generateLocalFilesFromList(c asserter, options *generate } else { sourceData, err := s.generateLocalFile( filepath.Join(options.dirPath, file.name), - file.creationProperties.sizeBytes(c, options.defaultSize)) - contentMD5 := md5.Sum(sourceData) + file.creationProperties.sizeBytes(c, options.defaultSize), file.body) if file.creationProperties.contentHeaders == nil { file.creationProperties.contentHeaders = &contentHeaders{} } - file.creationProperties.contentHeaders.contentMD5 = contentMD5[:] + + if file.creationProperties.contentHeaders.contentMD5 == nil { + contentMD5 := md5.Sum(sourceData) + file.creationProperties.contentHeaders.contentMD5 = contentMD5[:] + } c.AssertNoErr(err) // TODO: You'll need to set up things like attributes, and other relevant things from @@ -212,7 +220,7 @@ func (s scenarioHelper) generateCommonRemoteScenarioForLocal(c asserter, dirPath for j, name := range batch { fileList[5*i+j] = name - _, err := s.generateLocalFile(filepath.Join(dirPath, name), defaultFileSize) + _, err := s.generateLocalFile(filepath.Join(dirPath, name), defaultFileSize, nil) c.AssertNoErr(err) } } @@ -371,14 +379,24 @@ func (scenarioHelper) generateBlobsFromList(c asserter, options *generateBlobFro continue // no real folders in blob } ad := blobResourceAdapter{b} - reader, sourceData := getRandomDataAndReader(b.creationProperties.sizeBytes(c, options.defaultSize)) + var reader *bytes.Reader + var sourceData []byte + if b.body != nil { + reader = bytes.NewReader(b.body) + sourceData = b.body + } else { + reader, sourceData = getRandomDataAndReader(b.creationProperties.sizeBytes(c, options.defaultSize)) + b.body = sourceData // set body + } // Setting content MD5 - contentMD5 := md5.Sum(sourceData) if ad.obj.creationProperties.contentHeaders == nil { b.creationProperties.contentHeaders = &contentHeaders{} } - ad.obj.creationProperties.contentHeaders.contentMD5 = contentMD5[:] + if ad.obj.creationProperties.contentHeaders.contentMD5 == nil { + contentMD5 := md5.Sum(sourceData) + ad.obj.creationProperties.contentHeaders.contentMD5 = contentMD5[:] + } tags := ad.toBlobTags() @@ -387,7 +405,6 @@ func (scenarioHelper) generateBlobsFromList(c asserter, options *generateBlobFro } headers := ad.toHeaders() - headers.ContentMD5 = contentMD5[:] var err error @@ -399,19 +416,47 @@ func (scenarioHelper) generateBlobsFromList(c asserter, options *generateBlobFro options.accessTier = azblob.DefaultAccessTier } - cResp, err := bb.Upload(ctx, - reader, - headers, - ad.toMetadata(), - azblob.BlobAccessConditions{}, - options.accessTier, - tags, - common.ToClientProvidedKeyOptions(options.cpkInfo, options.cpkScopeInfo), - azblob.ImmutabilityPolicyOptions{}, - ) + if reader.Size() > 0 { + // to prevent the service from erroring out with an improper MD5, we opt to commit a block, then the list. + blockID := base64.StdEncoding.EncodeToString([]byte(uuid.NewString())) + sResp, err := bb.StageBlock(ctx, + blockID, + reader, + azblob.LeaseAccessConditions{}, + nil, + common.ToClientProvidedKeyOptions(options.cpkInfo, options.cpkScopeInfo)) - c.AssertNoErr(err) - c.Assert(cResp.StatusCode(), equals(), 201) + c.AssertNoErr(err) + c.Assert(sResp.StatusCode(), equals(), 201) + + cResp, err := bb.CommitBlockList(ctx, + []string{blockID}, + headers, + ad.toMetadata(), + azblob.BlobAccessConditions{}, + options.accessTier, + ad.toBlobTags(), + common.ToClientProvidedKeyOptions(options.cpkInfo, options.cpkScopeInfo), + azblob.ImmutabilityPolicyOptions{}, + ) + + c.AssertNoErr(err) + c.Assert(cResp.StatusCode(), equals(), 201) + } else { // todo: invalid MD5 on empty blob is impossible like this, but it's doubtful we'll need to support it. + // handle empty blobs + cResp, err := bb.Upload(ctx, + reader, + headers, + ad.toMetadata(), + azblob.BlobAccessConditions{}, + options.accessTier, + ad.toBlobTags(), + common.ToClientProvidedKeyOptions(options.cpkInfo, options.cpkScopeInfo), + azblob.ImmutabilityPolicyOptions{}) + + c.AssertNoErr(err) + c.Assert(cResp.StatusCode(), equals(), 201) + } case common.EBlobType.PageBlob(): pb := options.containerURL.NewPageBlobURL(b.name) cResp, err := pb.Create(ctx, reader.Size(), 0, headers, ad.toMetadata(), azblob.BlobAccessConditions{}, azblob.DefaultPremiumBlobAccessTier, tags, common.ToClientProvidedKeyOptions(options.cpkInfo, options.cpkScopeInfo), azblob.ImmutabilityPolicyOptions{}) @@ -529,7 +574,7 @@ func (s scenarioHelper) downloadBlobContent(a asserter, options downloadContentO retryReader := downloadResp.Body(azblob.RetryReaderOptions{}) defer retryReader.Close() - destData, err := ioutil.ReadAll(retryReader) + destData, err := io.ReadAll(retryReader) a.AssertNoErr(err) return destData[:] } @@ -653,7 +698,7 @@ func (scenarioHelper) generateCommonRemoteScenarioForS3(c asserter, client *mini objectName5 := createNewObject(c, client, bucketName, prefix+specialNames[i]) // Note: common.AZCOPY_PATH_SEPARATOR_STRING is added before bucket or objectName, as in the change minimize JobPartPlan file size, - // transfer.Source & transfer.Destination(after trimed the SourceRoot and DestinationRoot) are with AZCOPY_PATH_SEPARATOR_STRING suffix, + // transfer.Source & transfer.Destination(after trimming the SourceRoot and DestinationRoot) are with AZCOPY_PATH_SEPARATOR_STRING suffix, // when user provided source & destination are without / suffix, which is the case for scenarioHelper generated URL. bucketPath := "" @@ -717,7 +762,7 @@ func (scenarioHelper) generateAzureFilesFromList(c asserter, options *generateAz // set other properties // TODO: do we need a SetProperties method on dir...? Discuss with zezha-msft if f.creationProperties.creationTime != nil { - panic("setting these properties isn't implmented yet for folders in the test harnesss") + panic("setting these properties isn't implemented yet for folders in the test harness") // TODO: nakulkar-msft the attributes stuff will need to be implemented here before attributes can be tested on Azure Files } @@ -731,19 +776,23 @@ func (scenarioHelper) generateAzureFilesFromList(c asserter, options *generateAz // create the file itself fileSize := int64(f.creationProperties.sizeBytes(c, options.defaultSize)) - contentR, contentD := getRandomDataAndReader(int(fileSize)) - contentMD5 := md5.Sum(contentD) + var contentR *bytes.Reader + var contentD []byte + if f.body != nil { + contentR = bytes.NewReader(f.body) + contentD = f.body + } else { + contentR, contentD = getRandomDataAndReader(int(fileSize)) + } if f.creationProperties.contentHeaders == nil { f.creationProperties.contentHeaders = &contentHeaders{} } - f.creationProperties.contentHeaders.contentMD5 = contentMD5[:] + if f.creationProperties.contentHeaders.contentMD5 == nil { + contentMD5 := md5.Sum(contentD) + f.creationProperties.contentHeaders.contentMD5 = contentMD5[:] + } - // if f.verificationProperties.contentHeaders == nil { - // f.verificationProperties.contentHeaders = &contentHeaders{} - // } - // f.verificationProperties.contentHeaders.contentMD5 = contentMD5[:] headers := ad.toHeaders(c, options.shareURL) - headers.ContentMD5 = contentMD5[:] cResp, err := file.Create(ctx, fileSize, headers, ad.toMetadata()) c.AssertNoErr(err) @@ -910,7 +959,7 @@ func (s scenarioHelper) downloadFileContent(a asserter, options downloadContentO retryReader := downloadResp.Body(azfile.RetryReaderOptions{}) defer retryReader.Close() // The client must close the response body when finished with it - destData, err := ioutil.ReadAll(retryReader) + destData, err := io.ReadAll(retryReader) a.AssertNoErr(err) downloadResp.Body(azfile.RetryReaderOptions{}) return destData diff --git a/e2etest/scenario_os_helpers_for_windows.go b/e2etest/scenario_os_helpers_for_windows.go index 7a2206e35..ad7fde9ae 100644 --- a/e2etest/scenario_os_helpers_for_windows.go +++ b/e2etest/scenario_os_helpers_for_windows.go @@ -39,7 +39,7 @@ import ( type osScenarioHelper struct{} // set file attributes to test file -func (osScenarioHelper) setAttributesForLocalFile(filePath string, attrList []string) error { +func (osScenarioHelper) setAttributesForLocalFile(filePath string, attrList []string) error { //nolint:golint,unused lpFilePath, err := syscall.UTF16PtrFromString(filePath) if err != nil { return err @@ -65,7 +65,7 @@ func (osScenarioHelper) setAttributesForLocalFile(filePath string, attrList []st return err } -func (s osScenarioHelper) setAttributesForLocalFiles(c asserter, dirPath string, fileList []string, attrList []string) { +func (s osScenarioHelper) setAttributesForLocalFiles(c asserter, dirPath string, fileList []string, attrList []string) { //nolint:golint,unused for _, fileName := range fileList { err := s.setAttributesForLocalFile(filepath.Join(dirPath, fileName), attrList) c.AssertNoErr(err) diff --git a/e2etest/zt_basic_copy_sync_remove_test.go b/e2etest/zt_basic_copy_sync_remove_test.go index 1ed82283c..0378c9130 100644 --- a/e2etest/zt_basic_copy_sync_remove_test.go +++ b/e2etest/zt_basic_copy_sync_remove_test.go @@ -21,9 +21,9 @@ package e2etest import ( - "testing" - "github.com/Azure/azure-storage-azcopy/v10/common" + "testing" + "time" ) // ================================ Copy And Sync: Upload, Download, and S2S ========================================= @@ -336,3 +336,279 @@ func TestBasic_CopyWithShareRoot(t *testing.T) { "", ) } + +// TestBasic_HashBasedSync_Folders validates that folders appropriately use LMT when hash based sync is enabled +func TestBasic_HashBasedSync_Folders(t *testing.T) { + RunScenarios( + t, + eOperation.Sync(), + eTestFromTo.Other(common.EFromTo.FileFile(), common.EFromTo.FileLocal()), // test both dest and source comparators + eValidate.Auto(), + anonymousAuthOnly, + anonymousAuthOnly, + params{ +recursive: true, + compareHash: common.ESyncHashType.MD5(), + }, + &hooks{ + beforeRunJob: func(h hookHelper) { // set up source to overwrite dest + newFiles := testFiles{ + defaultSize: "1K", + shouldTransfer: []interface{}{ + folder(""), + folder("overwrite me"), + folder("not duplicate"), + }, + shouldSkip: []interface{}{ + folder("do not overwrite me"), + }, + } + + h.SetTestFiles(newFiles) + + target := newFiles.shouldTransfer[1].(*testObject) // overwrite me + + h.CreateFile(target, false) // create destination before source to prefer overwrite + time.Sleep(5 * time.Second) + h.CreateFile(target, true) + }, + }, + testFiles{ + defaultSize: "1K", + shouldTransfer: []interface{}{ + folder(""), + folder("not duplicate"), + }, + shouldSkip: []interface{}{ + folder("do not overwrite me"), + }, + }, + EAccountType.Standard(), + EAccountType.Standard(), + "", + ) +} + +func TestBasic_HashBasedSync_S2S(t *testing.T) { + RunScenarios( + t, + eOperation.Sync(), + eTestFromTo.Other(common.EFromTo.BlobBlob()), + eValidate.Auto(), + anonymousAuthOnly, + anonymousAuthOnly, + params{ + recursive: true, + compareHash: common.ESyncHashType.MD5(), + }, + &hooks{ + beforeRunJob: func(h hookHelper) { + h.CreateFile(f("overwriteme.txt"), false) // will have a different hash, and get overwritten. + + existingBody := []byte("foobar") + existingObject := f("skipme-exists.txt") + existingObject.body = existingBody + + h.CreateFile(existingObject, true) + h.CreateFile(existingObject, false) + }, + }, + testFiles{ + defaultSize: "1K", + shouldTransfer: []interface{}{ + folder(""), + f("asdf.txt"), + f("overwriteme.txt"), // create at destination with different hash + }, + shouldSkip: []interface{}{ + f("skipme-exists.txt"), // create at destination + }, + }, + EAccountType.Standard(), + EAccountType.Standard(), + "", + ) +} + +func TestBasic_HashBasedSync_UploadDownload(t *testing.T) { + RunScenarios( + t, + eOperation.Sync(), + eTestFromTo.Other(common.EFromTo.LocalBlob(), common.EFromTo.LocalFile(), common.EFromTo.BlobLocal(), common.EFromTo.FileLocal()), // no need to run every endpoint again + eValidate.Auto(), + anonymousAuthOnly, + anonymousAuthOnly, + params{ + recursive: true, + compareHash: common.ESyncHashType.MD5(), + }, + &hooks{ + beforeRunJob: func(h hookHelper) { + h.CreateFile(f("overwriteme.txt"), false) // will have a different hash, and get overwritten. + + existingBody := []byte("foobar") + existingObject := f("skipme-exists.txt") + existingObject.body = existingBody + + h.CreateFile(existingObject, true) + h.CreateFile(existingObject, false) + }, + }, + testFiles{ + defaultSize: "1K", + shouldTransfer: []interface{}{ + folder(""), + f("asdf.txt"), + f("overwriteme.txt"), // create at destination with different hash + }, + shouldSkip: []interface{}{ + f("skipme-exists.txt"), // create at destination + }, + }, + EAccountType.Standard(), + EAccountType.Standard(), + "", + ) +} + +func TestBasic_OverwriteHNSDirWithChildren(t *testing.T) { + RunScenarios( + t, + eOperation.Copy(), + eTestFromTo.Other(common.EFromTo.LocalBlobFS()), + eValidate.Auto(), + anonymousAuthOnly, + anonymousAuthOnly, + params{ + recursive: true, + preserveSMBPermissions: true, + }, + &hooks{ + beforeRunJob: func(h hookHelper) { + h.CreateFiles( + testFiles{ + defaultSize: "1K", + shouldSkip: []interface{}{ + folder("overwrite"), //create folder to overwrite, with no perms so it can be correctly detected later. + f("overwrite/a"), // place file under folder to re-create conditions + }, + }, + false, // create dest + false, // do not set test files + false, // create only shouldSkip here + ) + }, + }, + testFiles{ + defaultSize: "1K", + shouldTransfer: []interface{}{ + folder(""), + // overwrite with an ACL to ensure overwrite worked + folder("overwrite", with{adlsPermissionsACL: "user::rwx,group::rwx,other::-w-"}), + }, + }, + EAccountType.HierarchicalNamespaceEnabled(), + EAccountType.HierarchicalNamespaceEnabled(), + "", + ) +} + +func TestBasic_SyncLMTSwitch_PreferServiceLMT(t *testing.T) { + RunScenarios( + t, + eOperation.Sync(), + eTestFromTo.Other(common.EFromTo.FileFile()), + eValidate.Auto(), + anonymousAuthOnly, + anonymousAuthOnly, + params{ + preserveSMBInfo: BoolPointer(false), + }, + &hooks{ + beforeRunJob: func(h hookHelper) { + // re-create dotransfer on the destination before the source to allow an overwrite. + // create the files endpoint with an LMT in the future. + fromTo := h.FromTo() + if fromTo.To() == common.ELocation.File() { + // if we're ignoring the SMB LMT, then the service LMT will still indicate the file is old, rather than new. + h.CreateFile(f("dotransfer", with{lastWriteTime: time.Now().Add(time.Second * 60)}), false) + } else { + h.CreateFile(f("dotransfer"), false) + } + time.Sleep(time.Second * 5) + if fromTo.From() == common.ELocation.File() { + // if we're ignoring the SMB LMT, then the service LMT will indicate the destination is older, not newer. + h.CreateFile(f("dotransfer", with{lastWriteTime: time.Now().Add(-time.Second * 60)}), true) + } else { + h.CreateFile(f("dotransfer"), true) + } + }, + }, + testFiles{ + defaultSize: "1K", + shouldTransfer: []interface{}{ + folder(""), + f("dotransfer"), + }, + shouldSkip: []interface{}{ + f("donottransfer"), // "real"/service LMT should be out of date + }, + }, + EAccountType.Standard(), + EAccountType.Standard(), + "", + ) +} + +func TestBasic_SyncLMTSwitch_PreferSMBLMT(t *testing.T) { + RunScenarios( + t, + eOperation.Sync(), + eTestFromTo.Other(common.EFromTo.FileFile()), + eValidate.Auto(), + anonymousAuthOnly, + anonymousAuthOnly, + params{ + // enforce for Linux/MacOS tests + preserveSMBInfo: BoolPointer(true), + }, + &hooks{ + beforeRunJob: func(h hookHelper) { + /* + In a typical scenario, the source is written before the destination. + This way, the destination is always skipped in the case of overwrite on Sync. + + In this case, because we distinctly DO NOT want to test the service LMT, we'll create the destination before the source. + But, we'll create those files with an SMB LMT that would lead to a skipped file. + */ + + newTestFiles := testFiles{ + defaultSize: "1K", + shouldTransfer: []interface{}{ + folder(""), + f("do overwrite"), + }, + shouldSkip: []interface{}{ + f("do not overwrite"), + }, + } + + // create do not overwrite in the future, so that it does not get overwritten + h.CreateFile(f("do not overwrite", with{lastWriteTime: time.Now().Add(time.Second * 60)}), false) + // create do overwrite in the past, so that it does get overwritten + h.CreateFile(f("do overwrite", with{lastWriteTime: time.Now().Add(-time.Second * 60)}), false) + time.Sleep(time.Second * 5) + h.CreateFiles(newTestFiles, true, true, false) + }, + }, + testFiles{ + defaultSize: "1K", + shouldTransfer: []interface{}{ + folder(""), + }, + }, + EAccountType.Standard(), + EAccountType.Standard(), + "", + ) +} diff --git a/e2etest/zt_copy_file_smb_test.go b/e2etest/zt_copy_file_smb_test.go index a1800a226..b6b90b724 100644 --- a/e2etest/zt_copy_file_smb_test.go +++ b/e2etest/zt_copy_file_smb_test.go @@ -10,8 +10,10 @@ import ( func TestSMB_FromShareSnapshot(t *testing.T) { RunScenarios(t, eOperation.Copy(), eTestFromTo.Other(common.EFromTo.FileFile()), eValidate.AutoPlusContent(), anonymousAuthOnly, anonymousAuthOnly, params{ recursive: true, - preserveSMBInfo: true, preserveSMBPermissions: true, + + // default, but present for clarity + //preserveSMBInfo: BoolPointer(true), }, &hooks{ // create a snapshot for the source share beforeRunJob: func(h hookHelper) { @@ -39,7 +41,7 @@ func TestSMB_ToDevNull(t *testing.T) { params{ recursive: true, preserveSMBPermissions: isWindows, - preserveSMBInfo: isWindows, + preserveSMBInfo: BoolPointer(isWindows), checkMd5: common.EHashValidationOption.FailIfDifferent(), destNull: true, }, diff --git a/e2etest/zt_preserve_properties_test.go b/e2etest/zt_preserve_properties_test.go index a2d7bec09..d94a9514e 100644 --- a/e2etest/zt_preserve_properties_test.go +++ b/e2etest/zt_preserve_properties_test.go @@ -42,10 +42,10 @@ func TestProperties_NameValueMetadataIsPreservedS2S(t *testing.T) { } func TestProperties_NameValueMetadataCanBeUploaded(t *testing.T) { - expectedMap := map[string]string{"foo": "abc", "bar": "def"} + expectedMap := map[string]string{"foo": "abc", "bar": "def", "baz": "state=a;b"} RunScenarios(t, eOperation.Copy(), eTestFromTo.AllUploads(), eValidate.Auto(), anonymousAuthOnly, anonymousAuthOnly, params{ recursive: true, - metadata: "foo=abc;bar=def", + metadata: "foo=abc;bar=def;baz=state=a\\;b", }, nil, testFiles{ defaultSize: "1K", shouldTransfer: []interface{}{ diff --git a/e2etest/zt_preserve_smb_properties_test.go b/e2etest/zt_preserve_smb_properties_test.go index be70fbf25..4b3e5905d 100644 --- a/e2etest/zt_preserve_smb_properties_test.go +++ b/e2etest/zt_preserve_smb_properties_test.go @@ -68,8 +68,10 @@ func TestProperties_SMBPermissionsSDDLPreserved(t *testing.T) { common.EFromTo.FileFile(), ), eValidate.Auto(), anonymousAuthOnly, anonymousAuthOnly, params{ recursive: true, - preserveSMBInfo: true, preserveSMBPermissions: true, + + // default, but present for clarity + //preserveSMBInfo: BoolPointer(true), }, nil, testFiles{ defaultSize: "1K", shouldTransfer: []interface{}{ @@ -88,7 +90,9 @@ func TestProperties_SMBPermissionsSDDLPreserved(t *testing.T) { func TestProperties_SMBDates(t *testing.T) { RunScenarios(t, eOperation.CopyAndSync(), eTestFromTo.Other(common.EFromTo.LocalFile(), common.EFromTo.FileLocal()), eValidate.Auto(), anonymousAuthOnly, anonymousAuthOnly, params{ recursive: true, - preserveSMBInfo: true, + + // default, but present for clarity + //preserveSMBInfo: BoolPointer(true), }, &hooks{ beforeRunJob: func(h hookHelper) { // Pause then re-write all the files, so that their LastWriteTime is different from their creation time @@ -118,7 +122,9 @@ func TestProperties_SMBDates(t *testing.T) { func TestProperties_SMBFlags(t *testing.T) { RunScenarios(t, eOperation.CopyAndSync(), eTestFromTo.Other(common.EFromTo.LocalFile(), common.EFromTo.FileFile(), common.EFromTo.FileLocal()), eValidate.Auto(), anonymousAuthOnly, anonymousAuthOnly, params{ recursive: true, - preserveSMBInfo: true, + + // default, but present for clarity + //preserveSMBInfo: BoolPointer(true), }, nil, testFiles{ defaultSize: "1K", shouldTransfer: []interface{}{ @@ -143,7 +149,9 @@ func TestProperties_SMBPermsAndFlagsWithIncludeAfter(t *testing.T) { RunScenarios(t, eOperation.Copy(), eTestFromTo.Other(common.EFromTo.FileLocal()), eValidate.Auto(), anonymousAuthOnly, anonymousAuthOnly, params{ recursive: true, - preserveSMBInfo: true, // this wasn't compatible with time-sensitive filtering prior. + + // default, but present for clarity + //preserveSMBInfo: BoolPointer(true), // includeAfter: SET LATER }, &hooks{ beforeRunJob: func(h hookHelper) { @@ -190,7 +198,9 @@ func TestProperties_SMBPermsAndFlagsWithSync(t *testing.T) { RunScenarios(t, eOperation.Sync(), eTestFromTo.Other(common.EFromTo.LocalFile(), common.EFromTo.FileLocal()), eValidate.Auto(), anonymousAuthOnly, anonymousAuthOnly, params{ recursive: true, - preserveSMBInfo: true, // this wasn't compatible with time-sensitive filtering prior. + + // default, but present for clarity + //preserveSMBInfo: BoolPointer(true), }, &hooks{ beforeRunJob: func(h hookHelper) { // Pause then re-write all the files, so that their LastWriteTime is different from their creation time @@ -244,7 +254,9 @@ func TestProperties_SMBWithCopyWithShareRoot(t *testing.T) { recursive: true, invertedAsSubdir: true, preserveSMBPermissions: true, - preserveSMBInfo: true, + + // default, but present for clarity + //preserveSMBInfo: BoolPointer(true), }, nil, testFiles{ @@ -276,7 +288,9 @@ func TestProperties_SMBTimes(t *testing.T) { anonymousAuthOnly, params{ recursive: true, - preserveSMBInfo: true, + + // default, but present for clarity + //preserveSMBInfo: BoolPointer(true), }, nil, testFiles{ diff --git a/e2etest/zt_resume_windows_test.go b/e2etest/zt_resume_windows_test.go index 902a52288..611d75763 100644 --- a/e2etest/zt_resume_windows_test.go +++ b/e2etest/zt_resume_windows_test.go @@ -10,10 +10,12 @@ func TestResume_FolderState(t *testing.T) { // Create a child file before the folder itself, then persist the properties of the folder upon resume, knowing that we created the folder. RunScenarios(t, eOperation.CopyAndSync()|eOperation.Resume(), eTestFromTo.Other(common.EFromTo.LocalFile(), common.EFromTo.FileFile(), common.EFromTo.FileLocal()), eValidate.Auto(), anonymousAuthOnly, anonymousAuthOnly, params{ recursive: true, - preserveSMBInfo: true, debugSkipFiles: []string{ "a", }, + + // default, but present for clarity + //preserveSMBInfo: BoolPointer(true), }, nil, testFiles{ defaultSize: "1K", @@ -29,11 +31,13 @@ func TestResume_NoCreateFolder(t *testing.T) { // Don't create the folder "ourselves", and let AzCopy find that out on a resume. RunScenarios(t, eOperation.Copy()|eOperation.Resume(), eTestFromTo.Other(common.EFromTo.LocalFile(), common.EFromTo.FileFile(), common.EFromTo.FileLocal()), eValidate.Auto(), anonymousAuthOnly, anonymousAuthOnly, params{ recursive: true, - preserveSMBInfo: true, debugSkipFiles: []string{ "a", "a/b", }, + + // default, but present for clarity + //preserveSMBInfo: BoolPointer(true), }, &hooks{ beforeResumeHook: func(h hookHelper) { // Create the folder in the middle of the transfer diff --git a/e2etest/zz_tests_to_add.go b/e2etest/zz_tests_to_add.go index e3842fb04..d450b219f 100644 --- a/e2etest/zz_tests_to_add.go +++ b/e2etest/zz_tests_to_add.go @@ -5,7 +5,7 @@ package e2etest // Next framework _use_ tasks // In progress: More filter tests // Flesh out attribute support, in usages of objectProperties.smbAttributes, so that we can create, and verify, tests with these -// (right now, tests that use these will fail, because they lack the necessary code to retrieve them, from the destinatino (and set the at the source) +// (right now, tests that use these will fail, because they lack the necessary code to retrieve them, from the destination (and set the at the source) // isn't there. See commented code marked TODO: nakulkar-msft // The resource manager support for S3 and BlobFS and (who will do this one) GCP? // diff --git a/go.mod b/go.mod index e0e7067d5..4d9e974bf 100644 --- a/go.mod +++ b/go.mod @@ -14,6 +14,8 @@ require ( github.com/mattn/go-ieproxy v0.0.3 github.com/minio/minio-go v6.0.14+incompatible github.com/pkg/errors v0.9.1 + github.com/pkg/xattr v0.4.6 + github.com/rogpeppe/go-internal v1.8.1 // indirect github.com/spf13/cobra v1.4.0 github.com/wastore/keychain v0.0.0-20180920053336-f2c902a3d807 github.com/wastore/keyctl v0.3.1 @@ -44,7 +46,6 @@ require ( github.com/kr/pretty v0.3.0 // indirect github.com/kr/text v0.2.0 // indirect github.com/mitchellh/go-homedir v1.1.0 // indirect - github.com/rogpeppe/go-internal v1.8.1 // indirect github.com/russross/blackfriday/v2 v2.1.0 // indirect github.com/spf13/pflag v1.0.5 // indirect github.com/stretchr/objx v0.3.0 // indirect diff --git a/go.sum b/go.sum index 6ad20a705..2183e517e 100644 --- a/go.sum +++ b/go.sum @@ -57,18 +57,9 @@ cloud.google.com/go/storage v1.10.0/go.mod h1:FLPqc6j+Ki4BU591ie1oL6qBQGu2Bl/tZ9 cloud.google.com/go/storage v1.21.0 h1:HwnT2u2D309SFDHQII6m18HlrCi3jAXhUMTLOWXYH14= cloud.google.com/go/storage v1.21.0/go.mod h1:XmRlxkgPjlBONznT2dDUU/5XlpU2OjMnKuqnZI01LAA= dmitri.shuralyov.com/gpu/mtl v0.0.0-20190408044501-666a987793e9/go.mod h1:H6x//7gZCb22OMCxBHrMx7a5I7Hp++hsVxbQ4BYO7hU= -github.com/Azure/azure-pipeline-go v0.2.3 h1:7U9HBg1JFK3jHl5qmo4CTZKFTVgMwdFHMVtCdfBE21U= github.com/Azure/azure-pipeline-go v0.2.3/go.mod h1:x841ezTBIMG6O3lAcl8ATHnsOPVl2bqk7S3ta6S6u4k= -github.com/Azure/azure-pipeline-go v0.2.4-0.20220420205509-9c760f3e9499 h1:eVXzrNOutCSxn7gYn2Tb2alO/D41vX6EyDoRhByS4zc= -github.com/Azure/azure-pipeline-go v0.2.4-0.20220420205509-9c760f3e9499/go.mod h1:x841ezTBIMG6O3lAcl8ATHnsOPVl2bqk7S3ta6S6u4k= github.com/Azure/azure-pipeline-go v0.2.4-0.20220425205405-09e6f201e1e4 h1:hDJImUzpTAeIw/UasFUUDB/+UsZm5Q/6x2/jKKvEUiw= github.com/Azure/azure-pipeline-go v0.2.4-0.20220425205405-09e6f201e1e4/go.mod h1:x841ezTBIMG6O3lAcl8ATHnsOPVl2bqk7S3ta6S6u4k= -github.com/Azure/azure-storage-blob-go v0.13.1-0.20220307213743-78b465951faf h1:81jHLpY81IPdZqBzsnudRpQM1E9xk+ZzBhhJm7BEvcY= -github.com/Azure/azure-storage-blob-go v0.13.1-0.20220307213743-78b465951faf/go.mod h1:SMqIBi+SuiQH32bvyjngEewEeXoPfKMgWlBDaYf6fck= -github.com/Azure/azure-storage-blob-go v0.13.1-0.20220418210520-914dace75d43 h1:/yh9OPVjemL4n8CaXc+GpFTvSlotRFj2HXJIgLo2gG8= -github.com/Azure/azure-storage-blob-go v0.13.1-0.20220418210520-914dace75d43/go.mod h1:vbjsVbX0dlxnRc4FFMPsS9BsJWPcne7GB7onqlPvz58= -github.com/Azure/azure-storage-blob-go v0.13.1-0.20220418220008-28ac0a48144e h1:uGef/l7KHdWy6XTwhnEB4IhJEisPLe0TDfLVthiVL04= -github.com/Azure/azure-storage-blob-go v0.13.1-0.20220418220008-28ac0a48144e/go.mod h1:vbjsVbX0dlxnRc4FFMPsS9BsJWPcne7GB7onqlPvz58= github.com/Azure/azure-storage-blob-go v0.15.0 h1:rXtgp8tN1p29GvpGgfJetavIG0V7OgcSXPpwp3tx6qk= github.com/Azure/azure-storage-blob-go v0.15.0/go.mod h1:vbjsVbX0dlxnRc4FFMPsS9BsJWPcne7GB7onqlPvz58= github.com/Azure/azure-storage-file-go v0.6.1-0.20201111053559-3c1754dc00a5 h1:aHEvBM4oXIWSTOVdL55nCYXO0Cl7ie3Ui5xMQhLVez8= @@ -247,6 +238,8 @@ github.com/niemeyer/pretty v0.0.0-20200227124842-a10e7caefd8e/go.mod h1:zD1mROLA github.com/pkg/diff v0.0.0-20210226163009-20ebb0f2a09e/go.mod h1:pJLUxLENpZxwdsKMEsNbx1VGcRFpLqf3715MtcvvzbA= github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4= github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0= +github.com/pkg/xattr v0.4.6 h1:0vqthLIMxQKA9VscyMcxjvAUGvyfzlk009vwLE8OZJg= +github.com/pkg/xattr v0.4.6/go.mod h1:sBD3RAqlr8Q+RC3FutZcikpT8nyDrIEEBw2J744gVWs= github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM= github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= github.com/prometheus/client_model v0.0.0-20190812154241-14fe0d1b01d4/go.mod h1:xMI15A0UPsDsEKsMN9yxemIoYk6Tm2C1GtYGdfGttqA= @@ -437,6 +430,7 @@ golang.org/x/sys v0.0.0-20200803210538-64077c9b5642/go.mod h1:h1NjWce9XRLGQEsW7w golang.org/x/sys v0.0.0-20200828194041-157a740278f4/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20200905004654-be1d3432aa8f/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20200930185726-fdedc70b468f/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20201101102859-da207088b7d1/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20201201145000-ef89a241ccb3/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20210104204734-6f8348627aad/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= diff --git a/jobsAdmin/JobsAdmin.go b/jobsAdmin/JobsAdmin.go index 97e3419aa..f9458640d 100755 --- a/jobsAdmin/JobsAdmin.go +++ b/jobsAdmin/JobsAdmin.go @@ -24,6 +24,7 @@ import ( "context" "encoding/json" "fmt" + "github.com/Azure/azure-storage-blob-go/azblob" "os" "path/filepath" "runtime" @@ -74,7 +75,7 @@ var JobsAdmin interface { // JobMgr returns the specified JobID's JobMgr JobMgr(jobID common.JobID) (ste.IJobMgr, bool) - JobMgrEnsureExists(jobID common.JobID, level common.LogLevel, commandString string) ste.IJobMgr + JobMgrEnsureExists(jobID common.JobID, level common.LogLevel, commandString string, sourceBlobToken azblob.Credential) ste.IJobMgr // AddJobPartMgr associates the specified JobPartMgr with the Jobs Administrator //AddJobPartMgr(appContext context.Context, planFile JobPartPlanFileName) IJobPartMgr @@ -293,12 +294,12 @@ func (ja *jobsAdmin) AppPathFolder() string { // JobMgrEnsureExists returns the specified JobID's IJobMgr if it exists or creates it if it doesn't already exit // If it does exist, then the appCtx argument is ignored. func (ja *jobsAdmin) JobMgrEnsureExists(jobID common.JobID, - level common.LogLevel, commandString string) ste.IJobMgr { + level common.LogLevel, commandString string, sourceBlobToken azblob.Credential) ste.IJobMgr { return ja.jobIDToJobMgr.EnsureExists(jobID, func() ste.IJobMgr { // Return existing or new IJobMgr to caller - return ste.NewJobMgr(ja.concurrency, jobID, ja.appCtx, ja.cpuMonitor, level, commandString, ja.logDir, ja.concurrencyTuner, ja.pacer, ja.slicePool, ja.cacheLimiter, ja.fileCountLimiter, ja.jobLogger, false) + return ste.NewJobMgr(ja.concurrency, jobID, ja.appCtx, ja.cpuMonitor, level, commandString, ja.logDir, ja.concurrencyTuner, ja.pacer, ja.slicePool, ja.cacheLimiter, ja.fileCountLimiter, ja.jobLogger, false, sourceBlobToken) }) } @@ -387,7 +388,7 @@ func (ja *jobsAdmin) ResurrectJob(jobId common.JobID, sourceSAS string, destinat continue } mmf := planFile.Map() - jm := ja.JobMgrEnsureExists(jobID, mmf.Plan().LogLevel, "") + jm := ja.JobMgrEnsureExists(jobID, mmf.Plan().LogLevel, "", nil) jm.AddJobPart(partNum, planFile, mmf, sourceSAS, destinationSAS, false, nil) } @@ -421,7 +422,7 @@ func (ja *jobsAdmin) ResurrectJobParts() { } mmf := planFile.Map() //todo : call the compute transfer function here for each job. - jm := ja.JobMgrEnsureExists(jobID, mmf.Plan().LogLevel, "") + jm := ja.JobMgrEnsureExists(jobID, mmf.Plan().LogLevel, "", nil) jm.AddJobPart(partNum, planFile, mmf, EMPTY_SAS_STRING, EMPTY_SAS_STRING, false, nil) } } @@ -453,7 +454,7 @@ func (ja *jobsAdmin) ListJobs(givenStatus common.JobStatus) common.ListJobsRespo if givenStatus == common.EJobStatus.All() || givenStatus == jpph.JobStatus() { ret.JobIDDetails = append(ret.JobIDDetails, common.JobIDDetails{JobId: jobID, CommandString: jpph.CommandString(), - StartTime: jpph.StartTime, JobStatus: jpph.JobStatus()}) + StartTime: jpph.StartTime, JobStatus: jpph.JobStatus()}) } mmf.Unmap() @@ -489,7 +490,7 @@ func (ja *jobsAdmin) DeleteJob(jobID common.JobID) { * Removes the entry of given JobId from JobsInfo */ -// TODO: take care fo this. +// TODO: take care of this. /*func (ja *jobsAdmin) cleanUpJob(jobID common.JobID) { jm, found := ja.JobMgr(jobID) if !found { @@ -582,7 +583,7 @@ func (ja *jobsAdmin) TryGetPerformanceAdvice(bytesInJob uint64, filesInJob uint3 a := ste.NewPerformanceAdvisor(p, ja.commandLineMbpsCap, int64(megabitsPerSec), finalReason, finalConcurrency, dir, averageBytesPerFile, isToAzureFiles) return a.GetAdvice() } - + //Structs for messageHandler /* PerfAdjustment message. */ @@ -594,7 +595,7 @@ func (ja *jobsAdmin) messageHandler(inputChan <-chan *common.LCMMsg) { toBitsPerSec := func(megaBitsPerSec int64) int64 { return megaBitsPerSec * 1000 * 1000 / 8 } - + const minIntervalBetweenPerfAdjustment = time.Minute lastPerfAdjustTime := time.Now().Add(-2 * minIntervalBetweenPerfAdjustment) var err error @@ -609,23 +610,23 @@ func (ja *jobsAdmin) messageHandler(inputChan <-chan *common.LCMMsg) { var perfAdjustmentReq common.PerfAdjustmentReq if time.Since(lastPerfAdjustTime) < minIntervalBetweenPerfAdjustment { - err = fmt.Errorf("Performance Adjustment already in progress. Please try after " + - lastPerfAdjustTime.Add(minIntervalBetweenPerfAdjustment).Format(time.RFC3339)) + err = fmt.Errorf("Performance Adjustment already in progress. Please try after " + + lastPerfAdjustTime.Add(minIntervalBetweenPerfAdjustment).Format(time.RFC3339)) } - + if e := json.Unmarshal([]byte(msg.Req.Value), &perfAdjustmentReq); e != nil { err = fmt.Errorf("parsing %s failed with %s", msg.Req.Value, e.Error()) } if perfAdjustmentReq.Throughput < 0 { err = fmt.Errorf("invalid value %d for cap-mbps. cap-mpbs should be greater than 0", - perfAdjustmentReq.Throughput) + perfAdjustmentReq.Throughput) } if err == nil { lastPerfAdjustTime = time.Now() ja.UpdateTargetBandwidth(toBitsPerSec(perfAdjustmentReq.Throughput)) - + resp.Status = true resp.AdjustedThroughPut = perfAdjustmentReq.Throughput resp.NextAdjustmentAfter = lastPerfAdjustTime.Add(minIntervalBetweenPerfAdjustment) @@ -637,11 +638,11 @@ func (ja *jobsAdmin) messageHandler(inputChan <-chan *common.LCMMsg) { resp.Err = err.Error() } - msg.SetResponse(&common.LCMMsgResp { + msg.SetResponse(&common.LCMMsgResp{ TimeStamp: time.Now(), - MsgType: msg.Req.MsgType, - Value: resp, - Err: err, + MsgType: msg.Req.MsgType, + Value: resp, + Err: err, }) msg.Reply() @@ -660,7 +661,7 @@ type jobIDToJobMgr struct { nocopy common.NoCopy lock sync.RWMutex m map[common.JobID]ste.IJobMgr -} +} func newJobIDToJobMgr() jobIDToJobMgr { return jobIDToJobMgr{m: make(map[common.JobID]ste.IJobMgr)} diff --git a/jobsAdmin/init.go b/jobsAdmin/init.go index 0c40bcaef..92d9e6024 100755 --- a/jobsAdmin/init.go +++ b/jobsAdmin/init.go @@ -30,7 +30,7 @@ import ( "time" "github.com/Azure/azure-pipeline-go/pipeline" - + "github.com/Azure/azure-storage-azcopy/v10/common" "github.com/Azure/azure-storage-azcopy/v10/ste" ) @@ -48,7 +48,7 @@ func round(num float64) int { return int(num + math.Copysign(0.5, num)) } -// ToFixed api returns the float number precised upto given decimal places. +// ToFixed api returns the float number precised up to given decimal places. func ToFixed(num float64, precision int) float64 { output := math.Pow(10, float64(precision)) return float64(round(num*output)) / output @@ -163,8 +163,8 @@ func MainSTE(concurrency ste.ConcurrencySettings, targetRateInMegaBitsPerSec flo func ExecuteNewCopyJobPartOrder(order common.CopyJobPartOrderRequest) common.CopyJobPartOrderResponse { // Get the file name for this Job Part's Plan jppfn := JobsAdmin.NewJobPartPlanFileName(order.JobID, order.PartNum) - jppfn.Create(order) // Convert the order to a plan file - jm := JobsAdmin.JobMgrEnsureExists(order.JobID, order.LogLevel, order.CommandString) // Get a this job part's job manager (create it if it doesn't exist) + jppfn.Create(order) // Convert the order to a plan file + jm := JobsAdmin.JobMgrEnsureExists(order.JobID, order.LogLevel, order.CommandString, order.CredentialInfo.SourceBlobToken) // Get a this job part's job manager (create it if it doesn't exist) if len(order.Transfers.List) == 0 && order.IsFinalPart { /* @@ -217,60 +217,60 @@ func CancelPauseJobOrder(jobID common.JobID, desiredJobStatus common.JobStatus) } /* - // Search for the Part 0 of the Job, since the Part 0 status concludes the actual status of the Job - jpm, found := jm.JobPartMgr(0) - if !found { - return common.CancelPauseResumeResponse{ - CancelledPauseResumed: false, - ErrorMsg: fmt.Sprintf("job with JobId %s has a missing 0th part", jobID.String()), - } - } - - jpp0 := jpm.Plan() - var jr common.CancelPauseResumeResponse - switch jpp0.JobStatus() { // Current status - case common.EJobStatus.Completed(): // You can't change state of a completed job - jr = common.CancelPauseResumeResponse{ - CancelledPauseResumed: false, - ErrorMsg: fmt.Sprintf("Can't %s JobID=%v because it has already completed", verb, jobID), - } - case common.EJobStatus.Cancelled(): - // If the status of Job is cancelled, it means that it has already been cancelled - // No need to cancel further - jr = common.CancelPauseResumeResponse{ - CancelledPauseResumed: false, - ErrorMsg: fmt.Sprintf("cannot cancel the job %s since it is already cancelled", jobID), - } - case common.EJobStatus.Cancelling(): - // If the status of Job is cancelling, it means that it has already been requested for cancellation - // No need to cancel further - jr = common.CancelPauseResumeResponse{ - CancelledPauseResumed: true, - ErrorMsg: fmt.Sprintf("cannot cancel the job %s since it has already been requested for cancellation", jobID), + // Search for the Part 0 of the Job, since the Part 0 status concludes the actual status of the Job + jpm, found := jm.JobPartMgr(0) + if !found { + return common.CancelPauseResumeResponse{ + CancelledPauseResumed: false, + ErrorMsg: fmt.Sprintf("job with JobId %s has a missing 0th part", jobID.String()), + } } - case common.EJobStatus.InProgress(): - // If the Job status is in Progress and Job is not completely ordered - // Job cannot be resumed later, hence graceful cancellation is not required - // hence sending the response immediately. Response CancelPauseResumeResponse - // returned has CancelledPauseResumed set to false, because that will let - // Job immediately stop. - fallthrough - case common.EJobStatus.Paused(): // Logically, It's OK to pause an already-paused job - jpp0.SetJobStatus(desiredJobStatus) - msg := fmt.Sprintf("JobID=%v %s", jobID, - common.IffString(desiredJobStatus == common.EJobStatus.Paused(), "paused", "canceled")) - if jm.ShouldLog(pipeline.LogInfo) { - jm.Log(pipeline.LogInfo, msg) - } - jm.Cancel() // Stop all inflight-chunks/transfer for this job (this includes all parts) - jr = common.CancelPauseResumeResponse{ - CancelledPauseResumed: true, - ErrorMsg: msg, + jpp0 := jpm.Plan() + var jr common.CancelPauseResumeResponse + switch jpp0.JobStatus() { // Current status + case common.EJobStatus.Completed(): // You can't change state of a completed job + jr = common.CancelPauseResumeResponse{ + CancelledPauseResumed: false, + ErrorMsg: fmt.Sprintf("Can't %s JobID=%v because it has already completed", verb, jobID), + } + case common.EJobStatus.Cancelled(): + // If the status of Job is cancelled, it means that it has already been cancelled + // No need to cancel further + jr = common.CancelPauseResumeResponse{ + CancelledPauseResumed: false, + ErrorMsg: fmt.Sprintf("cannot cancel the job %s since it is already cancelled", jobID), + } + case common.EJobStatus.Cancelling(): + // If the status of Job is cancelling, it means that it has already been requested for cancellation + // No need to cancel further + jr = common.CancelPauseResumeResponse{ + CancelledPauseResumed: true, + ErrorMsg: fmt.Sprintf("cannot cancel the job %s since it has already been requested for cancellation", jobID), + } + case common.EJobStatus.InProgress(): + // If the Job status is in Progress and Job is not completely ordered + // Job cannot be resumed later, hence graceful cancellation is not required + // hence sending the response immediately. Response CancelPauseResumeResponse + // returned has CancelledPauseResumed set to false, because that will let + // Job immediately stop. + fallthrough + case common.EJobStatus.Paused(): // Logically, It's OK to pause an already-paused job + jpp0.SetJobStatus(desiredJobStatus) + msg := fmt.Sprintf("JobID=%v %s", jobID, + common.IffString(desiredJobStatus == common.EJobStatus.Paused(), "paused", "canceled")) + + if jm.ShouldLog(pipeline.LogInfo) { + jm.Log(pipeline.LogInfo, msg) + } + jm.Cancel() // Stop all inflight-chunks/transfer for this job (this includes all parts) + jr = common.CancelPauseResumeResponse{ + CancelledPauseResumed: true, + ErrorMsg: msg, + } } + return jr } - return jr -} */ func ResumeJobOrder(req common.ResumeJobRequest) common.CancelPauseResumeResponse { // Strip '?' if present as first character of the source sas / destination sas diff --git a/main.go b/main.go index f192e95ae..1985a05a3 100644 --- a/main.go +++ b/main.go @@ -60,12 +60,13 @@ func main() { // the user can optionally put the plan files somewhere else if azcopyJobPlanFolder == "" { // make the app path folder ".azcopy" first so we can make a plans folder in it - if err := os.Mkdir(azcopyAppPathFolder, os.ModeDir); err != nil && !os.IsExist(err) { + if err := os.MkdirAll(azcopyAppPathFolder, os.ModeDir); err != nil && !os.IsExist(err) { common.PanicIfErr(err) } azcopyJobPlanFolder = path.Join(azcopyAppPathFolder, "plans") } - if err := os.Mkdir(azcopyJobPlanFolder, os.ModeDir|os.ModePerm); err != nil && !os.IsExist(err) { + + if err := os.MkdirAll(azcopyJobPlanFolder, os.ModeDir|os.ModePerm); err != nil && !os.IsExist(err) { log.Fatalf("Problem making .azcopy directory. Try setting AZCOPY_PLAN_FILE_LOCATION env variable. %v", err) } diff --git a/main_windows.go b/main_windows.go index 169780bb6..dd05e37ed 100644 --- a/main_windows.go +++ b/main_windows.go @@ -42,7 +42,7 @@ func osModifyProcessCommand(cmd *exec.Cmd) *exec.Cmd { return cmd } -// ProcessOSSpecificInitialization chnages the soft limit for filedescriptor for process +// ProcessOSSpecificInitialization changes the soft limit for filedescriptor for process // return the filedescriptor limit for process. If the function fails with some, it returns // the error // TODO: this api is implemented for windows as well but not required because Windows diff --git a/sddl/sddlHelper_linux.go b/sddl/sddlHelper_linux.go new file mode 100644 index 000000000..874550d7f --- /dev/null +++ b/sddl/sddlHelper_linux.go @@ -0,0 +1,1717 @@ +// +build linux + +// Copyright Microsoft +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in +// all copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +// THE SOFTWARE. + +package sddl + +import ( + "encoding/binary" + "fmt" + "strconv" + "unsafe" + + "github.com/Azure/azure-storage-azcopy/v10/common" + + "github.com/pkg/xattr" +) + +/* + * Following constants are used by various Windows functions that deal with SECURITY_DESCRIPTORs and SIDs. + * Most of these constants are originally defined in winnt.h + */ + +/* + * Valid/supported revision numbers for various object types. + * + * TODO: Do we need to support ACL_REVISION_DS (4) with support for Object ACEs? + * Are they used for filesystem objects? + */ +const ( + SDDL_REVISION = 1 // SDDL Revision MUST always be 1. + SID_REVISION = 1 // SID Revision MUST always be 1. + ACL_REVISION = 2 // ACL revision for support basic ACE type used for filesystem ACLs. + ACL_REVISION_DS = 4 // ACL revision for supporting stuff like Object ACE. This should ideally not be used with the ACE + // types we support, but I've seen some objects like that. +) + +type SECURITY_INFORMATION uint32 + +// Valid bitmasks contained in type SECURITY_INFORMATION. +const ( + OWNER_SECURITY_INFORMATION = 0x00000001 + GROUP_SECURITY_INFORMATION = 0x00000002 + DACL_SECURITY_INFORMATION = 0x00000004 + SACL_SECURITY_INFORMATION = 0x00000008 + LABEL_SECURITY_INFORMATION = 0x00000010 + ATTRIBUTE_SECURITY_INFORMATION = 0x00000020 + SCOPE_SECURITY_INFORMATION = 0x00000040 + BACKUP_SECURITY_INFORMATION = 0x00010000 + PROTECTED_DACL_SECURITY_INFORMATION = 0x80000000 + PROTECTED_SACL_SECURITY_INFORMATION = 0x40000000 + UNPROTECTED_DACL_SECURITY_INFORMATION = 0x20000000 + UNPROTECTED_SACL_SECURITY_INFORMATION = 0x10000000 +) + +// Valid bitmasks contained in type SECURITY_DESCRIPTOR_CONTROL. +const ( + SE_OWNER_DEFAULTED = 0x0001 + SE_GROUP_DEFAULTED = 0x0002 + SE_DACL_PRESENT = 0x0004 + SE_DACL_DEFAULTED = 0x0008 + SE_SACL_PRESENT = 0x0010 + SE_SACL_DEFAULTED = 0x0020 + SE_DACL_AUTO_INHERIT_REQ = 0x0100 + SE_SACL_AUTO_INHERIT_REQ = 0x0200 + SE_DACL_AUTO_INHERITED = 0x0400 + SE_SACL_AUTO_INHERITED = 0x0800 + SE_DACL_PROTECTED = 0x1000 + SE_SACL_PROTECTED = 0x2000 + SE_RM_CONTROL_VALID = 0x4000 + SE_SELF_RELATIVE = 0x8000 +) + +// Valid AceType values present in ACE_HEADER. +const ( + ACCESS_MIN_MS_ACE_TYPE = 0x0 + ACCESS_ALLOWED_ACE_TYPE = 0x0 + ACCESS_DENIED_ACE_TYPE = 0x1 + SYSTEM_AUDIT_ACE_TYPE = 0x2 + SYSTEM_ALARM_ACE_TYPE = 0x3 + ACCESS_MAX_MS_V2_ACE_TYPE = 0x3 + ACCESS_ALLOWED_COMPOUND_ACE_TYPE = 0x4 + ACCESS_MAX_MS_V3_ACE_TYPE = 0x4 + ACCESS_MIN_MS_OBJECT_ACE_TYPE = 0x5 + ACCESS_ALLOWED_OBJECT_ACE_TYPE = 0x5 + ACCESS_DENIED_OBJECT_ACE_TYPE = 0x6 + SYSTEM_AUDIT_OBJECT_ACE_TYPE = 0x7 + SYSTEM_ALARM_OBJECT_ACE_TYPE = 0x8 + ACCESS_MAX_MS_OBJECT_ACE_TYPE = 0x8 + ACCESS_MAX_MS_V4_ACE_TYPE = 0x8 + ACCESS_MAX_MS_ACE_TYPE = 0x8 + ACCESS_ALLOWED_CALLBACK_ACE_TYPE = 0x9 + ACCESS_DENIED_CALLBACK_ACE_TYPE = 0xA + ACCESS_ALLOWED_CALLBACK_OBJECT_ACE_TYPE = 0xB + ACCESS_DENIED_CALLBACK_OBJECT_ACE_TYPE = 0xC + SYSTEM_AUDIT_CALLBACK_ACE_TYPE = 0xD + SYSTEM_ALARM_CALLBACK_ACE_TYPE = 0xE + SYSTEM_AUDIT_CALLBACK_OBJECT_ACE_TYPE = 0xF + SYSTEM_ALARM_CALLBACK_OBJECT_ACE_TYPE = 0x10 + SYSTEM_MANDATORY_LABEL_ACE_TYPE = 0x11 + SYSTEM_RESOURCE_ATTRIBUTE_ACE_TYPE = 0x12 + SYSTEM_SCOPED_POLICY_ID_ACE_TYPE = 0x13 + SYSTEM_PROCESS_TRUST_LABEL_ACE_TYPE = 0x14 + SYSTEM_ACCESS_FILTER_ACE_TYPE = 0x15 + ACCESS_MAX_MS_V5_ACE_TYPE = 0x15 +) + +var aceTypeStringMap = map[string]BYTE{ + "A": ACCESS_ALLOWED_ACE_TYPE, + "D": ACCESS_DENIED_ACE_TYPE, + "OA": ACCESS_ALLOWED_OBJECT_ACE_TYPE, + "OD": ACCESS_DENIED_OBJECT_ACE_TYPE, + "AU": SYSTEM_AUDIT_ACE_TYPE, + "AL": SYSTEM_ALARM_ACE_TYPE, + "OU": SYSTEM_AUDIT_OBJECT_ACE_TYPE, + "OL": SYSTEM_ALARM_OBJECT_ACE_TYPE, + "ML": SYSTEM_MANDATORY_LABEL_ACE_TYPE, + "XA": ACCESS_ALLOWED_CALLBACK_ACE_TYPE, + "XD": ACCESS_DENIED_CALLBACK_ACE_TYPE, + "RA": SYSTEM_RESOURCE_ATTRIBUTE_ACE_TYPE, + "SP": SYSTEM_SCOPED_POLICY_ID_ACE_TYPE, + "XU": SYSTEM_AUDIT_CALLBACK_OBJECT_ACE_TYPE, + "ZA": ACCESS_ALLOWED_CALLBACK_ACE_TYPE, + "TL": SYSTEM_PROCESS_TRUST_LABEL_ACE_TYPE, + "FL": SYSTEM_ACCESS_FILTER_ACE_TYPE, +} + +// Valid bitmasks contained in AceFlags present in ACE_HEADER. +const ( + OBJECT_INHERIT_ACE = 0x01 + CONTAINER_INHERIT_ACE = 0x02 + NO_PROPAGATE_INHERIT_ACE = 0x04 + INHERIT_ONLY_ACE = 0x08 + INHERITED_ACE = 0x10 + VALID_INHERIT_FLAGS = 0x1F + CRITICAL_ACE_FLAG = 0x20 + + // AceFlags mask for what events we (should) audit. Used by SACL. + SUCCESSFUL_ACCESS_ACE_FLAG = 0x40 + FAILED_ACCESS_ACE_FLAG = 0x80 + + TRUST_PROTECTED_FILTER_ACE_FLAG = 0x40 +) + +// Valid bitmasks contained in AccessMask present in type ACCESS_ALLOWED_ACE. +const ( + // Generic access rights. + GENERIC_READ = 0x80000000 + GENERIC_WRITE = 0x40000000 + GENERIC_EXECUTE = 0x20000000 + GENERIC_ALL = 0x10000000 + DELETE = 0x00010000 + READ_CONTROL = 0x00020000 + WRITE_DAC = 0x00040000 + WRITE_OWNER = 0x00080000 + SYNCHRONIZE = 0x00100000 + STANDARD_RIGHTS_REQUIRED = 0x000F0000 + STANDARD_RIGHTS_READ = READ_CONTROL + STANDARD_RIGHTS_WRITE = READ_CONTROL + STANDARD_RIGHTS_EXECUTE = READ_CONTROL + STANDARD_RIGHTS_ALL = 0x001F0000 + SPECIFIC_RIGHTS_ALL = 0x0000FFFF + + // Access rights for files and directories. + FILE_READ_DATA = 0x0001 /* file & pipe */ + FILE_READ_ATTRIBUTES = 0x0080 /* all */ + FILE_READ_EA = 0x0008 /* file & directory */ + FILE_WRITE_DATA = 0x0002 /* file & pipe */ + FILE_WRITE_ATTRIBUTES = 0x0100 /* all */ + FILE_WRITE_EA = 0x0010 /* file & directory */ + FILE_APPEND_DATA = 0x0004 /* file */ + FILE_EXECUTE = 0x0020 /* file */ + + FILE_ALL_ACCESS = (STANDARD_RIGHTS_REQUIRED | SYNCHRONIZE | 0x1FF) + FILE_GENERIC_READ = (STANDARD_RIGHTS_READ | FILE_READ_DATA | FILE_READ_ATTRIBUTES | FILE_READ_EA | SYNCHRONIZE) + FILE_GENERIC_WRITE = (STANDARD_RIGHTS_WRITE | FILE_WRITE_DATA | FILE_WRITE_ATTRIBUTES | FILE_WRITE_EA | FILE_APPEND_DATA | SYNCHRONIZE) + FILE_GENERIC_EXECUTE = (STANDARD_RIGHTS_EXECUTE | FILE_READ_ATTRIBUTES | FILE_EXECUTE | SYNCHRONIZE) + + // Access rights for DS objects. + ADS_RIGHT_DS_CREATE_CHILD = 0x0001 + ADS_RIGHT_DS_DELETE_CHILD = 0x0002 + ADS_RIGHT_ACTRL_DS_LIST = 0x0004 + ADS_RIGHT_DS_SELF = 0x0008 + ADS_RIGHT_DS_READ_PROP = 0x0010 + ADS_RIGHT_DS_WRITE_PROP = 0x0020 + ADS_RIGHT_DS_DELETE_TREE = 0x0040 + ADS_RIGHT_DS_LIST_OBJECT = 0x0080 + ADS_RIGHT_DS_CONTROL_ACCESS = 0x0100 + + // Registry Specific Access Rights. + KEY_QUERY_VALUE = 0x0001 + KEY_SET_VALUE = 0x0002 + KEY_CREATE_SUB_KEY = 0x0004 + KEY_ENUMERATE_SUB_KEYS = 0x0008 + KEY_NOTIFY = 0x0010 + KEY_CREATE_LINK = 0x0020 + KEY_WOW64_32KEY = 0x0200 + KEY_WOW64_64KEY = 0x0100 + KEY_WOW64_RES = 0x0300 + + KEY_READ = ((STANDARD_RIGHTS_READ | KEY_QUERY_VALUE | KEY_ENUMERATE_SUB_KEYS | KEY_NOTIFY) & (^SYNCHRONIZE)) + KEY_WRITE = ((STANDARD_RIGHTS_WRITE | KEY_SET_VALUE | KEY_CREATE_SUB_KEY) & (^SYNCHRONIZE)) + KEY_EXECUTE = ((KEY_READ) & (^SYNCHRONIZE)) + KEY_ALL_ACCESS = ((STANDARD_RIGHTS_ALL | KEY_QUERY_VALUE | KEY_SET_VALUE | KEY_CREATE_SUB_KEY | KEY_ENUMERATE_SUB_KEYS | KEY_NOTIFY | KEY_CREATE_LINK) & (^SYNCHRONIZE)) + + // SYSTEM_ACCESS_FILTER_ACE Access rights. + SYSTEM_MANDATORY_LABEL_NO_WRITE_UP = 0x1 + SYSTEM_MANDATORY_LABEL_NO_READ_UP = 0x2 + SYSTEM_MANDATORY_LABEL_NO_EXECUTE_UP = 0x4 +) + +// Access mask exactly matching the value here will be mapped to the key. +var aceStringToRightsMap = map[string]uint32{ + "GA": GENERIC_ALL, + "GR": GENERIC_READ, + "GW": GENERIC_WRITE, + "GX": GENERIC_EXECUTE, + + "RC": READ_CONTROL, + "SD": DELETE, + "WD": WRITE_DAC, + "WO": WRITE_OWNER, + + "RP": ADS_RIGHT_DS_READ_PROP, + "WP": ADS_RIGHT_DS_WRITE_PROP, + "CC": ADS_RIGHT_DS_CREATE_CHILD, + "DC": ADS_RIGHT_DS_DELETE_CHILD, + "LC": ADS_RIGHT_ACTRL_DS_LIST, + "SW": ADS_RIGHT_DS_SELF, + "LO": ADS_RIGHT_DS_LIST_OBJECT, + "DT": ADS_RIGHT_DS_DELETE_TREE, + "CR": ADS_RIGHT_DS_CONTROL_ACCESS, + + "FA": FILE_ALL_ACCESS, + "FR": FILE_GENERIC_READ, + "FW": FILE_GENERIC_WRITE, + "FX": FILE_GENERIC_EXECUTE, + + "KA": KEY_ALL_ACCESS, + "KR": KEY_READ, + "KW": KEY_WRITE, + "KX": KEY_EXECUTE, + + "NR": SYSTEM_MANDATORY_LABEL_NO_READ_UP, + "NW": SYSTEM_MANDATORY_LABEL_NO_WRITE_UP, + "NX": SYSTEM_MANDATORY_LABEL_NO_EXECUTE_UP, +} + +// Access rights to their corresponding friendly names. +// Note that this intentionally has some of the fields left out from aceStringToRightsMap. +var aceRightsToStringMap = map[uint32]string{ + GENERIC_ALL: "GA", + GENERIC_READ: "GR", + GENERIC_WRITE: "GW", + GENERIC_EXECUTE: "GX", + READ_CONTROL: "RC", + DELETE: "SD", + WRITE_DAC: "WD", + WRITE_OWNER: "WO", + ADS_RIGHT_DS_READ_PROP: "RP", + ADS_RIGHT_DS_WRITE_PROP: "WP", + ADS_RIGHT_DS_CREATE_CHILD: "CC", + ADS_RIGHT_DS_DELETE_CHILD: "DC", + ADS_RIGHT_ACTRL_DS_LIST: "LC", + ADS_RIGHT_DS_SELF: "SW", + ADS_RIGHT_DS_LIST_OBJECT: "LO", + ADS_RIGHT_DS_DELETE_TREE: "DT", + ADS_RIGHT_DS_CONTROL_ACCESS: "CR", +} + +var ( + SECURITY_NULL_SID_AUTHORITY = [6]byte{0, 0, 0, 0, 0, 0} + SECURITY_WORLD_SID_AUTHORITY = [6]byte{0, 0, 0, 0, 0, 1} + SECURITY_LOCAL_SID_AUTHORITY = [6]byte{0, 0, 0, 0, 0, 2} + SECURITY_CREATOR_SID_AUTHORITY = [6]byte{0, 0, 0, 0, 0, 3} + SECURITY_NON_UNIQUE_AUTHORITY = [6]byte{0, 0, 0, 0, 0, 4} + SECURITY_NT_AUTHORITY = [6]byte{0, 0, 0, 0, 0, 5} + SECURITY_APP_PACKAGE_AUTHORITY = [6]byte{0, 0, 0, 0, 0, 15} + SECURITY_MANDATORY_LABEL_AUTHORITY = [6]byte{0, 0, 0, 0, 0, 16} + SECURITY_SCOPED_POLICY_ID_AUTHORITY = [6]byte{0, 0, 0, 0, 0, 17} + SECURITY_AUTHENTICATION_AUTHORITY = [6]byte{0, 0, 0, 0, 0, 18} +) + +const ( + SECURITY_NULL_RID = 0 + SECURITY_WORLD_RID = 0 + SECURITY_LOCAL_RID = 0 + SECURITY_CREATOR_OWNER_RID = 0 + SECURITY_CREATOR_GROUP_RID = 1 + SECURITY_DIALUP_RID = 1 + SECURITY_NETWORK_RID = 2 + SECURITY_BATCH_RID = 3 + SECURITY_INTERACTIVE_RID = 4 + SECURITY_LOGON_IDS_RID = 5 + SECURITY_SERVICE_RID = 6 + SECURITY_LOCAL_SYSTEM_RID = 18 + SECURITY_BUILTIN_DOMAIN_RID = 32 + SECURITY_PRINCIPAL_SELF_RID = 10 + SECURITY_CREATOR_OWNER_SERVER_RID = 0x2 + SECURITY_CREATOR_GROUP_SERVER_RID = 0x3 + SECURITY_LOGON_IDS_RID_COUNT = 0x3 + SECURITY_ANONYMOUS_LOGON_RID = 0x7 + SECURITY_PROXY_RID = 0x8 + SECURITY_ENTERPRISE_CONTROLLERS_RID = 0x9 + SECURITY_SERVER_LOGON_RID = SECURITY_ENTERPRISE_CONTROLLERS_RID + SECURITY_AUTHENTICATED_USER_RID = 0xb + SECURITY_RESTRICTED_CODE_RID = 0xc + SECURITY_NT_NON_UNIQUE_RID = 0x15 + + SECURITY_CREATOR_OWNER_RIGHTS_RID = 0x00000004 + SECURITY_LOCAL_SERVICE_RID = 0x00000013 + SECURITY_NETWORK_SERVICE_RID = 0x00000014 + SECURITY_WRITE_RESTRICTED_CODE_RID = 0x00000021 + + SECURITY_MANDATORY_LOW_RID = 0x00001000 + SECURITY_MANDATORY_MEDIUM_RID = 0x00002000 + SECURITY_MANDATORY_MEDIUM_PLUS_RID = (SECURITY_MANDATORY_MEDIUM_RID + 0x100) + SECURITY_MANDATORY_HIGH_RID = 0x00003000 + SECURITY_MANDATORY_SYSTEM_RID = 0x00004000 + + SECURITY_APP_PACKAGE_BASE_RID = 0x00000002 + SECURITY_BUILTIN_PACKAGE_ANY_PACKAGE = 0x00000001 +) + +// Predefined domain-relative RIDs for local groups. +// See https://msdn.microsoft.com/en-us/library/windows/desktop/aa379649(v=vs.85).aspx +const ( + DOMAIN_ALIAS_RID_ADMINS = 0x220 + DOMAIN_ALIAS_RID_USERS = 0x221 + DOMAIN_ALIAS_RID_GUESTS = 0x222 + DOMAIN_ALIAS_RID_POWER_USERS = 0x223 + DOMAIN_ALIAS_RID_ACCOUNT_OPS = 0x224 + DOMAIN_ALIAS_RID_SYSTEM_OPS = 0x225 + DOMAIN_ALIAS_RID_PRINT_OPS = 0x226 + DOMAIN_ALIAS_RID_BACKUP_OPS = 0x227 + DOMAIN_ALIAS_RID_REPLICATOR = 0x228 + DOMAIN_ALIAS_RID_RAS_SERVERS = 0x229 + DOMAIN_ALIAS_RID_PREW2KCOMPACCESS = 0x22A + DOMAIN_ALIAS_RID_REMOTE_DESKTOP_USERS = 0x22B + DOMAIN_ALIAS_RID_NETWORK_CONFIGURATION_OPS = 0x22C + DOMAIN_ALIAS_RID_INCOMING_FOREST_TRUST_BUILDERS = 0x22D + DOMAIN_ALIAS_RID_MONITORING_USERS = 0x22E + DOMAIN_ALIAS_RID_LOGGING_USERS = 0x22F + DOMAIN_ALIAS_RID_AUTHORIZATIONACCESS = 0x230 + DOMAIN_ALIAS_RID_TS_LICENSE_SERVERS = 0x231 + DOMAIN_ALIAS_RID_DCOM_USERS = 0x232 + DOMAIN_ALIAS_RID_IUSERS = 0x238 + DOMAIN_ALIAS_RID_CRYPTO_OPERATORS = 0x239 + DOMAIN_ALIAS_RID_CACHEABLE_PRINCIPALS_GROUP = 0x23B + DOMAIN_ALIAS_RID_NON_CACHEABLE_PRINCIPALS_GROUP = 0x23C + DOMAIN_ALIAS_RID_EVENT_LOG_READERS_GROUP = 0x23D + DOMAIN_ALIAS_RID_CERTSVC_DCOM_ACCESS_GROUP = 0x23E + DOMAIN_ALIAS_RID_RDS_REMOTE_ACCESS_SERVERS = 0x23F + DOMAIN_ALIAS_RID_RDS_ENDPOINT_SERVERS = 0x240 + DOMAIN_ALIAS_RID_RDS_MANAGEMENT_SERVERS = 0x241 + DOMAIN_ALIAS_RID_HYPER_V_ADMINS = 0x242 + DOMAIN_ALIAS_RID_ACCESS_CONTROL_ASSISTANCE_OPS = 0x243 + DOMAIN_ALIAS_RID_REMOTE_MANAGEMENT_USERS = 0x244 + DOMAIN_ALIAS_RID_DEFAULT_ACCOUNT = 0x245 + DOMAIN_ALIAS_RID_STORAGE_REPLICA_ADMINS = 0x246 + DOMAIN_ALIAS_RID_DEVICE_OWNERS = 0x247 +) + +const ( + DOMAIN_GROUP_RID_ENTERPRISE_READONLY_DOMAIN_CONTROLLERS = 0x1F2 // 498 + DOMAIN_USER_RID_ADMIN = 0x1F4 // 500 + DOMAIN_USER_RID_GUEST = 0x1F5 + DOMAIN_GROUP_RID_ADMINS = 0x200 // 512 + DOMAIN_GROUP_RID_USERS = 0x201 + DOMAIN_GROUP_RID_GUESTS = 0x202 + DOMAIN_GROUP_RID_COMPUTERS = 0x203 + DOMAIN_GROUP_RID_CONTROLLERS = 0x204 + DOMAIN_GROUP_RID_CERT_ADMINS = 0x205 + DOMAIN_GROUP_RID_SCHEMA_ADMINS = 0x206 + DOMAIN_GROUP_RID_ENTERPRISE_ADMINS = 0x207 + DOMAIN_GROUP_RID_POLICY_ADMINS = 0x208 + DOMAIN_GROUP_RID_READONLY_CONTROLLERS = 0x209 + DOMAIN_GROUP_RID_CLONEABLE_CONTROLLERS = 0x20A + DOMAIN_GROUP_RID_CDC_RESERVED = 0x20C + DOMAIN_GROUP_RID_PROTECTED_USERS = 0x20D + DOMAIN_GROUP_RID_KEY_ADMINS = 0x20E + DOMAIN_GROUP_RID_ENTERPRISE_KEY_ADMINS = 0x20F +) + +const ( + SECURITY_AUTHENTICATION_AUTHORITY_ASSERTED_RID = 0x1 + SECURITY_AUTHENTICATION_SERVICE_ASSERTED_RID = 0x2 + SECURITY_AUTHENTICATION_FRESH_KEY_AUTH_RID = 0x3 + SECURITY_AUTHENTICATION_KEY_TRUST_RID = 0x4 + SECURITY_AUTHENTICATION_KEY_PROPERTY_MFA_RID = 0x5 + SECURITY_AUTHENTICATION_KEY_PROPERTY_ATTESTATION_RID = 0x6 +) + +/* + * Define some Windows type names for increased readability of various Windows structs we use here. + */ +type BYTE byte +type WORD uint16 +type DWORD uint32 + +/**************************************************************************** + * Various binary structures used for conveying SMB objects. + * + * ALL MULTI-BYTE VALUES ARE IN LITTLE ENDIAN FORMAT. + * + * We don't use these structures in the code but they are there to help reader + * understand the code. + ****************************************************************************/ + +/* + * This is NT Security Descriptor in "Self Relative" format. + * This is returned when common.CIFS_XATTR_CIFS_NTSD xattr is queried for a file. + * The Linux equivalent struct is "struct cifs_ntsd". + */ +type SECURITY_DESCRIPTOR_CONTROL WORD +type SECURITY_DESCRIPTOR_RELATIVE struct { + // Revision number of this SECURITY_DESCRIPTOR. Must be 1. + Revision BYTE + // Zero byte. + Sbz1 BYTE + // Flag bits describing this SECURITY_DESCRIPTOR. + Control SECURITY_DESCRIPTOR_CONTROL + // Offset of owner sid. There's a SID structure at this offset. + OffsetOwner DWORD + // Offset of primary group sid. There's a SID structure at this offset. + OffsetGroup DWORD + // Offset of SACL. There's an ACL structure at this offset. + OffsetSacl DWORD + // Offset of DACL. There's an ACL structure at this offset. + OffsetDacl DWORD + // 0 or more bytes (depending on the various offsets above) follow this structure. + Data [0]BYTE +} + +// Maximum sub authority values present in a SID. +const SID_MAX_SUB_AUTHORITIES = 15 + +/* + * SID structure. + * The Linux equivalent struct is "struct cifs_sid". + */ +type SID struct { + Revision BYTE + // How many DWORD SubAuthority values? Cannot be 0, max possible value is SID_MAX_SUB_AUTHORITIES. + SubAuthorityCount BYTE + // IdentifierAuthority is in big endian format. + IdentifierAuthority [6]BYTE + // SubAuthorityCount SubAuthority DWORDs. + SubAuthority [1]DWORD +} + +/* + * Header at the beginning of every ACE. + */ +type ACE_HEADER struct { + AceType BYTE + AceFlags BYTE + AceSize WORD +} + +/* + * Single ACE (Access Check Entry). + * One or more of these are contained in ACL. + * The Linux equivalent struct is "struct cifs_ace". + */ +type ACCESS_ALLOWED_ACE struct { + Header ACE_HEADER + // What permissions is this ACE controlling? + AccessMask DWORD + // SID to which these permissions apply. + Sid SID +} + +/* + * Binary ACL format. Used for both DACL and SACL. + * The Linux equivalent struct is "struct cifs_acl". + */ +type ACL struct { + AclRevision BYTE + Sbz1 BYTE + AclSize WORD + AceCount WORD + Sbz2 WORD +} + +type AnySID struct { + Revision byte + SubAuthorityCount byte + IdentifierAuthority [6]byte + SubAuthority []uint32 +} + +// TODO: Validate completeness/correctness. +var wellKnownSidShortcuts = map[string]AnySID{ + "WD": {SID_REVISION, 1, SECURITY_WORLD_SID_AUTHORITY, []uint32{SECURITY_NULL_RID}}, + + "CO": {SID_REVISION, 1, SECURITY_CREATOR_SID_AUTHORITY, []uint32{SECURITY_CREATOR_OWNER_RID}}, + "CG": {SID_REVISION, 1, SECURITY_CREATOR_SID_AUTHORITY, []uint32{SECURITY_CREATOR_GROUP_RID}}, + "OW": {SID_REVISION, 1, SECURITY_CREATOR_SID_AUTHORITY, []uint32{SECURITY_CREATOR_OWNER_RIGHTS_RID}}, + + "NU": {SID_REVISION, 1, SECURITY_NT_AUTHORITY, []uint32{SECURITY_NETWORK_RID}}, + "IU": {SID_REVISION, 1, SECURITY_NT_AUTHORITY, []uint32{SECURITY_INTERACTIVE_RID}}, + "SU": {SID_REVISION, 1, SECURITY_NT_AUTHORITY, []uint32{SECURITY_SERVICE_RID}}, + "AN": {SID_REVISION, 1, SECURITY_NT_AUTHORITY, []uint32{SECURITY_ANONYMOUS_LOGON_RID}}, + "ED": {SID_REVISION, 1, SECURITY_NT_AUTHORITY, []uint32{SECURITY_ENTERPRISE_CONTROLLERS_RID}}, + "PS": {SID_REVISION, 1, SECURITY_NT_AUTHORITY, []uint32{SECURITY_PRINCIPAL_SELF_RID}}, + "AU": {SID_REVISION, 1, SECURITY_NT_AUTHORITY, []uint32{SECURITY_AUTHENTICATED_USER_RID}}, + "RC": {SID_REVISION, 1, SECURITY_NT_AUTHORITY, []uint32{SECURITY_RESTRICTED_CODE_RID}}, + "SY": {SID_REVISION, 1, SECURITY_NT_AUTHORITY, []uint32{SECURITY_LOCAL_SYSTEM_RID}}, + "LS": {SID_REVISION, 1, SECURITY_NT_AUTHORITY, []uint32{SECURITY_LOCAL_SERVICE_RID}}, + "NS": {SID_REVISION, 1, SECURITY_NT_AUTHORITY, []uint32{SECURITY_NETWORK_SERVICE_RID}}, + "WR": {SID_REVISION, 1, SECURITY_NT_AUTHORITY, []uint32{SECURITY_WRITE_RESTRICTED_CODE_RID}}, + + "BA": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_ADMINS}}, + "BU": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_USERS}}, + "BG": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_GUESTS}}, + "PU": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_POWER_USERS}}, + "AO": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_ACCOUNT_OPS}}, + "SO": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_SYSTEM_OPS}}, + "PO": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_PRINT_OPS}}, + "BO": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_BACKUP_OPS}}, + "RE": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_REPLICATOR}}, + "RU": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_PREW2KCOMPACCESS}}, + "RD": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_REMOTE_DESKTOP_USERS}}, + "NO": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_NETWORK_CONFIGURATION_OPS}}, + + "MU": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_MONITORING_USERS}}, + "LU": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_LOGGING_USERS}}, + "IS": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_IUSERS}}, + "CY": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_CRYPTO_OPERATORS}}, + "ER": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_EVENT_LOG_READERS_GROUP}}, + "CD": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_CERTSVC_DCOM_ACCESS_GROUP}}, + "RA": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_RDS_REMOTE_ACCESS_SERVERS}}, + "ES": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_RDS_ENDPOINT_SERVERS}}, + "MS": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_RDS_MANAGEMENT_SERVERS}}, + "HA": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_HYPER_V_ADMINS}}, + "AA": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_ACCESS_CONTROL_ASSISTANCE_OPS}}, + "RM": {SID_REVISION, 2, SECURITY_NT_AUTHORITY, []uint32{SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_REMOTE_MANAGEMENT_USERS}}, + + "LW": {SID_REVISION, 1, SECURITY_MANDATORY_LABEL_AUTHORITY, []uint32{SECURITY_MANDATORY_LOW_RID}}, + "ME": {SID_REVISION, 1, SECURITY_MANDATORY_LABEL_AUTHORITY, []uint32{SECURITY_MANDATORY_MEDIUM_RID}}, + "MP": {SID_REVISION, 1, SECURITY_MANDATORY_LABEL_AUTHORITY, []uint32{SECURITY_MANDATORY_MEDIUM_PLUS_RID}}, + "HI": {SID_REVISION, 1, SECURITY_MANDATORY_LABEL_AUTHORITY, []uint32{SECURITY_MANDATORY_HIGH_RID}}, + "SI": {SID_REVISION, 1, SECURITY_MANDATORY_LABEL_AUTHORITY, []uint32{SECURITY_MANDATORY_SYSTEM_RID}}, + "AC": {SID_REVISION, 2, SECURITY_APP_PACKAGE_AUTHORITY, []uint32{SECURITY_APP_PACKAGE_BASE_RID, SECURITY_BUILTIN_PACKAGE_ANY_PACKAGE}}, + + "AS": {SID_REVISION, 1, SECURITY_AUTHENTICATION_AUTHORITY, []uint32{SECURITY_AUTHENTICATION_AUTHORITY_ASSERTED_RID}}, + "SS": {SID_REVISION, 1, SECURITY_AUTHENTICATION_AUTHORITY, []uint32{SECURITY_AUTHENTICATION_SERVICE_ASSERTED_RID}}, +} + +// TODO: Validate completeness/correctness. +var domainRidShortcuts = map[string]uint32{ + "RO": DOMAIN_GROUP_RID_ENTERPRISE_READONLY_DOMAIN_CONTROLLERS, + "LA": DOMAIN_USER_RID_ADMIN, + "LG": DOMAIN_USER_RID_GUEST, + "DA": DOMAIN_GROUP_RID_ADMINS, + "DU": DOMAIN_GROUP_RID_USERS, + "DG": DOMAIN_GROUP_RID_GUESTS, + "DC": DOMAIN_GROUP_RID_COMPUTERS, + "DD": DOMAIN_GROUP_RID_CONTROLLERS, + "CA": DOMAIN_GROUP_RID_CERT_ADMINS, + "SA": DOMAIN_GROUP_RID_SCHEMA_ADMINS, + "EA": DOMAIN_GROUP_RID_ENTERPRISE_ADMINS, + "PA": DOMAIN_GROUP_RID_POLICY_ADMINS, + "CN": DOMAIN_GROUP_RID_CLONEABLE_CONTROLLERS, + "AP": DOMAIN_GROUP_RID_PROTECTED_USERS, + "KA": DOMAIN_GROUP_RID_KEY_ADMINS, + "EK": DOMAIN_GROUP_RID_ENTERPRISE_KEY_ADMINS, + "RS": DOMAIN_ALIAS_RID_RAS_SERVERS, +} + +/****************************************************************************/ + +// Test whether sd refers to a valid Security Descriptor. +// We do some basic validations of the SECURITY_DESCRIPTOR_RELATIVE header. +// 'flags' is used to convey what all information does the caller want us to verify in the binary SD. +func sdRelativeIsValid(sd []byte, flags SECURITY_INFORMATION) error { + if len(sd) < int(unsafe.Sizeof(SECURITY_DESCRIPTOR_RELATIVE{})) { + return fmt.Errorf("sd too small (%d bytes)", len(sd)) + } + + // Fetch various fields of the Security Descriptor. + revision := sd[0] + sbz1 := sd[1] + control := binary.LittleEndian.Uint16(sd[2:4]) + offsetOwner := binary.LittleEndian.Uint32(sd[4:8]) + offsetGroup := binary.LittleEndian.Uint32(sd[8:12]) + offsetSacl := binary.LittleEndian.Uint32(sd[12:16]) + offsetDacl := binary.LittleEndian.Uint32(sd[16:20]) + + // Now validate sanity of these fields. + if revision != SDDL_REVISION { + return fmt.Errorf("Invalid SD revision (%d), expected %d", revision, SDDL_REVISION) + } + + if sbz1 != 0 { + return fmt.Errorf("sbz1 must be 0, is %d", sbz1) + } + + // SE_SELF_RELATIVE must be set. + if (control & SE_SELF_RELATIVE) == 0 { + return fmt.Errorf("SE_SELF_RELATIVE control bit must be set (control=0x%x)", control) + } + + // Caller wants us to validate DACL information? + if (flags & DACL_SECURITY_INFORMATION) != 0 { + // SE_DACL_PRESENT bit MUST be *always* set. + if (control & SE_DACL_PRESENT) == 0 { + return fmt.Errorf("SE_DACL_PRESENT control bit must always be set (control=0x%x)", control) + } + + // offsetDacl may be 0 which would mean "No ACLs" aka "allow all users". + // If non-zero, OffsetDacl must point inside the relative Security Descriptor. + if offsetDacl != 0 && offsetDacl+uint32(unsafe.Sizeof(ACL{})) > uint32(len(sd)) { + return fmt.Errorf("DACL (offsetDacl=%d) must lie within sd (length=%d)", offsetDacl, len(sd)) + } + } + + // Caller wants us to validate SACL information? + if (flags & SACL_SECURITY_INFORMATION) != 0 { + // SE_SACL_PRESENT bit is optional. If not set it means there is no SACL present. + if (control&SE_SACL_PRESENT) != 0 && offsetSacl != 0 { + // OffsetSacl must point inside the relative Security Descriptor. + if offsetSacl+uint32(unsafe.Sizeof(ACL{})) > uint32(len(sd)) { + return fmt.Errorf("SACL (offsetSacl=%d) must lie within sd (length=%d)", offsetSacl, len(sd)) + } + } + } + + // Caller wants us to validate OwnerSID? + if (flags & OWNER_SECURITY_INFORMATION) != 0 { + if offsetOwner == 0 { + return fmt.Errorf("offsetOwner must not be 0") + } + + // OffsetOwner must point inside the relative Security Descriptor. + if offsetOwner+uint32(unsafe.Sizeof(SID{})) > uint32(len(sd)) { + return fmt.Errorf("OwnerSID (offsetOwner=%d) must lie within sd (length=%d)", + offsetOwner, len(sd)) + } + } + + // Caller wants us to validate GroupSID? + if (flags & GROUP_SECURITY_INFORMATION) != 0 { + if offsetGroup == 0 { + return fmt.Errorf("offsetGroup must not be 0") + } + + // OffsetGroup must point inside the relative Security Descriptor. + if offsetGroup+uint32(unsafe.Sizeof(SID{})) > uint32(len(sd)) { + return fmt.Errorf("GroupSID (offsetGroup=%d) must lie within sd (length=%d)", + offsetGroup, len(sd)) + } + } + + return nil +} + +// sidToString returns a stringified version of a binary SID object contained in sidSlice. +// The layout of the binary SID object is as per "SID struct". +func sidToString(sidSlice []byte) (string, error) { + // Ensure we have enough bytes till SID.IdentifierAuthority. + if len(sidSlice) < 8 { + return "", fmt.Errorf("Invalid binary SID [size (%d) < 8]", len(sidSlice)) + } + + // SID.Revision. + revision := sidSlice[:1][0] + + // SID.SubAuthorityCount. + subAuthorityCount := sidSlice[1:2][0] + + // Ensure we have enough bytes for subAuthorityCount authority values, where each is a 4-byte DWORD + // in little endian format. + if len(sidSlice) < int(8+(4*subAuthorityCount)) { + return "", fmt.Errorf("Invalid binary SID [subAuthorityCount=%d, size (%d) < %d]", + subAuthorityCount, len(sidSlice), (8 + (4 * subAuthorityCount))) + } + + // SID.IdentifierAuthority. + // The 48-bit authority is laid out in big endian format. + authorityHigh := uint64(binary.BigEndian.Uint16(sidSlice[2:4])) + authorityLow := uint64(binary.BigEndian.Uint32(sidSlice[4:8])) + authority := (authorityHigh<<32 | authorityLow) + + sidString := fmt.Sprintf("S-%d-%d", revision, authority) + + // Offset to start of SID.SubAuthority array. + offset := 8 + + // Parse and include all SubAuthority values in the SID string. + for i := 0; i < int(subAuthorityCount); i++ { + sidString += fmt.Sprintf("-%d", binary.LittleEndian.Uint32(sidSlice[offset:offset+4])) + offset += 4 + } + + return sidString, nil +} + +// Return the next token (after '-' till the next '-' or end of string) from 'sidString' and the remaining +// sidString after the token. +func getNextToken(sidString string) (string /* token */, string /* remaining sidString */) { + token := "" + charsProcessed := 0 + + for _, c := range sidString { + charsProcessed++ + if c == '-' { + break + } + token += string(c) + } + + return token, sidString[charsProcessed:] +} + +// stringToSid converts the string sid into a byte slice in the form of "struct SID". +// The returned byte slice can be copied to fill the sid in a binary Security Descriptor in the form of +// struct SECURITY_DESCRIPTOR_RELATIVE. +func stringToSid(sidString string) ([]byte, error) { + // Allocate a byte slice large enough to hold the binary SID. + maxSidBytes := int(unsafe.Sizeof(SID{}) + (unsafe.Sizeof(uint32(0)) * SID_MAX_SUB_AUTHORITIES)) + sid := make([]byte, maxSidBytes) + + sidStringOriginal := sidString + offset := 0 + + if (sidString[0] == 'S' || sidString[0] == 's') && sidString[1] == '-' { /* S-R-I-S-S */ + // R-I-S-S. + sidString = sidString[2:] + var subAuthorityCount byte = 0 + + token := "" + tokenIdx := 0 + for sidString != "" { + token, sidString = getNextToken(sidString) + + if tokenIdx == 0 { + // SID.Revision. + revision, err := strconv.ParseUint(token, 10, 8) + if err != nil { + return nil, fmt.Errorf("stringToSid: Error parsing Revision: %v", err) + } + if revision != SID_REVISION { + return nil, fmt.Errorf("stringToSid: Invalid SID Revision %d", revision) + } + sid[0] = byte(revision) + // Increment offset by 2 as we will fill SubAuthorityCount later. + offset += 2 + } else if tokenIdx == 1 { + // SID.IdentifierAuthority. + authority, err := strconv.ParseUint(token, 10, 32) + if err != nil { + return nil, fmt.Errorf("stringToSid: Error parsing IdentifierAuthority: %v", err) + } + authorityHigh := uint16(authority >> 32) + authorityLow := uint32(authority & 0xFFFFFFFF) + binary.BigEndian.PutUint16(sid[2:4], authorityHigh) + binary.BigEndian.PutUint32(sid[4:8], authorityLow) + offset += 6 + } else { + // SID.SubAuthority[]. + subAuth, err := strconv.ParseUint(token, 10, 32) + if err != nil { + // If not numeric, maybe domain RID, but domain RID must be the last component. + if rid, ok := domainRidShortcuts[token]; ok { + if sidString != "" { + return nil, fmt.Errorf("Domain RID (%s) seen but is not the last SubAuthority. SID=%s", token, sidStringOriginal) + } + subAuth = uint64(rid) + } else { + return nil, err + } + } + binary.LittleEndian.PutUint32(sid[offset:offset+4], uint32(subAuth)) + offset += 4 + subAuthorityCount++ + } + + tokenIdx++ + } + + // Now we know SubAuthorityCount, fill it. + sid[1] = subAuthorityCount + + } else { + // String SID like "BA"? + if wks, ok := wellKnownSidShortcuts[sidString]; ok { + // SID.Revision. + sid[0] = wks.Revision + // SID.SubAuthorityCount. + sid[1] = wks.SubAuthorityCount + // SID.IdentifierAuthority. + copy(sid[2:8], wks.IdentifierAuthority[:]) + + offset = 8 + for i := 0; i < int(wks.SubAuthorityCount); i++ { + // SID.SubAuthority[]. + binary.LittleEndian.PutUint32(sid[offset:offset+4], wks.SubAuthority[i]) + offset += 4 + } + } else if rid, ok := domainRidShortcuts[sidString]; ok { + // Domain RID like "DU"? + // TODO: Add domain RID support. We need to prefix the domain SID. + fmt.Printf("Got well known RID %d\n", rid) + + panic("Domain RIDs not yet implemented!") + } else { + return nil, fmt.Errorf("Invalid SID: %s", sidStringOriginal) + } + + } + + return sid[:offset], nil +} + +// Return a string representation of the 4-byte ACE rights. +func aceRightsToString(aceRights uint32) string { + /* + * Check if the aceRights exactly maps to a shorthand name. + */ + if v, ok := aceRightsToStringMap[aceRights]; ok { + return v + } + + /* + * Check if the rights can be expressed as a concatenation of shorthand names. + * Only if we can map all the OR'ed rights to shorthand names, we use it. + */ + aceRightsString := "" + var allRights uint32 = 0 + + for k, v := range aceRightsToStringMap { + if (aceRights & k) == k { + aceRightsString += v + allRights |= k + } + } + + // Use stringified rights only if *all* available rights can be represented with a shorthand name. + // The else part is commented as it's being hit too often. One such common aceRights value is 0x1200a9. + if allRights == aceRights { + return aceRightsString + } + /* + else if allRights != 0 { + fmt.Printf("aceRightsString: Only partial rights could be stringified (aceRights=0x%x, allRights=0x%x)", + aceRights, allRights) + } + */ + + // Fallback to integral mask value. + return fmt.Sprintf("0x%x", aceRights) +} + +// Does the aceType correspond to an object ACE? +// We don't support object ACEs. +func isObjectAce(aceType byte) bool { + switch aceType { + case ACCESS_ALLOWED_OBJECT_ACE_TYPE, + ACCESS_DENIED_OBJECT_ACE_TYPE, + SYSTEM_AUDIT_OBJECT_ACE_TYPE, + SYSTEM_ALARM_OBJECT_ACE_TYPE, + ACCESS_ALLOWED_CALLBACK_OBJECT_ACE_TYPE, + ACCESS_DENIED_CALLBACK_OBJECT_ACE_TYPE, + SYSTEM_AUDIT_CALLBACK_OBJECT_ACE_TYPE, + SYSTEM_ALARM_CALLBACK_OBJECT_ACE_TYPE: + return true + + default: + return false + } +} + +// Returns true for aceTypes that we support. +// TODO: Allow SACL ACE type, conditional ACE Types. +func isUnsupportedAceType(aceType byte) bool { + switch aceType { + case ACCESS_ALLOWED_ACE_TYPE, + ACCESS_DENIED_ACE_TYPE: + return false + default: + return true + } +} + +// Convert numeric ace type to string. +func aceTypeToString(aceType BYTE) (string, error) { + for k, v := range aceTypeStringMap { + if v == aceType { + return k, nil + } + } + + return "", fmt.Errorf("Unknown aceType: %d", aceType) +} + +// aceToString returns a stringified version of a binary ACE object contained in aceSlice. +// The layout of the binary ACE object is as per "struct ACCESS_ALLOWED_ACE". +func aceToString(aceSlice []byte) (string, error) { + // We access 8 bytes in this function, ensure we have at least 8 bytes. + if len(aceSlice) < 8 { + return "", fmt.Errorf("Short aceSlice: %d bytes", len(aceSlice)) + } + + aceString := "(" + + // ACCESS_ALLOWED_ACE.Header.AceType. + aceType := aceSlice[:1][0] + + // This is our gatekeeper for blocking unsupported ace types. + // We open up ACEs as we add support for them. + if isUnsupportedAceType(aceType) { + return "", fmt.Errorf("Unsupported ACE type: 0x%x", aceType) + } + + // ACCESS_ALLOWED_ACE.Header.AceFlags. + aceFlags := aceSlice[1:2][0] + // ACCESS_ALLOWED_ACE.AccessMask. + aceRights := binary.LittleEndian.Uint32(aceSlice[4:8]) + + aceTypeString, err := aceTypeToString(BYTE(aceType)) + if err != nil { + return "", fmt.Errorf("aceToString: %v", err) + } + aceString += aceTypeString + aceString += ";" + + if (aceFlags & CONTAINER_INHERIT_ACE) != 0 { + aceString += "CI" + } + if (aceFlags & OBJECT_INHERIT_ACE) != 0 { + aceString += "OI" + } + if (aceFlags & NO_PROPAGATE_INHERIT_ACE) != 0 { + aceString += "NP" + } + if (aceFlags & INHERIT_ONLY_ACE) != 0 { + aceString += "IO" + } + if (aceFlags & INHERITED_ACE) != 0 { + aceString += "ID" + } + if (aceFlags & SUCCESSFUL_ACCESS_ACE_FLAG) != 0 { + aceString += "SA" + } + if (aceFlags & FAILED_ACCESS_ACE_FLAG) != 0 { + aceString += "FA" + } + if (aceType == SYSTEM_ACCESS_FILTER_ACE_TYPE) && (aceFlags&TRUST_PROTECTED_FILTER_ACE_FLAG) != 0 { + aceString += "TP" + } + if (aceFlags & CRITICAL_ACE_FLAG) != 0 { + aceString += "CR" + } + + aceString += ";" + aceString += aceRightsToString(aceRights) + aceString += ";" + + // TODO: Empty object_guid;inherit_object_guid. + aceString += ";" + aceString += ";" + + sidoffset := 8 + sidStr, err := sidToString(aceSlice[sidoffset:]) + if err != nil { + return "", fmt.Errorf("aceToString: sidToString failed: %v", err) + } + aceString += sidStr + aceString += ")" + + return aceString, nil +} + +// Given the entrire xattr value buffer, return the SD revision. +func getRevision(sd []byte) BYTE { + if len(sd) < 1 { + return 0 + } + + // SECURITY_DESCRIPTOR_RELATIVE.Revision. + return BYTE(sd[0]) +} + +// Given the entrire xattr value buffer, return the owner sid string. +func getOwnerSidString(sd []byte) (string, error) { + // Make sure we have enough bytes to safely read the required fields. + if len(sd) < int(unsafe.Sizeof(SECURITY_DESCRIPTOR_RELATIVE{})) { + return "", fmt.Errorf("Short Security Descriptor: %d bytes!", len(sd)) + } + + // Only valid revision is 1, verify that. + revision := getRevision(sd) + if revision != SID_REVISION { + return "", fmt.Errorf("Invalid SID revision (%d), expected %d!", revision, SID_REVISION) + } + + // SECURITY_DESCRIPTOR_RELATIVE.OffsetOwner. + offsetOwner := binary.LittleEndian.Uint32(sd[4:8]) + if offsetOwner >= uint32(len(sd)) { + return "", fmt.Errorf("offsetOwner (%d) points outside Security Descriptor of size %d bytes!", + offsetOwner, len(sd)) + } + + sidStr, err := sidToString(sd[offsetOwner:]) + if err != nil { + return "", err + } + return "O:" + sidStr, nil +} + +// Given the entrire xattr value buffer, return the primary group sid string. +func getGroupSidString(sd []byte) (string, error) { + // Make sure we have enough bytes to safely read the required fields. + if len(sd) < int(unsafe.Sizeof(SECURITY_DESCRIPTOR_RELATIVE{})) { + return "", fmt.Errorf("Short Security Descriptor: %d bytes!", len(sd)) + } + + // Only valid revision is 1, verify that. + revision := getRevision(sd) + if revision != 1 { + return "", fmt.Errorf("Invalid SD revision (%d), expected 1!", revision) + } + + // SECURITY_DESCRIPTOR_RELATIVE.OffsetGroup. + offsetGroup := binary.LittleEndian.Uint32(sd[8:12]) + if offsetGroup >= uint32(len(sd)) { + return "", fmt.Errorf("offsetGroup (%d) points outside Security Descriptor of size %d bytes!", + offsetGroup, len(sd)) + } + + sidStr, err := sidToString(sd[offsetGroup:]) + if err != nil { + return "", err + } + return "G:" + sidStr, nil +} + +// Given the entrire xattr value buffer, return the DACL string. +func getDaclString(sd []byte) (string, error) { + // Make sure we have enough bytes to safely read the required fields. + if len(sd) < int(unsafe.Sizeof(SECURITY_DESCRIPTOR_RELATIVE{})) { + return "", fmt.Errorf("Short Security Descriptor: %d bytes!", len(sd)) + } + + // Only valid revision is 1, verify that. + revision := getRevision(sd) + if revision != SDDL_REVISION { + return "", fmt.Errorf("Invalid SD revision (%d), expected %d!", revision, SDDL_REVISION) + } + + // SECURITY_DESCRIPTOR_RELATIVE.Control. + control := binary.LittleEndian.Uint16(sd[2:4]) + + // DACL not present? + // + // Note: I have observed that Windows always sets SE_DACL_PRESENT even if we save a binary SD with + // SE_DACL_PRESENT cleared, so we don't expect the following but we still have it for resilience. + // Since user has not specified SE_DACL_PRESENT, it means he doesn't want to set any ACLs, which means + // he wants to "allow all users", hence "D:NO_ACCESS_CONTROL" is most appropriate. + // If we just return "D:" it would mean user wants access control but has not specified any ACEs, which + // would instead mean "allow nobody". + // + if (control & SE_DACL_PRESENT) == 0 { + fmt.Printf("[UNEXPECTED] SE_DACL_PRESENT bit not set, control word is 0x%x", control) + return "D:NO_ACCESS_CONTROL", nil + } + + daclString := "D:" + + dacl_flags := "" + if (control & SE_DACL_PROTECTED) != 0 { + dacl_flags += "P" + } + if (control & SE_DACL_AUTO_INHERIT_REQ) != 0 { + dacl_flags += "AR" + } + if (control & SE_DACL_AUTO_INHERITED) != 0 { + dacl_flags += "AI" + } + daclString += dacl_flags + + // SE_DACL_AUTO_INHERITED.OffsetDacl. + dacloffset := binary.LittleEndian.Uint32(sd[16:20]) + + if dacloffset == 0 { + // dacloffset==0 means that user doesn't want any explicit ACL to be set, which means "allow all users". + // This can be represented as "D:NO_ACCESS_CONTROL". + daclString += "NO_ACCESS_CONTROL" + + return daclString, nil + } + + if (dacloffset + 8) > uint32(len(sd)) { + return "", fmt.Errorf("dacloffset (%d) points outside Security Descriptor of size %d bytes!", + dacloffset+8, len(sd)) + } + + // ACL.AclRevision. + aclRevision := sd[dacloffset] + + // + // Though we support only ACCESS_ALLOWED_ACE_TYPE and ACCESS_DENIED_ACE_TYPE which as per docs should be + // present with ACL revision 2, but I've seen some objects with these ACE types but acl revision 4. + // Instead of failing here, we let it proceed. Later isUnsupportedAceType() will catch unsupported ACE types. + // + // https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-dtyp/20233ed8-a6c6-4097-aafa-dd545ed24428 + // + if aclRevision != ACL_REVISION && aclRevision != ACL_REVISION_DS { + // More importantly we don't support Object ACEs (ACL_REVISION_DS). + return "", fmt.Errorf("Invalid ACL Revision (%d), valid values are 2 and 4.", aclRevision) + } + + // ACL.AceCount. + numAces := binary.LittleEndian.Uint32(sd[dacloffset+4 : dacloffset+8]) + + // Offset of the first ACE. + offset := dacloffset + 8 + + // Go over all the ACEs and stringify them. + // If numAces is 0 it'll result in daclString to have only flags and no ACEs. + // Such an ACL will mean "allow nobody". + for i := 0; i < int(numAces); i++ { + if (offset + 4) > uint32(len(sd)) { + return "", fmt.Errorf("Short ACE (offset=%d), Security Descriptor size=%d bytes!", + offset, len(sd)) + } + + // ACCESS_ALLOWED_ACE.Header.AceSize. + ace_size := uint32(binary.LittleEndian.Uint16(sd[offset+2 : offset+4])) + + if (offset + ace_size) > uint32(len(sd)) { + return "", fmt.Errorf("ACE (offset=%d, ace_size=%d) lies outside Security Descriptor of size %d bytes!", offset, ace_size, len(sd)) + } + + aceStr, err := aceToString(sd[offset : offset+ace_size]) + if err != nil { + return "", err + } + daclString += aceStr + offset += ace_size + } + + return daclString, nil +} + +// getBinarySdSizeFromSDDLString returns the estimated number of bytes enough for binary representation of the +// given SDDLString in the SECURITY_DESCRIPTOR_RELATIVE form. +func getBinarySdSizeFromSDDLString(parsedSDDL SDDLString) uint32 { + // Maximum possible binary SID size. + maxSidBytes := uint32(unsafe.Sizeof(SID{}) + (unsafe.Sizeof(uint32(0)) * SID_MAX_SUB_AUTHORITIES)) + + sdSize := uint32(unsafe.Sizeof(SECURITY_DESCRIPTOR_RELATIVE{})) + + if parsedSDDL.OwnerSID != "" { + sdSize += maxSidBytes + } + + if parsedSDDL.GroupSID != "" { + sdSize += maxSidBytes + } + + if parsedSDDL.DACL.Flags != "" || len(parsedSDDL.DACL.ACLEntries) != 0 { + sdSize += uint32(unsafe.Sizeof(ACL{})) + sdSize += (uint32(unsafe.Sizeof(ACCESS_ALLOWED_ACE{})) + maxSidBytes) * uint32(len(parsedSDDL.DACL.ACLEntries)) + } + + if parsedSDDL.SACL.Flags != "" || len(parsedSDDL.SACL.ACLEntries) != 0 { + sdSize += uint32(unsafe.Sizeof(ACL{})) + sdSize += (uint32(unsafe.Sizeof(ACCESS_ALLOWED_ACE{})) + maxSidBytes) * uint32(len(parsedSDDL.SACL.ACLEntries)) + } + + return sdSize +} + +/**************************************************************************** + ** Exported APIs ** + ****************************************************************************/ + +// GetControl returns the security descriptor control bits. +func GetControl(sd []byte) (SECURITY_DESCRIPTOR_CONTROL, error) { + if len(sd) < 4 { + return 0, fmt.Errorf("SECURITY_DESCRIPTOR too small (%d bytes)", len(sd)) + } + return SECURITY_DESCRIPTOR_CONTROL(binary.LittleEndian.Uint16(sd[2:4])), nil +} + +// SetControl sets the requested control bits in the given security descriptor. +func SetControl(sd []byte, controlBitsOfInterest, controlBitsToSet SECURITY_DESCRIPTOR_CONTROL) error { + // GetControl() also does min length check for sd. + control, err := GetControl(sd) + if err != nil { + return err + } + + control = (control & ^controlBitsOfInterest) | controlBitsToSet + binary.LittleEndian.PutUint16(sd[2:4], uint16(control)) + + return nil +} + +// Convert a possibly non-numeric SID to numeric SID. +func CanonicalizeSid(sidString string) (string, error) { + // Convert to binary SID and back to canonicalize it. + sidSlice, err := stringToSid(sidString) + if err != nil { + return "", err + } + + canonicalSid, err := sidToString(sidSlice) + if err != nil { + return "", err + } + + return canonicalSid, nil +} + +// SecurityDescriptorToString returns an SDDL format string corresponding to the passed in binary Security Descriptor +// in SECURITY_DESCRIPTOR_RELATIVE format. +func SecurityDescriptorToString(sd []byte) (string, error) { + // We support only DACL/Owner/Group. + // TODO: Add support for SACL. + const flags SECURITY_INFORMATION = (DACL_SECURITY_INFORMATION | OWNER_SECURITY_INFORMATION | GROUP_SECURITY_INFORMATION) + + // Ensure Security Descriptor is valid so that rest of the code can safely access various fields. + if err := sdRelativeIsValid(sd, flags); err != nil { + return "", fmt.Errorf("SecurityDescriptorToString: %v", err) + } + + ownerSidString, err := getOwnerSidString(sd) + if err != nil { + return "", fmt.Errorf("SecurityDescriptorToString: getOwnerSidString failed: %v", err) + } + + groupSidString, err := getGroupSidString(sd) + if err != nil { + return "", fmt.Errorf("SecurityDescriptorToString: getGroupSidString failed: %v", err) + } + + daclString, err := getDaclString(sd) + if err != nil { + return "", fmt.Errorf("SecurityDescriptorToString: getDaclString failed: %v", err) + } + + sddlString := ownerSidString + groupSidString + daclString + + return sddlString, nil +} + +// SecurityDescriptorFromString converts a SDDL formatted string into a binary Security Descriptor in +// SECURITY_DESCRIPTOR_RELATIVE format. +func SecurityDescriptorFromString(sddlString string) ([]byte, error) { + + // Since NO_ACCESS_CONTROL friendly flag does not have a corresponding binary flag, we return it separately + // as a boolean. Caller can then act appropriately. + aclFlagsToControlBitmap := func(aclFlags string, forSacl bool) (SECURITY_DESCRIPTOR_CONTROL, bool, error) { + var control SECURITY_DESCRIPTOR_CONTROL = 0 + var no_access_control bool = false + + for i := 0; i < len(aclFlags); { + if aclFlags[i] == 'P' { + if forSacl { + control |= SE_SACL_PROTECTED + } else { + control |= SE_DACL_PROTECTED + } + i++ + } else if aclFlags[i] == 'A' { + if i == len(aclFlags) { + return 0, false, fmt.Errorf("Incomplete ACL Flags, ends at 'A': %s", aclFlags) + } + i++ + if aclFlags[i] == 'R' { // AR. + if forSacl { + control |= SE_SACL_AUTO_INHERIT_REQ + } else { + control |= SE_DACL_AUTO_INHERIT_REQ + } + i++ + } else if aclFlags[i] == 'I' { // AI. + if forSacl { + control |= SE_SACL_AUTO_INHERITED + } else { + control |= SE_DACL_AUTO_INHERITED + } + i++ + } else { + return 0, false, fmt.Errorf("Encountered unsupported ACL Flag '%s' after 'A'", + string(aclFlags[i])) + } + } else if aclFlags[i] == 'N' { + nacLen := len("NO_ACCESS_CONTROL") + if i+nacLen > len(aclFlags) { + return 0, false, fmt.Errorf("Incomplete NO_ACCESS_CONTROL Flag: %s", aclFlags) + } + if aclFlags[i:i+nacLen] == "NO_ACCESS_CONTROL" { + // NO_ACCESS_CONTROL seen. + no_access_control = true + } + i += nacLen + } else { + return 0, false, fmt.Errorf("Encountered unsupported ACL Flag '%s'", string(aclFlags[i])) + } + } + + return control, no_access_control, nil + } + + aceFlagsToByte := func(aceFlags string) (byte, error) { + var flags byte = 0 + + for i := 0; i < len(aceFlags); { + // Must have even number of characters. + if i+1 == len(aceFlags) { + return byte(0), fmt.Errorf("Invalid aceFlags: %s", aceFlags) + } + + flag := aceFlags[i : i+2] + + if flag == "CI" { + flags |= CONTAINER_INHERIT_ACE + } else if flag == "OI" { + flags |= OBJECT_INHERIT_ACE + } else if flag == "NP" { + flags |= NO_PROPAGATE_INHERIT_ACE + } else if flag == "IO" { + flags |= INHERIT_ONLY_ACE + } else if flag == "ID" { + flags |= INHERITED_ACE + } else if flag == "SA" { + flags |= SUCCESSFUL_ACCESS_ACE_FLAG + } else if flag == "FA" { + flags |= FAILED_ACCESS_ACE_FLAG + } else if flag == "TP" { + flags |= TRUST_PROTECTED_FILTER_ACE_FLAG + } else if flag == "CR" { + flags |= CRITICAL_ACE_FLAG + } else { + return byte(0), fmt.Errorf("Unsupported aceFlags: %s", aceFlags) + } + + i += 2 + } + + return flags, nil + } + + aceRightsToAccessMask := func(aceRights string) (uint32, error) { + var accessMask uint32 = 0 + + // Hex right string will start with 0x or 0X. + if len(aceRights) > 2 && (aceRights[0:2] == "0x" || aceRights[0:2] == "0X") { + accessMask, err := strconv.ParseUint(aceRights[2:], 16, 32) + if err != nil { + return 0, fmt.Errorf("Failed to parse integral aceRights %s: %v", aceRights, err) + } + return uint32(accessMask), nil + } + + for i := 0; i < len(aceRights); { + // Must have even number of characters. + if i+1 == len(aceRights) { + return 0, fmt.Errorf("Invalid aceRights: %s", aceRights) + } + + right := aceRights[i : i+2] + + if mask, ok := aceStringToRightsMap[right]; ok { + accessMask |= mask + } else { + return 0, fmt.Errorf("Unknown aceRight(%s): %s", right, aceRights) + } + + i += 2 + } + + return accessMask, nil + } + + aclEntryToSlice := func(aclEntry ACLEntry) ([]byte, error) { + // ace_type;ace_flags;rights;object_guid;inherit_object_guid;account_sid;(resource_attribute) + if len(aclEntry.Sections) != 6 { + return nil, fmt.Errorf("aclEntry has %d sections (expected 6)", len(aclEntry.Sections)) + } + // Maximum possible binary SID size. + maxSidBytes := int(unsafe.Sizeof(SID{}) + (unsafe.Sizeof(uint32(0)) * SID_MAX_SUB_AUTHORITIES)) + + sliceSize := int(unsafe.Sizeof(ACCESS_ALLOWED_ACE{})) + maxSidBytes + ace := make([]byte, sliceSize) + + // Base aceSize. We will add SID size to it to get complete ACE size. + var aceSize uint16 = 8 + + // ACCESS_ALLOWED_ACE.Header.AceType. + if aceType, ok := aceTypeStringMap[aclEntry.Sections[0]]; ok { + ace[0] = byte(aceType) + } else { + return nil, fmt.Errorf("Unknown aceType: %s", aclEntry.Sections[0]) + } + + // ACCESS_ALLOWED_ACE.Header.AceFlags. + flags, err := aceFlagsToByte(aclEntry.Sections[1]) + if err != nil { + return nil, fmt.Errorf("Unknown aceFlag %s: %v", aclEntry.Sections[1], err) + } + ace[1] = flags + + // ACCESS_ALLOWED_ACE.AccessMask. + accessMask, err := aceRightsToAccessMask(aclEntry.Sections[2]) + if err != nil { + return nil, fmt.Errorf("Unknown aceRights %s: %v", aclEntry.Sections[2], err) + } + binary.LittleEndian.PutUint32(ace[4:8], accessMask) + + // TODO: Support object ACEs? + if aclEntry.Sections[3] != "" { + return nil, fmt.Errorf("object_guid not supported: %s", aclEntry.Sections[3]) + } + + if aclEntry.Sections[4] != "" { + return nil, fmt.Errorf("inherit_object_guid not supported: %s", aclEntry.Sections[5]) + } + + if aclEntry.Sections[5] != "" { + sidSlice, err := stringToSid(aclEntry.Sections[5]) + if err != nil { + return nil, fmt.Errorf("Bad SID (%s): %v", aclEntry.Sections[5], err) + } + copy(ace[8:8+len(sidSlice)], sidSlice) + aceSize += uint16(len(sidSlice)) + } + + // ACCESS_ALLOWED_ACE.Header.AceSize. + binary.LittleEndian.PutUint16(ace[2:4], aceSize) + + return ace[:aceSize], nil + } + + // Use sddl.ParseSDDL() instead of reinventing SDDL parsing. + parsedSDDL, err := ParseSDDL(sddlString) + if err != nil { + return nil, fmt.Errorf("ParseSDDL(%s) failed: %v", sddlString, err) + } + + // Allocate a byte slice large enough to contain the binary Security Descriptor in SECURITY_DESCRIPTOR_RELATIVE + // format. + sdSize := getBinarySdSizeFromSDDLString(parsedSDDL) + sd := make([]byte, sdSize) + + // Returned Security Descriptor is in Self Relative format. + // + // Note: We always set SE_DACL_PRESENT as we have observed that Windows always sets that. + // It then uses offsetDacl to control whether ACLs are checked or not. + // offsetDacl==0 would mean that there are no ACLs and hence the file will have the "allow all users" + // permission. + // offsetDacl!=0 would cause the ACEs to be inspected from offsetDacl and if there are no ACEs present it + // would mean "allow nobody". + control := SECURITY_DESCRIPTOR_CONTROL(SE_SELF_RELATIVE | SE_DACL_PRESENT) + offsetOwner := 0 + offsetGroup := 0 + offsetDacl := 0 + offsetSacl := 0 + + // sd.Revision. + sd[0] = SDDL_REVISION + // sd.Sbz1. + sd[1] = 0 + + // OwnerSID follows immediately after SECURITY_DESCRIPTOR_RELATIVE header. + offset := 20 + if parsedSDDL.OwnerSID != "" { + offsetOwner = offset + sidSlice, err := stringToSid(parsedSDDL.OwnerSID) + if err != nil { + return nil, err + } + copy(sd[offset:offset+len(sidSlice)], sidSlice) + offset += len(sidSlice) + } + + if parsedSDDL.GroupSID != "" { + offsetGroup = offset + sidSlice, err := stringToSid(parsedSDDL.GroupSID) + if err != nil { + return nil, err + } + copy(sd[offset:offset+len(sidSlice)], sidSlice) + offset += len(sidSlice) + } + + // TODO: Add and audit SACL support. + if parsedSDDL.SACL.Flags != "" || len(parsedSDDL.SACL.ACLEntries) != 0 { + flags, no_access_control, err := aclFlagsToControlBitmap(parsedSDDL.SACL.Flags, true /* forSacl */) + if err != nil { + return nil, fmt.Errorf("Failed to parse SACL Flags %s: %v", parsedSDDL.SACL.Flags, err) + } + control |= flags + + // If NO_ACCESS_CONTROL flag is set we will skip the following, which will result in offsetSacl to be set as 0 + // in the binary SD, which would mean "No ACLs" aka "allow all users". + if !no_access_control { + offsetSacl = offset + + // ACL.AclRevision. + sd[offsetSacl] = ACL_REVISION + // ACL.Sbz1. + sd[offsetSacl+1] = 0 + + // Base aclSize. We will add ACE sizes to it to get complete ACL size. + var aclSize uint16 = 8 + + // ACL.AceCount. + binary.LittleEndian.PutUint16(sd[offsetSacl+4:offsetSacl+6], uint16(len(parsedSDDL.SACL.ACLEntries))) + // ACL.Sbz2. + binary.LittleEndian.PutUint16(sd[offsetSacl+6:offsetSacl+8], 0) + + offset += 8 // struct ACL. + for i := 0; i < len(parsedSDDL.SACL.ACLEntries); i++ { + aceSlice, err := aclEntryToSlice(parsedSDDL.SACL.ACLEntries[i]) + if err != nil { + return nil, err + } + copy(sd[offset:offset+len(aceSlice)], aceSlice) + offset += len(aceSlice) + aclSize += uint16(len(aceSlice)) + } + + // ACL.AclSize. + binary.LittleEndian.PutUint16(sd[offsetSacl+2:offsetSacl+4], aclSize) + + // Put in the end to prevent "unreachable code" complaints from vet. + panic("SACLs not supported!") + } else { + // If NO_ACCESS_CONTROL flag is set, there shouldn't be any ACEs. + // TODO: Is it safer to skip/ignore the ACEs? + if len(parsedSDDL.SACL.ACLEntries) != 0 { + return nil, fmt.Errorf("%d ACEs present along with NO_ACCESS_CONTROL SACL flag (%s): %v", + len(parsedSDDL.SACL.ACLEntries), parsedSDDL.SACL.Flags, err) + } + } + } + + if parsedSDDL.DACL.Flags != "" || len(parsedSDDL.DACL.ACLEntries) != 0 { + flags, no_access_control, err := aclFlagsToControlBitmap(parsedSDDL.DACL.Flags, false /* forSacl */) + if err != nil { + return nil, fmt.Errorf("Failed to parse DACL Flags %s: %v", parsedSDDL.DACL.Flags, err) + } + control |= flags + + // If NO_ACCESS_CONTROL flag is set we will skip the following, which will result in offsetDacl to be set as 0 + // in the binary SD, which would mean "No ACLs" aka "allow all users". + if !no_access_control { + offsetDacl = offset + + // ACL.AclRevision. + sd[offsetDacl] = ACL_REVISION + // ACL.Sbz1. + sd[offsetDacl+1] = 0 + + // Base aclSize. We will add ACE sizes to it to get complete ACL size. + var aclSize uint16 = 8 + + // ACL.AceCount. + binary.LittleEndian.PutUint16(sd[offsetDacl+4:offsetDacl+6], uint16(len(parsedSDDL.DACL.ACLEntries))) + // ACL.Sbz2. + binary.LittleEndian.PutUint16(sd[offsetDacl+6:offsetDacl+8], 0) + + offset += 8 // struct ACL. + for i := 0; i < len(parsedSDDL.DACL.ACLEntries); i++ { + aceSlice, err := aclEntryToSlice(parsedSDDL.DACL.ACLEntries[i]) + if err != nil { + return nil, err + } + copy(sd[offset:offset+len(aceSlice)], aceSlice) + offset += len(aceSlice) + aclSize += uint16(len(aceSlice)) + } + + // ACL.AclSize. + binary.LittleEndian.PutUint16(sd[offsetDacl+2:offsetDacl+4], aclSize) + } else { + // If NO_ACCESS_CONTROL flag is set, there shouldn't be any ACEs. + // TODO: Is it safer to skip/ignore the ACEs? + if len(parsedSDDL.DACL.ACLEntries) != 0 { + return nil, fmt.Errorf("%d ACEs present along with NO_ACCESS_CONTROL DACL flag (%s): %v", + len(parsedSDDL.DACL.ACLEntries), parsedSDDL.DACL.Flags, err) + } + } + } + + // sd.Control. + binary.LittleEndian.PutUint16(sd[2:4], uint16(control)) + // sd.OffsetOwner. + binary.LittleEndian.PutUint32(sd[4:8], uint32(offsetOwner)) + // sd.OffsetGroup. + binary.LittleEndian.PutUint32(sd[8:12], uint32(offsetGroup)) + // sd.OffsetSacl. + binary.LittleEndian.PutUint32(sd[12:16], uint32(offsetSacl)) + // sd.OffsetDacl. + binary.LittleEndian.PutUint32(sd[16:20], uint32(offsetDacl)) + + return sd[:offset], nil +} + +// SetSecurityObject is the equivalent of ntdll.NtSetSecurityObject method. +// It sets the given SECURITY_DESCRIPTOR for the given file. +// flags instructs what all needs to be set. +// sd should be a valid binary SECURITY_DESCRIPTOR_RELATIVE structure as a byte slice. +func SetSecurityObject(path string, flags SECURITY_INFORMATION, sd []byte) error { + var xattrKey string + + if len(sd) < int(unsafe.Sizeof(SECURITY_DESCRIPTOR_RELATIVE{})) { + panic(fmt.Errorf("SetSecurityObject: sd too small (%d bytes)", len(sd))) + } + + // Pick the right xattr key that allows us to pass the needed information to the cifs client. + if flags == DACL_SECURITY_INFORMATION { + // Only DACL. + xattrKey = common.CIFS_XATTR_CIFS_ACL + + // sd.OffsetOwner = 0. + binary.LittleEndian.PutUint32(sd[4:8], 0) + // sd.OffsetGroup = 0. + binary.LittleEndian.PutUint32(sd[8:12], 0) + // sd.OffsetSacl = 0. + binary.LittleEndian.PutUint32(sd[12:16], 0) + } else if flags == (DACL_SECURITY_INFORMATION | OWNER_SECURITY_INFORMATION | GROUP_SECURITY_INFORMATION) { + // DACL + Owner + Group. + xattrKey = common.CIFS_XATTR_CIFS_NTSD + + // sd.OffsetSacl = 0. + binary.LittleEndian.PutUint32(sd[12:16], 0) + } else if flags == (DACL_SECURITY_INFORMATION | SACL_SECURITY_INFORMATION | + OWNER_SECURITY_INFORMATION | GROUP_SECURITY_INFORMATION) { + // DACL + SACL + Owner + Group. + xattrKey = common.CIFS_XATTR_CIFS_NTSD_FULL + + // Put in the end to prevent "unreachable code" complaints from vet. + // TODO: Add support for "DACL + SACL + Owner + Group". + // Remove this panic only after rest of the code correctly supports SACL. + panic(fmt.Errorf("SetSecurityObject: Unsupported flags value 0x%x", flags)) + + } else { + panic(fmt.Errorf("SetSecurityObject: Unsupported flags value 0x%x", flags)) + } + + // Ensure Security Descriptor is valid before writing to the cifs client. + if err := sdRelativeIsValid(sd, flags); err != nil { + panic(fmt.Errorf("SetSecurityObject: %v", err)) + } + + err := xattr.Set(path, xattrKey, sd) + if err != nil { + return fmt.Errorf("SetSecurityObject: xattr.Set(%s) failed for file %s: %v", xattrKey, path, err) + } + + return nil +} + +// QuerySecurityObject is the equivalent of ntdll.NtQuerySecurityObject method. +// It fetches the binary SECURITY_DESCRIPTOR for the given file. +// 'flags' instructs what parts of the Security Descriptor needs to be queried. +// Returns a valid binary SECURITY_DESCRIPTOR_RELATIVE structure as a byte slice. +func QuerySecurityObject(path string, flags SECURITY_INFORMATION) ([]byte, error) { + var xattrKey string + + // Pick the right xattr key that allows us to pass the needed information to the cifs client. + if flags == DACL_SECURITY_INFORMATION { + // Only DACL. + xattrKey = common.CIFS_XATTR_CIFS_ACL + } else if flags == (DACL_SECURITY_INFORMATION | OWNER_SECURITY_INFORMATION | GROUP_SECURITY_INFORMATION) { + // DACL + Owner + Group. + xattrKey = common.CIFS_XATTR_CIFS_NTSD + } else if flags == (DACL_SECURITY_INFORMATION | SACL_SECURITY_INFORMATION | + OWNER_SECURITY_INFORMATION | GROUP_SECURITY_INFORMATION) { + // DACL + SACL + Owner + Group. + xattrKey = common.CIFS_XATTR_CIFS_NTSD_FULL + + // Put in the end to prevent "unreachable code" complaints from vet. + // TODO: Add support for "DACL + SACL + Owner + Group". + // Remove this panic only after rest of the code correctly supports SACL. + panic(fmt.Errorf("QuerySecurityObject: Unsupported flags value 0x%x", flags)) + } else { + panic(fmt.Errorf("QuerySecurityObject: Unsupported flags value 0x%x", flags)) + } + + sd, err := xattr.Get(path, xattrKey) + if err != nil { + return nil, fmt.Errorf("QuerySecurityObject: xattr.Get(%s, %s) failed: %v", path, xattrKey, err) + } + + // Ensure Security Descriptor returned by the cifs client is fine. + if err := sdRelativeIsValid(sd, flags); err != nil { + // panic because we expect cifs client to return a valid Security Descriptor. + panic(fmt.Errorf("QuerySecurityObject: %v", err)) + } + + return sd, nil +} diff --git a/sddl/sidTranslation_linux.go b/sddl/sidTranslation_linux.go new file mode 100644 index 000000000..4c6f37f0f --- /dev/null +++ b/sddl/sidTranslation_linux.go @@ -0,0 +1,28 @@ +// +build linux + +// Copyright © Microsoft +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in +// all copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +// THE SOFTWARE. + +package sddl + +// Note that all usages of OSTranslateSID gracefully handle the error, rather than throwing the error. +func OSTranslateSID(SID string) (string, error) { + return CanonicalizeSid(SID) +} diff --git a/sddl/sidTranslation_other.go b/sddl/sidTranslation_other.go index 07148f3ce..4a4ab9375 100644 --- a/sddl/sidTranslation_other.go +++ b/sddl/sidTranslation_other.go @@ -1,4 +1,5 @@ // +build !windows +// +build !linux // Copyright © Microsoft // diff --git a/ste/JobPartPlan.go b/ste/JobPartPlan.go index bf9bbd975..ef7cd7638 100644 --- a/ste/JobPartPlan.go +++ b/ste/JobPartPlan.go @@ -409,9 +409,8 @@ func (jppt *JobPartPlanTransfer) SetTransferStatus(status common.TransferStatus, if !overWrite { common.AtomicMorphInt32((*int32)(&jppt.atomicTransferStatus), func(startVal int32) (val int32, morphResult interface{}) { - // start value < 0 means that transfer status is already a failed value. - // If current transfer status has already failed value, then it will not be changed. - return common.Iffint32(startVal < 0, startVal, int32(status)), nil + // If current transfer status has some completed value, then it will not be changed. + return common.Iffint32(common.TransferStatus(startVal).StatusLocked(), startVal, int32(status)), nil }) } else { (&jppt.atomicTransferStatus).AtomicStore(status) diff --git a/ste/concurrency.go b/ste/concurrency.go index 6a5ef2423..6ab9e27d1 100644 --- a/ste/concurrency.go +++ b/ste/concurrency.go @@ -49,7 +49,7 @@ func (i *ConfiguredInt) GetDescription() string { func tryNewConfiguredInt(envVar common.EnvironmentVariable) *ConfiguredInt { override := common.GetLifecycleMgr().GetEnvironmentVariable(envVar) if override != "" { - val, err := strconv.ParseInt(override, 10, 64) + val, err := strconv.ParseInt(override, 10, 32) if err != nil { log.Fatalf("error parsing the env %s %q failed with error %v", envVar.Name, override, err) diff --git a/ste/downloader-azureFiles_linux.go b/ste/downloader-azureFiles_linux.go new file mode 100644 index 000000000..87e11198c --- /dev/null +++ b/ste/downloader-azureFiles_linux.go @@ -0,0 +1,230 @@ +// +build linux + +package ste + +import ( + "encoding/binary" + "fmt" + "net/url" + "path/filepath" + "strings" + "sync" + + "github.com/Azure/azure-storage-azcopy/v10/common" + "github.com/Azure/azure-storage-azcopy/v10/sddl" + "github.com/Azure/azure-storage-file-go/azfile" + + "github.com/pkg/xattr" + "golang.org/x/sys/unix" +) + +// This file implements the linux-triggered smbPropertyAwareDownloader interface. + +// works for both folders and files +func (*azureFilesDownloader) PutSMBProperties(sip ISMBPropertyBearingSourceInfoProvider, txInfo TransferInfo) error { + propHolder, err := sip.GetSMBProperties() + if err != nil { + return fmt.Errorf("Failed to get SMB properties for %s: %w", txInfo.Destination, err) + } + + // Set 32-bit FileAttributes for the file. + setAttributes := func() error { + // This is a safe conversion. + attribs := uint32(propHolder.FileAttributes()) + + xattrbuf := make([]byte, 4) + binary.LittleEndian.PutUint32(xattrbuf, uint32(attribs)) + + err := xattr.Set(txInfo.Destination, common.CIFS_XATTR_ATTRIB, xattrbuf) + if err != nil { + return fmt.Errorf("xattr.Set(%s, %s, 0x%x) failed: %w", + txInfo.Destination, common.CIFS_XATTR_ATTRIB, attribs, err) + } + + return nil + } + + // Set creation time and last write time for the file. + // XXX + // Note: It makes two SMB calls, one for setting the last write time and one for the create time. + // XXX + setDates := func() error { + smbCreation := propHolder.FileCreationTime() + smbLastWrite := propHolder.FileLastWriteTime() + + if txInfo.ShouldTransferLastWriteTime() { + var ts [2]unix.Timespec + + // Don't set atime. + ts[0] = unix.Timespec{unix.UTIME_OMIT, unix.UTIME_OMIT} + + // Set mtime to smbLastWrite. + ts[1] = unix.NsecToTimespec(smbLastWrite.UnixNano()) + + // We follow symlink (no unix.AT_SYMLINK_NOFOLLOW) just like the Windows implementation. + err := unix.UtimesNanoAt(unix.AT_FDCWD, txInfo.Destination, ts[:], 0 /* flags */) + if err != nil { + return fmt.Errorf("unix.UtimesNanoAt failed to set mtime for file %s: %w", + txInfo.Destination, err) + } + } + + // Convert time from "nanoseconds since Unix Epoch" to "ticks since Windows Epoch". + smbCreationTicks := common.UnixNanoToWindowsTicks(smbCreation.UnixNano()) + + xattrbuf := make([]byte, 8) + // This is a safe conversion. + binary.LittleEndian.PutUint64(xattrbuf, uint64(smbCreationTicks)) + + err := xattr.Set(txInfo.Destination, common.CIFS_XATTR_CREATETIME, xattrbuf) + if err != nil { + return fmt.Errorf("xattr.Set(%s, %s, 0x%x) failed: %w", + txInfo.Destination, common.CIFS_XATTR_CREATETIME, smbCreationTicks, err) + } + + return nil + } + + // =========== set file times before we set attributes, to make sure the time-setting doesn't + // reset archive attribute. There's currently no risk of the attribute-setting messing with the times, + // because we only set the last (content) "write time", not the last (metadata) "change time" ===== + + // TODO: Cifs client may cause the ctime to be updated. Need to think in details. + + err = setDates() + if err != nil { + return err + } + return setAttributes() +} + +var globalSetAclMu = &sync.Mutex{} + +// works for both folders and files +func (a *azureFilesDownloader) PutSDDL(sip ISMBPropertyBearingSourceInfoProvider, txInfo TransferInfo) error { + // Let's start by getting our SDDL and parsing it. + sddlString, err := sip.GetSDDL() + + // TODO: be better at handling these errors. + // GetSDDL will fail on a file-level SAS token. + if err != nil { + return fmt.Errorf("Failed to get source SDDL for file %s: %w", txInfo.Destination, err) + } + if sddlString == "" { + // nothing to do (no key returned) + return errorNoSddlFound + } + + // We don't need to worry about making the SDDL string portable as this is expected for persistence into Azure Files in the first place. + sd, err := sddl.SecurityDescriptorFromString(sddlString) + if err != nil { + return fmt.Errorf("Failed to parse SDDL (%s) for file %s: %w", sddlString, txInfo.Destination, err) + } + + ctl, err := sddl.GetControl(sd) + if err != nil { + return fmt.Errorf("Error getting control bits: %w", err) + } + + var securityInfoFlags sddl.SECURITY_INFORMATION = sddl.DACL_SECURITY_INFORMATION + + // remove everything down to the if statement to return to xcopy functionality + // Obtain the destination root and figure out if we're at the top level of the transfer. + destRoot := a.jptm.GetDestinationRoot() + relPath, err := filepath.Rel(destRoot, txInfo.Destination) + if err != nil { + // This should never ever happen. + panic("couldn't find relative path from root") + } + + // Golang did not cooperate with backslashes with filepath.SplitList. + splitPath := strings.Split(relPath, common.DeterminePathSeparator(relPath)) + + // To achieve robocopy like functionality, and maintain the ability to add new permissions in the middle of the copied file tree, + // we choose to protect both already protected files at the source, and to protect the entire root folder of the transfer. + // Protected files and folders experience no inheritance from their parents (but children do experience inheritance) + // To protect the root folder of the transfer, it's not enough to just look at "isTransferRoot" because, in the + // case of downloading a complete share, with strip-top-dir = false (i.e. no trailing /* on the URL), the thing at the transfer + // root is the share, and currently (April 2019) we can't get permissions for the share itself. So we have to "lock"/protect + // the permissions one level down in that case (i.e. for its children). But in the case of downloading from a directory (not the share root) + // then we DO need the check on isAtTransferRoot. + isProtectedAtSource := (ctl & sddl.SE_DACL_PROTECTED) != 0 + isAtTransferRoot := len(splitPath) == 1 + + parsedSDDL, err := sddl.ParseSDDL(sddlString) + if err != nil { + panic(fmt.Sprintf("Sanity check; SDDL failed to parse (downloader-azureFiles_linux.go), %s", err)) // We already parsed it. This is impossible. + } + + /* + via Jason Shay: + One exception is related to the "AI" flag. + If you provide a descriptor to NtSetSecurityObject with just AI (SE_DACL_AUTO_INHERITED), it will not be stored. + If you provide it with SE_DACL_AUTO_INHERITED AND SE_DACL_AUTO_INHERIT_REQ, then SE_DACL_AUTO_INHERITED will be stored (note the _REQ flag is never stored) + + The REST API for Azure Files will see the "AI" in the SDDL, and will do the _REQ flag work in the background for you. + */ + if strings.Contains(parsedSDDL.DACL.Flags, "AI") { + // set the DACL auto-inherit flag, since Windows didn't pick it up for some reason... + err := sddl.SetControl(sd, sddl.SE_DACL_AUTO_INHERITED|sddl.SE_DACL_AUTO_INHERIT_REQ, sddl.SE_DACL_AUTO_INHERITED|sddl.SE_DACL_AUTO_INHERIT_REQ) + if err != nil { + return fmt.Errorf("Failed to persist auto-inherit bit: %w", err) + } + } + + if isProtectedAtSource || isAtTransferRoot || a.parentIsShareRoot(txInfo.Source) { + // TODO: Is setting SE_DACL_PROTECTED control bit equivalent to passing + // PROTECTED_DACL_SECURITY_INFORMATION flag to NtSetSecurityObject()? + // securityInfoFlags |= sddl.PROTECTED_DACL_SECURITY_INFORMATION + err := sddl.SetControl(sd, sddl.SE_DACL_PROTECTED, sddl.SE_DACL_PROTECTED) + if err != nil { + return fmt.Errorf("Failed to set SE_DACL_PROTECTED control bit: %w", err) + } + } + + if txInfo.PreserveSMBPermissions == common.EPreservePermissionsOption.OwnershipAndACLs() { + securityInfoFlags |= sddl.OWNER_SECURITY_INFORMATION | sddl.GROUP_SECURITY_INFORMATION + } + + // Then let's set the security info. + // We don't know or control the order in which we visit + // elements of the tree (e.g. we don't know or care whether we are doing A/B before A/B/C). + // Therefore we must use must use SetNamedSecurityInfo, NOT TreeSetNamedSecurityInfo. + // (TreeSetNamedSecurityInfo, with TREE_SEC_INFO_RESET, would definitely NOT be safe to call in a situation + // where we don't know the order in which we visit elements of the tree). + // TODO: review and potentially remove the use of the global mutex here, once we finish drilling into issues + // observed when setting ACLs concurrently on a test UNC share. + // BTW, testing indicates no measurable perf difference, between using the mutex and not, in the cases tested. + // So it's safe to leave it here for now. + globalSetAclMu.Lock() + + /* + * XXX + * TODO: Why does Windows open the filehandle with InheritHandle set to 1? + * XXX + */ + + defer globalSetAclMu.Unlock() + + err = sddl.SetSecurityObject(txInfo.Destination, securityInfoFlags, sd) + if err != nil { + return fmt.Errorf("permissions could not be restored. It may help to add --%s=false to the AzCopy command line (so that ACLS will be preserved but ownership will not). "+ + " Or, if you want to preserve ownership, then run from a elevated command prompt or from an account in the Backup Operators group, and set the '%s' flag. err=%v", + common.PreserveOwnerFlagName, common.BackupModeFlagName, err) + } + + return err +} + +// TODO: this method may become obsolete if/when we are able to get permissions from the share root +func (a *azureFilesDownloader) parentIsShareRoot(source string) bool { + u, err := url.Parse(source) + if err != nil { + return false + } + f := azfile.NewFileURLParts(*u) + path := f.DirectoryOrFilePath + sep := common.DeterminePathSeparator(path) + splitPath := strings.Split(strings.Trim(path, sep), sep) + return path != "" && len(splitPath) == 1 +} diff --git a/ste/downloader-blob.go b/ste/downloader-blob.go index 2be80b73b..335442b36 100644 --- a/ste/downloader-blob.go +++ b/ste/downloader-blob.go @@ -135,8 +135,9 @@ func (bd *blobDownloader) GenerateDownloadFunc(jptm IJobPartTransferMgr, srcPipe // The retryReader encapsulates any retries that may be necessary while downloading the body jptm.LogChunkStatus(id, common.EWaitReason.Body()) retryReader := get.Body(azblob.RetryReaderOptions{ - MaxRetryRequests: destWriter.MaxRetryPerDownloadBody(), - NotifyFailedRead: common.NewReadLogFunc(jptm, u), + MaxRetryRequests: destWriter.MaxRetryPerDownloadBody(), + NotifyFailedRead: common.NewReadLogFunc(jptm, u), + ClientProvidedKeyOptions: clientProvidedKey, }) defer retryReader.Close() err = destWriter.EnqueueChunk(jptm.Context(), id, length, newPacedResponseBody(jptm.Context(), retryReader, pacer), true) diff --git a/ste/folderCreationTracker.go b/ste/folderCreationTracker.go index c7392b9df..6a7fc3162 100644 --- a/ste/folderCreationTracker.go +++ b/ste/folderCreationTracker.go @@ -37,8 +37,9 @@ func NewFolderCreationTracker(fpo common.FolderPropertyOption, plan *JobPartPlan type nullFolderTracker struct{} -func (f *nullFolderTracker) RecordCreation(folder string) { +func (f *nullFolderTracker) CreateFolder(folder string, doCreation func() error) error { // no-op (the null tracker doesn't track anything) + return doCreation() } func (f *nullFolderTracker) ShouldSetProperties(folder string, overwrite common.OverwriteOption, prompter common.Prompter) bool { @@ -76,12 +77,17 @@ func (f *jpptFolderTracker) RegisterPropertiesTransfer(folder string, transferIn } } -func (f *jpptFolderTracker) RecordCreation(folder string) { +func (f *jpptFolderTracker) CreateFolder(folder string, doCreation func() error) error { f.mu.Lock() defer f.mu.Unlock() if folder == common.Dev_Null { - return // Never persist to dev-null + return nil // Never persist to dev-null + } + + err := doCreation() + if err != nil { + return err } if idx, ok := f.contents[folder]; ok { @@ -92,6 +98,8 @@ func (f *jpptFolderTracker) RecordCreation(folder string) { // Recording it in memory is OK, because we *cannot* resume a job that hasn't finished traversal. f.unregisteredButCreated[folder] = struct{}{} } + + return nil } func (f *jpptFolderTracker) ShouldSetProperties(folder string, overwrite common.OverwriteOption, prompter common.Prompter) bool { diff --git a/ste/jobStatusManager.go b/ste/jobStatusManager.go index ca0ad84b2..62d0c4ac5 100755 --- a/ste/jobStatusManager.go +++ b/ste/jobStatusManager.go @@ -62,7 +62,7 @@ func (jm *jobMgr) statusMgrClosed() bool { func (jm *jobMgr) SendJobPartCreatedMsg(msg JobPartCreatedMsg) { jm.jstm.partCreated <- msg if msg.IsFinalPart { - //Inform statusManager that this is all parts we've + // Inform statusManager that this is all parts we've close(jm.jstm.partCreated) } } @@ -114,10 +114,10 @@ func (jm *jobMgr) handleStatusUpdateMessage() { js.TotalBytesExpected += msg.TotalBytesEnumerated case msg, ok := <-jstm.xferDone: - if !ok { //Channel is closed, all transfers have been attended. + if !ok { // Channel is closed, all transfers have been attended. jstm.xferDone = nil - //close drainXferDone so that other components can know no further updates happen + // close drainXferDone so that other components can know no further updates happen allXferDoneHandled = true close(jstm.xferDoneDrained) continue @@ -128,15 +128,24 @@ func (jm *jobMgr) handleStatusUpdateMessage() { switch msg.TransferStatus { case common.ETransferStatus.Success(): + if msg.IsFolderProperties { + js.FoldersCompleted++ + } js.TransfersCompleted++ js.TotalBytesTransferred += msg.TransferSize case common.ETransferStatus.Failed(), common.ETransferStatus.TierAvailabilityCheckFailure(), common.ETransferStatus.BlobTierFailure(): + if msg.IsFolderProperties { + js.FoldersFailed++ + } js.TransfersFailed++ js.FailedTransfers = append(js.FailedTransfers, msg) case common.ETransferStatus.SkippedEntityAlreadyExists(), common.ETransferStatus.SkippedBlobHasSnapshots(): + if msg.IsFolderProperties { + js.FoldersSkipped++ + } js.TransfersSkipped++ js.SkippedTransfers = append(js.SkippedTransfers, msg) } diff --git a/ste/mgr-JobMgr.go b/ste/mgr-JobMgr.go index 92aebe3db..762e23917 100755 --- a/ste/mgr-JobMgr.go +++ b/ste/mgr-JobMgr.go @@ -23,6 +23,7 @@ package ste import ( "context" "fmt" + "github.com/Azure/azure-storage-blob-go/azblob" "net/http" "runtime" "strings" @@ -111,7 +112,7 @@ type IJobMgr interface { func NewJobMgr(concurrency ConcurrencySettings, jobID common.JobID, appCtx context.Context, cpuMon common.CPUMonitor, level common.LogLevel, commandString string, logFileFolder string, tuner ConcurrencyTuner, pacer PacerAdmin, slicePool common.ByteSlicePooler, cacheLimiter common.CacheLimiter, fileCountLimiter common.CacheLimiter, - jobLogger common.ILoggerResetable, daemonMode bool) IJobMgr { + jobLogger common.ILoggerResetable, daemonMode bool, sourceBlobToken azblob.Credential) IJobMgr { const channelSize = 100000 // PartsChannelSize defines the number of JobParts which can be placed into the // parts channel. Any JobPart which comes from FE and partChannel is full, @@ -187,6 +188,7 @@ func NewJobMgr(concurrency ConcurrencySettings, jobID common.JobID, appCtx conte cpuMon: cpuMon, jstm: &jstm, isDaemon: daemonMode, + sourceBlobToken: sourceBlobToken, /*Other fields remain zero-value until this job is scheduled */} jm.Reset(appCtx, commandString) // One routine constantly monitors the partsChannel. It takes the JobPartManager from @@ -338,7 +340,8 @@ type jobMgr struct { fileCountLimiter common.CacheLimiter jstm *jobStatusManager - isDaemon bool /* is it running as service */ + isDaemon bool /* is it running as service */ + sourceBlobToken azblob.Credential } // ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// @@ -712,13 +715,15 @@ func (jm *jobMgr) CloseLog() { // DeferredCleanupJobMgr cleanup all the jobMgr resources. // Warning: DeferredCleanupJobMgr should be called from JobMgrCleanup(). -// As this function neither threadsafe nor idempotient. So if DeferredCleanupJobMgr called -// mulitple times, it may stuck as receiving channel already closed. Where as JobMgrCleanup() -// safe in that sense it will do the cleanup only once. +// +// As this function neither thread safe nor idempotent. So if DeferredCleanupJobMgr called +// multiple times, it may stuck as receiving channel already closed. Where as JobMgrCleanup() +// safe in that sense it will do the cleanup only once. // // TODO: Add JobsAdmin reference to each JobMgr so that in any circumstances JobsAdmin should not freed, -// while jobMgr running. Whereas JobsAdmin store number JobMgr running at any time. -// At that point DeferredCleanupJobMgr() will delete jobMgr from jobsAdmin map. +// +// while jobMgr running. Whereas JobsAdmin store number JobMgr running at any time. +// At that point DeferredCleanupJobMgr() will delete jobMgr from jobsAdmin map. func (jm *jobMgr) DeferredCleanupJobMgr() { jm.Log(pipeline.LogInfo, "DeferredCleanupJobMgr called") @@ -956,12 +961,12 @@ func (jm *jobMgr) scheduleJobParts() { case jobPart := <-jm.xferChannels.partsChannel: if !startedPoolSizer { - // spin up a GR to co-ordinate dynamic sizing of the main pool + // spin up a GR to coordinate dynamic sizing of the main pool // It will automatically spin up the right number of chunk processors go jm.poolSizer() startedPoolSizer = true } - jobPart.ScheduleTransfers(jm.Context()) + jobPart.ScheduleTransfers(jm.Context(), jm.sourceBlobToken) } } } diff --git a/ste/mgr-JobPartMgr.go b/ste/mgr-JobPartMgr.go index 6e1e1b606..d97acc7d3 100644 --- a/ste/mgr-JobPartMgr.go +++ b/ste/mgr-JobPartMgr.go @@ -28,7 +28,7 @@ var DebugSkipFiles = make(map[string]bool) type IJobPartMgr interface { Plan() *JobPartPlanHeader - ScheduleTransfers(jobCtx context.Context) + ScheduleTransfers(jobCtx context.Context, sourceBlobToken azblob.Credential) StartJobXfer(jptm IJobPartTransferMgr) ReportTransferDone(status common.TransferStatus) uint32 GetOverwriteOption() common.OverwriteOption @@ -340,7 +340,7 @@ func (jpm *jobPartMgr) Plan() *JobPartPlanHeader { } // ScheduleTransfers schedules this job part's transfers. It is called when a new job part is ordered & is also called to resume a paused Job -func (jpm *jobPartMgr) ScheduleTransfers(jobCtx context.Context) { +func (jpm *jobPartMgr) ScheduleTransfers(jobCtx context.Context, sourceBlobToken azblob.Credential) { jobCtx = context.WithValue(jobCtx, ServiceAPIVersionOverride, DefaultServiceApiVersion) jpm.atomicTransfersDone = 0 // Reset the # of transfers done back to 0 // partplan file is opened and mapped when job part is added @@ -377,9 +377,10 @@ func (jpm *jobPartMgr) ScheduleTransfers(jobCtx context.Context) { metadataString := string(dstData.Metadata[:dstData.MetadataLength]) jpm.metadata = common.Metadata{} if len(metadataString) > 0 { - for _, keyAndValue := range strings.Split(metadataString, ";") { // key/value pairs are separated by ';' - kv := strings.Split(keyAndValue, "=") // key/value are separated by '=' - jpm.metadata[kv[0]] = kv[1] + var err error + jpm.metadata, err = common.StringToMetadata(metadataString) + if err != nil { + panic("sanity check: metadata string should be valid at this point: " + metadataString) } } blobTagsStr := string(dstData.BlobTags[:dstData.BlobTagsLength]) @@ -409,7 +410,7 @@ func (jpm *jobPartMgr) ScheduleTransfers(jobCtx context.Context) { jpm.priority = plan.Priority - jpm.createPipelines(jobCtx) // pipeline is created per job part manager + jpm.createPipelines(jobCtx, sourceBlobToken) // pipeline is created per job part manager // *** Schedule this job part's transfers *** for t := uint32(0); t < plan.NumTransfers; t++ { @@ -512,11 +513,13 @@ func (jpm *jobPartMgr) RescheduleTransfer(jptm IJobPartTransferMgr) { jpm.jobMgr.ScheduleTransfer(jpm.priority, jptm) } -func (jpm *jobPartMgr) createPipelines(ctx context.Context) { +func (jpm *jobPartMgr) createPipelines(ctx context.Context, sourceBlobToken azblob.Credential) { if atomic.SwapUint32(&jpm.atomicPipelinesInitedIndicator, 1) != 0 { panic("init client and pipelines for same jobPartMgr twice") } - + if jpm.sourceCredential == nil { + jpm.sourceCredential = sourceBlobToken + } fromTo := jpm.planMMF.Plan().FromTo credInfo := jpm.credInfo if jpm.credInfo.CredentialType == common.ECredentialType.Unknown() { @@ -561,9 +564,10 @@ func (jpm *jobPartMgr) createPipelines(ctx context.Context) { CallerID: fmt.Sprintf("JobID=%v, Part#=%d", jpm.Plan().JobID, jpm.Plan().PartNum), Cancel: jpm.jobMgr.Cancel, } - - sourceCred = common.CreateBlobCredential(ctx, jobState.CredentialInfo.WithType(jobState.S2SSourceCredentialType), credOption) - jpm.sourceCredential = sourceCred + if jpm.sourceCredential == nil { + sourceCred = common.CreateBlobCredential(ctx, jobState.CredentialInfo.WithType(jobState.S2SSourceCredentialType), credOption) + jpm.sourceCredential = sourceCred + } } jpm.sourceProviderPipeline = NewBlobPipeline( @@ -581,8 +585,9 @@ func (jpm *jobPartMgr) createPipelines(ctx context.Context) { // Consider the ADLSG2->ADLSG2 ACLs case if fromTo == common.EFromTo.BlobBlob() && jpm.Plan().PreservePermissions.IsTruthy() { + credential := common.CreateBlobFSCredential(ctx, credInfo, credOption) jpm.secondarySourceProviderPipeline = NewBlobFSPipeline( - azbfs.NewAnonymousCredential(), + credential, azbfs.PipelineOptions{ Log: jpm.jobMgr.PipelineLogInfo(), Telemetry: azbfs.TelemetryOptions{ diff --git a/ste/mgr-JobPartTransferMgr.go b/ste/mgr-JobPartTransferMgr.go index 4fa87e607..1a7160761 100644 --- a/ste/mgr-JobPartTransferMgr.go +++ b/ste/mgr-JobPartTransferMgr.go @@ -341,7 +341,7 @@ func (jptm *jobPartTransferMgr) Info() TransferInfo { // does not exceeds 50000 (max number of block per blob) if blockSize == 0 { blockSize = common.DefaultBlockBlobBlockSize - for ; uint32(sourceSize/blockSize) > common.MaxNumberOfBlocksPerBlob; blockSize = 2 * blockSize { + for ; sourceSize >= common.MaxNumberOfBlocksPerBlob * blockSize; blockSize = 2 * blockSize { if blockSize > common.BlockSizeThreshold { /* * For a RAM usage of 0.5G/core, we would have 4G memory on typical 8 core device, meaning at a blockSize of 256M, @@ -433,9 +433,9 @@ func (jptm *jobPartTransferMgr) FileCountLimiter() common.CacheLimiter { // As at Oct 2019, cases where we mutate destination names are // (i) when destination is Windows or Azure Files, and source contains characters unsupported at the destination // (ii) when downloading with --decompress and there are two files that differ only in an extension that will will strip -// e.g. foo.txt and foo.txt.gz (if we decompress the latter, we'll strip the extension and the names will collide) +//e.g. foo.txt and foo.txt.gz (if we decompress the latter, we'll strip the extension and the names will collide) // (iii) For completeness, there's also bucket->container name resolution when copying from S3, but that is not expected to ever -// create collisions, since it already takes steps to prevent them. +//create collisions, since it already takes steps to prevent them. func (jptm *jobPartTransferMgr) WaitUntilLockDestination(ctx context.Context) error { if strings.EqualFold(jptm.Info().Destination, common.Dev_Null) { return nil // nothing to lock @@ -977,4 +977,4 @@ func (jptm *jobPartTransferMgr) ShouldInferContentType() bool { func (jptm *jobPartTransferMgr) SuccessfulBytesTransferred() int64 { return atomic.LoadInt64(&jptm.atomicSuccessfulBytes) -} \ No newline at end of file +} diff --git a/ste/s2sCopier-URLToBlob.go b/ste/s2sCopier-URLToBlob.go index 750717ce8..82618cfd2 100644 --- a/ste/s2sCopier-URLToBlob.go +++ b/ste/s2sCopier-URLToBlob.go @@ -46,7 +46,7 @@ func newURLToBlobCopier(jptm IJobPartTransferMgr, destination string, p pipeline pipeline.LogInfo, srcInfoProvider.RawSource(), destination, - fmt.Sprintf("BlobType has been explictly set to %q for destination blob.", blobTypeOverride)) + fmt.Sprintf("BlobType has been explicitly set to %q for destination blob.", blobTypeOverride)) } } else { if blobSrcInfoProvider, ok := srcInfoProvider.(IBlobSourceInfoProvider); ok { // If source is a blob, detect the source blob type. diff --git a/ste/sender-azureFile.go b/ste/sender-azureFile.go index fea562a39..70a956a46 100644 --- a/ste/sender-azureFile.go +++ b/ste/sender-azureFile.go @@ -336,7 +336,7 @@ func (u *azureFileSenderBase) Epilogue() { // 2. The service started updating the last-write-time in March 2021 when the file is modified. // So when we uploaded the ranges, we've unintentionally changed the last-write-time. if u.jptm.IsLive() && u.jptm.Info().PreserveSMBInfo { - //This is an extra round trip, but we can live with that for these relatively rare cases + // This is an extra round trip, but we can live with that for these relatively rare cases _, err := u.fileURL().SetHTTPHeaders(u.ctx, u.headersToApply) if err != nil { u.jptm.FailActiveSend("Applying final attribute settings", err) @@ -435,7 +435,7 @@ func (AzureFileParentDirCreator) verifyAndHandleCreateErrors(err error) error { return nil } -// splitWithoutToken splits string with a given token, and returns splitted results without token. +// splitWithoutToken splits string with a given token, and returns split results without token. func (AzureFileParentDirCreator) splitWithoutToken(str string, token rune) []string { return strings.FieldsFunc(str, func(c rune) bool { return c == token @@ -464,16 +464,12 @@ func (d AzureFileParentDirCreator) CreateDirToRoot(ctx context.Context, dirURL a // Try to create the directories for i := 0; i < len(segments); i++ { curDirURL = curDirURL.NewDirectoryURL(segments[i]) - // TODO: Persist permissions on folders. - _, err := curDirURL.Create(ctx, azfile.Metadata{}, azfile.SMBProperties{}) - if err == nil { - // We did create it, so record that fact. I.e. THIS job created the folder. - // Must do it here, in the routine that is shared by both the folder and the file code, - // because due to the parallelism of AzCopy, we don't know which will get here first, file code, or folder code. - dirUrl := curDirURL.URL() - dirUrl.RawQuery = "" - t.RecordCreation(dirUrl.String()) - } + recorderURL := curDirURL.URL() + recorderURL.RawQuery = "" + err = t.CreateFolder(recorderURL.String(), func() error { + _, err := curDirURL.Create(ctx, azfile.Metadata{}, azfile.SMBProperties{}) + return err + }) if verifiedErr := d.verifyAndHandleCreateErrors(err); verifiedErr != nil { return verifiedErr } diff --git a/ste/sender-blobFS.go b/ste/sender-blobFS.go index a4ae83162..82ac5a514 100644 --- a/ste/sender-blobFS.go +++ b/ste/sender-blobFS.go @@ -198,15 +198,14 @@ func (u *blobFSSenderBase) doEnsureDirExists(d azbfs.DirectoryURL) error { if d.IsFileSystemRoot() { return nil // nothing to do, there's no directory component to create } - - _, err := d.Create(u.jptm.Context(), false) - if err == nil { - // must always do this, regardless of whether we are called in a file-centric code path - // or a folder-centric one, since with the parallelism we use, we don't actually - // know which will happen first - dirUrl := d.URL() - u.jptm.GetFolderCreationTracker().RecordCreation(dirUrl.String()) - } + // must always do this, regardless of whether we are called in a file-centric code path + // or a folder-centric one, since with the parallelism we use, we don't actually + // know which will happen first + dirUrl := d.URL() + err := u.jptm.GetFolderCreationTracker().CreateFolder(dirUrl.String(), func() error { + _, err := d.Create(u.jptm.Context(), false) + return err + }) if stgErr, ok := err.(azbfs.StorageError); ok && stgErr.ServiceCode() == azbfs.ServiceCodePathAlreadyExists { return nil // not a error as far as we are concerned. It just already exists } diff --git a/ste/sender-blobFolders.go b/ste/sender-blobFolders.go index 151c9efda..4b4904055 100644 --- a/ste/sender-blobFolders.go +++ b/ste/sender-blobFolders.go @@ -3,6 +3,7 @@ package ste import ( "fmt" "github.com/Azure/azure-pipeline-go/pipeline" + "github.com/Azure/azure-storage-azcopy/v10/azbfs" "github.com/Azure/azure-storage-azcopy/v10/common" "github.com/Azure/azure-storage-blob-go/azblob" "net/url" @@ -53,6 +54,57 @@ func newBlobFolderSender(jptm IJobPartTransferMgr, destination string, p pipelin return out, nil } +func (b *blobFolderSender) setDatalakeACLs() { + bURLParts := azblob.NewBlobURLParts(b.destination.URL()) + bURLParts.BlobName = strings.TrimSuffix(bURLParts.BlobName, "/") // BlobFS does not like when we target a folder with the / + bURLParts.Host = strings.ReplaceAll(bURLParts.Host, ".blob", ".dfs") + // todo: jank, and violates the principle of interfaces + fileURL := azbfs.NewFileURL(bURLParts.URL(), b.jptm.(*jobPartTransferMgr).jobPartMgr.(*jobPartMgr).secondaryPipeline) + + // We know for a fact our source is a "blob". + acl, err := b.sip.(*blobSourceInfoProvider).AccessControl() + if err != nil { + b.jptm.FailActiveSend("Grabbing source ACLs", err) + } + acl.Permissions = "" // Since we're sending the full ACL, Permissions is irrelevant. + _, err = fileURL.SetAccessControl(b.jptm.Context(), acl) + if err != nil { + b.jptm.FailActiveSend("Putting ACLs", err) + } +} + +func (b *blobFolderSender) overwriteDFSProperties() (string, error) { + b.jptm.Log(pipeline.LogWarning, "It is impossible to completely overwrite a folder with existing content under it on a hierarchical namespace storage account. A best-effort attempt will be made, but if CPK does not match the transfer will fail.") + + b.metadataToApply["hdi_isfolder"] = "true" // Set folder metadata flag + err := b.getExtraProperties() + if err != nil { + return "Get Extra Properties", fmt.Errorf("when getting additional folder properties: %w", err) + } + + // SetMetadata can set CPK if it wasn't specified prior. This is not a "full" overwrite, but a best-effort overwrite. + _, err = b.destination.SetMetadata(b.jptm.Context(), b.metadataToApply, azblob.BlobAccessConditions{}, b.cpkToApply) + if err != nil { + return "Set Metadata", fmt.Errorf("A best-effort overwrite was attempted; CPK errors cannot be handled when the blob cannot be deleted.\n%w", err) + } + + _, err = b.destination.SetTags(b.jptm.Context(), nil, nil, nil, b.blobTagsToApply) + if err != nil { + return "Set Blob Tags", err + } + _, err = b.destination.SetHTTPHeaders(b.jptm.Context(), b.headersToAppply, azblob.BlobAccessConditions{}) + if err != nil { + return "Set HTTP Headers", err + } + + // Upload ADLS Gen 2 ACLs + if b.jptm.FromTo() == common.EFromTo.BlobBlob() && b.jptm.Info().PreserveSMBPermissions.IsTruthy() { + b.setDatalakeACLs() + } + + return "", nil +} + func (b *blobFolderSender) EnsureFolderExists() error { t := b.jptm.GetFolderCreationTracker() @@ -72,6 +124,17 @@ func (b *blobFolderSender) EnsureFolderExists() error { if t.ShouldSetProperties(b.DirUrlToString(), b.jptm.GetOverwriteOption(), b.jptm.GetOverwritePrompter()) { _, err := b.destination.Delete(b.jptm.Context(), azblob.DeleteSnapshotsOptionNone, azblob.BlobAccessConditions{}) if err != nil { + if stgErr, ok := err.(azblob.StorageError); ok { + if stgErr.ServiceCode() == "DirectoryIsNotEmpty" { // this is DFS, and we cannot do a standard replacement on it. Opt to simply overwrite the properties. + where, err := b.overwriteDFSProperties() + if err != nil { + return fmt.Errorf("%w. When %s", err, where) + } + + return nil + } + } + return fmt.Errorf("when deleting existing blob: %w", err) } } else { @@ -90,20 +153,34 @@ func (b *blobFolderSender) EnsureFolderExists() error { return fmt.Errorf("when getting additional folder properties: %w", err) } - _, err = b.destination.Upload(b.jptm.Context(), - strings.NewReader(""), - b.headersToAppply, - b.metadataToApply, - azblob.BlobAccessConditions{}, - azblob.DefaultAccessTier, // It doesn't make sense to use a special access tier, the blob will be 0 bytes. - b.blobTagsToApply, - b.cpkToApply, - azblob.ImmutabilityPolicyOptions{}) + err = t.CreateFolder(b.DirUrlToString(), func() error { + _, err := b.destination.Upload(b.jptm.Context(), + strings.NewReader(""), + b.headersToAppply, + b.metadataToApply, + azblob.BlobAccessConditions{}, + azblob.DefaultAccessTier, // It doesn't make sense to use a special access tier, the blob will be 0 bytes. + b.blobTagsToApply, + b.cpkToApply, + azblob.ImmutabilityPolicyOptions{}) + + return err + }) + if err != nil { return fmt.Errorf("when creating folder: %w", err) } - t.RecordCreation(b.DirUrlToString()) + // Upload ADLS Gen 2 ACLs + if b.jptm.FromTo() == common.EFromTo.BlobBlob() && b.jptm.Info().PreserveSMBPermissions.IsTruthy() { + b.setDatalakeACLs() + } + + return nil + + if err != nil { + return err + } return folderPropertiesSetInCreation{} } diff --git a/ste/sender-blockBlob.go b/ste/sender-blockBlob.go index 3a0085b5f..383d52fc2 100644 --- a/ste/sender-blockBlob.go +++ b/ste/sender-blockBlob.go @@ -91,7 +91,7 @@ func getVerifiedChunkParams(transferInfo TransferInfo, memLimit int64) (chunkSiz if chunkSize > common.MaxBlockBlobBlockSize { // mercy, please - err = fmt.Errorf("block size of %.2fGiB for file %s of size %.2fGiB exceeds maxmimum allowed block size for a BlockBlob", + err = fmt.Errorf("block size of %.2fGiB for file %s of size %.2fGiB exceeds maximum allowed block size for a BlockBlob", toGiB(chunkSize), transferInfo.Source, toGiB(transferInfo.SourceSize)) return } @@ -273,6 +273,28 @@ func (s *blockBlobSenderBase) Cleanup() { } } +//Currently we've common Metadata Copier across all senders for block blob. +func (s *blockBlobSenderBase) GenerateCopyMetadata(id common.ChunkID) chunkFunc { + return createChunkFunc(true, s.jptm, id, func() { + if unixSIP, ok := s.sip.(IUNIXPropertyBearingSourceInfoProvider); ok { + // Clone the metadata before we write to it, we shouldn't be writing to the same metadata as every other blob. + s.metadataToApply = common.Metadata(s.metadataToApply).Clone().ToAzBlobMetadata() + + statAdapter, err := unixSIP.GetUNIXProperties() + if err != nil { + s.jptm.FailActiveSend("GetUNIXProperties", err) + } + + common.AddStatToBlobMetadata(statAdapter, s.metadataToApply) + } + _, err := s.destBlockBlobURL.SetMetadata(s.jptm.Context(), s.metadataToApply, azblob.BlobAccessConditions{}, s.cpkToApply) + if err != nil { + s.jptm.FailActiveSend("Setting Metadata", err) + return + } + }) +} + func (s *blockBlobSenderBase) setBlockID(index int32, value string) { s.muBlockIDs.Lock() defer s.muBlockIDs.Unlock() diff --git a/ste/sender.go b/ste/sender.go index 446ecdc1d..4e59733c5 100644 --- a/ste/sender.go +++ b/ste/sender.go @@ -66,6 +66,17 @@ type sender interface { GetDestinationLength() (int64, error) } +////////////////////////////////////////////////////////////////////////////////////////////////// +// propertiesSender is a sender that can copy properties like metadata/tags/tier alone to +// to destination instead of full copy +// + +type propertiesSender interface { + sender + + GenerateCopyMetadata(id common.ChunkID) chunkFunc +} + ///////////////////////////////////////////////////////////////////////////////////////////////// // folderSender is a sender that also knows how to send folder property information ///////////////////////////////////////////////////////////////////////////////////////////////// @@ -107,9 +118,9 @@ type s2sCopier interface { type s2sCopierFactory func(jptm IJobPartTransferMgr, srcInfoProvider IRemoteSourceInfoProvider, destination string, p pipeline.Pipeline, pacer pacer) (s2sCopier, error) -///////////////////////////////////////////////////////////////////////////////////////////////// +// /////////////////////////////////////////////////////////////////////////////////////////////// // Abstraction of the methods needed to upload one file to a remote location -///////////////////////////////////////////////////////////////////////////////////////////////// +// /////////////////////////////////////////////////////////////////////////////////////////////// type uploader interface { sender diff --git a/ste/sender_blockBlob_test.go b/ste/sender_blockBlob_test.go index ec5fae8b5..3262550f0 100644 --- a/ste/sender_blockBlob_test.go +++ b/ste/sender_blockBlob_test.go @@ -46,7 +46,7 @@ func (s *blockBlobSuite) TestGetVerifiedChunkParams(c *chk.C) { // Verify large block Size memLimit = int64(8388608000) // 8000MiB - expectedErr = fmt.Sprintf("block size of 3.91GiB for file tmpSrc of size 7.81GiB exceeds maxmimum allowed block size for a BlockBlob") + expectedErr = fmt.Sprintf("block size of 3.91GiB for file tmpSrc of size 7.81GiB exceeds maximum allowed block size for a BlockBlob") _, _, err = getVerifiedChunkParams(transferInfo, memLimit) c.Assert(err.Error(), chk.Equals, expectedErr) diff --git a/ste/sourceInfoProvider-Local_linux.go b/ste/sourceInfoProvider-Local_linux.go index 16f47615c..73a03f429 100644 --- a/ste/sourceInfoProvider-Local_linux.go +++ b/ste/sourceInfoProvider-Local_linux.go @@ -4,8 +4,12 @@ package ste import ( + "fmt" "github.com/Azure/azure-storage-azcopy/v10/common" + "github.com/Azure/azure-storage-azcopy/v10/sddl" + "github.com/Azure/azure-storage-file-go/azfile" "golang.org/x/sys/unix" + "strings" "time" ) @@ -162,3 +166,63 @@ func (s statTAdapter) MTime() time.Time { func (s statTAdapter) CTime() time.Time { return time.Unix(s.Ctim.Unix()) } + +// This file os-triggers the ISMBPropertyBearingSourceInfoProvider interface on a local SIP. +// Note: Linux SIP doesn't implement the ICustomLocalOpener since it doesn't need to do anything special, unlike +// Windows where we need to pass FILE_FLAG_BACKUP_SEMANTICS flag for opening file. + +func (f localFileSourceInfoProvider) GetSDDL() (string, error) { + // We only need Owner, Group, and DACLs for azure files, CIFS_XATTR_CIFS_NTSD gets us that. + const securityInfoFlags sddl.SECURITY_INFORMATION = sddl.DACL_SECURITY_INFORMATION | sddl.OWNER_SECURITY_INFORMATION | sddl.GROUP_SECURITY_INFORMATION + + // Query the Security Descriptor object for the given file. + sd, err := sddl.QuerySecurityObject(f.jptm.Info().Source, securityInfoFlags) + if err != nil { + return "", fmt.Errorf("sddl.QuerySecurityObject(%s, 0x%x) failed: %w", + f.jptm.Info().Source, securityInfoFlags, err) + } + + // Convert the binary Security Descriptor to string in SDDL format. + // This is the Windows equivalent of ConvertSecurityDescriptorToStringSecurityDescriptorW(). + sdStr, err := sddl.SecurityDescriptorToString(sd) + if err != nil { + // Panic, as it's unexpected and we would want to know. + panic(fmt.Errorf("Cannot parse binary Security Descriptor returned by QuerySecurityObject(%s, 0x%x): %v", f.jptm.Info().Source, securityInfoFlags, err)) + } + + fSDDL, err := sddl.ParseSDDL(sdStr) + if err != nil { + return "", fmt.Errorf("sddl.ParseSDDL(%s) failed: %w", sdStr, err) + } + + if strings.TrimSpace(fSDDL.String()) != strings.TrimSpace(sdStr) { + panic("SDDL sanity check failed (parsed string output != original string)") + } + + return fSDDL.PortableString(), nil +} + +func (f localFileSourceInfoProvider) GetSMBProperties() (TypedSMBPropertyHolder, error) { + info, err := common.GetFileInformation(f.jptm.Info().Source) + + return HandleInfo{info}, err +} + +type HandleInfo struct { + common.ByHandleFileInformation +} + +func (hi HandleInfo) FileCreationTime() time.Time { + // This returns nanoseconds since Unix Epoch. + return time.Unix(0, hi.CreationTime.Nanoseconds()) +} + +func (hi HandleInfo) FileLastWriteTime() time.Time { + // This returns nanoseconds since Unix Epoch. + return time.Unix(0, hi.LastWriteTime.Nanoseconds()) +} + +func (hi HandleInfo) FileAttributes() azfile.FileAttributeFlags { + // Can't shorthand it because the function name overrides. + return azfile.FileAttributeFlags(hi.ByHandleFileInformation.FileAttributes) +} diff --git a/ste/xfer-anyToRemote-file.go b/ste/xfer-anyToRemote-file.go index 81952a6a6..6433be093 100644 --- a/ste/xfer-anyToRemote-file.go +++ b/ste/xfer-anyToRemote-file.go @@ -81,7 +81,7 @@ func prepareDestAccountInfo(bURL azblob.BlobURL, jptm IJobPartTransferMgr, ctx c } } -//// TODO: Infer availability based upon blob size as well, for premium page blobs. +// // TODO: Infer availability based upon blob size as well, for premium page blobs. func BlobTierAllowed(destTier azblob.AccessTierType) bool { // If we failed to get the account info, just return true. // This is because we can't infer whether it's possible or not, and the setTier operation could possibly succeed (or fail) @@ -181,6 +181,8 @@ func anyToRemote(jptm IJobPartTransferMgr, p pipeline.Pipeline, pacer pacer, sen if info.IsFolderPropertiesTransfer() { anyToRemote_folder(jptm, info, p, pacer, senderFactory, sipf) + } else if (jptm.GetOverwriteOption() == common.EOverwriteOption.PosixProperties() && info.EntityType == common.EEntityType.File()) { + anyToRemote_fileProperties(jptm, info, p, pacer, senderFactory, sipf) } else { anyToRemote_file(jptm, info, p, pacer, senderFactory, sipf) } @@ -499,7 +501,7 @@ func epilogueWithCleanupSendToRemote(jptm IJobPartTransferMgr, s sender, sip ISo defer jptm.LogChunkStatus(pseudoId, common.EWaitReason.ChunkDone()) // normal setting to done doesn't apply to these pseudo ids if jptm.WasCanceled() { - // This is where we detect that transfer has been cancelled. Further statments do not act on + // This is where we detect that transfer has been cancelled. Further statements do not act on // dead jptm. We set the status here. jptm.SetStatus(common.ETransferStatus.Cancelled()) } diff --git a/ste/xfer-anyToRemote-fileProperties.go b/ste/xfer-anyToRemote-fileProperties.go new file mode 100644 index 000000000..c06c4b5ca --- /dev/null +++ b/ste/xfer-anyToRemote-fileProperties.go @@ -0,0 +1,84 @@ +// Copyright © 2017 Microsoft +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in +// all copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +// THE SOFTWARE. + +package ste + +import ( + "github.com/Azure/azure-pipeline-go/pipeline" + "github.com/Azure/azure-storage-azcopy/v10/common" +) + +// anyToRemote_folder handles all kinds of sender operations for FOLDERS - both uploads from local files, and S2S copies +func anyToRemote_fileProperties(jptm IJobPartTransferMgr, info TransferInfo, p pipeline.Pipeline, pacer pacer, senderFactory senderFactory, sipf sourceInfoProviderFactory) { + // schedule the work as a chunk, so it will run on the main goroutine pool, instead of the + // smaller "transfer initiation pool", where this code runs. + id := common.NewChunkID(jptm.Info().Source, 0, 0) + jptm.LogChunkStatus(id, common.EWaitReason.XferStart()) + + // step 1. perform initial checks + if jptm.WasCanceled() { + /* This is earliest we detect that jptm has been cancelled before we reach destination */ + jptm.SetStatus(common.ETransferStatus.Cancelled()) + jptm.ReportTransferDone() + return + } + + // step 2a. Create sender + srcInfoProvider, err := sipf(jptm) + if err != nil { + jptm.LogSendError(info.Source, info.Destination, err.Error(), 0) + jptm.SetStatus(common.ETransferStatus.Failed()) + jptm.ReportTransferDone() + return + } + + if (jptm.GetOverwriteOption() != common.EOverwriteOption.PosixProperties() || + srcInfoProvider.EntityType() != common.EEntityType.File()) { + panic("configuration error. Source Info Provider does not have FileProperties entity type") + } + + baseSender, err := senderFactory(jptm, info.Destination, p, pacer, srcInfoProvider) + if err != nil { + jptm.LogSendError(info.Source, info.Destination, err.Error(), 0) + jptm.SetStatus(common.ETransferStatus.Failed()) + jptm.ReportTransferDone() + return + } + s, ok := baseSender.(propertiesSender) + if !ok { + jptm.LogSendError(info.Source, info.Destination, "sender implementation does not support copying properties alone", 0) + jptm.SetStatus(common.ETransferStatus.Failed()) + jptm.ReportTransferDone() + return + } + + jptm.LogChunkStatus(id, common.EWaitReason.LockDestination()) + err = jptm.WaitUntilLockDestination(jptm.Context()) + if err != nil { + jptm.LogSendError(info.Source, info.Destination, err.Error(), 0) + jptm.SetStatus(common.ETransferStatus.Failed()) + jptm.ReportTransferDone() + return + } + + jptm.SetNumberOfChunks(1) + jptm.SetActionAfterLastChunk(func() { commonSenderCompletion(jptm, baseSender, info) }) // for consistency run standard Epilogue + jptm.ScheduleChunks(s.GenerateCopyMetadata(id)) // Just one chunk to schedule +} diff --git a/ste/xfer-remoteToLocal-file.go b/ste/xfer-remoteToLocal-file.go index 84283d1bd..611a120af 100644 --- a/ste/xfer-remoteToLocal-file.go +++ b/ste/xfer-remoteToLocal-file.go @@ -21,6 +21,7 @@ package ste import ( + "encoding/base64" "errors" "fmt" "io" @@ -353,7 +354,7 @@ func epilogueWithCleanupDownload(jptm IJobPartTransferMgr, dl downloader, active } // check if we need to rename back to original name. At this point, we're sure the file is completely - // downloaded and not corrupt. Infact, post this point we should only log errors and + // downloaded and not corrupt. In fact, post this point we should only log errors and // not fail the transfer. renameNecessary := !strings.EqualFold(info.getDownloadPath(), info.Destination) && !strings.EqualFold(info.Destination, common.Dev_Null) @@ -394,6 +395,7 @@ func epilogueWithCleanupDownload(jptm IJobPartTransferMgr, dl downloader, active } func commonDownloaderCompletion(jptm IJobPartTransferMgr, info TransferInfo, entityType common.EntityType) { +redoCompletion: // note that we do not really know whether the context was canceled because of an error, or because the user asked for it // if was an intentional cancel, the status is still "in progress", so we are still counting it as pending // we leave these transfer status alone @@ -417,6 +419,25 @@ func commonDownloaderCompletion(jptm IJobPartTransferMgr, info TransferInfo, ent panic("reached branch where jptm is assumed to be live, but it isn't") } + // Attempt to put MD5 data if necessary, compliant with the sync hash scheme + if jptm.ShouldPutMd5() { + fi, err := os.Stat(info.Destination) + if err != nil { + jptm.FailActiveDownload("saving MD5 data (stat to pull LMT)", err) + goto redoCompletion // let fail as expected + } + + err = common.PutHashData(info.Destination, common.SyncHashData{ + Mode: common.ESyncHashType.MD5(), + LMT: fi.ModTime(), + Data: base64.StdEncoding.EncodeToString(info.SrcHTTPHeaders.ContentMD5), + }) + if err != nil { + jptm.FailActiveDownload("saving MD5 data (writing alternate data stream)", err) + goto redoCompletion // let fail as expected + } + } + // We know all chunks are done (because this routine was called) // and we know the transfer didn't fail (because just checked its status above), // so it must have succeeded. So make sure its not left "in progress" state @@ -472,7 +493,7 @@ func tryDeleteFile(info TransferInfo, jptm IJobPartTransferMgr) { } // Returns the path of file to be downloaded. If we want to -// download to a temp path we return a temp paht in format +// download to a temp path we return a temp path in format // /actual/parent/path/.azDownload-- func (info *TransferInfo) getDownloadPath() string { if common.GetLifecycleMgr().DownloadToTempPath() { diff --git a/testSuite/cmd/testblob.go b/testSuite/cmd/testblob.go index 7cb99829c..78741ac28 100644 --- a/testSuite/cmd/testblob.go +++ b/testSuite/cmd/testblob.go @@ -206,7 +206,7 @@ func verifyBlockBlobDirUpload(testBlobCmd TestBlobCommand) { // opening the file locally and memory mapping it. sFileInfo, err := os.Stat(objectLocalPath) if err != nil { - fmt.Println("error geting the subject blob file info on local disk ") + fmt.Println("error getting the subject blob file info on local disk ") os.Exit(1) } diff --git a/testSuite/scripts/test_autodetect_blob_type.py b/testSuite/scripts/test_autodetect_blob_type.py index 36adbe467..4dfae50da 100644 --- a/testSuite/scripts/test_autodetect_blob_type.py +++ b/testSuite/scripts/test_autodetect_blob_type.py @@ -36,7 +36,7 @@ def test_copy_infer_blob_type_from_files_to_page_blob(self): file_name = "testS2SVHD.vhd" containerName = util.get_resource_name("s2sbtautodetect") - # These run on seperate accounts in CI, so even without "dst", it's OK. + # These run on separate accounts in CI, so even without "dst", it's OK. # Needed this to run on a single account, though. dstbase = util.get_object_sas(util.test_s2s_dst_blob_account_url, containerName + "dst") srcbase = util.get_object_sas(util.test_s2s_src_file_account_url, containerName) @@ -77,7 +77,7 @@ def test_copy_detect_blob_type_from_blob_to_blob(self): file_name = "testS2SVHD.vhd" containerName = util.get_resource_name("s2sbtautodetect") - # These run on seperate accounts in CI, so even without "dst", it's OK. + # These run on separate accounts in CI, so even without "dst", it's OK. # Needed this to run on a single account, though. dstbase = util.get_object_sas(util.test_s2s_dst_blob_account_url, containerName + "dst") srcbase = util.get_object_sas(util.test_s2s_src_blob_account_url, containerName) diff --git a/testSuite/scripts/test_blob_download.py b/testSuite/scripts/test_blob_download.py index 07e94e375..0e0fa9f3c 100644 --- a/testSuite/scripts/test_blob_download.py +++ b/testSuite/scripts/test_blob_download.py @@ -119,7 +119,7 @@ def test_blob_download_63mb_in_4mb(self): result = util.Command("testBlob").add_arguments(file_path).add_arguments(destination_sas).execute_azcopy_verify() self.assertTrue(result) - # downloading the created parallely in blocks of 4mb file through azcopy. + # downloading the created parallelly in blocks of 4mb file through azcopy. download_file = util.test_directory_path + "/test_63mb_in4mb_download.txt" result = util.Command("copy").add_arguments(destination_sas).add_arguments(download_file)\ .add_flags("log-level","info").add_flags("block-size-mb", "4").execute_azcopy_copy_command() diff --git a/testSuite/scripts/test_blob_piping.py b/testSuite/scripts/test_blob_piping.py index 1e4642a5f..4522d1975 100644 --- a/testSuite/scripts/test_blob_piping.py +++ b/testSuite/scripts/test_blob_piping.py @@ -105,7 +105,7 @@ def execute_command_with_pipe(command, source_file_to_pipe=None, destination_fil # if piping azcopy's output to a file if destination_file_to_pipe is not None: with open(destination_file_to_pipe, "wb") as output, open('fake_input.txt', 'wb') as fake_input: - # an emtpy file is used as stdin because if None was specified, then the subprocess would + # an empty file is used as stdin because if None was specified, then the subprocess would # inherit the parent's stdin pipe, this is a limitation of the subprocess package try: subprocess.check_call(shlex.split(command), stdin=fake_input, stdout=output, timeout=360) diff --git a/testSuite/scripts/test_file_download.py b/testSuite/scripts/test_file_download.py index d86d427cb..d345581c6 100644 --- a/testSuite/scripts/test_file_download.py +++ b/testSuite/scripts/test_file_download.py @@ -280,7 +280,7 @@ def test_file_download_63mb_in_4mb(self): result = util.Command("testFile").add_arguments(file_path).add_arguments(destination_sas).execute_azcopy_verify() self.assertTrue(result) - # downloading the created parallely in blocks of 4mb file through azcopy. + # downloading the created parallelly in blocks of 4mb file through azcopy. download_file = util.test_directory_path + "/test_63mb_in4mb_download.txt" result = util.Command("copy").add_arguments(destination_sas).add_arguments(download_file).add_flags("log-level", "info").add_flags( diff --git a/testSuite/scripts/test_file_upload.py b/testSuite/scripts/test_file_upload.py index 054259ba5..428c3bd5b 100644 --- a/testSuite/scripts/test_file_upload.py +++ b/testSuite/scripts/test_file_upload.py @@ -7,7 +7,7 @@ class FileShare_Upload_User_Scenario(unittest.TestCase): def test_file_upload_empty(self): - self.util_test_file_upload_size_n_fullname(0) #emtpy file + self.util_test_file_upload_size_n_fullname(0) #empty file def test_file_upload_1b_fullname(self): self.util_test_file_upload_size_n_fullname(1) #1B @@ -209,7 +209,7 @@ def test_guess_mime_type(self): self.assertTrue(result) # test_1G_file_upload verifies the azcopy upload of 1Gb file upload in blocks of 100 Mb - @unittest.skip("coverd by stress") + @unittest.skip("covered by stress") def test_1GB_file_upload(self): # create 1Gb file filename = "test_1G_file.txt" diff --git a/testSuite/scripts/test_service_to_service_copy.py b/testSuite/scripts/test_service_to_service_copy.py index b5b285bdd..a8994c4f5 100644 --- a/testSuite/scripts/test_service_to_service_copy.py +++ b/testSuite/scripts/test_service_to_service_copy.py @@ -304,7 +304,7 @@ def test_non_overwrite_copy_single_file_from_file_to_blob(self): # Test oauth support for service to service copy, where source is authenticated with SAS # and destination is authenticated with OAuth token. - @unittest.skip("coverd by blob to blob") + @unittest.skip("covered by blob to blob") def test_copy_single_17mb_file_from_file_to_blob_oauth(self): src_share_url = util.get_object_sas(util.test_s2s_src_file_account_url, self.bucket_name_file_blob) dst_container_url = util.get_object_without_sas(util.test_s2s_dst_blob_account_url, self.bucket_name_file_blob) @@ -554,7 +554,7 @@ def util_test_copy_single_file_from_s3_to_blob_handleinvalidmetadata( # Test oauth support for service to service copy, where source is authenticated with access key for S3 # and destination is authenticated with OAuth token. - @unittest.skip("coverd by blob to blob") + @unittest.skip("covered by blob to blob") def test_copy_single_17mb_file_from_s3_to_blob_oauth(self): src_bucket_url = util.get_object_without_sas(util.test_s2s_src_s3_service_url, self.bucket_name_s3_blob) dst_container_url = util.get_object_without_sas(util.test_s2s_dst_blob_account_url, self.bucket_name_s3_blob) diff --git a/testSuite/scripts/test_upload_block_blob.py b/testSuite/scripts/test_upload_block_blob.py index ff96183aa..372786bf1 100644 --- a/testSuite/scripts/test_upload_block_blob.py +++ b/testSuite/scripts/test_upload_block_blob.py @@ -834,4 +834,4 @@ def test_follow_symlinks_upload(self): self.fail('error parsing the output in JSON format') self.assertEquals(x.TransfersCompleted, "10") - self.assertEquals(x.TransfersFailed, "0") + self.assertEquals(x.TransfersFailed, "0") \ No newline at end of file diff --git a/website/src/index.html b/website/src/index.html index 81132ad71..20a0acac0 100644 --- a/website/src/index.html +++ b/website/src/index.html @@ -24,7 +24,7 @@

AzCopy Command Guide

-

AzCopy is a command-line tool that helps you manage your Azure Storage account. Bellow is a guide to help you familiarize with the commands that can be used. If you want to read more about AzCopy click +

AzCopy is a command-line tool that helps you manage your Azure Storage account. Below is a guide to help you familiarize with the commands that can be used. If you want to read more about AzCopy click here.

@@ -55,7 +55,7 @@

STEP 1: Choose your command

Please complete step 1

- +