Skip to content
Permalink
Browse files

Added SPN authentication & relevant in-depth documentation. (#383)

* go fmt & prepared login parameter validation for SPN auth

* SPN authentication via certificates is implemented.

* Authentication via client secret is implemented.

* Added documentation for SPN authentication

* go fmt project

* Added required parameters.

* Merged --secret and --certificate into --service-principal

* Updated readme to reflect changes in command structure

* Improved readme, corrected missing entries in go.mod/go.sum

* Made PEM cert authentication work, updated readme more.

* Moved all secret ingestion to environment variables

(There is likely a better way to display these in the help dialog)

* Corrected listing of client secret & cert password environment variables

* Changed env variable name, updated documentation

* Updated changelog

* Follow pattern for env vars, log warning message

* Hid sensitive environment variables, exposed with flag

* Correct strings + add default OAuth path to login parameter checking

* Replicate environment variable notice to additional relevant outputs

* AzCopy Readme restructure (#426)

* updates

* addressing some tech review feedback

* Fixing a line

* Changing from folder to directory

* Adding clarification for the sync command

* Adding a bit about powershell prompts

* Correctly handle multiple potential chains

I don't think this will be necessary ever, but better safer than sorry.

* Support encrypted & rsa private keys

* Barricade against incorrect key type

* Remove multi-chain support

* Handle potentially encrypted blocks better

* Snip off extra debug code
  • Loading branch information...
adreed-msft committed Jun 28, 2019
1 parent 934b43e commit 58785685c55461b9c30a1cfdd9899795be938317
@@ -1,6 +1,7 @@

# Change Log

## Version XX.XX.XX
## Version XX.XX.XX

### Bug fix

@@ -14,9 +15,7 @@

1. Enabled copying from page/block/append blob to another blob of a different type
1. AzCopy now grabs proxy details (sans authentication) from the Windows Registry using `mattn/go-ieproxy`.

### New features

1. Service Principal Authentication is now available under `azcopy login`-- check `azcopy env` for details on client secrets/cert passwords.
1. SAS tokens are supported on HNS (Hierarchical Namespace/Azure Data Lake Generation 2) Storage Accounts

## Version 10.1.2
226 README.md
@@ -1,229 +1,73 @@
# AzCopy v10

## About

AzCopy (v10) is the next-generation command-line utility designed for copying data to/from Microsoft Azure Blob and File, using simple commands designed for optimal performance. You can copy data between a file system and a storage account, or between storage accounts.

## Features

* Copy data from Azure Blob containers/File shares to File system, and vice versa
* Copy block blobs between two Azure Storage accounts
* Sync a directory in local file system to Azure Blob, or vice versa
* List/Remove files and blobs in a given path
* Supports glob patterns in path, and --exclude flags
* Resillient: retries automatically after a failure, and supports resuming after a failed job

## What's new in v10 ?

* Synchronize a file system up to Azure Blob or vice versa. Use `azcopy sync <source> <destination>`
* Supports Azure Data Lake Storage Gen2. Use `myaccount.dfs.core.windows.net` for the URI to use ADLS Gen2 APIs.
* Supports copying an entire account (Blob service only) to another account. Use `azcopy cp https://myaccount.blob.core.windows.net https://myotheraccount.blob.core.windows.net` which will enumerate all Blob containers and copy to the destination account
* Supports [copying data from AWS S3](https://github.com/Azure/azure-storage-azcopy/wiki/Copy-from-AWS-S3)
* Account to account copy is now using the new Put from URL APIs that will copy the data directly from one storage account to another. No data transfer is needed down to the client where AzCopy runs. Therefore it is significantly faster!
* List/Remove files and blobs in a given path
* Supports glob patterns in path, and --exclude flags
* Every AzCopy run will create a job order, and a related log file. You can view and restart previous jobs using `azcopy jobs` command.
* Improved performance all around!

## Installation

1. Download the AzCopy executable using one of the following links:
* [Windows x64](https://aka.ms/downloadazcopy-v10-windows) (zip)
* [Linux x64](https://aka.ms/downloadazcopy-v10-linux) (tar.gz)
* [MacOS x64](https://aka.ms/downloadazcopy-v10-mac) (zip)

2. Unzip (or untar on Linux) and get started

On Linux:
```
wget -O azcopyv10.tar.gz https://aka.ms/downloadazcopy-v10-linux
tar -xzf azcopyv10.tar.gz
cd azcopy_linux_amd64_10.*
./azcopy
```

On Windows:
```
Invoke-WebRequest -Uri https://aka.ms/downloadazcopy-v10-windows -OutFile .\azcopyv10.zip
Expand-Archive azcopyv10.zip -DestinationPath .
cd .\azcopy_windows_amd64_10.*
.\azcopy.exe
```

## Manual

### Authenticating with Azure Storage

AzCopy supports two types of authentication. See the table below to see which type you need to use.
* **Pre-signed URLs** (URLs with Shared Access Signature aka. **SAS tokens**): Simply generate a SAS token from the Azure Portal, Storage Explorer, or one of the other Azure tools and append to the Blob path (container/virtual directory/blob path).
* **Azure Active Directory Authentication** : Add your user to the **'Blob Data Contributor'** role in the Azure Portal, and log on to AzCopy using `azcopy login`. To authenticate with MSI, use `azcopy login --identity`. Once logged in, you can simply use AzCopy commands without any SAS token appended to the path. e.g. `azcopy cp https://myaccount.blob.core.windows.net/container/data /mydata --recursive`

| Azure Storage service | Supported authentication methods |
| ------------- | ------------- |
| Blob storage | SAS tokens OR Azure Active Directory Authentication |
| File storage | SAS tokens |
| ADLS Gen2 | SAS tokens OR Azure Active Directory Authentication |

> :exclamation::exclamation::exclamation:Note a [SAS token](https://docs.microsoft.com/en-us/azure/storage/common/storage-dotnet-shared-access-signature-part-1) is *NOT* an account key. SAS tokens are limited in scope and validity, and starts with a question mark which can be appended to a Blob URL. Here is an example: `?sv=2017-11-09&ss=bf&srt=co&sp=rwac&se=2018-11-16T03:59:09Z&st=2018-11-15T19:59:09Z&sip=10.102.166.17&spr=https,http&sig=k8xSm2K3crBbtNpfoxyvh9n%2BMjDTqRk2XpY8JYIAMaA%3D`.
### Getting started

AzCopy is self-documenting. To list the available commands, run:
```
./azcopy -h
```

To view the help page and examples, run:
```
./azcopy <cmd> -h
```

### Simple command-line syntax
```
# The general syntax
./azcopy <cmd> <arguments> --<flag-name>=<flag-value>
# Example:
./azcopy cp <source path> <destination path> --<flag-name>=<flag-value>
./azcopy cp "/path/to/local" "https://account.blob.core.windows.net/container?sastoken" --recursive=true
./azcopy cp "/mnt/myfile.txt" "https://myaccount.blob.core.windows.net/mycontainer/myfile.txt?sv=2017-11-09&ss=bf&srt=co&sp=rwac&se=2018-11-16T03:59:09Z&st=2018-11-15T19:59:09Z&sip=10.102.166.17&spr=https,http&sig=k8xSm2K3crBbtNpfoxyvh9n%2BMjDTqRk2XpY8JYIAMaA%3D"
```

To see more examples:
```
./azcopy cp -h
```
AzCopy v10 is a command-line utility that you can use to copy data to and from containers and file shares in Azure Storage accounts. AzCopy V10 presents easy-to-use commands that are optimized for performance.

Each transfer operation will create a `Job` for AzCopy to act on. You can view the history of jobs using the following command:
```
./azcopy jobs list
```
## Features and capabilities

The job logs and data are located under the $HOME/.azcopy directory on Linux, and %USERPROFILE%\.azcopy on Windows. You can clear the job data/logs if you wish after AzCopy completes the transfers.
:white_check_mark: Use with storage accounts that have a hierarchical namespace (Azure Data Lake Storage Gen2).

### Copy data to Azure storage
The copy command can be used to transfer data from the source to the destination. The locat can be a:
1. local path
2. URL to Azure Blob/Virtual Directory/Container
3. URL to Azure File/Directory/File Share
```
./azcopy <command> <source path> <destination path>
```

The following command will upload `1file.txt` to the Block Blob at `https://myaccount.blob.core.windows.net/mycontainer/1file.txt`.
```
./azcopy cp /data/1file.txt "https://myaccount.blob.core.windows.net/mycontainer/1file.txt?sastokenhere"
```
:white_check_mark: Create containers and file shares.

The following command will upload all files under `directory1` recursively to the path at `https://myaccount.blob.core.windows.net/mycontainer/directory1`.
```
./azcopy cp /data/directory1 "https://myaccount.blob.core.windows.net/mycontainer/directory1?sastokenhere" --recursive=true
```
:white_check_mark: Upload files and directories.

The following command will upload all files directly under `directory1` without recursing into sub-directories, to the path at `https://myaccount.blob.core.windows.net/mycontainer/directory1`.
```
./azcopy cp /data/directory1/* "https://myaccount.blob.core.windows.net/mycontainer/directory1?sastokenhere"
```
:white_check_mark: Download files and directories.

To upload into File storage, simply change the URI to Azure File URI with corresponding SAS token.
:white_check_mark: Copy containers, directories and blobs between storage accounts (Blobs only).

### Copy VHD image to Azure Storage
:white_check_mark: Synchronize containers with local file systems and visa versa (Blobs only).

AzCopy by default uploads data into Block Blobs. However if a source file has `.vhd` extension, AzCopy will default to uploading to a Page Blob.
:white_check_mark: Copy objects, directories, and buckets from Amazon Web Services (AWS) (Blobs only).

### Copy data from Azure to local file systems
:white_check_mark: List files in a container (Blobs only).

The following will download all Blob container contents into the local file system creating the directory `mycontainer` in the destination.
```
./azcopy cp "https://myaccount.blob.core.windows.net/mycontainer?sastokenhere" /data/ --recursive=true
```
:white_check_mark: Remove files from a container (Blobs only).

The following will download all Blob container contents into the local file system. `mycontainer` directory will not be created in the destination because the globbing pattern looks for all paths inside `mycontainer` in the source rather than the `mycontainer` container itself.
```
./azcopy cp "https://myaccount.blob.core.windows.net/mycontainer/*?sastokenhere" /data/ --recursive=true
```
:white_check_mark: Recover from failures by restarting previous jobs.

The following command will download all txt files in the source to the `directory1` path. Note that AzCopy will scan the entire source and filter for `.txt` files. This may take a while when you have thousands/millions of files in the source.
```
./azcopy cp "https://myaccount.blob.core.windows.net/mycontainer/directory1/*.txt?sastokenhere" /data/directory1
```
## Find help

### Copy data between Azure Storage accounts (currently supports Block Blobs only)
For complete guidance, visit any of these articles on the docs.microsoft.com website.

Copying data between two Azure Storage accounts make use of the PutBlockFromURL API, and does not use the client machine's network bandwidth. Data is copied between two Azure Storage servers. AzCopy simply orchestrates the copy operation.
```
./azcopy cp "https://myaccount.blob.core.windows.net/?sastokenhere" "https://myotheraccount.blob.core.windows.net/?sastokenhere" --recursive=true
```
:eight_spoked_asterisk: [Get started with AzCopy](https://docs.microsoft.com/azure/storage/common/storage-use-azcopy-v10)

### Advanced Use Cases
:eight_spoked_asterisk: [Transfer data with AzCopy and blob storage](https://docs.microsoft.com/azure/storage/common/storage-use-azcopy-blobs)

#### Configure Concurrency
:eight_spoked_asterisk: [Transfer data with AzCopy and file storage](https://docs.microsoft.com/azure/storage/common/storage-use-azcopy-files)

Set the environment variable `AZCOPY_CONCURRENCY_VALUE` to configure the number of concurrent requests. This is set to 300 by default. Note that this does not equal to 300 parallel connections. Reducing this will limit the bandwidth, and CPU used by AzCopy.
:eight_spoked_asterisk: [Transfer data with AzCopy and Amazon S3 buckets](https://docs.microsoft.com/azure/storage/common/storage-use-azcopy-s3)

#### Configure proxy settings
To configure the proxy settings for AzCopy v10, set the environment variable https_proxy using the following command:
:eight_spoked_asterisk: [Configure, optimize, and troubleshoot AzCopy](https://docs.microsoft.com/azure/storage/common/storage-use-azcopy-configure)

```
# For Windows:
set https_proxy=<proxy IP>:<proxy port>
# For Linux:
export https_proxy=<proxy IP>:<proxy port>
# For MacOS
export https_proxy=<proxy IP>:<proxy port>
```
### Find help from your command prompt

For proxy authentication, use the following format:
For convenience, consider adding the AzCopy directory location to your system path for ease of use. That way you can type `azcopy` from any directory on your system.

```
export https_proxy=<user>:<pass>@<proxy IP>:<proxy port>
# or with a domain:
export https_proxy=<domain>%5C<user>:<pass>@<proxy IP>:<proxy port>
```
To see a list of commands, type `azcopy -h` and then press the ENTER key.

### Configure log location
To learn about a specific command, just include the name of the command (For example: `azcopy list -h`).

Set the environment variable 'AZCOPY_LOG_LOCATION' to a directory of your choice where there is plenty of disk space as logs for large data transfers may use up Gigabytes of space depending on the chosen log level.
![AzCopy command help example](readme-command-prompt.png)

## Troubleshooting and Reporting Issues
If you choose not to add AzCopy to your path, you'll have to change directories to the location of your AzCopy executable and type `azcopy` or `.\azcopy` in Windows PowerShell command prompts.

### Check Logs for errors
## Frequently asked questions

AzCopy creates a log file for all the jobs. Look for clues in the logs to understand the problem. AzCopy will print UPLOADFAILED, COPYFAILED, and DOWNLOADFAILED strings for failures with the paths along with the error reason.
### What is the difference between `sync` and `copy`?

cat 04dc9ca9-158f-7945-5933-564021086c79.log | grep -i UPLOADFAILED
The `copy` command is a simple transferring operation, it scans the source and attempts to transfer every single file/blob. The supported source/destination pairs are listed in the help message of the tool. On the other hand, `sync` makes sure that whatever is present in the source will be replicated to the destination. If your goal is to simply move some files, then `copy` is definitely the right command, since it offers much better performance.

### View and resume jobs
For `sync`, last modified times are used to determine whether to transfer the same file present at both the source and the destination. If the use case is to incrementally transfer data
then `sync` is the better choice, since only the modified/missing files are transferred.

To view the job stats, run:
```
./azcopy jobs show <job-id>
```
### Will `copy` overwrite my files?

To see the transfers of a specific status(Success or Failed), run:
```
./azcopy jobs show <job-id> --with-status=Failed
```
By default, AzCopy will overwrite the files at the destination if they already exist. To avoid this behavior, please use the flag `--overwrite=false`.

You can resume a failed/cancelled job using its identifier along with the SAS token(s), which are not persisted for security reasons.
```
./azcopy jobs resume <jobid> --source-sas ?sastokenhere --destination-sas ?sastokenhere
```
### Will 'sync' delete files in the destination if they no longer exist in the source location?

### Raise an Issue
By default, the 'sync' command doesn't delete files in the destination unless you use an optional flag with the command. To learn more, see [Synchronize files](https://docs.microsoft.com/azure/storage/common/storage-use-azcopy-blobs#synchronize-files).

Raise an issue on this repository for any feedback or issue encountered.

### FAQ

- What is the difference between `sync` and `copy`?
- The `copy` command is a simple transferring operation, it scans the source and attempts to transfer every single file/blob. The supported source/destination pairs are listed in the help message of the tool. On the other hand, `sync` makes sure that whatever is present in the source will be replicated to the destination, and also whatever is not at the source will be deleted from the destination. If your goal is to simply move some files, then `copy` is definitely the right command, since it offers much better performance.
- For `sync`, last modified times are used to determine whether to transfer the same file present at both the source and the destination.
- If the use case is to incrementally transfer data, then `sync` is the better choice, since only the modified/missing files are transferred.
- Will `copy` overwrite my files?
- By default, AzCopy will overwrite the files at the destination if they already exist. To avoid this behavior, please use the flag `--overwrite=false`.

## Contributing
## How to contribute to AzCopy v10

This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
@@ -124,7 +124,7 @@ func (o RetryOptions) calcDelay(try int32) time.Duration { // try is >=1; never

// Introduce some jitter: [0.0, 1.0) / 2 = [0.0, 0.5) + 0.8 = [0.8, 1.3)
// For casts and rounding - be careful, as per https://github.com/golang/go/issues/20757
delay = time.Duration(float32(delay) * (rand.Float32()/2 + 0.8) ) // NOTE: We want math/rand; not crypto/rand
delay = time.Duration(float32(delay) * (rand.Float32()/2 + 0.8)) // NOTE: We want math/rand; not crypto/rand
if delay > o.MaxRetryDelay {
delay = o.MaxRetryDelay
}
@@ -32,12 +32,12 @@ var _ = chk.Suite(&copyEnumeratorHelperTestSuite{})
func (s *copyEnumeratorHelperTestSuite) TestAddTransferPathRootsTrimmed(c *chk.C) {
// setup
request := common.CopyJobPartOrderRequest{
SourceRoot: "a/b/",
SourceRoot: "a/b/",
DestinationRoot: "y/z/",
}

transfer := common.CopyTransfer{
Source: "a/b/c.txt",
Source: "a/b/c.txt",
Destination: "y/z/c.txt",
}

@@ -20,21 +20,29 @@ import (
"github.com/spf13/cobra"
)

var showSensitive = false

// envCmd represents the env command
var envCmd = &cobra.Command{
Use: "env",
Short: envCmdShortDescription,
Long: envCmdLongDescription,
Run: func(cmd *cobra.Command, args []string) {
for _, env := range common.VisibleEnvironmentVariables {
val := glcm.GetEnvironmentVariable(env)
if env.Hidden && !showSensitive {
val = "REDACTED"
}

glcm.Info(fmt.Sprintf("Name: %s\nCurrent Value: %s\nDescription: %s\n",
env.Name, glcm.GetEnvironmentVariable(env), env.Description))
env.Name, val, env.Description))
}

glcm.Exit(nil, common.EExitCode.Success())
},
}

func init() {
envCmd.PersistentFlags().BoolVar(&showSensitive, "show-sensitive", false, "Show sensitive/secret environment variables")
rootCmd.AddCommand(envCmd)
}

0 comments on commit 5878568

Please sign in to comment.
You can’t perform that action at this time.