Skip to content

Commit

Permalink
Tag for 1.0.0a5 release
Browse files Browse the repository at this point in the history
- Fix tests
- Update docs
  • Loading branch information
alfpark committed Jun 9, 2017
1 parent 66753b5 commit 45ef468
Show file tree
Hide file tree
Showing 9 changed files with 146 additions and 34 deletions.
13 changes: 11 additions & 2 deletions CHANGELOG.md
Expand Up @@ -2,6 +2,15 @@

## [Unreleased]

## [1.0.0a5] - 2017-06-09
### Added
- Synchronous copy support with the `synccopy` command. This command supports
multi-destination replication.

### Fixed
- Various YAML config file and CLI interaction issues
- Upload resume support with replication

## [1.0.0a4] - 2017-06-02
### Changed
- From scratch rewrite providing a consistent CLI experience and a vast
Expand Down Expand Up @@ -201,11 +210,11 @@ usage documentation carefully when upgrading from 0.12.1.
`--no-skiponmatch`.
- 0.8.2: performance regression fixes

[Unreleased]: https://github.com/Azure/blobxfer/compare/1.0.0a4...HEAD
[Unreleased]: https://github.com/Azure/blobxfer/compare/1.0.0a5...HEAD
[1.0.0a5]: https://github.com/Azure/blobxfer/compare/1.0.0a4...1.0.0a5
[1.0.0a4]: https://github.com/Azure/blobxfer/compare/0.12.1...1.0.0a4
[0.12.1]: https://github.com/Azure/blobxfer/compare/0.12.0...0.12.1
[0.12.0]: https://github.com/Azure/blobxfer/compare/0.11.5...0.12.0
[0.11.5]: https://github.com/Azure/blobxfer/compare/0.11.4...0.11.5
[0.11.4]: https://github.com/Azure/blobxfer/compare/v0.11.2...0.11.4
[0.11.2]: https://github.com/Azure/blobxfer/compare/e5e435a...v0.11.2

6 changes: 4 additions & 2 deletions README.md
Expand Up @@ -24,12 +24,14 @@ from Azure Blob and File Storage
throughput limits
* `replica` mode allows replication of a file across multiple destinations
including to multiple storage accounts
* Synchronous copy with replication support (including block-level copies
for Block blobs)
* Client-side encryption support
* Support all Azure Blob types and Azure Files for both upload and download
* Advanced skip options for rsync-like operations
* Store/restore POSIX filemode and uid/gid
* Support for reading/pipe from `stdin`
* Support for reading from blob snapshots
* Support reading/pipe from `stdin`
* Support reading from blob snapshots for downloading and synchronous copy
* Configurable one-shot block upload support
* Configurable chunk size for both upload and download
* Automatic block size selection for block blob uploading
Expand Down
2 changes: 1 addition & 1 deletion blobxfer/version.py
Expand Up @@ -22,4 +22,4 @@
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.

__version__ = '1.0.0a4'
__version__ = '1.0.0a5'
75 changes: 55 additions & 20 deletions docs/10-cli-usage.md
Expand Up @@ -10,22 +10,24 @@ command will be detailed along with all options available.

## <a name="commands"></a>Commands
### `download`
Downloads a remote Azure path, which may contain many resources, to the
local machine. This command requires at the minimum, the following options:
* `--storage-account`
* `--remote-path`
Downloads remote Azure paths, which may contain many resources, to the
local machine. This command requires at the minimum, the following options
if invoked without a YAML configuration file:
* `--storage-account` for the source remote Azure path
* `--remote-path` for the source remote Azure path
* `--local-path`

Additionally, an authentication option for the storage account is required.
Please see the Authentication sub-section below under Options.

### `upload`
Uploads a local path to a remote Azure path. The local path may contain
many resources on the local machine. This command requires at the minimum,
the following options:
Uploads local paths to a remote Azure path or set of remote Azure paths.
The local path may contain many resources on the local machine. This command
requires at the minimum, the following options if invoked without a YAML
configuration file:
* `--local-path`
* `--storage-account`
* `--remote-path`
* `--storage-account` for the destination remote Azure path
* `--remote-path` for the destination remote Azure path

Additionally, an authentication option for the storage account is required.
Please see the Authentication sub-section below under Options.
Expand All @@ -34,7 +36,17 @@ If piping from `stdin`, `--local-path` should be set to `-` as per
convention.

### `synccopy`
TODO: not yet implemented.
Synchronously copies remote Azure paths to other remote Azure paths. This
command requires at the minimum, the following options if invoked without
a YAML configuration file:
* `--storage-account` for the source remote Azure path
* `--remote-path` for the source remote Azure path
* `--sync-copy-dest-storage-account` for the destination remote Azure path
* `--sync-copy-dest-remote-path` for the destination remote Azure path

Additionally, an authentication option for both storage accounts is required.
Please see the `Authentication` and `Connection` sub-section below under the
next section.

## <a name="options"></a>Options
### General
Expand Down Expand Up @@ -70,7 +82,8 @@ to be output.
recursively uploaded or downloaded.
* `--remote-path` is the remote Azure path. This path must contain the
Blob container or File share at the begining, e.g., `mycontainer/vdir`
* `--resume-file` specifies the resume file to write to or read from.
* `--resume-file` specifies the resume database to write to or read from.
Resume files should be specific for a session.
* `--timeout` is the integral timeout value in seconds to use.
* `-h` or `--help` can be passed at every command level to receive context
sensitive help.
Expand All @@ -80,11 +93,19 @@ sensitive help.
`blobxfer` supports both Storage Account access keys and Shared Access
Signature (SAS) tokens. One type must be supplied with all commands in
order to successfully authenticate against Azure Storage. These options are:
* `--sas` is a shared access signature (SAS) token. This can can be
optionally provided through an environment variable `BLOBXFER_SAS` instead.
* `--storage-account-key` is the storage account access key. This can be
optionally provided through an environment variable
`BLOBXFER_STORAGE_ACCOUNT_KEY` instead.
* `--sas` is a shared access signature (sas) token. This can can be
optionally provided through an environment variable `BLOBXFER_SAS` instead.
* `--sync-copy-dest-sas` is a shared access signature (SAS) token for the
destination Azure Storage account for the `synccopy` command. This can be
optionally provided through an environment variable
`BLOBXFER_SYNC_COPY_DEST_SAS` instead.
* `--sync-copy-dest-storage-account-key` specifies the destination Azure
Storage account key for the `synccopy` command. This can be optionally
provided through an environment variable
`BLOBXFER_SYNC_COPY_DEST_STORAGE_ACCOUNT_KEY` instead.

### Concurrency
Please see the [performance considerations](98-performance-considerations.md)
Expand All @@ -103,6 +124,12 @@ Azure Public regions, or `core.windows.net`.
* `--storage-account` specifies the storage account to use. This can be
optionally provided through an environment variable `BLOBXFER_STORAGE_ACCOUNT`
instead.
* `--sync-copy-dest-storage-account` specifies the destination remote
Azure storage account for the `synccopy` command. This can be optionally
provided through an environment variable
`BLOBXFER_SYNC_COPY_DEST_STORAGE_ACCOUNT` instead.
* `--sync-copy-dest-remote-path` specifies the destination remote Azure path
under the synchronous copy destination storage account.

### Encryption
* `--rsa-private-key` is the RSA private key in PEM format to use. This can
Expand Down Expand Up @@ -151,17 +178,19 @@ regarding Vectored IO operations in `blobxfer`.
Vectored IO operations

### Other
* `--delete` deletes extraneous files at the remote destination path on
uploads and at the local resource on downloads. This actions occur after the
transfer has taken place.
* `--delete` deletes extraneous files (including blob snapshots if the parent
is deleted) at the remote destination path on uploads and at the local
resource on downloads. This actions occur after the transfer has taken place,
similarly to rsync's delete after option. Note that this interacts with other
filters such as `--include` and `--exclude`.
* `--one-shot-bytes` controls the number of bytes to "one shot" a block
Blob upload. The maximum value that can be specified is 256MiB. This may
be useful when using account-level SAS keys and enforcing non-overwrite
behavior.
* `--rename` renames a single file upload or download to the target
destination or source path, respectively.
* `--strip-components N` will strip the leading `N` components from the
file path. The default is `1`.
local file path. The default is `1`.

## <a name="examples"></a>Example Invocations
### `download` Examples
Expand Down Expand Up @@ -227,7 +256,15 @@ blobxfer upload --config myconfig.yaml
```

### `synccopy` Examples
TODO: not implemented yet.
#### Synchronously Copy an Entire Path Recursively to Another Storage Account
```shell
blobxfer synccopy --storage-account mystorageaccount --sas "mysastoken" --remote-path mysourcecontainer --sync-copy-dest-storage-account mydestaccount --sync-copy-dest-storage-account-key "mydestkey" --sync-copy-dest-remote-path mydestcontainer
```

#### Synchronously Copy using a YAML Configuration File
```shell
blobxfer synccopy --config myconfig.yaml
```

## <a name="general-notes"></a>General Notes
* `blobxfer` does not take any leases on blobs or containers. It is up to the
Expand All @@ -252,5 +289,3 @@ appropriate `skip_on` option, respectively.
* Globbing of wildcards must be disabled by your shell (or properly quoted)
during invoking `blobxfer` such that include and exclude patterns can be
read verbatim without the shell expanding the wildcards.
* The `--delete` operates similarly to `--delete-after` in rsync. Please
note that this option interacts with `--include` and `--exclude` filters.
50 changes: 49 additions & 1 deletion docs/20-yaml-configuration.md
Expand Up @@ -182,6 +182,9 @@ upload:
This corresponds to the block size for block and append blobs, page size
for page blobs, and the file chunk for files. Only block blobs can have
a block size of up to 100MiB, all others have a maximum of 4MiB.
* `delete_extraneous_destination` will cleanup any files remotely that are
not found on locally. Note that this interacts with include and
exclude filters.
* `one_shot_bytes` is the size limit to upload block blobs in a single
request.
* `overwrite` specifies clobber behavior
Expand Down Expand Up @@ -210,4 +213,49 @@ upload:
round-robin order amongst the destinations listed.

### <a name="synccopy"></a>`synccopy`
TODO: not yet implemented.
The `synccopy` section specifies synchronous copy sources and destinations.
Note that `synccopy` refers to a list of objects, thus you may specify as many
of these sub-configuration blocks on the `synccopy` property as you need.
When the `synccopy` command with the YAML config is specified, the list
is iterated and all specified sources are synchronously copied.

```yaml
synccopy:
- source:
- mystorageaccount0: mycontainer
destination:
- mystorageaccount0: othercontainer
- mystorageaccount1: mycontainer
include:
- "*.bin"
exclude:
- "*.tmp"
options:
mode: auto
delete_extraneous_destination: true
overwrite: true
recursive: true
skip_on:
filesize_match: false
lmt_ge: false
md5_match: true
```

* `source` is a list of storage account to remote path mappings. All sources
are copied to each destination specified.
* `destination` is a list of storage account to remote path mappings
* `include` is a list of include patterns
* `exclude` is a list of exclude patterns
* `options` are synccopy-specific options
* `mode` is the operating mode
* `delete_extraneous_destination` will cleanup any files in remote
destinations that are not found in the remote sources. Note that this
interacts with include and exclude filters.
* `overwrite` specifies clobber behavior
* `recursive` specifies if source remote paths should be recursively
searched for files to copy
* `skip_on` are skip on options to use
* `filesize_match` skip if file size match
* `lmt_ge` skip if source file has a last modified time greater than or
equal to the destination file
* `md5_match` skip if MD5 match
3 changes: 2 additions & 1 deletion docs/40-client-side-encryption.md
Expand Up @@ -17,7 +17,8 @@ These keys are encrypted using RSAES-OAEP and encoded in the metadata.
* MD5 for both the pre-encrypted and encrypted version of the file is stored
in the entity metadata, if enabled. `skip_on` options will still work
transparently with encrypted blobs/files.
* MAC integrity checks are preferred over MD5 to validate encrypted data.
* HMAC-SHA256 checks over encrypted data are performed instead of MD5 over
unencrypted data to validate integrity if both are present.
* Attempting to upload the same file that exists in Azure Storage, but the
file in Azure Storage is not encrypted will not occur if any `skip_on` match
condition succeeds. This behavior can be overridden by deleting the target
Expand Down
12 changes: 8 additions & 4 deletions docs/98-performance-considerations.md
Expand Up @@ -43,7 +43,7 @@ the maximum is 4MiB.
For block blobs, setting the chunk size to something greater than 4MiB will
not only allow you larger file sizes (recall that the maximum number of
blocks for a block blob is 50000, thus at 100MiB blocks, you can create a
5TiB block blob object) but will allow you to amortize larger portions of
4.768TiB block blob object) but will allow you to amortize larger portions of
data transfer over each request/response overhead. `blobxfer` can
automatically select the proper block size given your file, but will not
automatically tune the chunk size as that depends upon your system and
Expand Down Expand Up @@ -76,14 +76,18 @@ instead.
MD5 hashing will impose some performance penalties to check if the file
should be uploaded or downloaded. For instance, if uploading and the local
file is determined to be different than it's remote counterpart, then the
time spent performing the MD5 comparison is lost.
time spent performing the MD5 comparison is effectively "lost."

## Client-side Encryption
Client-side encryption will naturally impose a performance penalty on
`blobxfer` both for uploads (encrypting) and downloads (decrypting) depending
upon the processor speed and number of cores available. Additionally, for
uploads, encryption is not parallelizable and is in-lined with the main
process.
uploads, encryption is not parallelizable within an object and is in-lined
with the main process.

## Resume Files (Databases)
Enabling resume support may slightly impact performance as a key-value shelve
for bookkeeping is kept on disk and is updated frequently.

## pyOpenSSL
As of requests 2.6.0 and Python versions < 2.7.9 (i.e., interpreter found on
Expand Down
2 changes: 1 addition & 1 deletion docs/README.md
Expand Up @@ -7,7 +7,7 @@ name. The `blobxfer` data movement library is built on the
Please refer to the following documents detailing the usage of `blobxfer`.

1. [Installation](01-installation.md)
2. [Command-Line Usage](10-cli-usage.md)
2. [CLI Usage](10-cli-usage.md)
3. [YAML Configuration](20-yaml-configuration.md)
4. [Vectored I/O](30-vectored-io.md)
5. [Client-side Encryption](40-client-side-encryption.md)
Expand Down
17 changes: 15 additions & 2 deletions tests/test_blobxfer_operations_download.py
Expand Up @@ -888,6 +888,20 @@ def test_start(
assert d._pre_md5_skip_on_check.call_count == 0


def test_start_exception():
d = ops.Downloader(mock.MagicMock(), mock.MagicMock(), mock.MagicMock())
d._general_options.resume_file = None
d._run = mock.MagicMock(side_effect=RuntimeError('oops'))
d._wait_for_transfer_threads = mock.MagicMock()
d._cleanup_temporary_files = mock.MagicMock()
d._md5_offload = mock.MagicMock()

with pytest.raises(RuntimeError):
d.start()
assert d._wait_for_transfer_threads.call_count == 1
assert d._cleanup_temporary_files.call_count == 1


def test_start_keyboard_interrupt():
d = ops.Downloader(mock.MagicMock(), mock.MagicMock(), mock.MagicMock())
d._general_options.resume_file = None
Expand All @@ -896,7 +910,6 @@ def test_start_keyboard_interrupt():
d._cleanup_temporary_files = mock.MagicMock()
d._md5_offload = mock.MagicMock()

with pytest.raises(KeyboardInterrupt):
d.start()
d.start()
assert d._wait_for_transfer_threads.call_count == 1
assert d._cleanup_temporary_files.call_count == 1

0 comments on commit 45ef468

Please sign in to comment.