Skip to content

Commit

Permalink
Merge pull request #42 from alfpark/master
Browse files Browse the repository at this point in the history
Decouple Azure python dependencies for blobxfer
  • Loading branch information
alfpark committed Sep 3, 2015
2 parents 7832b58 + bdb68b9 commit 4fd32a0
Show file tree
Hide file tree
Showing 3 changed files with 211 additions and 59 deletions.
53 changes: 34 additions & 19 deletions Python/Storage/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
##blobxfer.py
Please refer to the Microsoft HPC and Azure Batch Team [blog post](http://blogs.technet.com/b/windowshpc/archive/2015/04/16/linux-blob-transfer-python-code-sample.aspx)
Please refer to the Microsoft HPC and Azure Batch Team
[blog post](http://blogs.technet.com/b/windowshpc/archive/2015/04/16/linux-blob-transfer-python-code-sample.aspx)
for code sample explanations.

###Introduction
Expand All @@ -12,15 +13,23 @@ with various transfer optimizations, built-in retries, and user-specified
timeouts.

The blobxfer script is a python script that can be used on any platform where
Python 2.7, 3.3 or 3.4 can be installed. The script requires two
prerequisite packages to be installed: (1) azure and (2) requests. The azure
package is required for the script to utilize the
[Azure Python SDK](http://azure.microsoft.com/en-us/documentation/articles/python-how-to-install/)
to interact with Azure using a management certificate or a shared key. The
requests package is required for SAS support. If SAS is not needed, one can
remove all of the requests references from the script to reduce the
prerequisite footprint. You can install these packages using pip, easy_install
or through standard setup.py procedures.
Python 2.7, 3.3 or 3.4 can be installed. Depending upon the desired mode of
operation as listed above, the script will require the following packages,
some of which will automatically pull required dependent packages:
* Management Certificate
* `azure-servicemanagement-legacy` >= 0.20.0
* `azure-storage` >= 0.20.0
* Shared Account Key
* `azure-storage` >= 0.20.0
* SAS Key
* `requests` >= 2.7.0

If you want to utilize any/all of the connection methods to Azure Storage,
then install all three of `azure-servicemanagement-legacy`, `azure-storage`,
and `requests`. You can install these packages using pip, easy_install
or through standard setup.py procedures. As of this script version 0.9.9.0,
it no longer supports the legacy Azure Python SDK, i.e., `azure` package with
version < 1.0.0 due to breaking changes in the azure packages.

Program parameters and command-line options can be listed via the -h switch. At
the minimum, three positional arguments are required: storage account name,
Expand Down Expand Up @@ -75,20 +84,26 @@ indicate that with `--download`. When downloading an entire container, the
script will attempt to pre-allocate file space and recreate the sub-directory
structure as needed.

###Notes
A note on performance with Python versions < 2.7.9 (i.e., interpreter found
on default Ubuntu 14.04 installations) -- as of requests 2.6.0, if certain
packages are installed, as those found in `requests[security]` then the
underlying urllib3 package will utilize the ndg-httpsclient package which
will use [pyOpenSSL](https://urllib3.readthedocs.org/en/latest/security.html#pyopenssl).
This will ensure the peers are [fully validated](https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning).
###Performance Notes
* Most likely, you will need to tweak the `--numworkers` argument that best
suits your environment. The default of 64 may not work properly if you are
attempting to run multiple blobxfer sessions in parallel from one machine or
IP address.
* As of requests 2.6.0 and Python versions < 2.7.9 (i.e., interpreter found
on default Ubuntu 14.04 installations), if certain packages are installed,
as those found in `requests[security]` then the underlying `urllib3`
package will utilize the `ndg-httpsclient` package which will use
[pyOpenSSL](https://urllib3.readthedocs.org/en/latest/security.html#pyopenssl).
This will ensure the peers are
[fully validated](https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning).
However, this incurs a rather larger performance penalty. If you understand
the potential security risks for disabling this behavior due to high
performance requirements, you can either remove ndg-httpsclient or use the
script in a virtualenv environment without the ndg-httpsclient package.
performance requirements, you can either remove `ndg-httpsclient` or use the
script in a `virtualenv` environment without the `ndg-httpsclient` package.
Python versions >= 2.7.9 are not affected by this issue.

###Change Log
* 0.9.9.0: update script for compatibility with new Azure Python packages
* 0.9.8: fix blob endpoint for non-SAS input, add retry on ServerBusy
* 0.9.7: normalize SAS keys (accept keys with or without ? char prefix)
* 0.9.6: revert local resource path expansion, PEP8 fixes
Expand Down
129 changes: 106 additions & 23 deletions Python/Storage/blobxfer.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
respectively, if the skiponmatch parameter is enabled.
TODO list:
- remove dependency on azure python packages
- convert from synchronous multithreading to asyncio/trollius
"""

Expand All @@ -77,11 +78,25 @@
import sys
import threading
import time
import xml.etree.ElementTree as ET
# non-stdlib imports
import azure
import azure.servicemanagement
import azure.storage
import requests
try:
import azure
import azure.common
except ImportError: # pragma: no cover
pass
try:
import azure.servicemanagement
except ImportError: # pragma: no cover
pass
try:
import azure.storage.blob
except ImportError: # pragma: no cover
pass
try:
import requests
except ImportError: # pragma: no cover
pass

# remap keywords for Python3
# pylint: disable=W0622,C0103
Expand All @@ -96,7 +111,7 @@
# pylint: enable=W0622,C0103

# global defines
_SCRIPT_VERSION = '0.9.8'
_SCRIPT_VERSION = '0.9.9.0'
_DEFAULT_MAX_STORAGEACCOUNT_WORKERS = 64
_MAX_BLOB_CHUNK_SIZE_BYTES = 4194304
_MAX_LISTBLOBS_RESULTS = 1000
Expand All @@ -106,6 +121,60 @@
_PY2 = sys.version_info.major == 2


class SasBlobList(object):
"""Sas Blob listing object"""
def __init__(self):
"""Ctor for SasBlobList"""
self.blobs = []
self.next_marker = None

def __iter__(self):
"""Iterator"""
return iter(self.blobs)

def __len__(self):
"""Length"""
return len(self.blobs)

def __getitem__(self, index):
"""Accessor"""
return self.blobs[index]

def add_blob(self, name, content_length, content_md5, blobtype):
"""Adds a blob to the list
Parameters:
name - blob name
content_length - content length
content_md5 - content md5
blobtype - blob type
Returns:
Nothing
Raises:
Nothing
"""
obj = type('bloblistobject', (object,), {})
obj.name = name
obj.properties = type('properties', (object,), {})
obj.properties.content_length = content_length
if content_md5 is not None and len(content_md5) > 0:
obj.properties.content_md5 = content_md5
else:
obj.properties.content_md5 = None
obj.properties.blobtype = blobtype
self.blobs.append(obj)

def set_next_marker(self, marker):
"""Set the continuation token
Parameters:
marker - next marker
Returns:
Nothing
Raises:
Nothing
"""
self.next_marker = marker


class SasBlobService(object):
"""BlobService supporting SAS for functions used in the Python SDK.
create_container method does not exist because it is not a supported
Expand All @@ -129,6 +198,31 @@ def __init__(self, blobep, saskey, timeout):
self.saskey = saskey
self.timeout = timeout

def _parse_blob_list_xml(self, content):
"""Parse blob list in xml format to an attribute-based object
Parameters:
content - http response content in xml
Returns:
attribute-based object
Raises:
No special exception handling
"""
result = SasBlobList()
root = ET.fromstring(content)
blobs = root.find('Blobs')
for blob in blobs.iter('Blob'):
name = blob.find('Name').text
props = blob.find('Properties')
cl = int(props.find('Content-Length').text)
md5 = props.find('Content-MD5').text
bt = props.find('BlobType').text
result.add_blob(name, cl, md5, bt)
try:
result.set_next_marker(root.find('NextMarker').text)
except Exception:
pass
return result

def list_blobs(self, container_name, marker=None,
maxresults=_MAX_LISTBLOBS_RESULTS):
"""List blobs in container
Expand Down Expand Up @@ -157,10 +251,7 @@ def list_blobs(self, container_name, marker=None,
raise IOError(
'incorrect status code returned for list_blobs: {}'.format(
response.status_code))
response.body = response.content
# pylint: disable=W0212
return azure.storage._parse_blob_enum_results_list(response)
# pylint: enable=W0212
return self._parse_blob_list_xml(response.content)

def get_blob(self, container_name, blob_name, x_ms_range):
"""Get blob
Expand Down Expand Up @@ -567,7 +658,7 @@ def create_dir_ifnotexists(dirname):
print('created local directory: {}'.format(dirname))
except OSError as exc:
if exc.errno != errno.EEXIST:
raise
raise # pragma: no cover


def compute_md5_for_file_asbase64(filename, pagealign=False, blocksize=65536):
Expand Down Expand Up @@ -664,11 +755,8 @@ def get_blob_listing(blob_service, args):
container_name=args.container, marker=marker,
maxresults=_MAX_LISTBLOBS_RESULTS)
for blob in result:
blobdict[blob.name] = [blob.properties.content_length]
try:
blobdict[blob.name].append(blob.properties.content_md5)
except AttributeError:
blobdict[blob.name].append(None)
blobdict[blob.name] = [
blob.properties.content_length, blob.properties.content_md5]
marker = result.next_marker
if marker is None or len(marker) < 1:
break
Expand Down Expand Up @@ -880,11 +968,6 @@ def main():
len(args.storageaccountkey) < 1:
raise ValueError('storage account key is invalid')

if args.storageaccountkey is None and \
args.saskey is None:
raise ValueError(
'could not get reference to storage account key or sas key')

# set valid num workers
if args.numworkers < 1:
args.numworkers = 1
Expand Down Expand Up @@ -920,12 +1003,12 @@ def main():
else:
host_base = '.' + args.blobep
if args.timeout is None:
blob_service = azure.storage.BlobService(
blob_service = azure.storage.blob.BlobService(
account_name=args.storageaccount,
account_key=args.storageaccountkey,
host_base=host_base)
else:
blob_service = azure.storage.BlobService(
blob_service = azure.storage.blob.BlobService(
account_name=args.storageaccount,
account_key=args.storageaccountkey,
host_base=host_base, timeout=args.timeout)
Expand Down Expand Up @@ -1051,7 +1134,7 @@ def main():
azure_request(
blob_service.create_container, timeout=args.timeout,
container_name=args.container, fail_on_exist=False)
except azure.WindowsAzureConflictError:
except azure.common.AzureConflictHttpError:
pass
# initialize page blobs
if args.pageblob or args.autovhd:
Expand Down
Loading

0 comments on commit 4fd32a0

Please sign in to comment.