diff --git a/README.md b/README.md index f3fcf6f..8681938 100644 --- a/README.md +++ b/README.md @@ -1,29 +1,20 @@ -Spectra S3 Python3 SDK --------------- - +# Spectra S3 Python3 SDK [![Apache V2 License](http://img.shields.io/badge/license-Apache%20V2-blue.svg)](https://github.com/SpectraLogic/ds3_python3_sdk/blob/master/LICENSE.md) An SDK conforming to the Spectra S3 [specification](https://developer.spectralogic.com/doc/ds3api/1.2/wwhelp/wwhimpl/js/html/wwhelp.htm) for Python 3.6 -Contact Us ----------- - +## Contact Us Join us at our [Google Groups](https://groups.google.com/d/forum/spectralogicds3-sdks) forum to ask questions, or see frequently asked questions. -Installing ----------- - +## Installing To install the ds3_python3_sdk, either clone the latest code, or download a release bundle from [Releases](http://github.com/SpectraLogic/ds3_python3_sdk/releases). Once the code has been download, cd into the bundle, and install it with `sudo python3 setup.py install` Once `setup.py` completes the ds3_python3_sdk should be installed and available to be imported into python scripts. -Documentation -------------- +## Documentation The documentation for the SDK can be found at [http://spectralogic.github.io/ds3_python3_sdk/sphinx/v3.4.1/](http://spectralogic.github.io/ds3_python3_sdk/sphinx/v3.4.1/) -SDK ---- - +## SDK The SDK provides an interface for a user to add Spectra S3 functionality to their existing or new python application. In order to take advantage of the SDK you need to import the `ds3` python package and module. The following is an example that creates a Spectra S3 client from environment variables, creates a bucket, and lists all the buckets that are visible to the user. ```python @@ -40,8 +31,7 @@ for bucket in getServiceResponse.result['BucketList']: print(bucket['Name']) ``` -Client ---------- +## Client In the ds3_python3_sdk there are two ways that you can create a `Client` instance: environment variables, or manually. `ds3.createClientFromEnv` will create a `Client` using the following environment variables: * `DS3_ENDPOINT` - The URL to the DS3 Endpoint @@ -61,10 +51,27 @@ client = ds3.Client("endpoint", ds3.Credentials("access_key", "secret_key")) The proxy URL can be passed in as the named parameter `proxy` to `Client()`. -Putting Data ------------- +## Examples Communicating with the BP + +[An example of using getService and getBucket to list all accessible buckets and objects](samples/listAll.py) -To put data to a Spectra S3 appliance you have to do it inside of the context of what is called a Bulk Job. Bulk Jobs allow the Spectra S3 appliance to plan how data should land to cache, and subsequently get written/read to/from tape. The basic flow of every job is: +### HELPERS: Simple way of moving data to/from a file system +There are helper utilities for putting and getting data to a BP. These are designed to simplify the user workflow so +that you don't have to worry about BP job management. The helpers will create BP jobs as necessary, and transfer data +in parallel to improve performance. + +#### How to move everything: +- [An example of putting ALL files in a directory to a BP bucket](samples/putting_all_files_in_directory.py) +- [An example of getting ALL objects in a bucket and landing them in a directory](samples/getting_all_objects_in_bucket.py) + +#### How to move some things: +If you only want to move some items in a directory/bucket, you can specify them individually. These examples show how +to put and get a specific file, but the principle can be expanded to transferring multiple items at once. +- [An example of putting ONE file to a BP bucket](samples/putting_one_file_in_directory.py) +- [An example of getting ONE object in a bucket](samples/getting_one_file_in_directory.py) + +### Moving data the old way +To put data to a Spectra S3 appliance you have to do it inside the context of what is called a Bulk Job. Bulk Jobs allow the Spectra S3 appliance to plan how data should land to cache, and subsequently get written/read to/from tape. The basic flow of every job is: * Generate the list of objects that will either be sent to or retrieved from Spectra S3 * Send a bulk put/get to Spectra S3 to plan the job @@ -76,6 +83,4 @@ To put data to a Spectra S3 appliance you have to do it inside of the context of [An example of getting data with the Python SDK](samples/gettingData.py) -[An example of using getService and getBucket to list all accessible buckets and objects](samples/listAll.py) - [An example of how give objects on the server a different name than what is on the filesystem, and how to delete objects by folder](samples/renaming.py) diff --git a/samples/getting_all_objects_in_bucket.py b/samples/getting_all_objects_in_bucket.py new file mode 100644 index 0000000..8c21eea --- /dev/null +++ b/samples/getting_all_objects_in_bucket.py @@ -0,0 +1,50 @@ +# Copyright 2021 Spectra Logic Corporation. All Rights Reserved. +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use +# this file except in compliance with the License. A copy of the License is located at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# or in the "license" file accompanying this file. +# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR +# CONDITIONS OF ANY KIND, either express or implied. See the License for the +# specific language governing permissions and limitations under the License. + +import tempfile + +from os import path, walk +from ds3 import ds3, ds3Helpers + +# This example gets ALL objects within the bucket books and lands them in a temp folder. +# This uses the new helper functions which creates and manages the BP jobs behind the scenes. +# +# This assumes that there exists a bucket called books on the BP and it contains objects. +# Running the putting_all_files_in_directory.py example will create this setup. + +# The bucket that contains the objects. +bucket_name = "books" + +# The directory on the file system where the objects will be landed. +# In this example, we are using a temporary directory for easy cleanup. +destination_directory = tempfile.TemporaryDirectory(prefix="books-dir") + +# Create a client which will be used to communicate with the BP. +client = ds3.createClientFromEnv() + +# Create the helper to gain access to the new data movement utilities. +helper = ds3Helpers.Helper(client=client) + +# Retrieve all the objects in the desired bucket and land them in the specified directory. +# +# You can optionally specify a objects_per_bp_job and max_threads to tune performance. +get_job_ids = helper.get_all_files_in_bucket(destination_dir=destination_directory.name, bucket=bucket_name) +print("BP get job IDS: " + get_job_ids.__str__()) + +# Verify that all the files have been landed in the folder. +for root, dirs, files in walk(top=destination_directory.name): + for name in files: + print("File: " + path.join(root, name)) + for name in dirs: + print("Dir: " + path.join(root, name)) + +# Clean up the temp directory where we landed the files. +destination_directory.cleanup() diff --git a/samples/getting_one_file_in_directory.py b/samples/getting_one_file_in_directory.py new file mode 100644 index 0000000..56422a1 --- /dev/null +++ b/samples/getting_one_file_in_directory.py @@ -0,0 +1,57 @@ +# Copyright 2021 Spectra Logic Corporation. All Rights Reserved. +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use +# this file except in compliance with the License. A copy of the License is located at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# or in the "license" file accompanying this file. +# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR +# CONDITIONS OF ANY KIND, either express or implied. See the License for the +# specific language governing permissions and limitations under the License. + +import tempfile + +from os import path, walk +from ds3 import ds3, ds3Helpers + +# This example gets ONE objects within the bucket books and lands it in a temp folder. +# This uses the new helper functions which creates and manages the BP job behind the scenes. +# +# This assumes that there exists a bucket called books on the BP and it contains the object beowulf.txt. +# Running the putting_one_file_in_directory.py example will create this setup. + +# The bucket that contains the objects. +bucket_name = "books" + +# The directory on the file system where the object will be landed. +# In this example, we are using a temporary directory for easy cleanup. +destination_directory = tempfile.TemporaryDirectory(prefix="books-dir") + +# Create a client which will be used to communicate with the BP. +client = ds3.createClientFromEnv() + +# Create the helper to gain access to the new data movement utilities. +helper = ds3Helpers.Helper(client=client) + +# Create a HelperGetObject for each item you want to retrieve from the BP bucket. +# This example only gets one object, but you can transfer more than one at a time. +# For each object you must specify the name of the object on the BP, and the file path where you want to land the file. +# Optionally, if versioning is enabled on your bucket, you can specify the specific version to retrieve. +# If you don't specify a version, the most recent will be retrieved. +file_path = path.join(destination_directory.name, "beowulf.txt") +get_objects = [ds3Helpers.HelperGetObject(object_name="beowulf.txt", destination_path=file_path)] + +# Retrieve the objects in the desired bucket. +# You can optionally specify max_threads to tune performance. +get_job_id = helper.get_objects(get_objects=get_objects, bucket=bucket_name) +print("BP get job ID: " + get_job_id) + +# Verify that all the files have been landed in the folder. +for root, dirs, files in walk(top=destination_directory.name): + for name in files: + print("File: " + path.join(root, name)) + for name in dirs: + print("Dir: " + path.join(root, name)) + +# Clean up the temp directory where we landed the files. +destination_directory.cleanup() diff --git a/samples/putting_all_files_in_directory.py b/samples/putting_all_files_in_directory.py new file mode 100644 index 0000000..90fc008 --- /dev/null +++ b/samples/putting_all_files_in_directory.py @@ -0,0 +1,47 @@ +# Copyright 2021 Spectra Logic Corporation. All Rights Reserved. +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use +# this file except in compliance with the License. A copy of the License is located at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# or in the "license" file accompanying this file. +# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR +# CONDITIONS OF ANY KIND, either express or implied. See the License for the +# specific language governing permissions and limitations under the License. + +import os + +from ds3 import ds3, ds3Helpers + +# This example puts ALL files within the sub-folder /samples/resources to the bucket called books. +# This uses the new helper functions which creates and manages the BP jobs behind the scenes. + +# The bucket where to land the files. +bucket_name = "books" + +# The directory that contains files to be archived to BP. +# In this example, we are moving all files in the ds3_python3_sdk/samples/resources folder. +directory_with_files = os.path.join(os.path.dirname(str(__file__)), "resources") + +# Create a client which will be used to communicate with the BP. +client = ds3.createClientFromEnv() + +# Make sure the bucket that we will be sending objects to exists +client.put_bucket(ds3.PutBucketRequest(bucket_name)) + +# Create the helper to gain access to the new data movement utilities. +helper = ds3Helpers.Helper(client=client) + +# Archive all the files in the desired directory to the specified bucket. +# Note that the file's object names will be relative to the root directory you specified. +# For example: resources/beowulf.txt will be named just beowulf.txt in the BP bucket. +# +# You can optionally specify a objects_per_bp_job and max_threads to tune performance. +put_job_ids = helper.put_all_objects_in_directory(source_dir=directory_with_files, bucket=bucket_name) +print("BP put job IDs: " + put_job_ids.__str__()) + +# we now verify that all our objects have been sent to DS3 +bucketResponse = client.get_bucket(ds3.GetBucketRequest(bucket_name)) + +for obj in bucketResponse.result['ContentsList']: + print(obj['Key']) diff --git a/samples/putting_one_file_in_directory.py b/samples/putting_one_file_in_directory.py new file mode 100644 index 0000000..40b8f30 --- /dev/null +++ b/samples/putting_one_file_in_directory.py @@ -0,0 +1,49 @@ +# Copyright 2021 Spectra Logic Corporation. All Rights Reserved. +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use +# this file except in compliance with the License. A copy of the License is located at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# or in the "license" file accompanying this file. +# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR +# CONDITIONS OF ANY KIND, either express or implied. See the License for the +# specific language governing permissions and limitations under the License. + +import os + +from ds3 import ds3, ds3Helpers + +# This example puts ONE file /samples/resources/beowulf.txt to the bucket called books. +# This uses the new helper functions which creates and manages a single BP job. + +# The bucket where to land the files. +bucket_name = "books" + +# The file path being put to the BP. +file_path = os.path.join(os.path.dirname(str(__file__)), "resources", "beowulf.txt") + +# Create a client which will be used to communicate with the BP. +client = ds3.createClientFromEnv() + +# Make sure the bucket that we will be sending objects to exists +client.put_bucket(ds3.PutBucketRequest(bucket_name)) + +# Create the helper to gain access to the new data movement utilities. +helper = ds3Helpers.Helper(client=client) + +# Create a HelperPutObject for each item you want to send to the BP. +# This example only puts one file, but you can send more than one at a time. +# For each object you must specify the name it will be called on the BP, the file path, and the size of the file. +file_size = os.path.getsize(file_path) +put_objects = [ds3Helpers.HelperPutObject(object_name="beowulf.txt", file_path=file_path, size=file_size)] + +# Archive the files to the specified bucket +# You can optionally specify max_threads to tune performance. +put_job_id = helper.put_objects(put_objects=put_objects, bucket=bucket_name) +print("BP put job ID: " + put_job_id) + +# we now verify that all our objects have been sent to DS3 +bucketResponse = client.get_bucket(ds3.GetBucketRequest(bucket_name)) + +for obj in bucketResponse.result['ContentsList']: + print(obj['Key'])