Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 26 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,20 @@
Spectra S3 Python3 SDK
--------------

# Spectra S3 Python3 SDK
[![Apache V2 License](http://img.shields.io/badge/license-Apache%20V2-blue.svg)](https://github.com/SpectraLogic/ds3_python3_sdk/blob/master/LICENSE.md)

An SDK conforming to the Spectra S3 [specification](https://developer.spectralogic.com/doc/ds3api/1.2/wwhelp/wwhimpl/js/html/wwhelp.htm) for Python 3.6

Contact Us
----------

## Contact Us
Join us at our [Google Groups](https://groups.google.com/d/forum/spectralogicds3-sdks) forum to ask questions, or see frequently asked questions.

Installing
----------

## Installing
To install the ds3_python3_sdk, either clone the latest code, or download a release bundle from [Releases](http://github.com/SpectraLogic/ds3_python3_sdk/releases). Once the code has been download, cd into the bundle, and install it with `sudo python3 setup.py install`

Once `setup.py` completes the ds3_python3_sdk should be installed and available to be imported into python scripts.

Documentation
-------------
## Documentation
The documentation for the SDK can be found at [http://spectralogic.github.io/ds3_python3_sdk/sphinx/v3.4.1/](http://spectralogic.github.io/ds3_python3_sdk/sphinx/v3.4.1/)

SDK
---

## SDK
The SDK provides an interface for a user to add Spectra S3 functionality to their existing or new python application. In order to take advantage of the SDK you need to import the `ds3` python package and module. The following is an example that creates a Spectra S3 client from environment variables, creates a bucket, and lists all the buckets that are visible to the user.

```python
Expand All @@ -40,8 +31,7 @@ for bucket in getServiceResponse.result['BucketList']:
print(bucket['Name'])
```

Client
---------
## Client
In the ds3_python3_sdk there are two ways that you can create a `Client` instance: environment variables, or manually. `ds3.createClientFromEnv` will create a `Client` using the following environment variables:

* `DS3_ENDPOINT` - The URL to the DS3 Endpoint
Expand All @@ -61,10 +51,27 @@ client = ds3.Client("endpoint", ds3.Credentials("access_key", "secret_key"))

The proxy URL can be passed in as the named parameter `proxy` to `Client()`.

Putting Data
------------
## Examples Communicating with the BP

[An example of using getService and getBucket to list all accessible buckets and objects](samples/listAll.py)

To put data to a Spectra S3 appliance you have to do it inside of the context of what is called a Bulk Job. Bulk Jobs allow the Spectra S3 appliance to plan how data should land to cache, and subsequently get written/read to/from tape. The basic flow of every job is:
### HELPERS: Simple way of moving data to/from a file system
There are helper utilities for putting and getting data to a BP. These are designed to simplify the user workflow so
that you don't have to worry about BP job management. The helpers will create BP jobs as necessary, and transfer data
in parallel to improve performance.

#### How to move everything:
- [An example of putting ALL files in a directory to a BP bucket](samples/putting_all_files_in_directory.py)
- [An example of getting ALL objects in a bucket and landing them in a directory](samples/getting_all_objects_in_bucket.py)

#### How to move some things:
If you only want to move some items in a directory/bucket, you can specify them individually. These examples show how
to put and get a specific file, but the principle can be expanded to transferring multiple items at once.
- [An example of putting ONE file to a BP bucket](samples/putting_one_file_in_directory.py)
- [An example of getting ONE object in a bucket](samples/getting_one_file_in_directory.py)

### Moving data the old way
To put data to a Spectra S3 appliance you have to do it inside the context of what is called a Bulk Job. Bulk Jobs allow the Spectra S3 appliance to plan how data should land to cache, and subsequently get written/read to/from tape. The basic flow of every job is:

* Generate the list of objects that will either be sent to or retrieved from Spectra S3
* Send a bulk put/get to Spectra S3 to plan the job
Expand All @@ -76,6 +83,4 @@ To put data to a Spectra S3 appliance you have to do it inside of the context of

[An example of getting data with the Python SDK](samples/gettingData.py)

[An example of using getService and getBucket to list all accessible buckets and objects](samples/listAll.py)

[An example of how give objects on the server a different name than what is on the filesystem, and how to delete objects by folder](samples/renaming.py)
50 changes: 50 additions & 0 deletions samples/getting_all_objects_in_bucket.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Copyright 2021 Spectra Logic Corporation. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License"). You may not use
# this file except in compliance with the License. A copy of the License is located at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# or in the "license" file accompanying this file.
# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
# CONDITIONS OF ANY KIND, either express or implied. See the License for the
# specific language governing permissions and limitations under the License.

import tempfile

from os import path, walk
from ds3 import ds3, ds3Helpers

# This example gets ALL objects within the bucket books and lands them in a temp folder.
# This uses the new helper functions which creates and manages the BP jobs behind the scenes.
#
# This assumes that there exists a bucket called books on the BP and it contains objects.
# Running the putting_all_files_in_directory.py example will create this setup.

# The bucket that contains the objects.
bucket_name = "books"

# The directory on the file system where the objects will be landed.
# In this example, we are using a temporary directory for easy cleanup.
destination_directory = tempfile.TemporaryDirectory(prefix="books-dir")

# Create a client which will be used to communicate with the BP.
client = ds3.createClientFromEnv()

# Create the helper to gain access to the new data movement utilities.
helper = ds3Helpers.Helper(client=client)

# Retrieve all the objects in the desired bucket and land them in the specified directory.
#
# You can optionally specify a objects_per_bp_job and max_threads to tune performance.
get_job_ids = helper.get_all_files_in_bucket(destination_dir=destination_directory.name, bucket=bucket_name)
print("BP get job IDS: " + get_job_ids.__str__())

# Verify that all the files have been landed in the folder.
for root, dirs, files in walk(top=destination_directory.name):
for name in files:
print("File: " + path.join(root, name))
for name in dirs:
print("Dir: " + path.join(root, name))

# Clean up the temp directory where we landed the files.
destination_directory.cleanup()
57 changes: 57 additions & 0 deletions samples/getting_one_file_in_directory.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Copyright 2021 Spectra Logic Corporation. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License"). You may not use
# this file except in compliance with the License. A copy of the License is located at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# or in the "license" file accompanying this file.
# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
# CONDITIONS OF ANY KIND, either express or implied. See the License for the
# specific language governing permissions and limitations under the License.

import tempfile

from os import path, walk
from ds3 import ds3, ds3Helpers

# This example gets ONE objects within the bucket books and lands it in a temp folder.
# This uses the new helper functions which creates and manages the BP job behind the scenes.
#
# This assumes that there exists a bucket called books on the BP and it contains the object beowulf.txt.
# Running the putting_one_file_in_directory.py example will create this setup.

# The bucket that contains the objects.
bucket_name = "books"

# The directory on the file system where the object will be landed.
# In this example, we are using a temporary directory for easy cleanup.
destination_directory = tempfile.TemporaryDirectory(prefix="books-dir")

# Create a client which will be used to communicate with the BP.
client = ds3.createClientFromEnv()

# Create the helper to gain access to the new data movement utilities.
helper = ds3Helpers.Helper(client=client)

# Create a HelperGetObject for each item you want to retrieve from the BP bucket.
# This example only gets one object, but you can transfer more than one at a time.
# For each object you must specify the name of the object on the BP, and the file path where you want to land the file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary comma

# Optionally, if versioning is enabled on your bucket, you can specify the specific version to retrieve.
# If you don't specify a version, the most recent will be retrieved.
file_path = path.join(destination_directory.name, "beowulf.txt")
get_objects = [ds3Helpers.HelperGetObject(object_name="beowulf.txt", destination_path=file_path)]

# Retrieve the objects in the desired bucket.
# You can optionally specify max_threads to tune performance.
get_job_id = helper.get_objects(get_objects=get_objects, bucket=bucket_name)
print("BP get job ID: " + get_job_id)

# Verify that all the files have been landed in the folder.
for root, dirs, files in walk(top=destination_directory.name):
for name in files:
print("File: " + path.join(root, name))
for name in dirs:
print("Dir: " + path.join(root, name))

# Clean up the temp directory where we landed the files.
destination_directory.cleanup()
47 changes: 47 additions & 0 deletions samples/putting_all_files_in_directory.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Copyright 2021 Spectra Logic Corporation. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License"). You may not use
# this file except in compliance with the License. A copy of the License is located at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# or in the "license" file accompanying this file.
# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
# CONDITIONS OF ANY KIND, either express or implied. See the License for the
# specific language governing permissions and limitations under the License.

import os

from ds3 import ds3, ds3Helpers

# This example puts ALL files within the sub-folder /samples/resources to the bucket called books.
# This uses the new helper functions which creates and manages the BP jobs behind the scenes.

# The bucket where to land the files.
bucket_name = "books"

# The directory that contains files to be archived to BP.
# In this example, we are moving all files in the ds3_python3_sdk/samples/resources folder.
directory_with_files = os.path.join(os.path.dirname(str(__file__)), "resources")

# Create a client which will be used to communicate with the BP.
client = ds3.createClientFromEnv()

# Make sure the bucket that we will be sending objects to exists
client.put_bucket(ds3.PutBucketRequest(bucket_name))

# Create the helper to gain access to the new data movement utilities.
helper = ds3Helpers.Helper(client=client)

# Archive all the files in the desired directory to the specified bucket.
# Note that the file's object names will be relative to the root directory you specified.
# For example: resources/beowulf.txt will be named just beowulf.txt in the BP bucket.
#
# You can optionally specify a objects_per_bp_job and max_threads to tune performance.
put_job_ids = helper.put_all_objects_in_directory(source_dir=directory_with_files, bucket=bucket_name)
print("BP put job IDs: " + put_job_ids.__str__())

# we now verify that all our objects have been sent to DS3
bucketResponse = client.get_bucket(ds3.GetBucketRequest(bucket_name))

for obj in bucketResponse.result['ContentsList']:
print(obj['Key'])
49 changes: 49 additions & 0 deletions samples/putting_one_file_in_directory.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Copyright 2021 Spectra Logic Corporation. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License"). You may not use
# this file except in compliance with the License. A copy of the License is located at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# or in the "license" file accompanying this file.
# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
# CONDITIONS OF ANY KIND, either express or implied. See the License for the
# specific language governing permissions and limitations under the License.

import os

from ds3 import ds3, ds3Helpers

# This example puts ONE file /samples/resources/beowulf.txt to the bucket called books.
# This uses the new helper functions which creates and manages a single BP job.

# The bucket where to land the files.
bucket_name = "books"

# The file path being put to the BP.
file_path = os.path.join(os.path.dirname(str(__file__)), "resources", "beowulf.txt")

# Create a client which will be used to communicate with the BP.
client = ds3.createClientFromEnv()

# Make sure the bucket that we will be sending objects to exists
client.put_bucket(ds3.PutBucketRequest(bucket_name))

# Create the helper to gain access to the new data movement utilities.
helper = ds3Helpers.Helper(client=client)

# Create a HelperPutObject for each item you want to send to the BP.
# This example only puts one file, but you can send more than one at a time.
# For each object you must specify the name it will be called on the BP, the file path, and the size of the file.
file_size = os.path.getsize(file_path)
put_objects = [ds3Helpers.HelperPutObject(object_name="beowulf.txt", file_path=file_path, size=file_size)]

# Archive the files to the specified bucket
# You can optionally specify max_threads to tune performance.
put_job_id = helper.put_objects(put_objects=put_objects, bucket=bucket_name)
print("BP put job ID: " + put_job_id)

# we now verify that all our objects have been sent to DS3
bucketResponse = client.get_bucket(ds3.GetBucketRequest(bucket_name))

for obj in bucketResponse.result['ContentsList']:
print(obj['Key'])