# Python lab



## Other reading

You may also find information on using the boto3 python library here:

* [Using Boto3 - AWS's SDK for Python](https://igneoussystemshelp.zendesk.com/knowledge/articles/222814587)

* [Boto3 - Retry operations](https://igneoussystemshelp.zendesk.com/knowledge/articles/223204708)


## Import modules
To begin with, execute the following code to import the modules needed for this exercise:

In [1]:
import os
import getpass
import os.path
import boto3
import botocore
from boto3.s3.transfer import S3Transfer
import tempfile

import botocore.utils as boto_utils

print "imported modules"


imported modules


## Instantiate variables
You'll need to input a few things in order to get started.  
'bucket': This is your target bucket, similar to a filesystem directory without the hierarchy. In the Igneous User Interface, buckets are referred to as containers.   

**'access_key'**: The access key which identifies who you are to the server.  You can obtain this from your IT administrator.  This is similar to a ‘username’.

**'secret_key'**: A secret key which authenticates you to the server.  You would also obtain this from your IT administrator.  This is similar to a ‘password’.  As such, you should refrain from storing the secret_key directly in your scripts, especially if they are to be placed in a shared location.  

**'endpoint_url'**: This is the URL which hosts your Igneous Data Service.  If you are unsure what to put here, please consult with your IT Administrator.  Typically the endpoint URL would look something like this:  http://igneous.company.com:80 , or https://igneous.company.com:443.

**Note: if you specify a URL which contains ‘https’ , you will need to change the ‘use_ssl’ parameter to ‘True’.**

In [2]:
access_key='BES30GB4I2KNZN0TS2UY'
secret_key='RxB7LXSbbsf5qSowl+ExDzWyHujFWrlDJ/Z3h3/5'
endpoint_url='http://demo.iggy.bz:7070'
use_ssl = False


print "variables set"


variables set


## Setup connection parameters

You don't need to change anything here, but take notice of what we're doing. Basically, we're taking the access_key and secret-key , and creating a session object which we will later use to establish a connection.

In [3]:

def boto3_session(access_key, secret_key):

    return boto3.Session(
        aws_access_key_id=access_key,
        aws_secret_access_key=secret_key)

print "session function created"


session function created


Once we have the session object, Here's how to create a connection constructor.  Note that we're calling the 'boto3_session' function from within the boto3_s3_client function.

In [4]:
def boto3_s3_client(aKey, sKey, endpoint):

    # Create a session and then a client from it.
    session = boto3_session(aKey, sKey)
    client = session.client(
        's3',
        region_name='iggy-1',
        use_ssl=False,
        verify=False,
        endpoint_url=endpoint,
        config=boto3.session.Config(
            signature_version='s3',
            s3={
                'addressing_style': 'path'
            }
        ))

    # All finished here. We can start using the client as expected now
    return client

print "client function created"

client function created


Now you can actually call it..note that it won't 'do' anything yet.  But we'll print out the object so you can verify that it returned.

In [5]:

client = boto3_s3_client(access_key, secret_key, endpoint_url)
print client


<botocore.client.S3 object at 0x105d01190>


## Test connection & List buckets
Note that merely creating and executing a connection constructor isn't enough to know if its working, you actually have to *use* it.  Therefore, to test it out, lets have it list buckets

In [6]:
def list_buckets(client):

    # See the following URL for more details about this call:
    # http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_buckets
    lsb_resp = client.list_buckets()

    for thisBucket in lsb_resp['Buckets']:
        yield thisBucket['Name']
print "defined list_buckets function"


defined list_buckets function


Now call it, and see what prints out:

In [7]:

bucket_list = list_buckets(client)

for bucketEntry in bucket_list:
	print bucketEntry


indexed
nasa-picture-perday
apfio
apcontainer
s3container
noaa-public-data
apbstor
multimedia-commons
nasa-curiosity-rover


## List objects

Now that you know how to get a list of buckets, you can write a function to list the objects within one (or more) of them, provided your access/secret key has access.

First, lets define the function:

In [8]:


def list_objects(client, bucket):
    """
    Return a generator that iterates through the keys contained within the
    specified bucket.
    """

    # See the following URL for more details about this call:
    # http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects
    lsb_resp = client.list_objects(
        Bucket=bucket)

    for obj in lsb_resp['Contents']:
        yield obj['Key']

print "list_objects function created"

list_objects function created


Now, call it.  You will first have to define the bucket you want to list though, and uncomment out the line in the block below.  Keep in mind that you probably don't want to choose a bucket that has a lot of objects in it, since the printout will take time and screen real estate.

In [9]:

myBucket='apfio'
objList = list_objects(client, myBucket)

for listKey in objList:
    print listKey


/Users/andypern/Desktop/photo.jpg
1M100k/0
1M100k/1
1M100k/10
1M100k/100
1M100k/1000
1M100k/10000
1M100k/10001
1M100k/10002
1M100k/10003
1M100k/10004
1M100k/10005
1M100k/10006
1M100k/10007
1M100k/10008
1M100k/10009
1M100k/1001
1M100k/10010
1M100k/10011
1M100k/10012
1M100k/10013
1M100k/10014
1M100k/10015
1M100k/10016
1M100k/10017
1M100k/10018
1M100k/10019
1M100k/1002
1M100k/10020
1M100k/10021
1M100k/10022
1M100k/10023
1M100k/10024
1M100k/10025
1M100k/10026
1M100k/10027
1M100k/10028
1M100k/10029
1M100k/1003
1M100k/10030
1M100k/10031
1M100k/10032
1M100k/10033
1M100k/10034
1M100k/10035
1M100k/10036
1M100k/10037
1M100k/10038
1M100k/10039
1M100k/1004
1M100k/10040
1M100k/10041
1M100k/10042
1M100k/10043
1M100k/10044
1M100k/10045
1M100k/10046
1M100k/10047
1M100k/10048
1M100k/10049
1M100k/1005
1M100k/10050
1M100k/10051
1M100k/10052
1M100k/10053
1M100k/10054
1M100k/10055
1M100k/10056
1M100k/10057
1M100k/10058
1M100k/10059
1M100k/1006
1M100k/10060
1M100k/10061
1M100k/10062
1M100k/10063
1M100k/1006

## Put your own key

Now, objectStores don't have a concept of a 'directory' , but you can be clever and form the names of your objectKeys to mimic this. Basically, you can prefix all of your objectKeys with something like "username/" .

Lets define a prefix now:

In [10]:

objPrefix = getpass.getuser() + '/'

print "Your prefix is : %s" %(objPrefix)

Your prefix is : andypern/


Now that you have a prefix, any keys that you put that you place will start that way..

## Put an object

In order to put an object, you probably want some content.  This can either be a file (more specifically, the contents of a file), or it can simply be some data.  Later on you can upload files from your laptop, but to get started quickly you can just upload a dummy file with some text data.

First you'll set a variable to hold some text:

In [11]:

TEST_TEXT = ('Nintendo is an awesome place!  Switch is the best thing since and before sliced bread!')

print TEST_TEXT

Nintendo is an awesome place!  Switch is the best thing since and before sliced bread!


Next, lets define a function which will do the actual work of uploading:

In [12]:
def put_object(client, bucket, objKey, TEST_TEXT):

    # See the following URL for more details about this call:
    # http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.put_object
    put_resp = client.put_object(
        Body=TEST_TEXT,
        Bucket=bucket,
        Key=objKey
    )

    # Return the object's version
    return put_resp

print "put_object function defined"

put_object function defined


In order to run it, we'll call it and pass it some variables & objects that we got from before.  Note that you'll want to define the object key (aka: filename) that you want to upload as, and uncomment the appropriate line.  See how we put the prefix in front of it.  

If it runs successfully, you should see a JSON printout of some metadata associated with the object & request.

In [13]:
keyName = "tempfile"
objKey = objPrefix + keyName
put_object(client,myBucket,objKey,TEST_TEXT)


{u'ETag': '"54ce1c61b4dde1b3b03c5c865588d573"',
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '0',
   'content-type': 'text/plain; charset=utf-8',
   'date': 'Thu, 09 Mar 2017 16:20:01 GMT',
   'etag': '"54ce1c61b4dde1b3b03c5c865588d573"',
   'x-amz-request-id': '0-11-bb134932c9fc6b56857fc2d2666ec10b',
   'x-amz-version-id': '17-93b1c2b7ff71d8def3e9e9d5185b41c9'},
  'HTTPStatusCode': 200,
  'HostId': '',
  'RequestId': '0-11-bb134932c9fc6b56857fc2d2666ec10b',
  'RetryAttempts': 0},
 u'VersionId': '17-93b1c2b7ff71d8def3e9e9d5185b41c9'}

## List objects with your prefix

Ok great, you put your very own object.  Remember how, when you listed objects before, you saw everything that was in the bucket?  Well, if you want to only see *YOUR* stuff, you can filter just for objects that start with your prefix.

Lets define a function to do that:

In [14]:


def list_my_objects(client, bucket,objPrefix):
    """
    Return a generator that iterates through the keys contained within the
    specified bucket, which start with a specific prefix.
    """

    lsb_resp = client.list_objects(
        Bucket=bucket,
		Prefix=objPrefix) #notice this additional parameter

    for obj in lsb_resp['Contents']:
        yield obj['Key']

print "list_my_objects function created"

list_my_objects function created


Now lets give it a whirl...you should only see what you put:

In [15]:

myObjList = list_my_objects(client, myBucket,objPrefix)

for myObjKey in myObjList:
    print myObjKey


andypern/photo.jpg
andypern/tempfile


## Get object

So you put an object, and  were able to list just your objects, congrats!  Next, lets define a function that will perform a 'get' on that object, and print the contents out to the screen.  

As always, define the function first:

In [16]:
def get_object(client, bucket, objKey):

    # See the following URL for more details about this call:
    # http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.get_object
    get_resp = client.get_object(
        Bucket=bucket,
        Key=objKey)

    return get_resp['Body']

print "defined get_object function"

defined get_object function


Now call it, specifying the key name (that we defined when we did the put).

In [17]:

objContents = get_object(client, myBucket, objKey)
print objContents.read()



Nintendo is an awesome place!  Switch is the best thing since and before sliced bread!


## Head object

The `head_object` method allows you to get metadata, including extended metadata on an individual object basis.

First, define the function:

In [18]:

def head_object(client, bucket,objKey):
	try:
		response = client.head_object(
			Bucket=bucket,
			Key=objKey)
		return response

	except botocore.exceptions.ClientError as e:
		error_code = int(e.response['Error']['Code'])
		#print error_code
		return "failed: %s" %(error_code)

print "defined head_object function"


defined head_object function


Next, lets actually run it:

In [19]:

objMeta = head_object(client, myBucket, objKey)
print objMeta


{u'ContentType': 'text/plain; charset=utf-8', 'ResponseMetadata': {'HTTPStatusCode': 200, 'RetryAttempts': 0, 'HostId': '', 'RequestId': '0-12-c9a9b0f089cb713a32d46b993d433df7', 'HTTPHeaders': {'content-length': '86', 'last-modified': 'Thu, 09 Mar 2017 16:20:01 GMT', 'etag': '"54ce1c61b4dde1b3b03c5c865588d573"', 'x-amz-request-id': '0-12-c9a9b0f089cb713a32d46b993d433df7', 'date': 'Thu, 09 Mar 2017 16:20:01 GMT', 'x-amz-version-id': '17-93b1c2b7ff71d8def3e9e9d5185b41c9', 'content-type': 'text/plain; charset=utf-8'}}, u'LastModified': datetime.datetime(2017, 3, 9, 16, 20, 1, tzinfo=tzutc()), u'ContentLength': 86, u'VersionId': '17-93b1c2b7ff71d8def3e9e9d5185b41c9', u'ETag': '"54ce1c61b4dde1b3b03c5c865588d573"', u'Metadata': {}}


## Add metadata to an existing object

One of the cool things about an objectStore is that you can add custom metadata to your objects, enabling you to tag for all sorts of reasons and use cases.  In reality, adding metadata to an object is nothing more than performing a PUT on an existing object, with a special parameter.  The function is identical whether you are putting a brand new object, or adding metadata to an existing one.

Define a customized function:

In [20]:
def put_object_with_metadata(client, bucket, objKey,metaDict):

    # See the following URL for more details about this call:
    # http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.put_object
    put_resp = client.put_object(
        Body=TEST_TEXT,
        Bucket=bucket,
        Key=objKey,
		Metadata = metaDict
    )

    # Return the object's version
    return put_resp

print "put_object_with_metadata function defined"

put_object_with_metadata function defined


Perform the put, first defining some metadata.  Feel free to add or change as many fun tags as you like..

In [21]:

metaDict = {
	"username" : getpass.getuser(),
	"job" : "superhero",
	"location" : "marioworld"
}

put_object_with_metadata(client,myBucket,objKey,metaDict)


{u'ETag': '"54ce1c61b4dde1b3b03c5c865588d573"',
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '0',
   'content-type': 'text/plain; charset=utf-8',
   'date': 'Thu, 09 Mar 2017 16:20:01 GMT',
   'etag': '"54ce1c61b4dde1b3b03c5c865588d573"',
   'x-amz-request-id': '0-11-2884abefa4dbdea2d43fa93d50f8475d',
   'x-amz-version-id': '18-a976815a8cfd5d2d09aff00eef6d7816'},
  'HTTPStatusCode': 200,
  'HostId': '',
  'RequestId': '0-11-2884abefa4dbdea2d43fa93d50f8475d',
  'RetryAttempts': 0},
 u'VersionId': '18-a976815a8cfd5d2d09aff00eef6d7816'}

If you got no errors, do another HEAD on the object and see what we get:

In [22]:
objMeta = head_object(client, myBucket, objKey)
print objMeta


{u'ContentType': 'text/plain; charset=utf-8', 'ResponseMetadata': {'HTTPStatusCode': 200, 'RetryAttempts': 0, 'HostId': '', 'RequestId': '0-12-7d800e25c96e1a9eab85b1b435e2a932', 'HTTPHeaders': {'content-length': '86', 'x-amz-meta-username': 'andypern', 'x-amz-meta-location': 'marioworld', 'x-amz-meta-job': 'superhero', 'last-modified': 'Thu, 09 Mar 2017 16:20:01 GMT', 'etag': '"54ce1c61b4dde1b3b03c5c865588d573"', 'x-amz-request-id': '0-12-7d800e25c96e1a9eab85b1b435e2a932', 'date': 'Thu, 09 Mar 2017 16:20:01 GMT', 'x-amz-version-id': '18-a976815a8cfd5d2d09aff00eef6d7816', 'content-type': 'text/plain; charset=utf-8'}}, u'LastModified': datetime.datetime(2017, 3, 9, 16, 20, 1, tzinfo=tzutc()), u'ContentLength': 86, u'VersionId': '18-a976815a8cfd5d2d09aff00eef6d7816', u'ETag': '"54ce1c61b4dde1b3b03c5c865588d573"', u'Metadata': {'username': 'andypern', 'job': 'superhero', 'location': 'marioworld'}}


## Upload a file

Putting dummy files is fun, but real files are better.  Try uploading some files from your Desktop (I'd recommend photo's...)

First, create a function to upload a file, note that its slightly different than putting a file.

In [23]:
def upload_file(client,bucket,objKey,fileName,metaDict):
	transfer = S3Transfer(client)
	transfer.upload_file(fileName, bucket, objKey,
	extra_args={'Metadata': metaDict})

print "defined upload_file function"

defined upload_file function


Now, lets identify a file to upload:

In [24]:
'''
Whatever file you upload must live on your desktop for simplicity sake
'''

homedir = os.path.expanduser('~')
desktop = os.path.join(homedir,'Desktop')

#myFile = "photo.jpg"
fileName = os.path.join(desktop,myFile)
fileTest = os.path.isfile(fileName)
print fileTest


NameError: name 'myFile' is not defined

If that returned `True` , you are good to proceed.  If it returned `False` , make sure `myFile` is actually a filename which lives on your desktop, and be sure to include the file extension.

Now, upload the file:

In [None]:

objKey = objPrefix + myFile
upload_file(client,myBucket,objKey,fileName,metaDict)
uploadMeta = head_object(client,myBucket,objKey)
print uploadMeta

## Delete object

now its time to cleanup after yourself, basically you want to delete the object you just created.

First, define the function:

In [None]:
def delete_object(client, bucket, objKey):

    # See the following URL for more details about this call:
    # http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.delete_object
    get_resp = client.delete_object(
        Bucket=bucket,
        Key=objKey)

    return get_resp
print "defined delete_object function"

Now, actually call it

In [None]:

delete_response = delete_object(client, myBucket, objKey)
print delete_response

## string it all together

well, what if you want to run all of this, at once?

If you were able to run each step piece-meal above without any errors, you should now be able to go to the top of your browser and :

1.  Click the `Kernel` menu
2.  Click `Restart and run all`