# AWS S3 using boto3
This tutorial will demonstrate how to interact with S3 bucket from python using boto3 library.
For ad hoc one time operations you may find the AWS [command line interface](https://aws.amazon.com/cli/) more useful.

In order to get programtic access to aws you will need to provide a `AWS Access Key ID` and `AWS Secret Access Key` even if the resource is "public"
 * The easiest way to authenticate yourself is to install the AWS CLI mentioned above and run command `aws configure` from the command line.
 * When you run `aws configure` it should create a file in `~/.aws/credentials` which contains the login credentials. Boto3 will automatically recognize credentails stored in this location.
 * If you are trying to access a bucket owned by another team/person you will need to have them provide you with the crednetials. AWS configure allows you to setup each new credential under a profile. You can tell boto3 which credential you'd like to use by passing the `profile` argument otherwise it will pickup the `default` profile (if existing).
 * Alternatively, you can type your credentials directly into the script but this is not recommended

**Please read through [this](http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-set-up.html) guide to setup your credentials. For this tutorial I am using my own personal test bucket. If you'd like to actually follow the steps you will need your own credentials and a test bucket. If you'd like to use my bucket to test contact @bstarling in slack and I can provide you with credentials to access `public-test-bucket-d4d` used in this tutorial**

In [1]:
import boto3

In [2]:
# check that boto3 is able to pick up valid credentials
# Will print a list of profiles
session = boto3.Session(profile_name='d4d_tutorial')
session.available_profiles

['d4d', 'd4d_tutorial', 'd4d_s3', 'dynro', 'default']

In [3]:
# Tell boto3 which resource you will use
s3 = session.resource('s3')

In [4]:
# Specify AWS bucket
bucket = s3.Bucket('public-test-bucket-d4d')

In [5]:
# print all objects in bucket
for obj in bucket.objects.all():
    print(obj)

s3.ObjectSummary(bucket_name='public-test-bucket-d4d', key='tutorial/')
s3.ObjectSummary(bucket_name='public-test-bucket-d4d', key='tutorial/data.csv')
s3.ObjectSummary(bucket_name='public-test-bucket-d4d', key='tutorial/file_one.txt')
s3.ObjectSummary(bucket_name='public-test-bucket-d4d', key='tutorial/file_three.txt')
s3.ObjectSummary(bucket_name='public-test-bucket-d4d', key='tutorial/file_two.txt')


In [6]:
# get object by name
file = bucket.Object(key='tutorial/data.csv')
print(file)

s3.Object(bucket_name='public-test-bucket-d4d', key='tutorial/data.csv')


In [7]:
# AWS s3 object
file.get()

{'AcceptRanges': 'bytes',
 'Body': <botocore.response.StreamingBody at 0x108fa0e48>,
 'ContentLength': 8,
 'ContentType': 'text/csv',
 'ETag': '"8015171fe51e613df5dcdf8e89e94b1c"',
 'LastModified': datetime.datetime(2017, 1, 24, 14, 14, 45, tzinfo=tzutc()),
 'Metadata': {},
 'ResponseMetadata': {'HTTPHeaders': {'accept-ranges': 'bytes',
   'content-length': '8',
   'content-type': 'text/csv',
   'date': 'Wed, 25 Jan 2017 12:26:49 GMT',
   'etag': '"8015171fe51e613df5dcdf8e89e94b1c"',
   'last-modified': 'Tue, 24 Jan 2017 14:14:45 GMT',
   'server': 'AmazonS3',
   'x-amz-id-2': 'MfNDWsYCnrUpZPsw6ccST05Tb3CK2l4GR3MP8HenBLEkaSlEmkW4ScgKQgRaBIexSBFkz5dPmWo=',
   'x-amz-request-id': '4DD3D2C04D199104'},
  'HTTPStatusCode': 200,
  'HostId': 'MfNDWsYCnrUpZPsw6ccST05Tb3CK2l4GR3MP8HenBLEkaSlEmkW4ScgKQgRaBIexSBFkz5dPmWo=',
  'RequestId': '4DD3D2C04D199104',
  'RetryAttempts': 0}}

In [8]:
# read file body (careful doing this with large files)
file.get()['Body'].read()

b'my data\n'

In [9]:
#Download file
s3.meta.client.download_file('public-test-bucket-d4d', 'tutorial/data.csv', 'local_data_file.csv')

In [10]:
# or using attributes of variables assigned earlier

In [11]:
s3.meta.client.download_file(bucket.name, file.key, 'local_data_new.csv')

In [12]:
#download all files in a s3 "folder" with specific prefix:
for item in bucket.objects.filter(Prefix='tutorial/file'):
    s3.meta.client.download_file(bucket.name, item.key, 'local_{}'.format(item.key.split('/')[-1]))

In [None]:
# Check dir contents after download
%ls

**NOTE: unless you need to do you work inside python you can do a lot with the [AWS CLI](http://docs.aws.amazon.com/cli/latest/reference/s3/index.html) which uses botocore under the hood*
* **`aws s3 sync <local_folder> s3://bstarling-d4d/data/test`**  sync a folder (need read/write access)
* **`aws s3 cp s3://public-test-bucket-d4d/tutorial/data.csv local_data.csv`** copy a file from s3 to local file
* **`aws s3 cp local_data.csv s3://public-test-bucket-d4d/tutorial/data.csv`** push local file to s3
* **`aws s3 ls s3://public-test-bucket-d4d/tutorial/ --profile d4d_tutorial`** list contents (using d4d_tutorial credentials)