# Amazon S3


Amazon S3 (Simple Storage Service) is a web service through which AWS offers storage service. Amazon claims, S3 uses the same scalable storage infrastructure that Amazon.com uses to run its own global e-commerce network. Amazon S3 is a scalable, high-speed, low-cost, web-based cloud storage service designed for online backup and archiving of data and application programs. It is designed with a minimal feature set and created to make web-scale computing easier for developers. 

Its an object storage service, which differs from block and file cloud storage. Each object is stored as a file with its metadata included and given an ID number. Applications use this ID number to access an object. Unlike file and block cloud storage, a developer can access an object via a rest API. S3 enables users to upload, store and download practically any file or object that is up to five terabytes(5 TB) in size.

Amazon S3 comes in two storage classes: S3 Standard and S3 Infrequent Access. S3 Standard is suitable for frequently accessed data that needs to be delivered with low latency and high throughput. S3 Standard targets applications, dynamic websites, content distribution and big data workloads. S3 Infrequent Access offers a lower storage price for backups and long-term data storage.

### Bucket Restrictions and Limitations

From AWS documentation:<br>
A bucket is owned by the AWS account that created it. By default, you can create up to 100 buckets in each of your AWS accounts. If you need additional buckets, you can increase your bucket limit by submitting a service limit increase. For information about how to increase your bucket limit, see AWS Service Limits in the AWS General Reference.

Bucket ownership is not transferable; however, if a bucket is empty, you can delete it. After a bucket is deleted, the name becomes available to reuse, but the name might not be available for you to reuse for various reasons. For example, some other account could create a bucket with that name. Note, too, that it might take some time before the name can be reused. So if you want to use the same bucket name, don't delete the bucket.

There is no limit to the number of objects that can be stored in a bucket and no difference in performance whether you use many buckets or just a few. You can store all of your objects in a single bucket, or you can organize them across several buckets. You cannot create a bucket within another bucket.

The high-availability engineering of Amazon S3 is focused on get, put, list, and delete operations. Because bucket operations work against a centralized, global resource space, it is not appropriate to create or delete buckets on the high-availability code path of your application. It is better to create or delete buckets in a separate initialization or setup routine that you run less often.

#### Note
All bucket names should comply with DNS naming conventions. These conventions are enforced in all Regions except for the US East (N. Virginia) Region. The rules for DNS-compliant bucket names are:

* Bucket names must be at least 3 and no more than 63 characters long.
* Bucket names must be a series of one or more labels. Adjacent labels are separated by a single period (.). Bucket names can contain lowercase letters, numbers, and hyphens. Each label must start and end with a lowercase letter or a number.
* Bucket names must not be formatted as an IP address (e.g., 192.168.5.4).
* When using virtual hosted–style buckets with SSL, the SSL wildcard certificate only matches buckets that do not contain periods. To work around this, use HTTP or write your own certificate verification logic. We recommend that you do not use periods (".") in bucket names.



Read more about AWS S3 [here](https://aws.amazon.com/s3/)

[Deep Dive](http://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html)


### Client vs Resource

You can access S3 either using a clinet object or as a resource. The documentation says little about the difference between a clinet and resource. Each service module(like S3, EC2, SQS etc) has a Client class that provides a 1-to-1 mapping of the service API. Each service module also has a Resource class that provides an object-oriented interface to work with. 

Each resource object wraps a service client. 

    s3 = Aws::S3::Resource.new
    s3.client
    #=> #<Aws::S3::Client>


Given a service resource object you can start exploring related resources without making API calls. If you know the name of a bucket, you can construct a bucket resource without making an API request. 

    bucket = s3.bucket('aws-sdk')
    
In above example, an instance of Aws::S3::Bucket is returned. This is a lightweight reference to an actual bucket that might exist in Amazon S3. When you reference a resource, no API calls are made until you operate on the resource.

Following code will use the bucket reference to delete the bucket.

    bucket.delete
    

You can use a resource to reference other resources. In the next example, I use the bucket object to reference an object in the bucket by its key.

Again, no API calls are made until I invoke an operation such as #put or #delete.

    obj = bucket.object('hello.txt')
    obj.put(body:'Hello World!')
    obj.delete
    
The **`resource`** interface is relatively new compared to client. It has a lot of unfinished features. It doesn't cover all services a client object provides. 

### Create S3 resource object. 

Boto 3 has both low-level clients and higher-level resources. In this notebook we will use, resource interface to use S3 service. 


In [35]:
import boto3
import os
import time
import getpass

system_user_name=getpass.getuser()

s3 = boto3.resource('s3')

### Creating a Bucket


**Request Syntax**

    bucket = s3.create_bucket(
        ACL='private'|'public-read'|'public-read-write'|'authenticated-read',
        Bucket='string',
        CreateBucketConfiguration={
            'LocationConstraint': 'EU'|'eu-west-1'|'us-west-1'|'us-west-2'|'ap-south-1'|'ap-southeast-1'|'ap-southeast-2'|'ap-northeast-1'|'sa-east-1'|'cn-north-1'|'eu-central-1'
        },
        GrantFullControl='string',
        GrantRead='string',
        GrantReadACP='string',
        GrantWrite='string',
        GrantWriteACP='string'
    )

In [36]:
bucket_name=time.strftime("s3.%d%m%Y%H%M%S."+system_user_name)

In [37]:
# Boto 3
s3.create_bucket(Bucket=bucket_name)

s3.Bucket(name='s3.29102017163354.skaf48')

### Storing Data

You can store data from a file, stream, or string.

In [38]:
# Boto 3
s3.Object(bucket_name, 'hello.txt').put(Body=open('hello.txt', 'rb'))

{'ETag': '"cc27e90f51eab831d523f545dcd438a9"',
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '0',
   'date': 'Sun, 29 Oct 2017 21:33:57 GMT',
   'etag': '"cc27e90f51eab831d523f545dcd438a9"',
   'server': 'AmazonS3',
   'x-amz-id-2': 'Bdfm8SLhl2xDHWddi6x8/a7inbIp218tqXNCV/N61+WmD/2W/APkhdwZrDnaJAZgNXESMDRdnB4=',
   'x-amz-request-id': '35BFD0FA2F5A5780'},
  'HTTPStatusCode': 200,
  'HostId': 'Bdfm8SLhl2xDHWddi6x8/a7inbIp218tqXNCV/N61+WmD/2W/APkhdwZrDnaJAZgNXESMDRdnB4=',
  'RequestId': '35BFD0FA2F5A5780',
  'RetryAttempts': 0}}

### Accessing a Bucket

You can easily access a bucket with Boto 3's resources but they cannot check automatically whether a bucket exists.

In [39]:
import botocore
bucket = s3.Bucket(bucket_name)
exists = True
try:
    s3.meta.client.head_bucket(Bucket=bucket_name)
except botocore.exceptions.ClientError as e:
    # If the client error is a 404 error, then the bucket does not exist.
    error_code = int(e.response['Error']['Code'])
    if error_code == 404:
        exists = False

### Uploading files to S3 Bucket

Its easy to upload a file to AWS S3. This is similar to above where we stored hello.txt file in S3 bucket. The syntax is different as in we can specify the name of the file in S3 as we want. Here same file is uploaded to S3 without name change.

In [40]:
filename = 'expression-attributes.json'

# Uploads the given file using a managed uploader, which will split up large
# files automatically and upload parts in parallel.
s3.Bucket(bucket_name).Object(filename)

s3.Object(bucket_name='s3.29102017163354.skaf48', key='expression-attributes.json')

### List all the buckets created under a User's AWS account.

In [41]:
# Create an S3 client
s3 = boto3.client('s3')

# Call S3 to list current buckets
response = s3.list_buckets()

# Get a list of all bucket names from the response
buckets = [bucket['Name'] for bucket in response['Buckets']]

# Print out the bucket list
print("Bucket List: %s" % buckets)

Bucket List: ['aws-logs-714861692883-us-east-1', 'aws-logs-714861692883-us-west-2', 'dsa-mizzou', 'dsabucket1', 'dsabucket2', 'dsabucket3', 'dsatwilio', 'hardata', 's3.29102017163354.skaf48', 'skaf48bucket00', 'skaf48bucket50']


### Access Controls

Getting and setting canned access control values in Boto 3 operates on an ACL resource object. 

Amazon S3 Access Control Lists (ACLs) enable you to manage access to buckets and objects. Each bucket and object has an ACL attached to it as a subresource. It defines which AWS accounts or groups are granted access and the type of access. When a request is received against a resource, Amazon S3 checks the corresponding ACL to verify the requester has the necessary access permissions.

#### Options
--acl (string) The canned ACL to apply to the bucket.

Possible values:

* private
* public-read
* public-read-write
* authenticated-read


In [42]:
# Boto 3
bucket.Acl().put(ACL='public-read')

{'ResponseMetadata': {'HTTPHeaders': {'content-length': '0',
   'date': 'Sun, 29 Oct 2017 21:33:58 GMT',
   'server': 'AmazonS3',
   'x-amz-id-2': 'd4QFGQ+/5eb8TfWEfHv1NIIaQnwM6H7cWYTlkiJqrt4+3lBsCKdhoZiUV946Ia+ao10Vn/Yv6Mw=',
   'x-amz-request-id': 'ECFFDABEBB283184'},
  'HTTPStatusCode': 200,
  'HostId': 'd4QFGQ+/5eb8TfWEfHv1NIIaQnwM6H7cWYTlkiJqrt4+3lBsCKdhoZiUV946Ia+ao10Vn/Yv6Mw=',
  'RequestId': 'ECFFDABEBB283184',
  'RetryAttempts': 0}}

### Downloading files from S3 Bucket

Just like uploading its easy to download a file from AWS S3.

In [43]:
import boto3
import botocore

KEY = 'hello.txt' # replace with your object key

s3 = boto3.resource('s3')

try:
    s3.Bucket(bucket_name).download_file(KEY, 'download.txt')   # key has the file name to be downloaded, download.txt is the 
                                                                # name of downloaded file.
except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] == "404":
        print("The object does not exist.")
    else:
        raise

### Deleting a Bucket

All of the keys in a bucket must be deleted before the bucket itself can be deleted.

In [44]:
# executing this cell will delete the bucket "dsabucket3"

for key in bucket.objects.all():
    key.delete()
bucket.delete()

{'ResponseMetadata': {'HTTPHeaders': {'date': 'Sun, 29 Oct 2017 21:33:58 GMT',
   'server': 'AmazonS3',
   'x-amz-id-2': 'KWzIbvdvlYjaWyPJ4oEc+r9tSfCeFf5kPSdPyW5fyLb08xNzXe4HGEVE/8Gzm/ImEuJP3JiEa6E=',
   'x-amz-request-id': '142C340486B5CA9B'},
  'HTTPStatusCode': 204,
  'HostId': 'KWzIbvdvlYjaWyPJ4oEc+r9tSfCeFf5kPSdPyW5fyLb08xNzXe4HGEVE/8Gzm/ImEuJP3JiEa6E=',
  'RequestId': '142C340486B5CA9B',
  'RetryAttempts': 0}}