__Amazon S3__ (Simple Storage Service) is a Amazon's service for storing files. Here we explain how to access it from command line and from Python.

## Amazon S3

### What it is S3

The name S3 comes from Simple Storage Service. And it is really simple in a sense that one store data using the follwing:
* __bucket__: place to store. Its name is unique for all S3 users, which means that there cannot exist two buckets with the same name even if they are private for to different users.
* __key__: a unique (for a bucket) name that link to the sotred object. It is common to use path like syntax to group objects. 
* __object__: any file (text or binary). It can be partitioned.

### Sign up
First go to 
https://s3.console.aws.amazon.com/s3

and sign up for S3. You can also try to create a bucket, upload files etc. Here we will explain how to use it porogramatically. 

## Data 

But first let's get data we are going to use here. We take the dataset `train.csv` from https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge. We locally store in `data` directory.

### Sampling data

We also sample this dataset in order to have one more example (and faster execution).

In [106]:
import numpy as np
import pandas as pd
comments = pd.read_csv("data/train.csv")
nrows = comments.shape[0]
comments.iloc[np.random.choice(range(nrows), 10000, replace=False)].to_csv("data/train_sample10000.csv", index=False)
comments.iloc[np.random.choice(range(nrows), 1000, replace=False)].to_csv("data/train_sample1000.csv", index=False)
comments.iloc[np.random.choice(range(nrows), 100, replace=False)].to_csv("data/train_sample100.csv", index=False)

## Installing AWS Command Line Interface and boto

In order to install boto (Python interface to Amazon Web Service) and AWS Command Line Interface (__CLI__) type:
```
pip install boto3
pip install awscli
```

Then in your home directory create file `~/.aws/credentials` with the following:

```
[myaws]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY
```

If you add these configuration as `[default]`, you won't need to add `--profile myaws` in CLI commands in Section CLI Basic Commands.

### Where to get credentials from

1. Go to https://console.aws.amazon.com/console/home and log in
2. Click on USER NAME (right top) and select `My Security Credentials`.
3. Click on `+ Access keys (access key ID and secret access key)` and then on `Create New Acess Key`.
4 Choose `Show access key`.

## CLI Basic Commands 

### List buckets
```
aws --profile myaws s3 ls
```

### List all buckets

```
aws --profile myaws s3 ls 
```

### Create buckers
```
aws --profile myaws s3 mb s3://barteks-toxic-comments
```
__Warning__ The bucket namespace is shared by all users of the system so you need to change the name.

### Upload and download files

Upload:
```
aws --profile myaws s3 cp data/train.csv s3://barteks-toxic-comments
aws --profile myaws s3 cp data/train_sample10000.csv s3://barteks-toxic-comments/sample/
aws --profile myaws s3 cp data/train_sample1000.csv s3://barteks-toxic-comments/sample/
aws --profile myaws s3 cp data/train_sample100.csv s3://barteks-toxic-comments/sample/
```
Download:
```
aws --profile myaws s3 cp s3://barteks-toxic-comments/sample/train_sample100.csv data/train_sample100_copy.csv
```

### List files in path
 
```
aws --profile myaws s3 ls s3://barteks-toxic-comments/
aws --profile myaws s3 ls s3://barteks-toxic-comments/sample/
```

### Remove file(s)

```
aws --profile myaws s3 rm s3://barteks-toxic-comments/sample/train_sample2.csv
aws --profile myaws s3 rm s3://barteks-toxic-comments/sample/ --recursive
```

### Delete bucket

For deleting a bucket use
```
aws --profile myaws s3 rb  s3://barteks-toxic-comments
```
in order to delete non empty backet use `--force` option.

In order to empty a backet use
```
aws --profile myaws s3 rm s3://barteks-toxic-comments/ --recursive
```

## Boto

In [99]:
from boto.s3.connection import S3Connection
conn = S3Connection(profile_name='myaws')

#### List buckets

In [107]:
conn.get_all_buckets()

[<Bucket: barteks>,
 <Bucket: barteks-toxic-comments>,
 <Bucket: barteks-toxic-comments-stats>]

In [101]:
[bucket.name for bucket in conn.get_all_buckets()]

['barteks', 'barteks-toxic-comments', 'barteks-toxic-comments-stats']

#### Create a bucket for public read

__Warning__ As before, bucket's namespace is shared, so the following comment will produce an error, unless, you change the `Bucket` name.

And you have the followng Access Control List (ACL) options while creating it: `'private', 'public-read', 'public-read-write', 'authenticated-read'`.

In [48]:
# bucket = conn.create_bucket("barteks-toxic-comments-stats")

#### Deleting

In [44]:
# conn.delete_bucket('barteks-toxic-comments-stats')

#### Store data

In [84]:
from boto.s3.key import Key
bucket = conn.get_bucket('barteks-toxic-comments')
k = Key(bucket)
k.key = 'string_example'
k.set_contents_from_string('Short text')

10

In [57]:
k.delete()

<Key: barteks-toxic-comments-stats,foobar>

In [85]:
!aws --profile myaws s3 ls s3://barteks-toxic-comments/

2018-09-09 00:26:56         10 string_example
2018-09-04 08:38:55   68802655 train.csv


#### Read and list keys

In [104]:
bucket = conn.get_bucket('barteks-toxic-comments')
keys = bucket.get_all_keys()
keys

[<Key: barteks-toxic-comments,sample/train_sample1.csv>,
 <Key: barteks-toxic-comments,sample/train_sample2.csv>,
 <Key: barteks-toxic-comments,train.csv>]

In [105]:
keys[0].bucket, keys[0].key

(<Bucket: barteks-toxic-comments>, 'sample/train_sample1.csv')

In [89]:
from boto.s3.key import Key
key = Key(bucket)
key

<Key: barteks-toxic-comments,None>

In [93]:
key.key = 'string_example'
key.get_contents_as_string()

b'Short text'

In [34]:
conn.lookup('no-existsing')

In [43]:
keys[0]

<Key: barteks-toxic-comments,train.csv>

In [40]:
conn.lookup(keys[0])

In [13]:
get_bucket('barteks-toxic-comments')

AttributeError: 's3.ServiceResource' object has no attribute 'get_bucket'

In [None]:
#### Write

In [10]:
?s3.create_bucket

In [37]:
sample_comments = pd.read_csv("data/train_sample.csv")
sample_comments

Unnamed: 0,id,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,2b092ac096e1d9e7,- I also would like to know your reason for li...,0,0,0,0,0,0
1,e9c53ba11ad12535,Indo-Aryan Caucasians (Proto Nordics).,0,0,0,0,0,0
2,34c116438bb64bff,if \n\nIF YOU HAD ANY GUTS YOU'D BAN ME RIGHT ...,1,0,1,0,1,1
3,378452f47e52dc7f,"Logicus to David Wilson: David, thanks for thi...",0,0,0,0,0,0
4,2b4d774d68fe88d4,started this article not realizing how late it...,0,0,0,0,0,0
5,d115039cf2333750,No \n\nYOU will be block! Now go fuck yourself...,1,0,1,0,1,0
6,61a4947dde48be07,This isn't a discussion page to discuss the sy...,0,0,0,0,0,0
7,4146bf30c7fdcbe1,No. Controversy shouldn't be a criterion for c...,0,0,0,0,0,0
8,49f7e174ec963927,"FUCK YOU ALL, LOSERS \n\nFUCK OFF MY PAGE, FAGS",1,1,1,0,1,0
9,d91bfd05f58bea8b,"""I read on some blogs someone noticing an ad f...",0,0,0,0,0,0




* Empty a bucket
aws s3 rm s3://bucket-name/doc --recursive


## Links:

* https://github.com/boto/boto
* https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge
* http://boto.cloudhackers.com/en/latest/s3_tut.html

In [19]:
import boto3
session = boto3.Session(profile_name='myaws')
s3 = session.resource('s3')
[bucket.name for bucket in conn.buckets.all()]

AttributeError: 'S3Connection' object has no attribute 'buckets'