<div style="background:#F5F7FA; height:100px; padding: 2em; font-size:14px;">
<span style="font-size:18px;color:#152935;">Want to do more?</span><span style="border: 1px solid #3d70b2;padding: 15px;float:right;margin-right:40px; color:#3d70b2; "><a href="https://ibm.co/wsnotebooks" target="_blank" style="color: #3d70b2;text-decoration: none;">Sign Up</a></span><br>
<span style="color:#5A6872;"> Try out this notebook with your free trial of IBM Watson Studio.</span>
</div>

<div><img src="https://www.ibm.com/blogs/bluemix/wp-content/uploads/2016/11/new-object-storage-card.png", width=370, height=370, align = 'right'> 

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/51/IBM_logo.svg/640px-IBM_logo.svg.png", width = 90, height = 90, align = 'right', style="margin:0px 25px"></div>

# Work with IBM Cloud Object Storage in R

This notebook walks you through working with objects and buckets in IBM Cloud Object Storage (COS) using the R programming language.

This notebook runs on R with Spark 2.1.
________

## Table of contents
1. [IBM Cloud Object Storage and S3](#cos)
2. [Getting requests right](#requests)
3. [Bucket operations](#bucket)
4. [Conclusion](#conclusion)
_______

## 1. IBM Cloud Object Storage (COS) and S3<a id="cos"></a>

##### What is COS exactly?
If you're using IBM Watson or Watson Studio, chances are you are using <a href="https://console.bluemix.net/docs/services/cloud-object-storage/about-cos.html#about-ibm-cloud-object-storage" target="_blank" rel="noopener noreferrer">IBM Cloud Object Storage (COS)</a> as well. COS is a flexible, durable, secure and affordable storage solution that supports a subset of the popular <a href="https://en.wikipedia.org/wiki/Amazon_S3" target="_blank" rel="noopener noreferrer">S3 API</a>.  
______

## 2. Getting requests right<a id="requests"></a>

1. [Authenticate with HMAC keys](#hmac)
2. [Create new service credentials with HMAC keys](#hmaccredentials)
3. [Load libraries and define credentials](#loadlibraries)
4. [Formulate a canonical request](#canonical)
5. [Create a string to sign](#createstring)
6. [Create the signature](#createsignature)

Making requests using the AWS Signature V4 template is no trivial task. We're going to use the `aws.s3` and `aws.signature` libraries from the <a href="https://cloudyr.github.io/" target="_blank" rel="noopener noreferrer">cloudyr</a> project to do the heavy lifting for us. The creators of these packages specifically designed them to allow access to S3-based services.

Making requests is a 4-step process, the gory details of which can be found <a href="https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html" target="_blank" rel="noopener noreferrer">here</a>. Feel free to dive in as deep as you like. When you are ready to get started, come back to this notebook.

### 2.1 Authenticate with HMAC keys<a id="hmac"></a>
Before making any requests via REST we need to procure a set of secret and access keys for authentication. From the COS documentation:

> In addition to <a href="https://console.bluemix.net/docs/services/cloud-object-storage/iam/overview.html#getting-started-with-iam" target="_blank" rel="noopener noreferrer">IAM token-based authentication</a>, it is also possible to authenticate using a signature created from a pair of access and secret keys. This is functionally identical to the AWS Signature Version 4, and the <a href="https://en.wikipedia.org/wiki/Hash-based_message_authentication_code" target="_blank" rel="noopener noreferrer">HMAC Keys</a> provided by IBM COS should work with the majority of S3-compatible libraries and tools.

These keys aren't included in your credentials by default, so you'll need to configure a new set that does possess them.

### 2.2 Create new service credentials with HMAC keys<a id="hmaccredentials"></a>

Use the following steps to create a service credential that includes HMAC keys <a href="https://console.bluemix.net/docs/services/cloud-object-storage/iam/service-credentials.html#service-credentials" target="_blank" rel="noopener noreferrer">(source)</a>:

1. Log in to IBM Cloud and open up your dashboard.
2. Under Resource Group make sure to select **All Resources**, then navigate to your Cloud Object Storage service instance.
3. In the side navigation, click **Service Credentials**.
* Click _'New credential +'_ and provide the necessary information. 
4. To generate HMAC credentials, specify the following in the **Add Inline Configuration Parameters (Optional)** field: `{"HMAC":true}`
5. Click **Add** to generate service credential.

Before you click **Add** it should look something like this:

<div><img src="https://raw.githubusercontent.com/kurlare/airbnbFinder/master/addHmacCos.png", width = 450, height = 450></div>

The HMAC keys are located in the `cos_hmac_keys` field in your credentials and are called:
- `access_key_id` and 
- `secret_access_key`, respectively.  

____________

### 2.3 Load libraries and define credentials<a id="loadlibraries"></a>

Start off by loading these libraries and defining your credentials in a hidden cell. You can do this using `#@hidden_cell` at the beginning of a line in a code cell. 

**Note:**  It is best practice to not leave your HMAC credentials in notebooks or scripts. You can source them from a `credentials.R` file stored in a private repo on GitHub or on the file system in your Watson Studio account. 
You could also set them as OS level environment variables. If you want to add them directly to the notebook at least add the '#@hidden_cell' tag to the top of your code cell with credentials in it. This will hide that cell when sharing the notebook from Watson Studio. It should look like this:

<div><img src="https://raw.githubusercontent.com/kurlare/airbnbFinder/master/hidden_cell.png", width = 400, height = 400></div>

In [1]:

## Install required libraries
suppressWarnings(suppressMessages(install.packages("aws.s3", repos = c("cloudyr" = "http://cloudyr.github.io/drat"))))
suppressWarnings(suppressMessages(install.packages('readr')))
library(aws.s3)
library(aws.signature)
library(readr)

## Put your credentials here (with caution, of course), and add the hidden cell tag to the first line of this code cell.
#access_key_id <- "<yourAccessKeyIdHere>"
#secret_access_key <-"<yourSecretAccessKeyHere>"

### 2.4 Formulate a canonical request<a id="canonical"></a>
AWS V4 signatures require a canonical or 'standardized' request that gets incorporated into the signature.

In [2]:
## Define the headers necessary to create a standardized (canonical) request
hdrs <- list(`Content-Type` = "application/x-www-form-urlencoded; charset=utf-8", 
             Host = "s3-api.us-geo.objectstorage.softlayer.net",
             `x-amz-date` = format(Sys.time(), "%Y%m%dT%H%M%SZ", tz = "UTC"))

## Formulate the canonical request 
cr <- canonical_request(verb = "GET",
                        canonical_uri = "",
                        query_args = list(),
                        canonical_headers = hdrs,
                        request_body = "")

### 2.5 Create a string to sign<a id="createstring"></a>
Take the hash of the canonical request and combine it with metadata about your request.  

In [3]:
## Formulate a unique string to sign using the hash of your canonical request
sts <- string_to_sign(algorithm = "AWS4-HMAC-SHA256", 
                      datetime = format(Sys.time(), "%Y%m%dT%H%M%SZ", tz = "UTC"),
                      region = 'us-standard',
                      service = 's3',
                      request_hash = cr$hash)

### 2.6 Create the signature<a id="createsignature"></a>
Create your signature with the secret access key and the string to sign.

In [4]:
signature <- signature_v4(secret = secret_access_key,
                          region = "us-standard",
                          service = "s3",
                          string_to_sign = sts,
                          verbose = T)

Checking for credentials in user-supplied values
Using user-supplied value for AWS Secret Access Key
Using user-supplied value for AWS Region ('us-standard')


____________

## 3. Bucket operations<a id="bucket"></a>

1. [List buckets](#listbuckets)
2. [Create a bucket](#createbucket)
3. [Putting objects](#putobjects)
4. [Getting objects](#getobjects)
5. [Delete an Object](#deleteobject)
6. [Delete buckets](#deletebucket)

Now that authentication is out of the way we can freely access and work with our COS. You start by listing the buckets in your COS instance. Then you create a new bucket, put an object in it, get that object from it and delete the bucket. To do all of this we'll be using the `s3HTTP()` function from the `aws.s3` package.  

Check out the <a href="https://github.com/cloudyr/aws.s3/blob/master/R/s3HTTP.R" target="_blank" rel="noopener noreferrer">source code on GitHub</a> to get a better idea of what's going on behind the scenes. 

Another important resource to keep at your fingertips is the <a href="https://ibm-public-cos.github.io/crs-docs/api-reference" target="_blank" rel="noopener noreferrer">API Reference</a> for IBM COS.

With that being said, let's get started.

### 3.1 List buckets<a id="listbuckets"></a>
Getting a list of buckets is pretty easy - we just specify that the base URL is Softlayer instead of AWS and use the `GET` verb.

In [5]:
## Make the request, pointing at the IBM COS url instead of Amazon.
## Simple Bucket list
listOfBuckets <- s3HTTP(verb = "GET",
                       url_style = "path",
                       base_url = "s3-api.us-geo.objectstorage.softlayer.net", 
                       key = access_key_id, 
                       secret = secret_access_key,
                       verbose = T)

## To see the complete list, uncomment the next line and execute the cell again.
## listOfBuckets

## Show the list of buckets as a dataframe
cosBucketsDF <- do.call(rbind.data.frame, listOfBuckets$Buckets)
head(cosBucketsDF)

Checking for credentials in user-supplied values
Using user-supplied value for AWS Access Key ID
Using user-supplied value for AWS Secret Access Key
Using default value for AWS Region ('us-east-1')
S3 Request URL: https://s3-api.us-geo.objectstorage.softlayer.net/
Executing request with AWS credentials
Parsing AWS API response
Success: (200) OK


Unnamed: 0,Name,CreationDate
Bucket,anewbucketincos327021909a809fasdf,2018-01-07T01:52:23.171Z
Bucket1,environments7b356c53a936495c832a4fefc034d644,2017-12-08T16:15:16.902Z
Bucket2,forecastingknowledgebase-sy0xtlfrz-catalog-77673589,2017-11-02T19:59:42.510Z
Bucket3,forecastingretailsalesandrevenub1e0d23e2fe34b9fb6142ef19530baec,2017-11-02T19:43:37.545Z
Bucket4,heylookanewbucketabc123doremeezas1231,2018-01-19T13:12:48.985Z
Bucket5,industrial-wdp,2017-10-23T20:03:48.927Z


### 3.2 Create a new bucket <a id="createbucket"></a>
This is also pretty straightforward.  Specify the new bucket name in the `bucket` parameter, making sure to override any region defaults by setting `check_region` to `FALSE` and `region` to _'us-standard'_. 

In [7]:
## Specify the bucket name in the 'bucket' parameter. 
createBucket <- s3HTTP(verb = "PUT",  ## Changed to PUT
                       bucket = "heylookanewbucketabc123doremeezas123m",  ## Name of bucket we want to create
                       path = "", 
                       url_style = "path",
                       base_url = "s3-api.us-geo.objectstorage.softlayer.net", 
                       key = access_key_id, 
                       secret = secret_access_key,
                       verbose = T, 
                       check_region = F,  ## Don't want to get mixed up with Amazon 
                       region = 'us-standard')  ## ^^

Checking for credentials in user-supplied values
Using user-supplied value for AWS Access Key ID
Using user-supplied value for AWS Secret Access Key
Using user-supplied value for AWS Region ('us-standard')
S3 Request URL: https://s3-api.us-geo.objectstorage.softlayer.net/heylookanewbucketabc123doremeezas123m/
Executing request with AWS credentials
Parsing AWS API response
Success: (200) OK


A response status of 200 means we successfully created the bucket!  Let's check our list of buckets to make sure it's there:

In [8]:
## Same as before
listOfBuckets <- s3HTTP(verb = "GET",
                       bucket = "", 
                       path = "", 
                       url_style = "path",
                       base_url = "s3-api.us-geo.objectstorage.softlayer.net", 
                       key = access_key_id, 
                       secret = secret_access_key,
                       verbose = T)

## Convert to dataframe and subset by the index of the new bucket
cosBucketsDF <- do.call(rbind.data.frame, listOfBuckets$Buckets)
cosBucketsDF[grep("heylookanewbucket", cosBucketsDF$Name), ]

Checking for credentials in user-supplied values
Using user-supplied value for AWS Access Key ID
Using user-supplied value for AWS Secret Access Key
Using default value for AWS Region ('us-east-1')
S3 Request URL: https://s3-api.us-geo.objectstorage.softlayer.net/
Executing request with AWS credentials
Parsing AWS API response
Success: (200) OK


Unnamed: 0,Name,CreationDate
Bucket4,heylookanewbucketabc123doremeezas1231,2018-01-19T13:12:48.985Z
Bucket5,heylookanewbucketabc123doremeezas123m,2018-01-22T12:38:30.731Z


Excellent, there it is!  An empty bucket is just waiting for an object to be put in it.  Let's fulfill its existential purpose. 

### 3.3 Putting objects<a id="putobjects"></a>
First we need an object. You like cars, don't you? Good, `mtcars` it is. To upload the file to COS we'll need use the path in the request body and the size in the headers. From the docs:

> _A PUT given a path to an object uploads the request body as an object. A SHA256 hash of the object is a required header._

Let's get those pieces first.

In [9]:
## Write data from base R to CSV file 
write.csv(datasets::mtcars, file = "mtcars.csv")

## Path to mtcars.csv
mtcarsPath <- paste0(getwd(), "/mtcars.csv")

## Length of file in bytes
contentLength <- file.size(mtcarsPath)

paste("Path to mtcars CSV on DSX file system:", mtcarsPath)
paste("Content length of mtcars CSV in bytes: ", contentLength)

To make the PUT request, specify the bucket you want to use, add _server side encryption_ and _content length_ to the `headers` parameter, and the path to the file in `request_body`:

In [10]:
## Put objects in COS
putObjectInCOS <- s3HTTP(verb = "PUT", 
                         bucket = "heylookanewbucketabc123doremeezas123m", ## Specify the bucket to use
                         path = paste0("/", 'mtcars.csv'), ## Give object a name in COS
                         headers = list(`x-amz-server-side-encryption` = 'AES256',  ## Server side encryption header
                                        `Content-Length` = contentLength),  ## Content Length header
                         request_body = mtcarsPath,  ## Point to the file we want to upload
                         url_style = "path",
                         base_url = "s3-api.us-geo.objectstorage.softlayer.net", 
                         key = access_key_id, 
                         secret = secret_access_key, 
                         verbose = T, 
                         check_region = F, 
                         region = 'us-standard')

Checking for credentials in user-supplied values
Using user-supplied value for AWS Access Key ID
Using user-supplied value for AWS Secret Access Key
Using user-supplied value for AWS Region ('us-standard')
S3 Request URL: https://s3-api.us-geo.objectstorage.softlayer.net/heylookanewbucketabc123doremeezas123m/mtcars.csv
Executing request with AWS credentials
Parsing AWS API response
Success: (200) OK


How about we check the contents of that bucket to verify that the mtcars CSV is in it?  We'll display it as a dataframe since that is easier to see.

In [11]:
## List the contents of a specific bucket in COS
listOfBucketContents <- s3HTTP(verb = "GET", 
                               bucket = "heylookanewbucketabc123doremeezas123m",
                               url_style = "path",
                               base_url = "s3-api.us-geo.objectstorage.softlayer.net", 
                               key = access_key_id, 
                               secret = secret_access_key, 
                               verbose = T, 
                               check_region = F, 
                               region = 'us-standard')

cosBucketContentsDF <- do.call(rbind.data.frame, listOfBucketContents$Contents)
cosBucketContentsDF

Checking for credentials in user-supplied values
Using user-supplied value for AWS Access Key ID
Using user-supplied value for AWS Secret Access Key
Using user-supplied value for AWS Region ('us-standard')
S3 Request URL: https://s3-api.us-geo.objectstorage.softlayer.net/heylookanewbucketabc123doremeezas123m/
Executing request with AWS credentials
Parsing AWS API response
Success: (200) OK


Unnamed: 0,ID,DisplayName
Key,mtcars.csv,mtcars.csv
LastModified,2018-01-22T12:38:52.133Z,2018-01-22T12:38:52.133Z
ETag,"""6463474bfe6973a81dc7cbc4a71e8dd1""","""6463474bfe6973a81dc7cbc4a71e8dd1"""
Size,1783,1783
Owner,77673589-cd36-433e-94af-7a2609d3f74d,77673589-cd36-433e-94af-7a2609d3f74d
StorageClass,STANDARD,STANDARD


Success!
________

### 3.4 Getting objects<a id="getobjects"></a>
Now grab that mtcars CSV from COS and put it back in a dataframe. Use the `path` parameter to specify the object you want from the bucket.

In [12]:
## Specify the bucket name and the path to the object in the 'path' parameter.
mtCarsObject <- s3HTTP(verb = "GET",
                       bucket = "heylookanewbucketabc123doremeezas123m",
                       path = "/mtcars.csv",
                       url_style = "path",
                       base_url = "s3-api.us-geo.objectstorage.softlayer.net", 
                       key = access_key_id, 
                       secret = secret_access_key, 
                       verbose = T, 
                       check_region = F, 
                       region = 'us-standard')

mtCarsObject

Checking for credentials in user-supplied values
Using user-supplied value for AWS Access Key ID
Using user-supplied value for AWS Secret Access Key
Using user-supplied value for AWS Region ('us-standard')
S3 Request URL: https://s3-api.us-geo.objectstorage.softlayer.net/heylookanewbucketabc123doremeezas123m/mtcars.csv
Executing request with AWS credentials
Parsing AWS API response
Success: (200) OK


Response [https://s3-api.us-geo.objectstorage.softlayer.net/heylookanewbucketabc123doremeezas123m/mtcars.csv]
  Date: 2018-01-22 12:39
  Status: 200
  Content-Type: text/csv
  Size: 1.78 kB
"","mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb"
"Mazda RX4",21,6,160,110,3.9,2.62,16.46,0,1,4,4
"Mazda RX4 Wag",21,6,160,110,3.9,2.875,17.02,0,1,4,4
"Datsun 710",22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
"Hornet 4 Drive",21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
"Hornet Sportabout",18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
"Valiant",18.1,6,225,105,2.76,3.46,20.22,1,0,3,1
"Duster 360",14.3,8,360,245,3.21,3.57,15.84,0,0,3,4
"Merc 240D",24.4,4,146.7,62,3.69,3.19,20,1,0,4,2
"Merc 230",22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
...

Convert that response back to a dataframe.

In [13]:
## Convert to DF
mtcarsDF <- suppressMessages(httr::content(mtCarsObject))

## First column contains row names
mtcarsDF <- data.frame(tibble::column_to_rownames(mtcarsDF, var = 'X1'))

mtcarsDF

“Setting row names on a tibble is deprecated.”

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4


__________

### 3.5  Delete an object<a id="deleteobject"></a>
A bucket must be emptied before it can be deleted. 
To delete a bucket, change the verb parameter to `DELETE`, then specify the bucket name and object name (don't forget the slash). The remainder of the parameters are the same as creating a bucket or getting an object.

**Note:** If you request to delete a non-empty bucket the server responds with `409 Conflict`.  

In [14]:
## Delete mtcars.csv from COS
deleteObject <- s3HTTP(verb = "DELETE",
                       bucket = "heylookanewbucketabc123doremeezas123m",
                       path = "/mtcars.csv",
                       base_url = "s3-api.us-geo.objectstorage.softlayer.net", 
                       key = access_key_id, 
                       secret = secret_access_key, 
                       check_region = F,
                       verbose = T,
                       region = 'us-standard')

deleteObject

Checking for credentials in user-supplied values
Using user-supplied value for AWS Access Key ID
Using user-supplied value for AWS Secret Access Key
Using user-supplied value for AWS Region ('us-standard')
S3 Request URL: https://s3-api.us-geo.objectstorage.softlayer.net/heylookanewbucketabc123doremeezas123m/mtcars.csv
Executing request with AWS credentials


### 3.6 Delete bucket<a id="deletebucket"></a>
If that worked, we should be able to delete the bucket now. 
The only change here is to leave the `path` parameter empty.

In [15]:
## Delete bucket from COS
deleteBucket <- s3HTTP(verb = "DELETE",
                       bucket = "heylookanewbucketabc123doremeezas123m",
                       path = "",
                       base_url = "s3-api.us-geo.objectstorage.softlayer.net", 
                       key = access_key_id, 
                       secret = secret_access_key, 
                       check_region = F,
                       verbose = T,
                       region = 'us-standard')

deleteBucket

Checking for credentials in user-supplied values
Using user-supplied value for AWS Access Key ID
Using user-supplied value for AWS Secret Access Key
Using user-supplied value for AWS Region ('us-standard')
S3 Request URL: https://s3-api.us-geo.objectstorage.softlayer.net/heylookanewbucketabc123doremeezas123m/
Executing request with AWS credentials


Looks like it worked, but just to confirm lets list the buckets one more time and see if the one we had created is there.

In [16]:
## As before, list buckets
listOfBuckets <- s3HTTP(verb = "GET",
                       bucket = "", 
                       path = "", 
                       url_style = "path",
                       base_url = "s3-api.us-geo.objectstorage.softlayer.net", 
                       key = access_key_id, 
                       secret = secret_access_key,
                       verbose = T)

## Convert to dataframe and subset by the index of the new bucket
cosBucketsDF <- do.call(rbind.data.frame, listOfBuckets$Buckets)
cosBucketsDF[grep("heylookanewbucket", cosBucketsDF$Name), ]

Checking for credentials in user-supplied values
Using user-supplied value for AWS Access Key ID
Using user-supplied value for AWS Secret Access Key
Using default value for AWS Region ('us-east-1')
S3 Request URL: https://s3-api.us-geo.objectstorage.softlayer.net/
Executing request with AWS credentials
Parsing AWS API response
Success: (200) OK


Unnamed: 0,Name,CreationDate
Bucket4,heylookanewbucketabc123doremeezas1231,2018-01-19T13:12:48.985Z


It's gone!  

_________

## 4. Conclusion and next steps<a id="conclusion"></a>
In this notebook you've learned how to access IBM Cloud Object Storage using the S3 API in R. We've covered the basic operations, but there's plenty more you can do. See the <a href="https://ibm-public-cos.github.io/crs-docs/api-reference" target="_blank" rel="noopener noreferrer">API Reference</a> for details.

In a future notebook I hope to show you how you can save any **ar**bitrary R object to COS - models, time series, the works.  For now it will suffice that the door has been opened for R users to enjoy their COS instance. Thanks for reading.

Questions?  Email me at <rafi.kurlansik@ibm.com>.
_______

### Author
**Rafi Kurlansik** is an Open Source Solutions Engineer specializing in big data technologies, such as Hadoop and Spark. He's responsible for developing and delivering demonstrations of IBM tech to both enterprise clients and the larger analytics community. Kurlansik has hands-on experience with machine learning, natural language processing, data visualization, and dashboard development. If you're wondering where he comes down on the biggest data science debate of our day, Rafi is, in his own words, "an avid R fan, especially RStudio!"

Copyright © IBM Corp. 2018. This notebook and its source code are released under the terms of the MIT License.

<div><br><img src="https://softwarereviews.s3.amazonaws.com/production/logos/offerings/2300/original/a47d3df4-fdec-4ad5-bb6e-86f9a0aacebcIBm_logo.png?1490823906" width = 400 height = 400>
</div><br>