## Introduction


In this notebook, we use the Bluemix CLI tools to create a new IBM Analytics Engine instance that is configured to use IBM Cloud Object Storage (IBM COS).

*Prerequisites:* 
- You have worked through the notebook `examples/CLI/CLI_Setup.ipynb`

*Recommended:*
- You have worked through the notebook `examples/CLI/Provision_IAE.ipynb`

---

## Load utility library and set notebook width

To prevent this notebook from getting too cluttered, we use some python utilities.  We load them below.

In [None]:
import sys
sys.path.append("../../modules")
import iae_examples

Let's set this notebook to use the full width of the browser using the utilities

In [None]:
iae_examples.set_notebook_full_width()

---

## Read Cloud Foundry endpoint properties

We can read some variables saved when we ran the notebook `examples/CLI/CLI_Setup.ipynb` to configure our choosen api, org and space

In [None]:
(CF_API, CF_ORG, CF_SPACE) = iae_examples.read_cf_target_endpoint_details('../../secrets/cf_target_endpoint.json')

---

## Save IBM Cloud Object Storage endpoint properties

Create a file `../../secrets/cos_s3_endpoint.json` with your COS credentials.  The file format should be:

```
{
   "S3_ACCESS_KEY":       "<AccessKey-changeme>",
   "S3_PRIVATE_ENDPOINT": "<Private-EndPoint-changeme>",
   "S3_PUBLIC_ENDPOINT":  "<Public-EndPoint-changeme>",
   "S3_SECRET_KEY":       "<SecretKey-changeme>"
}
```

Now let's load the cos file into some variables that we will use later

In [None]:
(S3_ACCESS_KEY, S3_PRIVATE_ENDPOINT, S3_PUBLIC_ENDPOINT, S3_SECRET_KEY) = \
    iae_examples.read_cos_endpoint_details('../../secrets/cos_s3_endpoint.json')

---

## Upload IAE bootstrap file to COS

In [None]:
url = 'https://raw.githubusercontent.com/snowch/IBM_Analytics_Engine_Examples/master/scripts/COS_S3.sh'
filename = 'COS_S3_bootstrap.sh'
bucket_name = 'temp-bucket'

iae_examples.save_url_to_cos(url, bucket_name, filename, S3_ACCESS_KEY, S3_SECRET_KEY, S3_PUBLIC_ENDPOINT)

---

## Provision IAE instance

Before we can provision IAE, we need to login to Bluemix using the Bluemix CLI

In [None]:
! bx login --apikey @../../secrets/apiKey.json -a {CF_API} -o {CF_ORG} -s {CF_SPACE}

There are a few ways to configure IAE to use IBM COS.   Let's automate the process with a custom script.

**NOTE:** These examples prefer automation to manual approaches for configuration.  One key benefit of automation is that it supports creating environments in a repeatable and testable way.

In [None]:
import json

custom_script = { 
    "num_compute_nodes": 1, 
    "hardware_config": "Standard", 
    "software_package": "ae-1.0-spark",
    "customization": [{
        "name": "action1",
        "type": "bootstrap",
        "script": {
            "source_type": "CosS3",
            "source_props": {
                "auth_endpoint": S3_PRIVATE_ENDPOINT,
                "access_key_id": S3_ACCESS_KEY,
                "secret_access_key": S3_SECRET_KEY
             },
         "script_path": bucket_name + "/COS_S3_bootstrap.sh"
        },
        "script_params": [S3_ACCESS_KEY, S3_PRIVATE_ENDPOINT, S3_SECRET_KEY]
    }]
}

# write the script to a file in the local directory where we can access it in the next step using the Bluemix CLI 

with open('../../secrets/custom_script.json', 'w') as f:
    f.write(json.dumps(custom_script))

We can then attempt to create an IBM Analytics Engine Instance.

In [None]:
! bx cf create-service IBMAnalyticsEngine Standard 'myiaeinstance' -c ../../secrets/custom_script.json

---

Note the output from above.  If all went ok, the CLI should suggest running `cf service myiaeinstance` to check the provisioning status. Let's do that now.

**NOTE:** If there is an error output by the above step, jump to the section below on debugging.



In [None]:
! bx cf service myiaeinstance

When the status is: `create succeeded`, move on to the next step.

---

## Create service key

Here we create a service key which contains the cluster credentials. 
We export the service key information to a file. 
We can then read the service key details into python variables so we can use those variables later in this notebook.

In [None]:
! bx cf create-service-key myiaeinstance myiaeinstance_servicekey

In [None]:
! bx cf service-keys myiaeinstance

In [None]:
! bx cf service-key myiaeinstance myiaeinstance_servicekey > ../../secrets/iae_service_key.json

# unfortunately, the output of the above command contains some lines of text before the json
# lets remove the first four lines of output and save the raw json

iae_examples.strip_premable_from_service_key('../../secrets/iae_service_key.json')

In [None]:
IAE_USER       = iae_examples.iae_service_user('../../secrets/iae_service_key.json')
IAE_PASSWORD   = iae_examples.iae_service_password('../../secrets/iae_service_key.json')
IAE_AMBARI_URL = iae_examples.iae_service_endpoint_ambari('../../secrets/iae_service_key.json')
IAE_LIVY_URL   = iae_examples.iae_service_endpoint_livy('../../secrets/iae_service_key.json')

---

## Verify COS was successfully configured

In [None]:
# This is broken
# iae_examples.is_s3_access_key_set(IAE_AMBARI_URL, IAE_USER, IAE_PASSWORD, S3_ACCESS_KEY)

---

## Upload spark script to COS

In [None]:
file_contents = """
from __future__ import print_function

import sys
from random import random
from operator import add

from pyspark.sql import SparkSession

if __name__ == "__main__":
    spark = SparkSession.builder.appName("PythonPi").getOrCreate()

    partitions = 2
    n = 100000 * partitions

    def f(_):
        x = random() * 2 - 1
        y = random() * 2 - 1
        return 1 if x ** 2 + y ** 2 <= 1 else 0

    count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
    print("Pi is roughly %f" % (4.0 * count / n))

    spark.stop()
"""

bucket_name = 'temp-bucket'
filename = 'PiEx.py'

iae_examples.save_string_to_cos(
    file_contents, bucket_name, filename, S3_ACCESS_KEY, S3_SECRET_KEY, S3_PUBLIC_ENDPOINT
    )

---

## Analyse data with Spark 

Execute spark job

In [None]:
import requests, json

headers = { 
    'Content-Type': 'application/json',
    'X-Requested-By': 'livy'
}
data = { "file":"s3a://{0}/PiEx.py".format(bucket_name) }

res = requests.post(IAE_LIVY_URL, auth=(IAE_USER, IAE_PASSWORD), headers=headers, data=json.dumps(data))
print(res.text)

id = res.json()['id']

Get job state (keep running until status `success` or `failed`)

In [None]:
headers = { 
    'Content-Type': 'application/json',
    'X-Requested-By': 'livy'
}
url = '{0}/{1}'.format(IAE_LIVY_URL, id)
response = requests.get(url, auth=(IAE_USER, IAE_PASSWORD), headers=headers)
print(response.json()['state'])

Let's take a look at the spark job log

In [None]:
headers = { 
    'Content-Type': 'application/json',
    'X-Requested-By': 'livy'
}
url = '{0}/{1}/log'.format(IAE_LIVY_URL, id)
response = requests.get(url, auth=(IAE_USER, IAE_PASSWORD), headers=headers)
print('\n'.join(response.json()['log']))

---
### Discussion

In the log output above, I can find the yarn application id, e.g.

```
17/09/23 06:21:51 INFO Client: Application report for application_1506108548102_0002 (state: ACCEPTED)
```

If I ssh onto the cluster, I can run the following command:

```
$ yarn logs -applicationId application_1506108548102_0002 | less
```

Burried in the yarn output, I see:

```
LogType:stdout
...skipping...
Pi is roughly 3.135060

End of LogType:stdout
```

---

## Debugging 

TODO 

In [None]:
! bx cf space dev --guid

In [None]:
! bx cf services

In [None]:
! bx cf service-keys myiaeinstance

In [None]:
! bx cf delete-service-key myiaeinstance myiaeinstance_servicekey -f

In [None]:
! bx cf delete-service myiaeinstance -f 