In [483]:
##### from collections import OrderedDict
## Pandas
import pandas as pd
from IPython.display import display
from IPython.display import HTML
from pandas.io.json import json_normalize
pd.set_option('max_colwidth',255)
pd.set_option('max_columns',10)

In [537]:
#### Prep for the presentation

### Authenticate to Ambari

#### Python requirements
import difflib
import getpass
import json
import requests
import sys
import time

#### Change these to fit your Ambari configuration
ambari_protocol = 'http'
ambari_server = 'sroberts-bp02.cloud.hortonworks.com'
#ambari_server = 'pregion-shared01.cloud.hortonworks.com'
ambari_port = 8080
ambari_user = 'admin'
#cluster = 'Sandbox'

#### Above input gives us http://user:pass@hostname:port/api/v1/
api_url = ambari_protocol + '://' + ambari_server + ':' + str(ambari_port)

#### Prompt for password & build the HTTP session
ambari_pass = getpass.getpass()
s = requests.Session()
s.auth = (ambari_user, ambari_pass)
s.headers.update({'X-Requested-By':'seanorama'})

#### Authenticate & verify authentication
r = s.get(api_url + '/api/v1/clusters')
assert r.status_code == 200
print("You are authenticated to Ambari!")

········
You are authenticated to Ambari!


# Field Notes: Ambari Blueprints

![Take This](http://nola.liberty.me/wp-content/uploads/sites/1472/2014/11/dangerousgif.gif)

## Nerd Alert: Presenting from ipython

* https://github.com/damianavila/RISE

#### Not Zeppelin

## whoami

Sean Roberts
Partner Engineering, EMEA
![Me](http://i.imgur.com/8uBbEpH.jpg)


## Today

* Requirements for Blueprints
* Refresher on Ambari Stacks
* Blueprint & Cluster Template
* Deploying the Blueprint & Cluster
* Field Notes *(sort of)*
* Questions

## Not Today

### Deploying with Ambari:

- Infrastructure & Node Prep
- Deploying Ambari Server & Agents
- Ambari considerations (java, database, …)
- Lessons learned from large scale deployments

### Ongoing operations:

- General overview of the API
- Configuration Groups
- Adding nodes to a config group

## What You'll Need

* Ambari Server & Agents Installed
* Ambari Agents Registered to Ambari Server (/api/v1/hosts)
* HDP prereqs(networking, OS repos, …)
* If using separate or non-default SQL databases, configure them first.
* Access & credentials to Ambari Server
* Blueprint (JSON)
* Cluster Description (JSON)

  * https://wiki.hortonworks.com/display/SE/SE+Cloud
  * http://github.com/HortonworksUniversity/Ops_Labs/1.1.0/build/security/ambari-bootstrap
  * CloudBreak
    * Note: HA Blueprints not supported *(simple validation bug should be fixed soon)*

In [538]:
r = s.get(api_url + '/api/v1/hosts')
print(json.dumps(r.json(), indent=2))

{
  "href": "http://sroberts-bp02.cloud.hortonworks.com:8080/api/v1/hosts",
  "items": [
    {
      "href": "http://sroberts-bp02.cloud.hortonworks.com:8080/api/v1/hosts/sroberts-bp01.cloud.hortonworks.com",
      "Hosts": {
        "host_name": "sroberts-bp01.cloud.hortonworks.com"
      }
    },
    {
      "href": "http://sroberts-bp02.cloud.hortonworks.com:8080/api/v1/hosts/sroberts-bp02.cloud.hortonworks.com",
      "Hosts": {
        "host_name": "sroberts-bp02.cloud.hortonworks.com"
      }
    },
    {
      "href": "http://sroberts-bp02.cloud.hortonworks.com:8080/api/v1/hosts/sroberts-bp03.cloud.hortonworks.com",
      "Hosts": {
        "host_name": "sroberts-bp03.cloud.hortonworks.com"
      }
    },
    {
      "href": "http://sroberts-bp02.cloud.hortonworks.com:8080/api/v1/hosts/sroberts-bp04.cloud.hortonworks.com",
      "Hosts": {
        "host_name": "sroberts-bp04.cloud.hortonworks.com"
      }
    },
    {
      "href": "http://sroberts-bp02.cloud.hortonworks.com:808

## Reminder: Ambari Stacks

* Stack: HDP, PHD, ...
    * Versions: 2.2
      * Services: HDFS, SPARK
        * Components: NODEMANAGER

## Service & Component List

In [486]:
r = s.get(api_url + '/api/v1/stacks/HDP/versions/2.2/services')

stackservicecomponents = {}
for a in r.json()['items']:
    r = s.get(a['href'] + '/components')
    components = []
    for b in [a['StackServiceComponents'] for a in r.json()['items']]:
        service = b['service_name']
        components.append(b['component_name'])
    stackservicecomponents[service] = ' '.join(components)

In [487]:
pd.DataFrame.from_dict(stackservicecomponents, orient='index').sort()

Unnamed: 0,0
AMBARI_METRICS,METRICS_COLLECTOR METRICS_MONITOR
FALCON,FALCON_CLIENT FALCON_SERVER
FLUME,FLUME_HANDLER
GANGLIA,GANGLIA_MONITOR GANGLIA_SERVER
HBASE,HBASE_CLIENT HBASE_MASTER HBASE_REGIONSERVER
HDFS,DATANODE HDFS_CLIENT JOURNALNODE NAMENODE SECONDARY_NAMENODE ZKFC
HIVE,HCAT HIVE_CLIENT HIVE_METASTORE HIVE_SERVER MYSQL_SERVER WEBHCAT_SERVER
KAFKA,KAFKA_BROKER
KERBEROS,KERBEROS_CLIENT
KNOX,KNOX_GATEWAY


## A Blueprint is

JSON document with 3 sections:

* `Blueprints`: Ambari Stack to use

* `host_groups`: Grouping of hosts and the Ambari Stack's components to deploy

* `configurations`: (optional) Specific configurations to pass through for the Ambari Stack

```json
{
  "Blueprints": {
    "stack_name": "HDP", "stack_version": "2.2"
  },
  
  "host_groups": [
    { "name": "master_1", "components": [ { "name": "NAMENODE" }, ... ] },
    { "name": "slave_1", "components": [ { "name": "DATANODE" }, ... ] }
  ],
  
  "configurations": [
    { "hive-site": { "hive.execution.engine": "tez" }
  ]
}
```

## Blueprint example

* With special configurations:
  - HA HDFS
  - HA YARN Resource Manager
  - Set different HDFS dirs depend on host_group
  - and Oozie using Postgresql instead of Derby

In [569]:
blueprint = json.loads(open('blueprints/blueprint-config-example.json').read())
#blueprint = json.loads(open('blueprints/blueprint-hdfs-ha.json').read())
#blueprint = json.loads(open('blueprints/blueprint-yarn-ha.json').read())
#blueprint = json.loads(open('blueprints/hdp-all.json').read())
print(json.dumps(blueprint, indent=2))

{
  "Blueprints": {
    "stack_name": "HDP",
    "stack_version": "2.2"
  },
  "host_groups": [
    {
      "components": [
        {
          "name": "HDFS_CLIENT"
        },
        {
          "name": "MAPREDUCE2_CLIENT"
        },
        {
          "name": "METRICS_COLLECTOR"
        },
        {
          "name": "METRICS_MONITOR"
        },
        {
          "name": "OOZIE_CLIENT"
        },
        {
          "name": "TEZ_CLIENT"
        },
        {
          "name": "YARN_CLIENT"
        },
        {
          "name": "ZOOKEEPER_CLIENT"
        }
      ],
      "name": "gateway",
      "cardinality": "1"
    },
    {
      "components": [
        {
          "name": "HISTORYSERVER"
        },
        {
          "name": "JOURNALNODE"
        },
        {
          "name": "METRICS_MONITOR"
        },
        {
          "name": "NAMENODE"
        },
        {
          "name": "OOZIE_SERVER"
        },
        {
          "name": "ZKFC"
        },
        {
          "na

In [570]:
host_groups = {}
for group in blueprint['host_groups']:
    host_group = group['name']
    components = []
    for component in group['components']:
        components.append(component['name'])
    host_groups[host_group] = components

## Host Groups & Components

In [571]:
display(pd.DataFrame.from_dict(OrderedDict(sorted(host_groups.items())), orient='index').T.sort())

Unnamed: 0,gateway,master_1,master_2,master_3,slave_archive,slave_standard
0,HDFS_CLIENT,HISTORYSERVER,APP_TIMELINE_SERVER,JOURNALNODE,DATANODE,DATANODE
1,MAPREDUCE2_CLIENT,JOURNALNODE,JOURNALNODE,METRICS_MONITOR,METRICS_MONITOR,METRICS_MONITOR
2,METRICS_COLLECTOR,METRICS_MONITOR,METRICS_MONITOR,NAMENODE,NODEMANAGER,NODEMANAGER
3,METRICS_MONITOR,NAMENODE,RESOURCEMANAGER,RESOURCEMANAGER,,
4,OOZIE_CLIENT,OOZIE_SERVER,ZOOKEEPER_SERVER,ZKFC,,
5,TEZ_CLIENT,ZKFC,,ZOOKEEPER_SERVER,,
6,YARN_CLIENT,ZOOKEEPER_SERVER,,,,
7,ZOOKEEPER_CLIENT,,,,,


## Upload Blueprint

POST /api/v1/blueprints/blueprintname

In [549]:
## Upload the Blueprint

body = blueprint
r = s.post(api_url + '/api/v1/blueprints/testblueprint', data=json.dumps(body))
print(r.status_code) # should return 201
#print(json.dumps(r.json(), indent=2))

201


In [None]:
r = s.get(api_url + '/api/v1/blueprints/testblueprint')
print(json.dumps(r.json(), indent=2))

## Cluster Description

### Setup Oozie database
On Ambari Server:

```bash
echo "host all all 172.24.0.0/16 trust" >> /var/lib/pgsql/data/pg_hba.conf
/etc/init.d/postgresql restart
sudo -u postgres psql

CREATE DATABASE oozie;
CREATE USER oozie WITH PASSWORD 'changethis';
GRANT ALL PRIVILEGES ON DATABASE oozie TO oozie;
```

In [561]:
cluster = {
  "blueprint": "testblueprint",
  "configurations": [
    { "oozie-site": {
          "oozie.db.schema.name" : "oozie",
          "oozie.service.JPAService.create.db.schema" : "true",
          "oozie.service.JPAService.jdbc.driver" : "org.postgresql.Driver",
          "oozie.service.JPAService.jdbc.username" : "oozie",
          "oozie.service.JPAService.jdbc.password" : "changethis",
          "oozie.service.JPAService.jdbc.url" : "jdbc:postgresql://sroberts-bp02.cloud.hortonworks.com:5432/oozie?createDatabaseIfNotExist=true"
    }},
    { "yarn-site" : {
          "yarn.resourcemanager.hostname.rm1": "sroberts-bp04.cloud.hortonworks.com",
          "yarn.resourcemanager.hostname.rm2": "sroberts-bp05.cloud.hortonworks.com"
    }}          
  ],
  "default_password": "changethis",
  "host_groups": [
    { "hosts": [
        { "fqdn": "sroberts-bp02.cloud.hortonworks.com" }
      ], "name": "gateway"
    },
    { "hosts": [
        { "fqdn": "sroberts-bp03.cloud.hortonworks.com" }
      ], "name": "master_1"
    },
    { "hosts": [
        { "fqdn": "sroberts-bp04.cloud.hortonworks.com" }
      ], "name": "master_2"
    },
    { "hosts": [
        { "fqdn": "sroberts-bp05.cloud.hortonworks.com" }
      ], "name": "master_3"
    },
    { "configurations": [
       { "yarn-site": {
         "yarn.nodemanager.local-dirs": "/mnt/hdfs0/yarn/local,/mnt/hdfs1/yarn/local,/mnt/hdfs2/yarn/local",
         "yarn.nodemanager.log-dirs": "/mnt/hdfs0/yarn/log,/mnt/hdfs1/yarn/log,/mnt/hdfs2/yarn/log"
        }
       },{ "hdfs-site": { "dfs.datanode.data.dir": "/mnt/hdfs0/data,/mnt/hdfs1/data,/mnt/hdfs2/data"}}
      ],
      "hosts": [
        { "fqdn": "sroberts-bp06.cloud.hortonworks.com" }
      ],
      "name": "slave_standard"     
    },
    { "hosts": [
        { "fqdn": "sroberts-bp07.cloud.hortonworks.com" },
        { "fqdn": "sroberts-bp08.cloud.hortonworks.com" }
      ], "name": "slave_archive"
    }
  ]
}

In [551]:
body = cluster
r = s.post(api_url + '/api/v1/clusters/mycluster', data=json.dumps(body))
print(r.status_code) ## Should return 202
print(json.dumps(r.json(), indent=2))

202
{
  "href": "http://sroberts-bp02.cloud.hortonworks.com:8080/api/v1/clusters/mycluster/requests/1",
  "Requests": {
    "id": 1,
    "status": "InProgress"
  }
}


In [500]:
r = s.get(api_url + '/api/v1/clusters/mycluster/requests/1')
print(json.dumps(r.json()['Requests'], indent=2))

{
  "type": "INTERNAL_REQUEST",
  "request_status": "IN_PROGRESS",
  "request_context": "Install and start all services",
  "create_time": 1430343177885,
  "cluster_name": "mycluster",
  "completed_task_count": 0,
  "queued_task_count": 34,
  "inputs": null,
  "timed_out_task_count": 0,
  "end_time": -1,
  "operation_level": null,
  "aborted_task_count": 0,
  "failed_task_count": 0,
  "request_schedule": null,
  "resource_filters": [],
  "task_count": 74,
  "progress_percent": 5.5540540540540535,
  "id": 1,
  "exclusive": false,
  "start_time": 1430343178728
}


## Export Blueprint

/api/v1/blueprints

/api/v1/clusters/clustername?format=blueprint

## Field Notes: How to ...

* Separate log locations
* HA


## Field Notes: Separate Databases
    
Example: PostgreSQL for Oozie

1. Prepare the database (see below)
2. Add appropriate configuration to Blueprint or Cluster template



## Field Notes: Blueprint Schema Changes from 1.7 to 2.0

Ambari Metrics replaces Ganglia & Nagios

| 1.7                          | 2.0               |
|------------------------------|-------------------|
| NAGIOS_SERVER GANGLIA SERVER | METRICS_COLLECTOR |
| GANGLIA_MONITOR              | METRICS_MONITOR   |



## Field Notes: HDFS dirs

- Blueprints will not detect your mount points.
- It will use the default path (/hadoop/...) unless set.
- Add to your Blueprint or Cluster template.
- They can be added globally or different for each host-group.

```json
{ "hdfs-site": { "dfs.datanode.data.dir": "/mnt/hdfs0/data,/mnt/hdfs1/data,/mnt/hdfs2/data"}}
```

## Field Notes: Timeouts

Raise the limits if running with limited networking or on slow hardware

```
# grep agent.*timeout /etc/ambari-server/conf/ambari.properties
agent.package.install.task.timeout=1800
agent.task.timeout=900
```

## Consideration: Local Repositories

/api/v1/stacks/HDP/versions/2.2/operating_systems/

```
for repo in HDP-2.2 HDP-UTILS-1.1.0.20; do
    curl -sSu admin http://${ambari_server}:8080/api/v1/stacks/HDP/versions/2.2/operating_systems/redhat6/repositories/${repo} -o /tmp/update-repo.txt
    sed -ir -e 's/\(public\|private\)-repo-1.hortonworks.com/repo.cloud.hortonworks.com/g' -e '/^  "href"/d' /tmp/update-repo.txt
    curl -sSu admin -H x-requested-by:sean http://${ambari_server}:8080/api/v1/stacks/HDP/versions/2.2/operating_systems/redhat6/repositories/${repo} -T /tmp/update-repo.txt
done
```

## Field Notes: Stack Advisor does not run on Blueprints!

But this undocumented API helps: `/api/v1/stacks/HDP/versions/2.2/recommendations`

In [565]:
body = {
      "recommend" : "host_groups",
      "services" : [ "AMBARI_METRICS","FALCON","FLUME","HBASE","HDFS","HIVE","KAFKA","KNOX","MAPREDUCE2","OOZIE","PIG","SLIDER","SPARK","SQOOP","STORM","TEZ","YARN","ZOOKEEPER" ],
      "hosts" : [ "sroberts-bp02.cloud.hortonworks.com","sroberts-bp03.cloud.hortonworks.com","sroberts-bp04.cloud.hortonworks.com","sroberts-bp05.cloud.hortonworks.com","sroberts-bp06.cloud.hortonworks.com","sroberts-bp07.cloud.hortonworks.com","sroberts-bp08.cloud.hortonworks.com" ]
}

r = s.post(api_url + '/api/v1/stacks/HDP/versions/2.2/recommendations', data=json.dumps(body))
print(json.dumps(r.json(), indent=2))

{
  "resources": [
    {
      "services": [
        "KAFKA",
        "PIG",
        "SPARK",
        "MAPREDUCE2",
        "YARN",
        "FALCON",
        "SLIDER",
        "HIVE",
        "TEZ",
        "ZOOKEEPER",
        "STORM",
        "SQOOP",
        "HBASE",
        "OOZIE",
        "FLUME",
        "KNOX",
        "HDFS",
        "AMBARI_METRICS"
      ],
      "recommendations": {
        "blueprint": {
          "configurations": null,
          "host_groups": [
            {
              "components": [
                {
                  "name": "NODEMANAGER"
                },
                {
                  "name": "APP_TIMELINE_SERVER"
                },
                {
                  "name": "DRPC_SERVER"
                },
                {
                  "name": "NIMBUS"
                },
                {
                  "name": "RESOURCEMANAGER"
                },
                {
                  "name": "DATANODE"
                },
        

In [566]:
body = {
      "recommend" : "configurations",
      "services" : [ "AMBARI_METRICS","FALCON","FLUME","HBASE","HDFS","HIVE","KAFKA","KNOX","MAPREDUCE2","OOZIE","PIG","SLIDER","SPARK","SQOOP","STORM","TEZ","YARN","ZOOKEEPER" ],
      "hosts" : [ "sroberts-bp02.cloud.hortonworks.com","sroberts-bp03.cloud.hortonworks.com","sroberts-bp04.cloud.hortonworks.com","sroberts-bp05.cloud.hortonworks.com","sroberts-bp06.cloud.hortonworks.com","sroberts-bp07.cloud.hortonworks.com","sroberts-bp08.cloud.hortonworks.com" ]
}

r = s.post(api_url + '/api/v1/stacks/HDP/versions/2.2/recommendations', data=json.dumps(body))
print(json.dumps(r.json(), indent=2))

{
  "resources": [
    {
      "services": [
        "KAFKA",
        "PIG",
        "SPARK",
        "MAPREDUCE2",
        "YARN",
        "FALCON",
        "SLIDER",
        "HIVE",
        "TEZ",
        "ZOOKEEPER",
        "STORM",
        "SQOOP",
        "HBASE",
        "OOZIE",
        "FLUME",
        "KNOX",
        "HDFS",
        "AMBARI_METRICS"
      ],
      "recommendations": {
        "blueprint": {
          "host_groups": [],
          "configurations": {
            "hadoop-env": {
              "properties": {
                "namenode_opt_maxnewsize": "512",
                "namenode_opt_newsize": "512",
                "namenode_heapsize": "2048"
              }
            },
            "hbase-site": {
              "properties": {
                "hbase.regionserver.global.memstore.upperLimit": "0.4"
              }
            },
            "hbase-env": {
              "properties": {
                "hbase_master_heapsize": "1024",
                "hbase_r

## Blueprint Generator

https://github.com/HortonworksUniversity/Ops_Labs/1.1.0/build/security/ambari-bootstrap/tree/master/deploy

# The End

![Field Notes](https://41.media.tumblr.com/tumblr_mc5ujlmOPE1r1dqs8o1_500.jpg)

### Field Notes from Aaron Wiebe

* We pre-deployed and configured ambari agents on each node.  This allowed us to perform the installation without automated ssh/root permissions being given to Ambari.
* Registration was done in batches of maximum 400 nodes.  Doing more than this crushed the internal yum repository during installation and caused timeouts and installation failures.
* Local repos were used
* Client threads on the ambari server needed to be turned up to handle the high number of agents
* AMS is slick, but requires it’s own server.  HBase backed in distributed mode against HDFS.
* Ambari server database needed its own server and the key cache of the mysql database was turned up to 24G and threads increased.
* Ambari server heap was increased to 12G.
* We’re looking at SSDs for Ambari servers and databases in the future.