Skip to content

pndaproject/platform-deployment-manager

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Platform Deployment Manager

Design

The Deployment Manager is a service that manages package deployment and application creation for a single PNDA cluster.

  • It implements the Packages, Applications, Repository and EnvironmentEndpoints API used by operators.
  • It parses and validates basic package structure.
  • It interacts with a Repository and a Registrar to determine available & record currently deployed packages and applications.
  • It includes a number of component specific Creator implementations that carry out the concrete steps necessary to set up different parts of the Core Platform.
  • It is easily extensible to support additional component types and repository types.

The design consists of a main class that implements the APIs and coordinates between a Repository, a Registrar and an Application Creator that dynamically loads a number of component specific Creator classes as required by a particular package.

HTTP and Python bindings are provided for these APIs.

Connecting

By default, the Deployment Manager is installed on the edge node. To access the API use: http://[cluster-name]-cdh-edge:5000

Repository

Packages are made available via a repository. The Deployment Manager is configured with a client of this repository at instantiation time. The reference repository is implemented as a thin wrapper over an Openstack Swift container.

Registrar

The details of package deployments for a given service instance are recorded by a registrar. The registrar stores information in HBase in the platform_packages and platform_applications tables.

Application Creator

The Application Creator handles the creation and control of applications on behalf of the Deployment Manager. It implements business logic that is common to all components and delegates to a component specific Creator as required by a particular package. Creator subclasses are dynamically loaded as needed by the Application Creator.

Creator

Each component type is associated with a subclass of Creator. Each Creator implements the specific steps necessary to perform the following functions:

Validation

Each component type has a specific structure. Each Creator implements a validation function that checks that structure. All components are validated before the package is deployed. If any validation function fails, the package is deemed “bad” and package deployment fails. This provides an opportunity to catch simple package construction problems early in the deployment process.

Application creation

Each component type has specific creation requirements and resource dependencies. Each Creator implements the process required to create components of a given type and returns “application_data”. The Deployment Manager aggregates the application data generated by the process of creating each of the components in the package, then persists an association between this and the package deployment using a Registrar.

Application control

Applications may be paused and restarted. This leaves all the installed components in-place and temporarily stops the running processes associated with those components.

Undeployment

Each Creator implements a specific set of steps to uninstall components of its associated type. The Creator is passed the application data associated with the package and component and uses this to execute those steps.

Requirements

Building

To build the Deployment Manager, change to the api directory, which contains the pom.xml file. Type mvn clean package on the command line. Once the build is successful, the built package will be placed in the target folder.

API Documentation

Base URL

All API paths below are relative to a base URL is defined by schemes, host, port and base path on the root level of this API specification.

<scheme>://<host>:<port>/<base path>

By default, the API uses 'https' scheme as the transfer protocol. Host is the domain name or hostname that serves the API. In order to access the API outside PNDA security perimeter, it has to via knox service by using the domain name or FQDN when creating a PNDA cluster. The domain name or FQDN must be resolvable via public or private DNS service. To access the deployment management API, the base path, /gateway/pnda/deployment, must be used as prefixes for all API paths.

e.g. https://knox.example.com:8443/gateway/pnda/deployment

Repository API

List packages from the repository

?recency=n may be used to control how many versions of each package are listed, by default recency=1

GET /repository/packages?user.name=<username>

Response Codes:
200 - OK
403 - Unauthorised user
500 - Server Error

Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml. 

Example response:
[
    {
	"latest_versions": [{
		"version": "1.0.23",
		"file": "spark-batch-example-app-1.0.23.tar.gz"
	}],
	"name": "spark-batch-example-app"
    }
]

Packages API

List packages currently deployed to the cluster

GET /packages?user.name=<username>

Response Codes:
200 - OK
403 - Unauthorised user
500 - Server Error

Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml. 

Example response:
["spark-batch-example-app-1.0.23"]

Get the status for package

GET /packages/<package>/status?user.name=<username>

Response Codes:
200 - OK
403 - Unauthorised user
500 - Server Error

Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml. 

Example response:
{"status": "DEPLOYED", "information": "human readable error message or other information about this status"}

Possible values for status:
NOTDEPLOYED
DEPLOYING
DEPLOYED
UNDEPLOYING

Get full information for package

GET /packages/<package>?user.name=<username>

Response Codes:
200 - OK
403 - Unauthorised user
500 - Server Error

Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml. 

Example response:
{
	"status": "DEPLOYED",
	"version": "1.0.23",
	"name": "spark-batch-example-app",
	"user": "who-deployed-this",
	"defaults": {
		"oozie": {
			"example": {
				"end": "${deployment_end}",
				"start": "${deployment_start}",
				"driver_mem": "256M",
				"input_data": "/user/pnda/PNDA_datasets/datasets/source=test-src/year=*",
				"executors_num": "2",
				"executors_mem": "256M",
				"freq_in_mins": "180",
				"job_name": "batch_example"
			}
		}
	}
}

Deploy package to the cluster

PUT /packages/<package>?user.name=<username>

Response Codes:
202 - Accepted, poll /packages/<package>/status for status
403 - Unauthorised user
404 - Package not found in repository
409 - Package already deployed
500 - Server Error

Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml. 

Undeploy package from the cluster

DELETE /packages/<package>?user.name=<username>

Response Codes:
202 - Accepted, poll /packages/<package>/status for status
403 - Unauthorised user
404 - Package not deployed
500 - Server Error

Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml. 

Applications API

List all applications

GET /applications?user.name=<username>

Response Codes:
200 - OK
403 - Unauthorised user
500 - Server Error

Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml. 

Example response:
["spark-batch-example-app-instance"]

List applications that have been created from package

GET /packages/<package>/applications?user.name=<username>

Response Codes:
200 - OK
403 - Unauthorised user
500 - Server Error

Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml. 

Example response:
["spark-batch-example-app-instance"]

Get the status for application

GET /applications/<application>/status?user.name=<username>

Response Codes:
200 - OK
403 - Unauthorised user
404 - Application not known
500 - Server Error

Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml. 

Example response:
{"status": "STARTED", "information": "human readible error message or other information about this status"}

Possible values for status:
NOTCREATED
CREATING
CREATED
STARTING
STARTED
STOPPING
DESTROYING

Get run-time details for application

GET /applications/<application>/detail?user.name=<username>

Response Codes:
200 - OK
403 - Unauthorised user
404 - Application not known
500 - Server Error

Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml. 

{
        "yarn_applications": {
		    "oozie-example": {
			    "type": "oozie",
				"yarn-id": "application_1479988623709_0015",
				"component": "example",
				"yarn-start-time": 1479992520527,
				"yarn-state": "FINISHED"
			}
		},
		"status": "STARTED",
		"name": "spark-batch-example-app-instance"
}

Get the summary status for application

GET /applications/<application>/summary?user.name=<username>

Response Codes:
200 - OK
403 - Unauthorised user
404 - Application not known
500 - Server Error

Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml. 

Summary status in case of oozie component

{
  "spark-batch-py": {
    "aggregate_status": "COMPLETED",
    "oozie-1": {
      "status": "OK",
      "name": "spark-batch-py-workflow",
      "actions": {
        "job-1": {
          "status": "OK",
          "information": "",
          "yarnId": "application_1531380960927_0152",
          "applicationType": "spark",
          "name": "process"
        }
      },
      "componentType": "Oozie",
      "aggregate_status": "COMPLETED",
      "oozieId": "0000013-180712073712712-oozie-oozi-W"
    }
  }
}

Summary status in case of spark-streaming component

{
  "spark-stream": {
    "aggregate_status": "RUNNING",
    "sparkStreaming-1": {
      "information": {
        "stageSummary": {
          "active": 0,
          "number_of_stages": 1404,
          "complete": 1000,
          "pending": 0,
          "failed": 0
        },
        "jobSummary": {
          "unknown": 0,
          "number_of_jobs": 351,
          "running": 0,
          "succeeded": 351,
          "failed": 0
        }
      },
      "name": "spark-stream-example-job",
      "yarnId": "application_1531380960927_0153",
      "componentType": "SparkStreaming",
      "aggregate_status": "RUNNING",
      "tracking_url": "http://st-2-std-hadoop-mgr-2.node.dc1.pnda.local:8088/proxy/application_1531380960927_0153/"
    }
  }
}

Summary status in case of flink component

{
  "test1": {
    "aggregate_status": "RUNNING",
    "flink-1": {
      "information": {
        "state": "OK",
        "vertices": [
          {
            "status": "RUNNING",
            "name": "Source"
          }
        ],
        "flinkJid": "e7a7163fef86ad81017a0239839207cb"
      },
      "name": "test1-example-job",
      "yarnId": "application_1524556418619_0205",
      "trackingUrl": "http://rhel-hadoop-mgr-1.node.dc1.pnda.local:8088/proxy/application_1524556418619_0205/#/jobs/e7a7163fef86ad81017a0239839207cb",
      "componentType": "Flink",
      "aggregate_status": "RUNNING"
    }
  }
}

Start application

POST /applications/<application>/start?user.name=<username>

Response Codes:
202 - Accepted, poll /applications/<application>/status for status
403 - Unauthorised user
404 - Application not known
500 - Server Error

Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml. 

Stop application

POST /applications/<application>/stop?user.name=<username>

Response Codes:
202 - Accepted, poll /applications/<application>/status for status
403 - Unauthorised user
404 - Application not known
500 - Server Error

Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml. 

Get full information for application

GET /applications/<application>?user.name=<username>

Response Codes:
200 - OK
403 - Unauthorised user
404 - Application not known
500 - Server Error

Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml. 

Example response:
{
	"status": "CREATED",
	"overrides": {
        "user": "somebody",
		"package_name": "spark-batch-example-app-1.0.23",
		"oozie": {
			"example": {
				"executors_num": "5"
			}
		}
	},
	"package_name": "spark-batch-example-app-1.0.23",
	"name": "spark-batch-example-app-instance",
	"defaults": {
		"oozie": {
			"example": {
				"end": "${deployment_end}",
				"input_data": "/user/pnda/PNDA_datasets/datasets/source=test-src/year=*",
				"driver_mem": "256M",
				"start": "${deployment_start}",
				"executors_num": "2",
				"freq_in_mins": "180",
				"executors_mem": "256M",
				"job_name": "batch_example"
			}
		}
	}
}

Create application from package

PUT /applications/<application>?user.name=<username>
{
	"package": "<package>",
	"<componentType>": {
		"<componentName>": {
			"<property>": "<value>"
		}
	}
}

Response Codes:
202 - Accepted, poll /applications/<application>/status for status
400 - Request body failed validation
403 - Unauthorised user
404 - Package not found
409 - Application already exists
500 - Server Error

Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml. 

Example body:
{
	"package": "<package>",
	"oozie": {
		"example": {
			"executors_num": "5"
		}
	}
}

Package is mandatory, property settings are optional

Destroy application

DELETE /applications/<application>?user.name=<username>

Response Codes:
200 - OK
403 - Unauthorised user
404 - Application not known
500 - Server Error

Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml. 

Environment Endpoints API

List environment variables known to the deployment manager

GET /environment/endpoints?user.name=<username>

Response Codes:
200 - OK
403 - Unauthorised user
500 - Server Error

Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml. 

Example response:
{"zookeeper_port": "2181", "cluster_root_user": "cloud-user", ... }

Deployment Manager Variables

The following variables are made available for use in the configuration files for every component and injected as previously described.

Application Variables

application_user        The user ID that this application's components will run as

Component Variables

component_application   unique application ID
component_name          name of component folder in package
component_job_name      application_id-component_name-job
component_xxx           setting xxx from properties.json
hdfspath_path_name      generated from entries in hdfs.json

Environment Variables

These can be obtained with the environment endpoints API

environment_app_packages_hdfs_path    /pnda/deployment/app_packages
environment_hadoop_manager_host       192.168.1.2
environment_hadoop_manager_password   admin
environment_hadoop_manager_username   admin
environment_cluster_private_key         ./dm.pem
environment_cluster_root_user           cloud-user
environment_hbase_rest_port             20550
environment_hbase_rest_server           cluster-cdh-mgr1
environment_hive_port                   10000
environment_hive_server                 cluster-cdh-mgr1
environment_impala_host                 cluster-cdh-dn0
environment_impala_port                 21050
environment_kafka_brokers               192.168.1.3:9092, ...
environment_kafka_manager               https://192.168.1.4:443
environment_kafka_zookeeper             192.168.1.5:2181, ...
environment_metric_logger_url           hhtp://192.169.1.7:3001/metrics
environment_name_node                   hdfs://cluster-cdh-mgr1:8020
environment_namespace                   platform_app
environment_oozie_uri                   http://cluster-cdh-mgr1:11000/oozie
environment_opentsdb                    192.168.1.6:4242
environment_queue_policy                /opt/pnda/rm-wrapper/yarn-policy.sh
environment_webhdfs_host                cluster-cdh-mgr1
environment_webhdfs_port                50070
environment_yarn_node_managers          cluster-cdh-dn0
environment_yarn_resource_manager_host  cluster-cdh-mgr1
environment_yarn_resource_manager_mr_port 8032
environment_yarn_resource_manager_port  8088
environment_zookeeper_port              2181
environment_zookeeper_quorum            cluster-cdh-mgr1

Spark Version Selection for Oozie and Spark Streaming

Both Spark streaming and Oozie components can be configured to use either Spark1 or Spark2. This may be set by including spark_version in properties.json and setting it to 1 or 2. It defaults to Spark1 if spark_version is not included.

component_spark_version            major version of spark to use. Set to '1' or '2'. Only applicable to HDP clusters

Spark Streaming Specific Variables

The following varibles are only injected for Spark streaming components. They may be overridden in properties.json, for example to override component_spark_version, include spark_version in properties.json.

component_spark_submit_args        additional arguments to spark-submit
component_respawn_type             whether to restart the process when it exits. Valid values are always, no, on-success, on-failure, on-abnormal, on-watchdog or on-abort. Refer to the systemd documentation for more information about each of these.
component_respawn_timeout_sec      used with component_respawn_type to set how long to wait (in seconds) before restarting the process when it exits.
(java only) component_main_jar     the jar containing the job code
(python only) component_main_py    the python file containing the job code
(python only) component_py_files   additional python files to pass to spark-submit

Oozie Specific Variables

The following varibles are only injected for Oozie components.

component_end                  2016-03-31T17:07Z
component_start                2016-03-24T17:07Z
mapreduce.job.user.name        hdfs
mapreduce.job.queuename        root.applications.prod
oozie.coord.application.path   hdfs://cluster-cdh-mgr1:8020/user/application_id/component_name/coordinator.xml
oozie.libpath                  /pnda/deployment/platform
oozie.use.system.libpath       true
user.name                      prod1

About

Provides an API that manages package deployment and application creation

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages