Configuration

Achal-Aggarwal edited this page May 10, 2016 · 4 revisions

Configuration

Arbiter is designed to be highly configurable. The configuration is specified in YAML, just as the workflow definitions.

Example Configuration

Here is an example configuration file for Arbiter:

---
killName: kill
killMessage: "Workflow $$name$$ has failed with msg: [${wf:errorMessage(wf:lastErrorNode())}]"
global:
  defaultArgs: {
    job-tracker: ["${jobTracker}"],
    name-node: ["${nameNode}"],
  }
  properties: {
    oozie.launcher.mapred.job.queue.name: "${launchedQueueName}",
    mapred.job.queue.name: "${queueName}"
  }
credentials:
  - name: "hive2-cred"
    type: "hive2"
    properties: {
      hive2.server.principal: "${hive2Principal}",
      hive2.jdbc.url: "${hiveServer2Url}",
    }
actionTypes:
  - tag: java
    name: rollup
    configurationPosition: 2
    properties: {"mapreduce.job.queuename": "rollups"}
    defaultArgs: {
      job-tracker: ["${jobTracker}"],
      name-node: ["${nameNode}"],
      main-class: ["com.etsy.db.VerticaRollupRunner"],
      arg: ["--file", "$$rollup_file$$", "--frequency", "$$frequency$$", "--category", "$$category$$", "--env", "${cluster_env}"]
      defaultInterpolations: {
        frequency: daily
      }
    }
  - tag: sub-workflow
    name: sub-workflow
    defaultArgs: {
      app-path: ["$$workflowPath$$"],
      propagate-configuration: []
    }
  - tag: java
    name: screamapillar
    configurationPosition: 2
    properties: {"mapreduce.job.queuename": "${queueName}", "mapreduce.map.output.compress": "true"}
    defaultArgs: {
      job-tracker: ["${jobTracker}"],
      name-node: ["${nameNode}"],
      main-class: ["com.etsy.oozie.Screamapillar"],
      arg: ["--workflow-id", "${wf:id()}", "--recipient", "$$recipients$$", "--sender", "$$sender$$", "--env", "${cluster_env}"]
    }
  - tag: hive2
    name: hive2
    cred: "hive2-cred"
    retryMax: 3
    retryInterval: 1
    xmlns: uri:oozie:hive2-action:0.1
    defaultArgs: {
      job-xml: ["hive-config.xml"],
      jdbc-url: ["${hiveServer2Url}"],
      script: ["$$script$$"],
      param: [
        "user=${user}"
      ]
    }
    properties: {
      oozie.hive.defaults: "hive-config.xml"
    }
    configurationPosition: 1

Action Type Configuration

Action types are the primary component of a configuration file. These are defined under the actionTypes node. An action type defines the mapping between an action in an Arbiter workflow definition and the Oozie workflow XML. Multiple Arbiter action types can map to the same Oozie action type. This allows for defining many custom actions without needing to actually create and register those custom actions with Oozie.

Now let's look at an example action type:

  - tag: java
    name: rollup
    configurationPosition: 2
    properties: {"mapreduce.job.queuename": "rollups"}
    defaultArgs: {
      job-tracker: ["${jobTracker}"],
      name-node: ["${nameNode}"],
      main-class: ["com.etsy.db.VerticaRollupRunner"],
      arg: ["--file", "$$rollup_file$$", "--frequency", "$$frequency$$", "--category", "$$category$$", "--env", "${cluster_env}"]
      defaultInterpolations: {
        frequency: daily
      }
    }
  1. tag is the XML tag name of this action type in the Oozie workflow XML. This element is required.
  2. name is the name of this custom action type. This will be used in the Arbiter workflow definitions, but will not appear in the generated XML. This element is required.
  3. configurationPosition allows specifying where in the generated XML for this action type the configuration element should be placed. It is optional, and if unspecified the configuration will be the first element.
  4. properties specifies configuration settings that should be applied to every action with this type. It is optional.
  5. defaultArgs defines elements that should appear in the generated XML for every action of this type. It is required. Moreover, you can define properties that can be defined for an action of this type in the Arbiter workflow definition, such as $$rollup_file$$ in this example. These will be replaced with the values set in the workflow definition with the XML is generated. You can set default values for these keys with defaultInterpolations, as with $$frequency$$ in the example above.

Kill Node Configuration

You can also define how the kill node will look in the generated Oozie workflow XML. There are two properties to do so:

  1. killName defines the name of the kill node.
  2. killMessage defines the error message logged if the kill node executes.

If these are omitted no kill node will be added to the generated XML.

Multiple Configuration Files

You can use multiple configuration files if desired. If there are overlapping settings, the rightmost file specified on the command line will take precedence. When running Arbiter a given configuration file can be specified as "low-priority" so that all standard configuration files will override settings if there is overlap.

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.