Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
742 lines (580 sloc) 31.2 KB

Open Cognitive Skills Specification

Version 1.0.0 DRAFT

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 RFC2119 RFC8174 when, and only when, they appear in all capitals, as shown here.

This document is licensed under The Apache License, Version 2.0.

Introduction

The Open Cognitive Skills (OCS) specification defines a standard, language-agnostic, and platform-agnostic language that describes how to use containerized, serverless applications to composes and executes code, machine learning models, and data. Skills enables data scientists and developers to easily compose, pack, ship, run and scale any machine learning model and code across any design and deployment environment.

Releationship to Cognitive Agent Modeling and Execution Language (CAMEL)

The Open Cognitive Skills specification is the first of a family of specifications grouped under the CAMEL initiative. The CAMEL initiative defines a standard, language-agnostic, and platform-agnostic language that describes the composition and orchestration of skills, data, and machine learning models to define and exeucte a Cognitive Agent that is used to augment human intelligence. CAMEL is an acronym that stands for "Cognitive Agent Modeling and Execution Language" and can be thought of as a language for programming Cognitive Agents. The OCS specification focuses on the Cognitive Skill construct which is the key building block of Cognitive Agents.

Table of Contents

Definitions

Open Cognitive Skill Document

A document that defines or describes a OCS resource. Skills are the primary type of OCS resource and conform to the common properties and conventions of all CAMEL resource types. The relevant properties and conventions are repeated here in this specification for simplicity.

Resources

A Resource is a first class domain object in CAMEL.

  • Resources have unique names (Fully Qualified Names or FQN’s)
  • Resources are versioned; each modification to a resource creates a new version
  • A reference to a Resource ID, is a reference to a specific version of that resource for a specific owner
  • A reference to a Resource name must be resolved to a scope (version and ownership)
  • Resources have an owner (a user) and privileges (Read, Write, Admin)
  • Privileges can be assigned to users or groups
Resource Names

Resources are named using Fully Qualified Names (FQN’s). A FQN consists of:

  • Namespace: String
  • Name: String
  • Version: Integer
  • Format: <ns>/<name>:<version>

The Namespace is required but defaults to default. Example namespace: acme

Version is optional and defaults to the latest version.

Example FQN’s:

  • acme/sentiment_analysis
  • acme/compliance_checker:47
  • acme/is_it_green
Skill

A containerized, serverless application that composes and executes code, machine learning models, and data. It enables data scientists and developers to easily compose, pack, ship, run and scale any machine learning model and code across any design and deployment environment.

Dataset

Provide a means to store and retrieve data. Two-dimensional, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Can also have metadata describing schema, organization w.r.t. an industry taxonomy, licensing terms, access rights, etc.

Model

The term Model refers to the model artifact that is created by a training or configuration process. A trained ML model refers to a Model that has been created by training a learning algorithm with training data. Other Model types include linguistic models which may be learning algorithm based, rules based, or some combination. Business rules or heuristics codified in business logic can also be used to create a Model.

Message

Declaration of a type used as input and/or output to a Skills and Agents. Messages contain a list of ordered fields that each have a specific data type (string, boolean, integer, float, etc.).

Schema

Allows the definition of input and output data types in a reusuable object that can be referenced from multiple resources. For example, multiple Skills can share a single Schema for a common input or output parameter list.

Specification

Versions

The OCS Specification is versioned using Semantic Versioning 2.0.0 (semver) and follows the semver specification.

The major.minor portion of the semver (for example 1.0) SHALL designate the OCS feature set. Typically, .patch versions address errors in this document, not the feature set. Tooling which supports OCS 1.0 SHOULD be compatible with all OCS 1.0.* versions. The patch version SHOULD NOT be considered by tooling, making no distinction between 1.0.0 and 1.0.1 for example.

Subsequent minor version releases of the OCS Specification (incrementing the minor version number) SHOULD NOT interfere with tooling developed to a lower minor version and same major version. Thus a hypothetical 1.1.0 specification SHOULD be usable with tooling designed for 1.0.0.

An OCS document compatible with OCS 1.*.* contains a required camel field which designates the semantic version of the OCS specification that it uses.

Format

A OCS document that conforms to the OCS Specification is itself a JSON object, which may be represented either in JSON or YAML format.

For example, if a field has an array value, the JSON array representation will be used:

{
   "field": [ 1, 2, 3 ]
}

All field names in the specification are case sensitive.

In order to preserve the ability to round-trip between YAML and JSON formats, YAML version 1.2 is RECOMMENDED along with some additional constraints:

Data Types

Primitive data types in OCS are based on the types supported by the JSON Schema Specification Wright Draft 00. Note that integer as a type is also supported and is defined as a JSON number without a fraction or exponent part. null is not supported as a type (see nullable for an alternative solution). Data types are defined using the Parameters, which is an extended subset of JSON Schema Specification Wright Draft 00.

Primitives have an optional modifier property: format. OCS uses several known formats to define in fine detail the data type being used. However, to support documentation needs, the format property is an open string-valued property, and can have any value.

Formats such as "email", "uuid", and so on, MAY be used even though undefined by this specification.

Types that are not accompanied by a format property follow the type definition in the JSON Schema. Tools that do not recognize a specific format MAY default back to the type alone, as if the format is not specified.

The formats defined by the OCS specification are:

Common Name type format Comments
integer integer int32 signed 32 bits
long integer int64 signed 64 bits
float number float
double number double
string string
byte string byte base64 encoded characters
binary string binary any sequence of octets
boolean boolean
date string date As defined by full-date - RFC3339
dateTime string date-time As defined by date-time - RFC3339
object object embedded/nested object
array array A valid type embedded/nested array

Examples

An integer

{
  "type": "integer"
}

A double

{
  "type": "number",
  "format": "double"
}

A base64 encoded binary

{
  "type": "string",
  "format": "byte"
}

A date-time string

{
  "type": "string",
  "format": "date-time"
}

Embedded object

{
  "type": "object"
}

Embedded array of numbers

{
  "type": "array",
  "format": "number"
}

Rich Text Formatting

Throughout the specification description fields are noted as supporting CommonMark markdown formatting. Where OCS tooling renders rich text it MUST support, at a minimum, markdown syntax as described by CommonMark 0.27. Tooling MAY choose to ignore some CommonMark features to address security concerns.

Schema

In the following description, if a field is not explicitly REQUIRED or described with a MUST or SHALL, it can be considered OPTIONAL.

Common Fields

The following fields are common across many OCS resources and will be referenced throughout the specification.

Field Name Type Description
camel string REQUIRED. This string MUST be the semantic version number of the OCS Specification version that the OCS document uses. The camel field SHOULD be used by tooling specifications and clients to interpret the OCS document. This is not related to the resource version string.
name string REQUIRED. The fully qualified name of the resource.
title string OPTIONAL. The human friendly name given to a resource.
description string OPTIONAL. A free-text account of a resource. MAY include rich text. MAY be a reference to external documentation using a URL Reference Object.
tags [Tag Object] OPTIONAL. An array of tags used to annotate the resource.

System Fields

A OCS document MAY include system fields. Any field that starts with a _ is considered a system field. Some common system fields are described below:

Field Name Type Description
_version integer The resource version number.
_createdAt dateTime Captures the date and time of resource creation.
_updatedAt dateTime Captures the date and time of last resource update.
_id string A unique identifier assigned by the system to identify a resource.

Common Objects

The following objects are common across many OCS resources and will be referenced throughout the specification.

Property Definition Object

The property defintion object is used to declare a configurable property of a resource.

Fixed Fields
Field Name Type Description
name string REQUIRED. The unique name of this property within the resource. Tools and libraries MUST use the name to uniquely identify the property, therefore, it is RECOMMENDED to follow common programming naming conventions.
title string REQUIRED.
description string OPTIONAL.
required boolean OPTIONAL. Is this property required to be set? Default value is false.
type string REQUIRED. Must be one of Enum, String, Boolean, Number.
defaultValue any OPTIONAL. The default value for this property.
validValues [string] REQUIRED. An array of valid values for use with the Enum property type. Ignored for other property types.
qualifiedBy string OPTIONAL. Scopes this property to another property value by name. For example, a property qualifiedBy=fileType with name json/style will only be active when the fileType property has value json.
secure boolean OPTIONAL. Should this property be encrypted? Default value is false.
Property Value Object

The property value object is used to set a property.

Fixed Fields
Field Name Type Description
name string The property name.
value any The property value.
Tag Object

The tag object is used to annotate resources with descriptive labels or categories to enable discovery.

Fixed Fields
Field Name Type Description
label string The tag label.
value string The tag value.
Parameter Object

The parameter object is used to define a list of ordered parameters used in a Message.

Examples

Primitive types example: User, Item, Rating

parameters:
  - name: uid
    title: User ID
    type: integer
    format: int64
    required: true
  - name: iid
    title: Item ID
    type: integer
    format: int64
    required: true
  - name: rating
    title: Rating
    type: number
    format: double
    required: true

Embedded objects and arrays example: News Article

parameters:
  - name: articleId
    title: Article ID
    type: string
    required: true
  - name: headline
    title: Headline
    type: string
    required: true
  - name: text
    title: Article Text
    type: string
    required: false
  - name: imageLinks
    title: Image Links
    type: array
    format: url
    required: false
  - name: feedInfo
    title: Feed Information
    type: object
    required: false
Fixed Fields
Field Name Type Description
name string REQUIRED. The unique name of the parameter. The name SHOULD be a code friendly identifier.
title string OPTIONAL. See Resource Title.
description string OPTIONAL. See Resource Description.
type string REQUIRED. One of the six valid types allowed by OCS: integer, number, boolean, string, object, array.
format string OPTIONAL. A descriptive format string like date, email, or double. See Data Types for a discussion of the built-in formats.
required boolean OPTIONAL. Default false.
Reference Object

The reference object is used to declare a pointer to a resource such as a Dataset or Schema.

Examples

Dataset reference, used in "pass-by-value" scenarios:

{
    "payload": {
        "$ref": "examples/movies_dataset"
    }
}

Schema reference, used in parameter declarations:

parameters:
  $ref: examples/MovieInformation
Fixed Fields
Field Name Type Description
$ref string REQUIRED. The Resource Name, including version if desired, of the resource this object is pointing to.
URL Reference Object

The URL reference object is used to declare a hyperlink to an external resource, such as external documentation, and can be used in description (see Resource Description) fields or other fields that support resource linking.

Examples

External documentation reference

{
    "description": {
        "$url": "http://example.com/external_markdown_doc.md"
    }
}
Fixed Fields
Field Name Type Description
$url string REQUIRED. A valid RFC 1738 URL pointing to an external resource that should be used as the value for the containing field.

Skill Object

This is the root document object of a OCS document that contains a Skill definition.

Skill Object Example

An exmaple "Hello World" Skill is shown below:

camel: 1.0.0
name: default/hello_world
title: Hello World
description: The classic Hello World example.
properties:
  -
    name: lang
    title: Language
    description: The language to say hello in.
    required: true
    type: Enum
    defaultValue: en
    validValues:
      - en
      - es
      - it
      - de
inputs:
  -
    name: yourName
    title: Your Name
    parameters:
      - name: name
        type: string
        description: The name to send
        required: true
    routing:
      all:
        action: default/hello_world
        runtime: cortex/functions
        output: greeting
outputs:
  -
    name: greeting
    title: Greeting
    parameters:
      - name: message
        type: string
        description: The greeting message
Fixed Fields
Field Name Type Description
camel string REQUIRED. See OCS Specification Version.
name string REQUIRED. See Resource Name.
title string REQUIRED. See Resource Title.
description string OPTIONAL. See Resource Description.
tags [Tag Object] OPTIONAL. See Resource Tags.
properties [Property Object] OPTIONAL.
inputs [Skill Input Object] REQUIRED. An array of Input Objects. At least one Input is required.
outputs [Skill Output Object] OPTIONAL. An array of Output Objects.
models [Model Defintion Object] OPTIONAL. An array of Model Definition Objects.
datasets [DataSet Reference Object] OPTIONAL. An array of Dataset Reference Objects. Each reference declares a dependency on a Dataset that must be mapped at runtime.
Skill Input Object

This object defines an input message used by the Skill.

Fixed fields
Field Name Type Description
name string REQUIRED.
title string REQUIRED.
parameters [Parameter Object | Reference Object] REQUIRED.
routing Routing Object REQUIRED.
Skill Output Object

This object defines an output message used by the Skill.

Fixed fields
Field Name Type Description
name string REQUIRED.
title string REQUIRED.
parameters [Parameter Object | Reference Object] REQUIRED.
Routing Object

This object defines the routing rules for a Skill input. Skills route Messages received on an Input to an Action for processing and then to an Output. A Skill MUST define at least one routing rule for each Input. Messages can be routed based on properties or Message field values. The simpliest form of routing is the all routing which routes all Messages received on a given Input to a single Action.

Examples

The ALL routing. Routes all messages to a single action.

routing:
  all:
    action: example/hello_world
    output: greeting

Property based routing

routing:
  property: model
  default:
    action: example/sentiment_python_pattern
    output: sentiment
  rules:
    - match: Stanford Sentiment
      action: example/sentiment_stanford
      output: sentiment
    - match: Microsoft Cognitive Services
      action: example/sentiment_microsoft
      output: sentiment
    - match: IBM Watson
      action: example/sentiment_watson
      output: sentiment

Field based routing

routing:
  field: language
  default:
    action: example/sentiment_english
    output: sentiment
  rules:
    - match: es
      action: example/sentiment_spanish
      output: sentiment
    - match: de
      action: example/sentiment_german
      output: sentiment
    - match: it
      action: example/sentiment_italian
      output: sentiment
All Routing Fixed fields
Field Name Type Description
action string REQUIRED. The Resource Name of the action to route to.
runtime string OPTIONAL. The Resource Name of the action runtime to use. The default runtime is assumed if not provided.
output string REQUIRED. The name of the Output to route to.
Property Routing Fixed fields
Field Name Type Description
property string REQUIRED. The name of the property to apply routing rules to.
default All Routing OPTIONAL. The default routing rule used if no property matches are made.
rules [Routing Rule Object] REQUIRED. List of routing rules to apply to the specified property value.
Field Routing Fixed fields
Field Name Type Description
field string REQUIRED. The name of the Message field to apply routing rules to.
default All Routing OPTIONAL. The default routing rule used if no property matches are made.
rules [Routing Rule Object] REQUIRED. List of routing rules to apply to the specified field value.
Routing Rule Object

This object defines a routing rule to apply to a value that comes from either a Skill property or Message field value.

Fixed fields
Field Name Type Description
match string The value to match.
action string REQUIRED. The Resource Name of the action to route to.
runtime string OPTIONAL. The Resource Name of the action runtime to use. The default runtime is assumed if not provided.
output string REQUIRED. The name of the Output to route to.
Model Definition Object

This object declares a model that is created by or used by a Skill. A Model is considered to be any machine learning model, statistical model, or other model that is trained, versioned, and deployed by a Skill. A Model has the following declarations:

  1. Metadata describing the model such as functional goal and algorithm used.
  2. Model verification records to determine if a model is generating results that are consistant with its design (see PMML Model Verification)
  3. Model quality metrics based on the type of algorithm, function, and modeling assumptions.
Fixed fields
Field Name Type Description
name string REQUIRED. The resource name of the model. MUST be unique within a Skill definition.
title string REQUIRED. The human friendly display title of the model.
description string OPTIONAL. The rich text description of the model.
tags [Tag] OPTIONAL.
functionName string OPTIONAL. The functional description of the model use. For example, classification or regression. MUST be one of the defined Model Functions.
algorithmName string OPTIONAL. The name of the underlying algorithm. For example, Linear Regression or LSTM.
Dataset Reference Object

This object defines how a Skill declares a dependency on a dataset that must be mapped to it at runtime.

Fixed fields
Field Name Type Description
name string REQUIRED. The name of the reference.
title string OPTIONAL. A human friendly display name for the reference.
description string OPTIONAL. A description or documentation of the reference.
parameters [Parameter Object | Reference Object] OPTIONAL. The fields expected in the dataset.
requiresWrite boolean OPTIONAL. Set to true to indicate that the mapped dataset must support writes.

Dataset

This is the root document object of a OCS document that contains a Dataset definition.

Dataset Object Example

An exmaple Dataset is shown below:

camel: 1.0.0
name: default/movie_info
title: Movie Information
description: Contains basic information about thousands of movies.
fields:
  - name: movieId
    type: integer
  - name: movieTitle
    type: string
  - name: releaseDate
    type: string
  - name: imdbUrl
    type: string
    format: url
  - name: category
    type: string
connections:
  default:
    name: cortex/content
    type: managedContent
    query:
      - name: key
        value: movielens/ML100K-Movies.csv
      - name: contentType
        value: CSV
      - name: csv/delimiter
        value: '|'
  environments: 
    - environment: PROD
      name: example/moviesdb
      type: postgresql
      query:
        - name: query
          value: "SELECT * FROM movies"
Fixed Fields
Field Name Type Description
camel string REQUIRED. See OCS Specification Version.
name string REQUIRED. See Resource Name.
title string REQUIRED. See Resource Title.
description string OPTIONAL. See Resource Description.
tags [Tag Object] OPTIONAL. See Resource Tags.
fields [Parameters Object] REQUIRED. Defines the fields included in this dataset. The dataset MAY include additional fields not defined here.
connections Connection Configuration Object REQUIRED. Data store connection configuration for this dataset.
Connection Configuration Object

This object defines the data source connection configuration for a Dataset. A Dataset MUST have at least one connection per deployed environment or MUST provide a default connection configuration that is used in environments where no environment specific connection is defined.

Fixed fields
Field Name Type Description
default Connection Reference Object OPTIONAL. The default connection to use in environments where no environment specific connection configuration exists.
environments [Connnection Reference Object] OPTIONAL. A list of environment specific connections to use for this dataset. Each object in this list MUST use the environment property to qualify which environment it applies to. If more than one connection for an environment is defined, only the first connection in the list will be used.
Connection Reference Object

This object defines the reference to a data source connection for a specific environment. It includes information about the type of connection (e.g. mongo, s3, etc.) as well as the query parameter configuration.

Fixed fields
Field Name Type Description
name string REQUIRED. The fully qualified name of the connection to use.
type string REQUIRED. The name of the connection type.
query [Property Value]] OPTIONAL. Defines the default connection query parameters to use when reading from this connection. The definitions of these properties is provided by connection type object.
environment string OPTIONAL.

Message

This object defines the message format used for skill-to-skill and agent-to-skill communication.

Fixed fields
Field Name Type Description
payload Payload Object REQUIRED.
parameters [Parameter Object] OPTIONAL.
Payload Object

This object is used in Messages to carry a payload sent for processing. Payloads have four different styles demonstrated in examples below:

  1. Inline Objects
  2. Inline Records
  3. Inline DataFrame
  4. DataSet Reference (e.g. pass-by-reference)
Examples
Payload with Inline Object
{
  "payload": {"text": "Hello World"}
}
Payload with Inline Records
{
  "payload": 
    {
      "records": [
        {},
        {},
        {}
      ]
    }
}
Payload with Inline DataFrame
{
  "payload": 
    {
      "columns": []
      "values": []
    }
}
Payload with DataSet Reference
{
  "payload": {"$ref": "default/MyDataSet"}
}
Fixed fields
Field Name Type Description
records [any] OPTIONAL. An array of JSON objects representing dataset records. The fields of the these records MUST match the fields defined in the dataset.
columns [string] OPTIONAL. The column names that match the records containing in the values array. The column names MUST at least match the field names defined by this dataset.
values [any] OPTIONAL. A two dimensional array representing payload records and their values.
$ref Reference Object OPTIONAL. if present, MUST refer to a Dataset using its fully qualified name.