[tl;dr: Jump to the example](#STIX-C2-Indicator-Example)
> # Preamble: Prototyping Environment
This document is a Jupyter Notebook. First, we load some modules and create some utility classes...

In [322]:
%reload_ext yamlmagic
from IPython.display import display, Markdown
from pyld import jsonld

from rdflib import Graph
from RDFClosure import convert_graph, Options, DeductiveClosure
from RDFClosure.CombinedClosure import RDFS_Semantics
import json
from datetime import datetime

import jsonschema

class ISODateEncoder(json.JSONEncoder):
    # JSON encoder that serializes datetimes to ISO 8601 strings
    def default(self, obj, *args, **kwargs):
        if isinstance(obj, datetime):
            return obj.isoformat()
        return json.JSONEncoder.default(self, obj, *args, **kwargs)

def graph_metrics(graph, label):
    display(Markdown("{}, the graph contains {} triples about {} subjects and {} objects using {} predicates.".format(
        label, len(graph), len(set(graph.subjects())), len(set(graph.objects())), len(set(graph.predicates()))
    )))

> ...and customize our environment for displaying large text...

In [289]:
%%html
<style>#notebook-container { width: 97vw; }</style>

# STIX _C2 Indicator_ Example
The STIX [C2 Indicator](http://stixproject.github.io/documentation/idioms/c2-indicator/) describes the data of a single event of interest in the reporting of a cyber incident:
![](http://stixproject.github.io/documentation/idioms/c2-indicator/diagram.png)

# Goal

Instead of a rigid, _document_-based approach to modeling the incident, we propose a graph-based approach using JSON-LD, RDF and RDF Schema.

First, let's look at the canonical XML.

In [290]:
with open("indicator-for-c2-ip-address.xml") as f:
    canonical = f.read()
    display(Markdown("```xml\n{}\n```".format(canonical)))

```xml
<stix:STIX_Package
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:stix="http://stix.mitre.org/stix-1"
    xmlns:stixCommon="http://stix.mitre.org/common-1"
    xmlns:indicator="http://stix.mitre.org/Indicator-2"
    xmlns:ttp="http://stix.mitre.org/TTP-1"
    xmlns:cybox="http://cybox.mitre.org/cybox-2"
    xmlns:AddressObject="http://cybox.mitre.org/objects#AddressObject-2"
    xmlns:stixVocabs="http://stix.mitre.org/default_vocabularies-1"
    xmlns:example="http://example.com/"
    xsi:schemaLocation="
    http://stix.mitre.org/stix-1 http://stix.mitre.org/XMLSchema/core/1.2/stix_core.xsd
    http://stix.mitre.org/Indicator-2 http://stix.mitre.org/XMLSchema/indicator/2.2/indicator.xsd
    http://stix.mitre.org/TTP-2 http://stix.mitre.org/XMLSchema/ttp/1.2/ttp.xsd
    http://stix.mitre.org/default_vocabularies-1 http://stix.mitre.org/XMLSchema/default_vocabularies/1.2.0/stix_default_vocabularies.xsd
    http://cybox.mitre.org/objects#AddressObject-2 http://cybox.mitre.org/XMLSchema/objects/Address/2.1/Address_Object.xsd"
    id="example:STIXPackage-33fe3b22-0201-47cf-85d0-97c02164528d"

    version="1.2"
    >
    <stix:Indicators>
        <stix:Indicator xsi:type="indicator:IndicatorType" id="example:Indicator-33fe3b22-0201-47cf-85d0-97c02164528d" timestamp="2014-05-08T09:00:00.000000Z">
            <indicator:Title>IP Address for known C2 channel</indicator:Title>
            <indicator:Type xsi:type="stixVocabs:IndicatorTypeVocab-1.1">IP Watchlist</indicator:Type>
            <indicator:Observable  id="example:Observable-1c798262-a4cd-434d-a958-884d6980c459">
                <cybox:Object id="example:Object-1980ce43-8e03-490b-863a-ea404d12242e">
                    <cybox:Properties xsi:type="AddressObject:AddressObjectType" category="ipv4-addr">
                        <AddressObject:Address_Value condition="Equals">10.0.0.0</AddressObject:Address_Value>
                    </cybox:Properties>
                </cybox:Object>
            </indicator:Observable>
            <indicator:Indicated_TTP>
                <stixCommon:TTP idref="example:TTP-bc66360d-a7d1-4d8c-ad1a-ea3a13d62da9" />
            </indicator:Indicated_TTP>
        </stix:Indicator>
    </stix:Indicators>
    <stix:TTPs>
        <stix:TTP xsi:type="ttp:TTPType" id="example:TTP-bc66360d-a7d1-4d8c-ad1a-ea3a13d62da9" timestamp="2014-05-08T09:00:00.000000Z">
            <ttp:Title>C2 Behavior</ttp:Title>
        </stix:TTP>
    </stix:TTPs>
</stix:STIX_Package>

```

# First approach: Modeling the XML
As an initial approach, let's just try to make a document that is as close as possible to the XML.

The JSON-LD context (here expressed as YAML for readability) fulfills much of the same role as the `xmlns:` parts of the XML declaration.

- Where an un-prefixed key is used, a suitable W3C or comparable spec is used.
- All type-like things are conflated into `@type`, a shorthand for `rdf:type`.
- All identity-like things are conflated into `@id`, or the URI of a node.
- Since RDF has no general concept of "containment", each such relationship is captured as a "pun," a lowercase version of the contained element: if 
  > an element `X` contains an element `Y`
  
  we say 
  
  > a node of type `X` `y`'s a node of type `Y`

In [291]:
%%yaml context
"@context":
    xsi: http://www.w3.org/2001/XMLSchema-instance#
    stx: http://stix.mitre.org/
    stix: stx:stix-1#
    stixCommon: stx:common-1#
    indicator: stx:Indicator-2#
    ttp: stx:TTP-1#
    cbx: http://cybox.mitre.org/
    cybox: cbx:cybox-2#
    obj: cbx:objects#
    AddressObject: obj:AddressObject-2#
    stixVocabs: stx:default_vocabularies-1#
    example: http://example.com/
    id: "@id"
    prov: http://www.w3.org/ns/prov#
    owl: http://www.w3.org/2002/07/owl#
    rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
    rdfs: http://www.w3.org/2000/01/rdf-schema#
    xsd: http://www.w3.org/2001/XMLSchema#
    title: rdfs:label
    skos: http://www.w3.org/2004/02/skos/core#
    prov:wasRevisionOf:
        "@type": "@id"
    category:
        "@id": skos:related
        "@type": "@id"
    type:
        "@id": "@type"
        "@type": "@id"
    timestamp:
        "@id": prov:generatedAtTime
        "@type": xsd:dateTime

<IPython.core.display.Javascript object>

With that out of the way, we can now use the namespace prefixes to speak precisely about the types of nodes.

In [292]:
%%yaml doc
id: example:STIXPackage-33fe3b22-0201-47cf-85d0-97c02164528d#1.2
prov:wasRevisionOf: example:STIXPackage-33fe3b22-0201-47cf-85d0-97c02164528d#1.1
type: stix:STIX_Package
stix:indicators:
    type: stix:Indicators
    stix:indicator:
        - type:
            - stix:Indicator
            - stixVocabs:IndicatorTypeVocab-1.1#IPWatchlist
          id: example:Indicator-33fe3b22-0201-47cf-85d0-97c02164528d
          timestamp: 2014-05-08T09:00:00.000000Z
          title: IP Address for known C2 channel
          indicator:observable:
              type: indicator:Observable
              id: example:Observable-1c798262-a4cd-434d-a958-884d6980c459
              cybox:object:
                  type: cybox:Object
                  id: example:Object-1980ce43-8e03-490b-863a-ea404d12242e
                  cybox:properties:
                     type: cybox:Properties xsi:type="AddressObject:AddressObjectType" category="ipv4-addr">
                     category: ipv4-addr
                     AddressObject:address_value:
                         type: AddressObject:Address_Value
                         condition:equals: 10.0.0.0
          indicator:indicated_ttp:
              type: indicator:Indicated_TTP
              stixCommon:ttp: example:TTP-bc66360d-a7d1-4d8c-ad1a-ea3a13d62da9
stix:ttps:
    type: stix:TTPS
    stix:ttp:
        type: stix:TTP
        id: example:TTP-bc66360d-a7d1-4d8c-ad1a-ea3a13d62da9
        timestamp: 2014-05-08T09:00:00.000000Z
        title: C2 Behavior

<IPython.core.display.Javascript object>

One of the core features of JSON-LD, the `expand` algorithm all implementations _must_ provide will replace all of the URI prefixes with fully-qualified domains.

In [293]:
jsonld.expand(doc, dict(expandContext=context))

[{'@id': 'http://example.com/STIXPackage-33fe3b22-0201-47cf-85d0-97c02164528d#1.2',
  '@type': ['http://stix.mitre.org/stix-1#STIX_Package'],
  'http://stix.mitre.org/stix-1#indicators': [{'@type': ['http://stix.mitre.org/stix-1#Indicators'],
    'http://stix.mitre.org/stix-1#indicator': [{'@id': 'http://example.com/Indicator-33fe3b22-0201-47cf-85d0-97c02164528d',
      '@type': ['http://stix.mitre.org/stix-1#Indicator',
       'http://stix.mitre.org/default_vocabularies-1#IndicatorTypeVocab-1.1#IPWatchlist'],
      'http://stix.mitre.org/Indicator-2#indicated_ttp': [{'@type': ['http://stix.mitre.org/Indicator-2#Indicated_TTP'],
        'http://stix.mitre.org/common-1#ttp': [{'@value': 'example:TTP-bc66360d-a7d1-4d8c-ad1a-ea3a13d62da9'}]}],
      'http://stix.mitre.org/Indicator-2#observable': [{'@id': 'http://example.com/Observable-1c798262-a4cd-434d-a958-884d6980c459',
        '@type': ['http://stix.mitre.org/Indicator-2#Observable'],
        'http://cybox.mitre.org/cybox-2#object': 

The `compact` algorithm will take a document and a context and repackage it.

In [294]:
jsonld.compact(doc, context, dict(expandContext=context))

{'@context': {'AddressObject': 'obj:AddressObject-2#',
  'category': {'@id': 'skos:related', '@type': '@id'},
  'cbx': 'http://cybox.mitre.org/',
  'cybox': 'cbx:cybox-2#',
  'example': 'http://example.com/',
  'id': '@id',
  'indicator': 'stx:Indicator-2#',
  'obj': 'cbx:objects#',
  'owl': 'http://www.w3.org/2002/07/owl#',
  'prov': 'http://www.w3.org/ns/prov#',
  'prov:wasRevisionOf': {'@type': '@id'},
  'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
  'rdfs': 'http://www.w3.org/2000/01/rdf-schema#',
  'skos': 'http://www.w3.org/2004/02/skos/core#',
  'stix': 'stx:stix-1#',
  'stixCommon': 'stx:common-1#',
  'stixVocabs': 'stx:default_vocabularies-1#',
  'stx': 'http://stix.mitre.org/',
  'timestamp': {'@id': 'prov:generatedAtTime', '@type': 'xsd:dateTime'},
  'title': 'rdfs:label',
  'ttp': 'stx:TTP-1#',
  'type': {'@id': '@type', '@type': '@id'},
  'xsd': 'http://www.w3.org/2001/XMLSchema#',
  'xsi': 'http://www.w3.org/2001/XMLSchema-instance#'},
 'id': 'example:STIXPackage

# Second Approach: RDFS Inference

Assuming the `rdfs:range` and `rdfs:domain` inference capabilities, and definition of the lower-case "puns" from above (such that, _if `X` `y`'s `Z`, then `Z` is a `Y`_), many `type` definitions can be omitted without any loss of information. Here's what the context for defining some RDFS inference would be like:

In [295]:
%%yaml doc_with_rdfs_inference
id: example:STIXPackage-33fe3b22-0201-47cf-85d0-97c02164528d#1.2
prov:wasRevisionOf: example:STIXPackage-33fe3b22-0201-47cf-85d0-97c02164528d#1.1
type: stix:STIX_Package
stix:indicators:
    stix:indicator:
        type: stixVocabs:IndicatorTypeVocab-1.1#IPWatchlist
        id: example:Indicator-33fe3b22-0201-47cf-85d0-97c02164528d
        timestamp: 2014-05-08T09:00:00.000000Z
        title: IP Address for known C2 channel
        indicator:observable:
            id: example:Observable-1c798262-a4cd-434d-a958-884d6980c459
            cybox:object:
                id: example:Object-1980ce43-8e03-490b-863a-ea404d12242e
                cybox:properties:
                    category: ipv4-addr
                    AddressObject:address_value:
                        condition:equals: 10.0.0.0
        indicator:indicated_ttp:
            id: example:TTP-bc66360d-a7d1-4d8c-ad1a-ea3a13d62da9
stix:ttps:
    stix:ttp:
        id: example:TTP-bc66360d-a7d1-4d8c-ad1a-ea3a13d62da9
        timestamp: 2014-05-08T09:00:00.000000Z
        title: C2 Behavior

<IPython.core.display.Javascript object>

This removes a good deal of boilerplate: indeed, we are approaching the spareness of the original diagram. Using the explicit structure of the document requires a fair amount of work.

In [296]:
%%yaml rdfs_puns_context
"@context":
    domain:
        "@id": rdfs:domain
        "@type": "@id"
    range:
        "@id": rdfs:range
        "@type": "@id"
    defines:
        "@reverse": rdfs:isDefinedBy

<IPython.core.display.Javascript object>

We'll reuse the original context, so that documents can be defined in terms of both.

In [297]:
combined_context = {"@context": [
    context["@context"],
    rdfs_puns_context["@context"]
]}

Here are the actual puns, stored in an OWL Ontology.

In [298]:
%%yaml rdfs_puns
id: stix:stix-rdfs-puns
type: owl:Ontology
defines:
    - id: stix:indicators
      range: stix:Indicators
    - id: stix:indicator
      range: stix:Indicator
    - id: indicator:observable
      range: indicator:Observable
    - id: cybox:object
      range: cybox:Object 
    - id: cybox:properties
      range: cybox:Properties
    - id: indicator:indicated_ttp
      range: indicator:Indicated_TTP
    - id: stix:ttps
      range: stix:TTPS
    - id: stix:ttp
      range: stix:TTP
    - id: AddressObject:address_value
      range: AddressObject:Address_Value

<IPython.core.display.Javascript object>

We now have to move away from the JSON-LD linked data regime and into the metamodel layer of RDF Schema. First, we create a graph and populate it with our JSON-LD about the Indicator, as well as our rules for determing meaning from our puns. Note, this is significant departure from XML, where schema and content are seldom discussable in the same kinds of queries.

In [308]:
graph = Graph()
for key, to_parse in {"puns": rdfs_puns, "indicator": doc_with_rdfs_inference}.items():
    expanded = jsonld.expand(to_parse, dict(expandContext=combined_context))
    cleaned = json.dumps(expanded, cls=ISODateEncoder)
    graph.parse(data=cleaned, format="json-ld")
    graph_metrics(graph, "After adding {}".format(key))

After adding puns, the graph contains 19 triples about 10 subjects and 11 objects using 3 predicates.

After adding indicator, the graph contains 37 triples about 19 subjects and 27 objects using 17 predicates.

Now the real magic happens. By applying deductive inference, we expand the graph to include all possible inferrable values.

In [309]:
DeductiveClosure(RDFS_Semantics).expand(graph)
graph_metrics(graph, "After adding inference")

After adding inference, the graph contains 116 triples about 43 subjects and 47 objects using 18 predicates.

Now the graph should contain all of the inferred types. We use the simple slicing notation of rdflib to verify our inferred triples about the `Indicator`.

In [318]:
from rdflib.term import URIRef as uri
sorted(list(graph[uri("http://example.com/Indicator-33fe3b22-0201-47cf-85d0-97c02164528d")::]))

[(rdflib.term.URIRef('http://stix.mitre.org/Indicator-2#indicated_ttp'),
  rdflib.term.URIRef('http://example.com/TTP-bc66360d-a7d1-4d8c-ad1a-ea3a13d62da9')),
 (rdflib.term.URIRef('http://stix.mitre.org/Indicator-2#observable'),
  rdflib.term.URIRef('http://example.com/Observable-1c798262-a4cd-434d-a958-884d6980c459')),
 (rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
  rdflib.term.URIRef('http://stix.mitre.org/default_vocabularies-1#IndicatorTypeVocab-1.1#IPWatchlist')),
 (rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
  rdflib.term.URIRef('http://stix.mitre.org/stix-1#Indicator')),
 (rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
  rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#Resource')),
 (rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label'),
  rdflib.term.Literal('IP Address for known C2 channel')),
 (rdflib.term.URIRef('http://www.w3.org/ns/prov#generatedAtTime'),
  rdflib.term.Literal(

# A Concrete Syntax
Combining JSON-LD with [JSON Schema](http://json-schema.org/), one can describe a new canonical format that can serve as the basis for API representations, storage and publishing without sacrificing expressive power or tersness, while not _requiring_ a user of the data to use or understand either schema or context. In this setting, we'd like to remove all idiosyncratic references to namespaces, etc. and be left with something that a user-focused RESTful API developer might create. This might require re-mapping some earlier constructs, as `indicator` might be a nice key, but can't also be a namespace.

In [338]:
%%yaml canonical_doc
id: example:STIXPackage-33fe3b22-0201-47cf-85d0-97c02164528d#1.2
prov:wasRevisionOf: example:STIXPackage-33fe3b22-0201-47cf-85d0-97c02164528d#1.1
type: stix:STIX_Package
stix:indicators:
    stix:indicator:
        type: stixVocabs:IndicatorTypeVocab-1.1#IPWatchlist
        id: example:Indicator-33fe3b22-0201-47cf-85d0-97c02164528d
        timestamp: 2014-05-08T09:00:00.000000Z
        title: IP Address for known C2 channel
        indicator:observable:
            id: example:Observable-1c798262-a4cd-434d-a958-884d6980c459
            cybox:object:
                id: example:Object-1980ce43-8e03-490b-863a-ea404d12242e
                cybox:properties:
                    category: ipv4-addr
                    AddressObject:address_value:
                        condition:equals: 10.0.0.0
        indicator:indicated_ttp:
            id: example:TTP-bc66360d-a7d1-4d8c-ad1a-ea3a13d62da9
stix:ttps:
    stix:ttp:
        id: example:TTP-bc66360d-a7d1-4d8c-ad1a-ea3a13d62da9
        timestamp: 2014-05-08T09:00:00.000000Z
        title: C2 Behavior

<IPython.core.display.Javascript object>

Here is an incomplete JSON schema for a STIX package. 

In [366]:
%%yaml package_schema
$schema: http://json-schema.org/schema#

title: A STIX Package
type: object
required:
    - type
    - id
properties:
    stix:indicators:
        $ref: "#/definitions/indicators"
    stix:ttps:
        $ref: "#/definitions/ttps"
    id:
        $ref: "#/definitions/uri"
    type:
        enum:
            - stix:STIX_Package
definitions:
    uri:
        type: string # regex for URIs is outside of scope!
    indicators:
        type: object
        required:
            - stix:indicator
        properties:
            stix:indicator:
                $ref: "#/definitions/indicator"
    indicator:
        type: object
        required:
            - id
        properties:
            id:
                $ref: "#/definitions/uri"
            indicator:observable:
                $ref: "#/definitions/observable"
    observable:
        type: object
        required:
            - id
        properties:
            id:
                $ref: "#/definitions/uri"
    ttps:
        type: object
        required:
            - stix:ttp
        properties:
            stix:ttp: 
                $ref: "#/definitions/ttp"
    ttp:
        type: object
        required:
            - id
        properties:
            id:
                $ref: "#/definitions/uri"

<IPython.core.display.Javascript object>

The schema can already validate the well formed document above...

In [367]:
try:
    jsonschema.validate(doc_with_rdfs_inference, package_schema)
    print("OK!")
except Exception as err:
    raise err

OK!


but will fail against a random object...

In [368]:
try:
    jsonschema.validate({"foo": "bar"}, package_schema)
    print("OK!")
except Exception as err:
    print(err.message)

'type' is a required property
