Skip to content

Data model

Václav Bartoš edited this page Oct 27, 2019 · 7 revisions

Organization of the database

All data about entities are stored in MongoDB. There is a collection for each entity type, containing a record (a JSON document) for each entity.

There are currently these collections/entity types:

collection description data type of _id keys
ip IPv4 address int (automatically converted to dotted decimal in Python code)
bgppref BGP prefix string (format a.b.c.d/x)
asn Autonomous system int
ipblock Allocated/assigned IP block (from whois database) string (format: a.b.c.d - e.f.g.h)
org Organization (from whois database) string (format: rir:netname)

Records and attributes

Each record is stored as a dictionary (object), its keys are called attributes.

Identification of the entity (e.g. IP address or AS number) is stored in a special key _id (used by MongoDB as primary key).

Each entity record contains at least the follwing attributes:

key name data type description
_id string/number ID of the entity, i.e. IP address, AS number, etc.
ts_added ISODate (datetime in Python) Time of record creation
ts_last_update ISODate (datetime in Python) Time of last update of the record

Note: All times are always stored in UTC.

Attribute names may be hierarchical (using dot-notation), corresponding to nested dictionaries/objects.

In the documentation, attributes names are sometimes written prefixed with the entity type they are used for and a colon (:), e.g.: ip:geo.ctry or asn:descr.

Generic formats of attribute values

Note: Formats of particualr attributes are described in Attributes.

Many attributes have hierarchical or somehow complex values. The following is specification on common formats of data storage used in NERD:

Simple values (plain)

Single value (string/number/bool/null) directly under the main key or a fixed hierarchy of keys and subkeys (the hierarchy is used only to group related keys together).

<key>: <value>
<key>: {
    <subkey>: <value>,
    <subkey>: {
        <subkey>: <value>,
        <subkey>: <value>,
    }
}

The attribute name is then composed by joining the key and subkeys with a dot, e.g. <key>.<subkey>.<subkey>.

Example:

"hostname": null,
"geo": {
    "ctry": "CZ",
    "city": "Prague"
}

Values with confidence (conf, list+conf)

If a value needs a confidence to be assigned, it's stored as follows:

<key>: {"v": <value>, "c": <confidence>}

Confidence is a real number between 0.0 and 1.0 (1.0 means 100% confidence). Confidence is optional, if it's not present, 1.0 is assumed.

If more values of the attribute are possible, each with different confidence, an array may be used:

<key>: [
    {"v": <value1>, "c": <confidence_of_value1>},
    {"v": <value2>, "c": <confidence_of_value2>},
    ...
]

Each particular attribute should always use the same variant (i.e. with or without the array, labeled as list+conf or conf) for all entites.

The .v and .c fields are not considered part of the attribute name, this is composed only of the <key> (and possible subkeys).

Example: TBD

Mapping / list of objects (list-of-objects)

If a mapping of some dynamic keys to values is needed, it's stored as an array of objects, where one or more attributes of the object act as the key and the rest is/are the value(s). The names of both key and non-key attributes in subobjects are fixed and must be the same in all elements of the array. The array must contain at most one element with any given value of key attribute (or combined value of multiple key attributes).

<key>: [
  { <key-attr1>: <value>, <attr1>: <value>, <attr2>: <value> },
  { <key-attr1>: <value>, <attr1>: <value>, <attr2>: <value> },
  ...
]

Example:

"bl": [
  { "n": "spamhaus-pbl", "t": ISODate("2017-06-01T12:00"), "v": 1 },
  { "n": "blocklist_de-ssh", "t": ISODate("2017-06-01T12:00"), "v": 0 }
]

In documentation, the key's data type is marked as list-of-objects. Name(s) of key attribute(s) must be specified in the key's documentation. Attributes of subobjects are documented separately, labeled as <key>[].<attr>. See documentation of events at Attributes for an example.

Tags

TBD

Special formats

Some attributes, like ip:tags use their own special format. Formats of particualr attributes are described in Attributes.