-
Notifications
You must be signed in to change notification settings - Fork 9
Data model
All data about entities are stored in MongoDB. There is a collection for each entity type, containing a record (a JSON document) for each entity.
There are currently these collections/entity types:
collection | description | data type of _id keys |
---|---|---|
ip |
IPv4 address | int (automatically converted to dotted decimal in Python code) |
bgppref |
BGP prefix | string (format a.b.c.d/x )
|
asn |
Autonomous system | int |
ipblock |
Allocated/assigned IP block (from whois database) | string (format: a.b.c.d - e.f.g.h )
|
org |
Organization (from whois database) | string (format: rir:netname ) |
Each record is stored as a dictionary (object), its keys are called attributes.
Identification of the entity (e.g. IP address or AS number) is stored in a special key _id
(used by MongoDB as primary key).
Each entity record contains at least the follwing attributes:
key name | data type | description |
---|---|---|
_id |
string/number | ID of the entity, i.e. IP address, AS number, etc. |
ts_added |
ISODate (datetime in Python) | Time of record creation |
ts_last_update |
ISODate (datetime in Python) | Time of last update of the record |
Note: All times are always stored in UTC.
Attribute names may be hierarchical (using dot-notation), corresponding to nested dictionaries/objects.
In the documentation, attributes names are sometimes written prefixed with the entity type they are used for and a colon (:
), e.g.: ip:geo.ctry
or asn:descr
.
Note: Formats of particualr attributes are described in Attributes.
Many attributes have hierarchical or somehow complex values. The following is specification on common formats of data storage used in NERD:
Single value (string/number/bool/null) directly under the main key or a fixed hierarchy of keys and subkeys (the hierarchy is used only to group related keys together).
<key>: <value>
<key>: {
<subkey>: <value>,
<subkey>: {
<subkey>: <value>,
<subkey>: <value>,
}
}
The attribute name is then composed by joining the key and subkeys with a dot, e.g. <key>.<subkey>.<subkey>
.
Example:
"hostname": null,
"geo": {
"ctry": "CZ",
"city": "Prague"
}
If a value needs a confidence to be assigned, it's stored as follows:
<key>: {"v": <value>, "c": <confidence>}
Confidence is a real number between 0.0 and 1.0 (1.0 means 100% confidence). Confidence is optional, if it's not present, 1.0 is assumed.
If more values of the attribute are possible, each with different confidence, an array may be used:
<key>: [
{"v": <value1>, "c": <confidence_of_value1>},
{"v": <value2>, "c": <confidence_of_value2>},
...
]
Each particular attribute should always use the same variant (i.e. with or without the array, labeled as list+conf
or conf
) for all entites.
The .v
and .c
fields are not considered part of the attribute name, this is composed only of the <key>
(and possible subkeys).
Example: TBD
If a mapping of some dynamic keys to values is needed, it's stored as an array of objects, where one or more attributes of the object act as the key and the rest is/are the value(s). The names of both key and non-key attributes in subobjects are fixed and must be the same in all elements of the array. The array must contain at most one element with any given value of key attribute (or combined value of multiple key attributes).
<key>: [
{ <key-attr1>: <value>, <attr1>: <value>, <attr2>: <value> },
{ <key-attr1>: <value>, <attr1>: <value>, <attr2>: <value> },
...
]
Example:
"bl": [
{ "n": "spamhaus-pbl", "t": ISODate("2017-06-01T12:00"), "v": 1 },
{ "n": "blocklist_de-ssh", "t": ISODate("2017-06-01T12:00"), "v": 0 }
]
In documentation, the key
's data type is marked as list-of-objects
. Name(s) of key attribute(s) must be specified in the key
's documentation. Attributes of subobjects are documented separately, labeled as <key>[].<attr>
. See documentation of events
at Attributes for an example.
TBD
Some attributes, like ip:tags
use their own special format. Formats of particualr attributes are described in Attributes.