NOTE: The app is designed to be containerised and run through a flask server using the REST API.
You can run the parsers against a defined set of rules and messages in stand
allone mode with the command ./parse.py rules logs. To get formated json
output run ./parse.py rules logs | jq. Please note that this requires jq to
be correctly installed in your system.
This will currently generate the output
{
"result-0": {
"rule": "bf1d64ad-9694-4317-b7a6-55e9a4915437",
"pattern": [
"46975537-6a3c-444a-9784-ea3c1d7e25d3",
"451a6827-da96-466a-aa97-d73b5605a13f"
],
"tokens": {
"name": "Barry Robinson",
"job": "Lead Cyber Engineer",
"chalange": "ok"
}
},
"result-1": {
"rule": "bf1d64ad-9694-4317-b7a6-55e9a4915437",
"pattern": [
"0b1db7a5-0308-4bfb-87e3-b2a48cee6b88",
"f28a4fcc-32dd-4c4b-afd6-4aca3e4f5537",
"ed395291-65d2-492c-afb8-d1b64599263c"
],
"tokens": {
"latitude": 52.4862,
"longetude": 1.8904,
"name": "Barry Robinson",
"ocupation": "Lead Cyber Engineer",
"expectation": "chalanging",
"tag": "v1.0"
}
}
}Currently, MongDB is used as a backend to drive the event engine through three
datastores. THis is not yet fully realised, but does have working features.
messages: Received messages in their raw stateupdates: Messages that have had data normalisation rules applied to themstatus: active status determined by event rules
Note: Basic MongoDB query language test can now be accessed through the rest API
Unit tests can be run with the command ./test_parsers.py
The file server.py creates a very basic REST API that can be accessed on
127.0.0.1:5000/parser using POST. With trhe server running, the command
curl -X POST -d 'message=name=Barry Robinson,job=Lead Cyber Engineer,expectation=Chalanging work,freeform=latitude 52.4862 longetude 1.8904' http://127.0.0.1:5000/parse | jq will produce the output
{
"rule": "bf1d64ad-9694-4317-b7a6-55e9a4915437",
"pattern": [
"0b1db7a5-0308-4bfb-87e3-b2a48cee6b88",
"f28a4fcc-32dd-4c4b-afd6-4aca3e4f5537",
"ed395291-65d2-492c-afb8-d1b64599263c"
],
"tokens": {
"latitude": 52.4862,
"longetude": 1.8904,
"name": "Barry Robinson",
"ocupation": "Lead Cyber Engineer",
"expectation": "Chalanging work"
}
}The query curl -X POST -d 'query=using messages query1 {"client_id": 3} aggregate ;' http://127.0.0.1:5000/query | jq will result in the output
{
"query1": {
"metadata": {
".rule.b489a151-6e84-43ce-86d2-40e21791b26b": 3,
".pattern.19187f7a-e575-4729-a307-f7e050205bc6": 3,
".tokens.date.Aug 8 11:26:11": 1,
".tokens.machine.DESKTOP-TJR7EI0": 3,
".tokens.component.kernel driver": 1,
".tokens.action.ileagal access request": 1,
".tokens.file./opt/dev/device": 1,
".client_id.3": 3,
".tokens.date.Aug 8 12:31:25": 1,
".tokens.notification.ilegal loggin attempt": 1,
".tokens.port.223": 1,
".tokens.src_ip_addr.172.16.0.12": 1,
".tokens.target_ip_addr.192.168.20.31": 1,
".tokens.user.vn_21": 1,
".tokens.status.login attempt failed": 1,
".tokens.date.Aug 8 12:25:11": 1,
".tokens.component.user management": 1,
".tokens.access.access granted": 1,
".tokens.src_ip_addr.172.17.0.12": 1
},
".rule": "b489a151-6e84-43ce-86d2-40e21791b26b",
".pattern": "19187f7a-e575-4729-a307-f7e050205bc6",
".tokens.date": [
"Aug 8 11:26:11",
"Aug 8 12:31:25",
"Aug 8 12:25:11"
],
".tokens.machine": "DESKTOP-TJR7EI0",
".tokens.component": [
"kernel driver",
"user management"
],
".tokens.action": "ileagal access request",
".tokens.file": "/opt/dev/device",
".client_id": 3,
".tokens.notification": "ilegal loggin attempt",
".tokens.port": 223,
".tokens.src_ip_addr": [
"172.16.0.12",
"172.17.0.12"
],
".tokens.target_ip_addr": "192.168.20.31",
".tokens.user": "vn_21",
".tokens.status": "login attempt failed",
".tokens.access": "access granted"
}
}Running the query curl -X POST -d 'query=using messages query1 {"client_id":3,"tokens.action":{"$regex": "ileagal.*"}} ;' http://127.0.0.1:5000/query | jq will return
{
"query1": {
"rule": "b489a151-6e84-43ce-86d2-40e21791b26b",
"pattern": "19187f7a-e575-4729-a307-f7e050205bc6",
"tokens": {
"date": "Aug 8 11:26:11",
"machine": "DESKTOP-TJR7EI0",
"component": "kernel driver",
"action": "ileagal access request",
"file": "/opt/dev/device"
},
"client_id": 3
}
}The format of a query looks like this
using <collection> <query name> <query expression> constrain <constraint expression> aggregate ;
using <collection> <query name> <query expression> constrain <constraint expression> ;
using <collection> <query name> <query expression> aggregate ;
using <collection> <query name> <query expression> ;
- A query happens on a collection in the mongodb database. The availible
collections are
messagesupdatesstatus
- a query has a name, such as
query_1which (when this is done) can be used in seperate queries - The format of the
query expressionsis ajsonstatement to search the collection for{"client_id": 3}will find all entries in the collection with the a value forclient_idof3{\"tokens.action\":{\"$regex\": \"ileagal.*\"}}will find all entries where the pathtokens.actionsatisfies theregex
- a constraint statement is a json statement to tell the query which fields
sgould be retruned. The default is all
{"client_id":1, "rule":1}would display onlyclient_idandrule.{"client_id":0}would not displayclient_id, but will display all other fields.- Adding any field with
0excludes it from the results - adding any field with
1includes it in the results and excludes all other fields
- Adding any field with
The docker files server.dockerfile and docker-compose.yml are supplied for
containerisation of the flask server. To buid and run the server
$ docker-compose build
$ docker-compose up You should see
$ docker-compose up
Recreating server ... done
Attaching to server
server | * Serving Flask app 'server'
server | * Debug mode: off
server | WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
server | * Running on all addresses (0.0.0.0)
server | * Running on http://127.0.0.1:5000
server | * Running on http://172.20.0.2:5000
server | Press CTRL+C to quitYou should now be able to access the server with curl -X POST -d 'message=name=Barry Robinson,job=Lead Cyber Engineer,expectation=Chalanging work,freeform=latitude 52.4862 longetude 1.8904' http://127.0.0.1:5000/parse | jq. If you wish to add new rules to the containerised server you will need to
update or add to the existing YAML rules in rules (which is where the server
defaults to), then rebuild the container with docker-compose build.
The parser breaks down a parsing problem into recognition and extraction phases,
allowing one pattern to delegate ectraction of tokens to another. Put simply, a
string that contains something like name=Barry Robinson,status={"company":"Northrup Grumman","job":"Lead Cyber Engineer","opinion":"favourable"} might be broken into 2 sepperat fragments
based on format.
name=Barry Robinsonwhich iskv{"company":"Northrup Grumman","job":"Lead Cyber Engineer","opinion":"favourable"}which isjson
This would happen by having a root pattern recognise the initial kv string
format, and route it to a kv parser. If the status={...} is the unique part,
a simple regex can be created to identify this message and route it to a
specific kv pattern that will extract the fragments and eith map them to
tokens or route them to a new pattern for reparsing.
The framework does this by ingesting rules that are a set of declarations and
patterns that it can then execut against a message string.
The framework can be executed by running ./framework.py <rules dir> <message>
where
rules dir: A directory containing YAML rule files
message: A message the the rules can decode
An example rule file is presented beow
- id: bf1d64ad-9694-4317-b7a6-55e9a4915437
name: test rule
patterns:
- id: f28a4fcc-32dd-4c4b-afd6-4aca3e4f5537
name: aws json
type: regex
partition: root
pattern: '^aws: (?P<json>{.*})'
triggers:
- name: json
format: regex
partition: aws regex
- id: eb4963b9-3fa5-4338-8a40-01a35fecc782
name: aws regex
type: regex
partition: aws regex
pattern: '^{"name":"(?P<name>[\w ]+)","satisfaction":"(?P<satisfaction>[\w ]+)"}'
map:
name: name
satisfaction: valueWith this rule the string 'aws: {"name":"Barry Robinson","value":"high"}' is
first matched by trhe pattern f28a4fcc-32dd-4c4b-afd6-4aca3e4f5537 whose
trigger is setup to forward the text extracted by the json capture group to
a regex partition called aws regex.
Pattern eb4963b9-3fa5-4338-8a40-01a35fecc782 recives the text and parses
name and value, which, due to the map statement, are maped to the tokens
name and value
The above rule, for the message 'aws: {"name":"Barry Robinson","satisfaction":"high"}' yields the output {"rule": "bf1d64ad-9694-4317-b7a6-55e9a4915437", "pattern": ["f28a4fcc-32dd-4c4b-afd6-4aca3e4f5537", "eb4963b9-3fa5-4338-8a40-01a35fecc782"], "tokens": {"name": "Barry Robinson", "value": "high"}}
With jq formating the command ./framework.py resources/framework_two/ 'aws: {"name":"Barry Robinson","satisfaction":"high"}' | jq will format to
{
"rule": "bf1d64ad-9694-4317-b7a6-55e9a4915437",
"pattern": [
"f28a4fcc-32dd-4c4b-afd6-4aca3e4f5537",
"eb4963b9-3fa5-4338-8a40-01a35fecc782"
],
"tokens": {
"name": "Barry Robinson",
"value": "high"
}
}- Where a
pathis expressed in astructured pattern(JSON or KV) as part ofpatternthe rule is telling the parser to ONLY match if those paths are present with the values supplied- A value of
.name:meansmatch if the parth is pressent - A value of
.name: Valuemeans match only if the path exists and has the supplied value.
- A value of
- Where a map directive is supplied, i.e.
.path: labelthe value at.pathwill be mapped to thelabelsupplied if the path exists in the parsed message. Otherwise it will be ignored. It's absence will not prevent the message from matching a rule if a suitable rules exists. - If any sub pattern fails, the entire chain of patterns fails.
- The parser will try to find another match.
- If no new match can be found, the parsing action will fail.
For more details about the parser please checkout the parser design document.
The general idea is that a partition should not constain a large number of
patterns. The basic architecture for recognising a message means that a top
level root pattern should do the main work of recognising the message type,
then forward the reevant fragments to partitions with some patterns in them to
extract values. The more patterns in a paratition, the less efficient the parser
becomes because it has to test more patterns. It's that simple.
In genreal a rule should be considered a top level grouping of similar patterns
that deals with a single message type. In the world of SIEM and log messages,
that might be like AWS ReddShift logs messages.
- The current implementation is undoubtedly not as efficient as it could be. Tjis was built as a learning excersize to understand ``python3`.
- The ability to express an
SIEMeventby using conditional logic. This would need to refer to token valules to create new values. See Token based extensible message parser - CheckPoint.yaml
- Pre defined tokens with types instead of auto formating
- Implement
containslogicjsonpatterns i.e..development.languages[] contains c++for thejson{"development":{"languages":["java","c","python","c++"]}} - The server should be able to handle bach requests for parsing to be usefull.
- There better diagnostic tools through exceptions.
- There should be proper loging.
- The parser could be made significantly more efficient by using
Hyperscanfor Python to do regex matching andRE2for extraction. - Structured patterns are grossly inificient.
- Message: Some structured or unstructured text needing tokanisation
- Structured pattern: A pattern for a structured format such as JSON or KV expressed using JQ Paths.
- Partition: A named segmentation of parsing search space. Within the
current architecture
partitionsare attached to specific formats such that a partition resolves toname:format, i.e.basic kv:kv. - Rules flie: A rule, written in
YAML, that groups togetherpatterns, along with their respectivemapandtriggerdirectives to program thethe frameworkof the parser. - Framework: The code responsible for marsheling
parsing enginesto acomplish the classification and extraction oftokenvalues from some string. - Token: A named tage for a value expressed in a pattern's
mapcriteria. - Trigger: A directive that programs the parser to resubmit a fragment for
further parsing by the specified
engine, using the specifiedpartition. - Engine: A grouping of
patternsattached to a specificformatandpartition, along with the functionality to extractfragmentsfrom that format and direct the parser to perform the correctactions, i.e. tomaptotokensortriggerfurther parsing. - SIEM: Security Information Event Management