Install requried modules with `pip`.

```
python3 -m pip install --user --upgrade elasticsearch
```

In [1]:
from elasticsearch import Elasticsearch
import json

Connect to local instance

In [2]:
conn = Elasticsearch(hosts=["localhost:9200"])

Object creation does not verify that server is up. Validate it!

In [3]:
conn.ping()

True

Elastic does not have schema, but each field still needs to be mapped to concrete type. If no mapping exist, then it will autodetect. While it mostly does a good job, you sometimes still want to make sure that mappings are correct. Furthermore, elastic achieves fast full text search by splitting large text segments into *tokens*. However, this can cause all kinds of problems when building a dashboard on top of tokenized strings. Solution is to dual map, so there's also a non-tokenized version.

Not going too much into detail, many frontend tools rely on having a duplicate field for each string with `.keyword` suffix. If missing, many things may break. This is handled by defining this in a template.

Normally, that template would be managed by Logstash or Filebeat. This is fine and good, but understanding this concept is very important down the line. **At one point you will be searching for something, only to discover that your field does not have right mapping to achieve whatever you want to do.** Then you need to modify the template and re-index if needed.

In [4]:
DEFAULT_SETTINGS = {
    "index": {
        "number_of_shards": 3,
        "number_of_replicas": 0,
        "refresh_interval": "5s"
    }
}

DEFAULT_PROPERTIES = {
    "@timestamp": {
        "type": "date",
        "format": "strict_date_optional_time||epoch_millis||date_time"
    },
    "@version": {
        "type": "keyword"
    },
    "ip": {
        "type": "ip"
    }
}

DEFAULT_MAPPINGS = {
    "dynamic_templates": [
        {
            "message_field": {
                "path_match": "message",
                "mapping": {
                    "norms": False,
                    "type": "text"
                },
                "match_mapping_type": "string"
            }
        },
        {
            "string_fields": {
                "mapping": {
                    "norms": False,
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword"
                        }
                    }
                },
                "match_mapping_type": "string",
                "match": "*"
            }
        }
    ],
    "properties": DEFAULT_PROPERTIES
}

DEFAULT_PATTERNS = [
    "logstash-*",
    "events-*",
    "suricata-*"
]

In [5]:
resp = conn.indices.put_template("default", body={
    "order": 0,
    "version": 0,
    "index_patterns": DEFAULT_PATTERNS,
    "settings": DEFAULT_SETTINGS,
    "mappings": DEFAULT_MAPPINGS,
    "aliases": {}
})

Load EVE JSON file, parse it and build the bulk. **Modify file path to reflect your EVE location.**

In [6]:
PATH_MODIFY_ME="/tmp/suricata/eve.json"

In [7]:
# init empty bulk
BULK = []
# how many items to send at once
BULKSIZE = 10
# counter to ensure that we flush when bulk is full
COUNT = 0
# Open EVE
with open(PATH_MODIFY_ME, "r") as eve:
    for line in eve:
        # Parse EVE JSON
        data = json.loads(line)
        # add logstash-formatted timestamp key, needed by some apps
        data["@timestamp"] = data["timestamp"]
        # add more metadata to each event
        data["path"] = PATH_MODIFY_ME
        # append metadata, this tells elastic which index to send the message to
        BULK.append({
            "index": {
                # modify if you want to use different, note hardcoded date for simple example
                "_index": "suricata-2021.01.21"
            }
        })
        # add EVE message to bulk
        BULK.append(data)
        # check if bulk should be flushed
        if COUNT%BULKSIZE==0:
            # ship it!
            resp = conn.bulk(BULK)
            # give feedback, many tools are really bad at this step
            print("bulk flush done errors:", resp["errors"])
            # empty the bulk
            BULK = []
        # increment counter to decide if we should flush
        COUNT += 1
# flush the tail
resp = conn.bulk(BULK)
print("final bulk flush done errors:", resp["errors"])

bulk flush done errors: False
bulk flush done errors: False
bulk flush done errors: False
bulk flush done errors: False
bulk flush done errors: False
final bulk flush done errors: False


Then do `curl $IP:9200/suricata-*/_search` to verify that you have logs in elastic.

If you have many EVE folders, then `glob` module is your friend in finding them.

In [8]:
import glob

In [9]:
files = glob.glob("/tmp/testcases/*/eve.json")

In [10]:
files

['/tmp/testcases/00/eve.json', '/tmp/testcases/01/eve.json']