# Query for inconsistencies in MISP events

## Introduction

- UUID: **83e49ad8-6a8e-4317-b689-4154084dfe82**
- Started from [issue 22](https://github.com/MISP/misp-playbooks/issues/22)
- State: **Published** : demo version with **output**
- Purpose: This playbook checks for **inconsistencies** in the event **distribution**, the TLP designation and the PAP marking. The playbook also verifies if events contain sufficient **attributes**, objects, **tags** or galaxies. 
    - There are also checks for inconsistencies with the **workflow** tags, a taxonomy that is often used during *threat intelligence curation*.
    - The results are listed in the playbook and sent to Mattermost.
    - Note that MISP has also built-in checks encoded in [https://github.com/MISP/MISP/blob/2.4/app/Lib/EventWarning/DefaultWarning.php](https://github.com/MISP/MISP/blob/2.4/app/Lib/EventWarning/DefaultWarning.php)
- Tags: [ "distribution", "data protection", "curation", "inconsistencies", "qa", "quality", "audit"]
- External resources: **Mattermost**
- Target audience: **CTI**

# Playbook

- **Query for inconsistencies in MISP events**
    - Introduction
- **Preparation**
    - PR:1 Initialise environment
    - PR:2 Load helper functions
    - PR:3 Set helper variables
- **Event quality check**
    - RE:1 Review events for inconsistencies
    - RE:2 Summary of findings
    - RE:3 Details of our findings
- **Closure**
    - EN:1 Create the summary of the playbook 
    - EN:2 Send a summary to Mattermost
    - EN:3 End of the playbook 
- External references
- Technical details

# Preparation

## PR:1 Initialise environment

This section **initialises the playbook environment** and loads the required Python libraries. 

The credentials for MISP (**API key**) and other services are loaded from the file `keys.py` in the directory **vault**. A [PyMISP](https://github.com/MISP/PyMISP) object is created to interact with MISP and the active MISP server is displayed. By printing out the server name you know that it's possible to connect to MISP. In case of a problem PyMISP will indicate the error with `PyMISPError: Unable to connect to MISP`.

The contents of the `keys.py` file should contain at least :

```
misp_url="<MISP URL>"                  # The URL to our MISP server
misp_key="<MISP API KEY>"              # The MISP API key
misp_verifycert=<True or False>        # Ignore certificate errors
mattermost_playbook_user="<MATTERMOST USER>"
mattermost_hook="<MATTERMOST WEBHOOK>"
```

In [2]:
# Initialise Python environment
import urllib3
import sys
import json
from pyfaup.faup import Faup
from prettytable import PrettyTable, MARKDOWN
from IPython.display import Image, display, display_markdown, HTML
from datetime import date
import requests
import uuid
from uuid import uuid4
from pymisp import *
from pymisp.tools import GenericObjectGenerator
import re
import time
from datetime import datetime

# Load the credentials
sys.path.insert(0, "../vault/")
from keys import *
if misp_verifycert is False:
    import urllib3
    urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
print("The \033[92mPython libraries\033[90m are loaded and the \033[92mcredentials\033[90m are read from the keys file.")

# Create the PyMISP object
misp = PyMISP(misp_url, misp_key, misp_verifycert)
print("I will use the MISP server \033[92m{}\033[90m for this playbook.\n\n".format(misp_url))

The version of PyMISP recommended by the MISP instance (2.4.176) is newer than the one you're using now (2.4.173). Please upgrade PyMISP.


The [92mPython libraries[90m are loaded and the [92mcredentials[90m are read from the keys file.
I will use the MISP server [92mhttps://misp.demo.cudeso.be/[90m for this playbook.




## PR:2 Load helper functions

The next cell contains **helper functions** that are used in this playbook. 

Instead of distributing helper functions as separate Python files this playbook includes all the required code as one code cell. This makes portability of playbooks between instances easier. The downside is that functions defined in this playbook need to be defined again in other playbooks, which is not optimal for code re-use. For this iteration of playbooks it is chosen to include the code in the playbook (more portability), but you can easily create one "helper" file that contains all the helper code and then import that file in each playbook (for example by adding to the previous cell `from helpers import *`). Note that the graphical workflow image is included as an external image. A missing image would not influence the further progress of the playbook.

In [3]:
def add_event_alert(playbook_results, event_id, event_info, problem):
    '''
    Add an alert to the event alert list
    '''
    if event_id in playbook_results:
        if problem not in playbook_results[event_id]["alerts"]:
            playbook_results[event_id]["alerts"].append(problem)
    else:
        playbook_results[event_id] = {}
        playbook_results[event_id]["info"] = event_info
        playbook_results[event_id]["alerts"] = [problem]
    return playbook_results


def check_sufficient_in_event(event, event_key, playbook_key, do_check):
    '''
    Check if there are sufficient attributes, objects, galaxies, ...
    '''
    if do_check:
        if not len(event.get(event_key, [])) > playbook_checks[playbook_key]["qt"]:
            insufficient_count[playbook_key] += 1
            add_event_alert(playbook_results, event["id"], event["info"], playbook_checks[playbook_key]["alert"])


def check_workflow_tag(event, tag, do_check, do_check_local, tag_name, playbook_key, published_state):
    '''
    Checks for the workflow tag, primarily useful for threat intelligence curation
    '''
    if do_check and tag["name"] == tag_name:
        if do_check_local and tag["local"] == 0:
            insufficient_count["workflow_tag"] += 1
            add_event_alert(playbook_results, event["id"], event["info"], playbook_checks["workflow_tag_local"]["alert"])
        if event["published"] == published_state:
            insufficient_count["workflow_tag"] += 1
            add_event_alert(playbook_results, event["id"], event["info"], playbook_checks[playbook_key]["alert"])


def check_marking_distribution(event, tag, tlp_tag, playbook_key):
    '''
    Check if the TLP or PAP markings correspond with the event distribution settings
    '''
    distribution = int(event["distribution"])
    if tag["name"] == tlp_tag and distribution not in playbook_checks[playbook_key]["distribution"]:
        insufficient_count["distribution"] += 1
        add_event_alert(playbook_results, event["id"], event["info"], playbook_checks[playbook_key]["alert"].format(distribution_labels[distribution]))

## PR:3 Set helper variables

This cell contains **helper variables** that are used in this playbook. Their usage is explained in the next steps of the playbook.

- `playbook_results` : the results of the playbook
- `insufficient_count` : numeric results of the playbook
- `playbook_checks` : the checks that are executed, with their alert message
- `distribution_labels` : the list of distribution labels
- `result_limit` : maximum number of results to include in one result page when querying MISP

In [15]:
playbook_checks = {"attributes": {"do_check": True, "qt": 5, "alert": "Insufficient attributes"},
                   "objects": {"do_check": False, "qt": 0, "alert": "Insufficient objects"},
                   "tags": {"do_check": True, "qt": 0, "alert": "Insufficient tags"},
                   "galaxies": {"do_check": True, "qt": 0, "alert": "Insufficient galaxies"},
                   "tlp_white": {"distribution": [3], "alert": "The event is tagged as tlp:white, yet the distribution is not set to all."},
                   "tlp_green": {"distribution": [1, 2, 3], "alert": "The event is tagged as tlp:green, yet the distribution is not set to community, connected communities or all."},
                   "tlp_amber": {"distribution": [0, 1, 2, 4], "alert": "The event is tagged as tlp:amber, yet the distribution is set to {}, be aware of potential information leakage."},
                   "tlp_red": {"distribution": [0, 1, 2, 4], "alert": "The event is tagged as tlp:red, yet the distribution is set to {}, be aware of potential information leakage."},
                   "valid_tlps": {"tlps": ["tlp:white", "tlp:green", "tlp:amber", "tlp:red", "tlp:ex:chr", "tlp:clear", "tlp:amber+strict"], "alert": "Unknown TLP tag, please refer to the TLP taxonomy as to what is valid, otherwise filtering rules created by your partners may miss your intent."},
                   "valid_tlps_required": {"alert": "The event does not have a valid TLP designation."},
                   "valid_tlps_global": {"alert": "The TLP tag is a 'local' tag. It needs to be a 'global' to be efficient and synchronise to your partners."},
                   "pap_white": {"distribution": [3], "alert": "The event is tagged with PAP:WHITE, yet the distribution is not set to all."},
                   "pap_red": {"distribution": [0, 1, 2, 4], "alert": "The event is tagged with PAP:RED, yet the distribution is set to {}, be aware of that information can be used on unintendedly."},
                   "workflow_tag_complete": {"do_check": True, "alert": "The workflow state is set to complete, yet the event is not published."},
                   "workflow_tag_incomplete": {"do_check": True, "alert": "The workflow state is set to incomplete, yet the event is published."},
                   "workflow_tag_rejected": {"do_check": True, "alert": "The workflow state is set to rejected, yet the event is published."},                   
                   "workflow_tag_local": {"do_check": True, "alert": "The workflow tag is not a 'local' tag. In general workflow tags should not be shared outside your organisation."},
                   "workflow_tag_todo":  {"do_check": True, "alert": "There are remaining workflow todo tasks, yet the event is already published."},
                   "required_tags": {"do_check": True, "tags": ["workflow:state"], "alert": "One or more of the required tags is missing."},
                    }
distribution_labels = {0: "Your organisation only", 1: "This community only", 2: "Connected communities", 3:"All communities", 4:"Sharing groups"}
result_limit = 100

playbook_results = {}
insufficient_count = {"attributes": 0, "objects": 0, "tags": 0, "galaxies": 0, "distribution": 0, "valid_tlps": 0, 
                        "valid_tlps_required": 0, "valid_tlps_global": 0, "workflow_tag": 0, "required_tags": 0}

# Event quality check

The next cell reviews the MISP events for various inconsistencies. To avoid memory issues, the playbook does not use `pythonify=True`. The checks include

- Verify there are sufficient **attributes**, **objects**, **tags** and **galaxies**;
- Review if the Traffic Light Protocol (**TLP**) and Permissible Actions Protocol (**PAP**) do not contradict with the event **distribution** setting;
- Check that the **workflow** tags make sense with the event publish state.

You can disable checks by setting their corresponding `do_check` value to False. 

You can also add additional filters to the MISP event search by changing the line starting with `event_list = misp.search("events" ...`. One of the filters is `org_list`. It allows to limit the search to one or a list of **organisations**. Set the value to False to include all organisations. Additional filters are for example
- `published` : published or not-published events
- `date_from` and `date_to` : filter on event dates
- `tags` : specify a list of tags
- and many more, basically all those available to the MISP [REST API search](https://www.misp-project.org/openapi/#tag/Events/operation/restSearchEvents)

## RE:1 Review events for inconsistencies

In [16]:
# Only consider events created by the below organisations. Set to False to include events from all orgs
org_list = [1, 2, 14, 16]
#org_list = False


print("Searching for events ...")
current_page = 1
processed_events = 0

while True:
    # Don't use pythonify=True to limit memory usage
    event_list = misp.search("events", org=org_list, limit=result_limit, page=current_page)
    len_event_list = len(event_list)
    if len_event_list == 0:
        break

    print(" Page {} with {} results.".format(current_page, len_event_list))
    for el in event_list:
        if not el.get("Event", False):
            break
        event = el["Event"]

        # Sufficient elements in the events?
        check_sufficient_in_event(event, "Attribute", "attributes", playbook_checks["attributes"]["do_check"])
        check_sufficient_in_event(event, "Object", "objects", playbook_checks["objects"]["do_check"])
        check_sufficient_in_event(event, "Galaxy", "galaxies", playbook_checks["galaxies"]["do_check"])

        tlp_present = False
        qt_tags = 0
        tag_list = []
        for tag in event.get("Tag", []):
            # Skip tag count if it refers to a galaxy
            if not tag["is_galaxy"] is True:
                tag_list.append(tag["name"].strip())
                qt_tags += 1

            # Workflow tags for threat intelligence curation
            check_workflow_tag(event, tag, playbook_checks["workflow_tag_complete"]["do_check"], playbook_checks["workflow_tag_local"]["do_check"], "workflow:state=\"complete\"", "workflow_tag_complete", False)
            check_workflow_tag(event, tag, playbook_checks["workflow_tag_incomplete"]["do_check"], playbook_checks["workflow_tag_local"]["do_check"], "workflow:state=\"incomplete\"", "workflow_tag_incomplete", True)
            check_workflow_tag(event, tag, playbook_checks["workflow_tag_rejected"]["do_check"], playbook_checks["workflow_tag_local"]["do_check"], "workflow:state=\"rejected\"", "workflow_tag_rejected", True)
  
            # Check for published events with remaining todo tags
            if playbook_checks["workflow_tag_todo"]["do_check"] and "workflow:todo=" in tag["name"]:
                if event["published"] is True:
                    insufficient_count["workflow_tag"] += 1
                    add_event_alert(playbook_results, event["id"], event["info"], playbook_checks["workflow_tag_todo"]["alert"])

            # Review the distribution settings while we process the tags
            check_marking_distribution(event, tag, "PAP:WHITE", "pap_white")
            check_marking_distribution(event, tag, "PAP:RED", "pap_red")
            check_marking_distribution(event, tag, "tlp:white", "tlp_white")
            check_marking_distribution(event, tag, "tlp:green", "tlp_green")
            check_marking_distribution(event, tag, "tlp:amber", "tlp_amber")
            check_marking_distribution(event, tag, "tlp:red", "tlp_red")

            # Check if the tag is from TLP taxonomy
            tag_name = tag["name"].lower().strip()
            if "tlp:" in tag_name or "threat tlp:" in tag_name or "tlp=" in tag_name:
                if not tag["name"] in playbook_checks["valid_tlps"]["tlps"]:
                    insufficient_count["valid_tlps"] += 1
                    add_event_alert(playbook_results, event["id"], event["info"], playbook_checks["valid_tlps"]["alert"])
                else:
                    tlp_present = True
                    if tag["local"] == 1:
                        insufficient_count["valid_tlps_global"] += 1
                        add_event_alert(playbook_results, event["id"], event["info"], playbook_checks["valid_tlps_global"]["alert"])

        # Finished processing the tags. Summarise results for tags
        if not tlp_present:
            insufficient_count["valid_tlps_required"] += 1
            add_event_alert(playbook_results, event["id"], event["info"], playbook_checks["valid_tlps_required"]["alert"])

        if playbook_checks["required_tags"]["do_check"]:
            required_tags = all(any(tag in element for element in tag_list) for tag in playbook_checks["required_tags"]["tags"])
            if not required_tags:
                insufficient_count["required_tags"] += 1
                add_event_alert(playbook_results, event["id"], event["info"], playbook_checks["required_tags"]["alert"])

        if playbook_checks["tags"]["do_check"] and not qt_tags > playbook_checks["tags"]["qt"]:
            insufficient_count["tags"] += 1
            add_event_alert(playbook_results, event["id"], event["info"], playbook_checks["tags"]["alert"])

        processed_events += 1
    current_page += 1

print("Finished searching. Processed \033[92m{}\033[90m events.".format(processed_events))

Searching for events ...
 Page 1 with 18 results.
Finished searching. Processed [92m18[90m events.


## RE:2 Summary of findings

Print out a short summary of the findings. The closure section will create a detailed overview of events that require improvement.

In [17]:
print("\nSummary of the playbook findings:")
print(" \033[91m{}\033[90m alerts for attributes".format(insufficient_count["attributes"]))
print(" \033[91m{}\033[90m alerts for objects".format(insufficient_count["objects"]))
print(" \033[91m{}\033[90m alerts for tags".format(insufficient_count["tags"]))
print(" \033[91m{}\033[90m alerts for galaxies".format(insufficient_count["galaxies"]))
print(" \033[91m{}\033[90m alerts for distribution settings".format(insufficient_count["distribution"]))
print(" \033[91m{}\033[90m alerts for valid_tlps".format(insufficient_count["valid_tlps"]))
print(" \033[91m{}\033[90m alerts for valid_tlps_required".format(insufficient_count["valid_tlps_required"]))
print(" \033[91m{}\033[90m alerts for workflow_tag".format(insufficient_count["workflow_tag"]))
print(" \033[91m{}\033[90m alerts for required_tags".format(insufficient_count["required_tags"]))

# A JSON dump of the results can be printed with the below command
#print(json.dumps(playbook_results, indent=4))


Summary of the playbook findings:
 [91m1[90m alerts for attributes
 [91m0[90m alerts for objects
 [91m0[90m alerts for tags
 [91m3[90m alerts for galaxies
 [91m1[90m alerts for distribution settings
 [91m0[90m alerts for valid_tlps
 [91m1[90m alerts for valid_tlps_required
 [91m1[90m alerts for workflow_tag
 [91m5[90m alerts for required_tags


## RE:3 Details of our findings

### Summarised by alert

Print the findings summarised per **alert** type.

In [18]:
table = PrettyTable()
table.field_names = ["Alert", "Events"]
table.align["Events"] = "l"
table.align["Alert"] = "l"
table._max_width = {"Alert": 80, "Events": 70}
alert_list = {}
for key, event in playbook_results.items():
    for alert in event["alerts"]:
        if alert in alert_list:
            alert_list[alert].append("{} - {}".format(key, event["info"]))
        else:
            alert_list[alert] = ["{} - {}".format(key, event["info"])]
for alert in alert_list:
    event_string = ""
    for event in alert_list[alert]:
        event_string = "{}{}\n".format(event_string, event)
    table.add_row([alert, event_string])
print(table.get_string(sortby="Alert"))
misp_alerts = table

+---------------------------------------------------------------------------+------------------------------------------------------------------------+
| Alert                                                                     | Events                                                                 |
+---------------------------------------------------------------------------+------------------------------------------------------------------------+
| Insufficient attributes                                                   | 3138 - Malware triage for incident in L2/L3                            |
|                                                                           |                                                                        |
| Insufficient galaxies                                                     | 1873 - Stantinko investigation                                         |
|                                                                           | 3037 - MAR-10382

### Summarised by event

Print the findings summarised by MISP **event**.

In [19]:
table = PrettyTable()
table.field_names = ["ID", "Event title", "Alerts"]
table.align["Event title"] = "l"
table.align["ID"] = "l"
table.align["Alerts"] = "l"
table._max_width = {"Alerts": 80, "Event title": 70}
for key, event in playbook_results.items():
    alert_string = ""
    for alert in event["alerts"]:
        alert_string = "{}{}\n".format(alert_string, alert)
    table.add_row([key, event["info"], alert_string])
print(table.get_string(sortby="ID"))
misp_events = table

+------+------------------------------------------------------------------------+---------------------------------------------------------------------------+
| ID   | Event title                                                            | Alerts                                                                    |
+------+------------------------------------------------------------------------+---------------------------------------------------------------------------+
| 1873 | Stantinko investigation                                                | Insufficient galaxies                                                     |
|      |                                                                        | The event does not have a valid TLP designation.                          |
|      |                                                                        |                                                                           |
| 2178 | Turla Outlook White Paper                  

# Closure

In this **closure** or end step we create a **summary** of the actions that were performed by the playbook. The summary is printed and can also be send to a chat channel. 

## EN:1 Create the summary of the playbook 

The next section creates a summary and stores the output in the variable `summary` in Markdown format. It also stores an intro text in the variable `intro`. These variables can later be used when sending information to Mattermost or TheHive.

In [20]:
summary = "# MISP Playbook summary\nQuery MISP events for inconsistencies \n\n"

current_date = datetime.now()
formatted_date = current_date.strftime("%Y-%m-%d")
summary += "## Overview\n\n"
summary += "- Date: **{}**\n".format(formatted_date)
summary += "- Events reviewed: **{}**\n".format(processed_events)
summary += "- Insufficient **attributes**: **{}**\n".format(insufficient_count["attributes"])
summary += "- Insufficient **objects**: **{}**\n".format(insufficient_count["objects"])
summary += "- Insufficient **tags**: **{}**\n".format(insufficient_count["tags"])
summary += "- Insufficient **galaxies**: **{}**\n".format(insufficient_count["galaxies"])
summary += "- Inconsistent **distribution** settings: **{}**\n".format(insufficient_count["distribution"])
summary += "- Invalid Traffic Light Protocol (**TLP**) designations: **{}**\n".format(insufficient_count["valid_tlps"])
summary += "- **Missing TLP**: **{}**\n".format(insufficient_count["valid_tlps_required"])
summary += "- Inconsistent **workflow** tags: **{}**\n".format(insufficient_count["workflow_tag"])
summary += "- **Required tags** not present: **{}**\n".format(insufficient_count["required_tags"])

summary += "\n\n"
summary += "## Alerts\n\n"
misp_alerts.set_style(MARKDOWN)
summary += misp_alerts.get_string(sortby="Alert")

summary += "\n\n"
summary += "## Events\n\n"
misp_events.set_style(MARKDOWN)
summary += misp_events.get_string(sortby="ID")
summary += "\n\n"

print("The \033[92msummary\033[90m of the playbook is available.\n")

The [92msummary[90m of the playbook is available.



## EN:2 Send a summary to Mattermost

Now you can send the summary to Mattermost. You can send the summary in two ways by selecting one of the options for the variable `send_to_mattermost_option` in the next cell.

- The default option where the entire summary is in the **chat**, or
- a short intro and the summary in a **card**

For this playbook we rely on a webhook in Mattermost. You can add a webhook by choosing the gear icon in Mattermost, then choose Integrations and then **Incoming Webhooks**. Set a channel for the webhook and lock the webhook to this channel with *"Lock to this channel"*.

In [21]:
send_to_mattermost_option = "via a chat message"
#send_to_mattermost_option = "via a chat message with card"

In [22]:
message = False
if send_to_mattermost_option == "via a chat message":
    message = {"username": mattermost_playbook_user, "text": summary}
elif send_to_mattermost_option == "via a chat message with card":
    message = {"username": mattermost_playbook_user, "text": intro, "props": {"card": summary}}

if message:
    r = requests.post(mattermost_hook, data=json.dumps(message))
    r.raise_for_status()
if message and r.status_code == 200:
    print("Summary is \033[92msent to Mattermost.\n")
else:
    print("\033[91mFailed to sent summary\033[90m to Mattermost.\n")

Summary is [92msent to Mattermost.



## EN:3 End of the playbook 

In [23]:
print("\033[92m End of the playbook")


[92m End of the playbook


## External references <a name="extreferences"></a>

- [The MISP Project](https://www.misp-project.org/)
- [Mattermost](https://mattermost.com/)

## Technical details 

### Documentation

This playbook requires these Python **libraries** to exist in the environment where the playbook is executed. You can install them with `pip install <library>`.

```
pyfaup
chardet
PrettyTable
ipywidgets
```

### Colour codes

The output from Python displays some text in different colours. These are the colour codes

```
Red = '\033[91m'
Green = '\033[92m'
Blue = '\033[94m'
Cyan = '\033[96m'
White = '\033[97m'
Yellow = '\033[93m'
Magenta = '\033[95m'
Grey = '\033[90m'
Black = '\033[90m'
Default = '\033[99m'
```