# Analysis of MITRE OT data

This notebook is used to analyse the OT data and provide a visual representation that can aid in discovering remediation priorities  based on different views. Please note this is mostly intended to validate some working theories and is by no means complete.

## Definition
* DUSTTUNNEL
  * ICECORE
* LAZYCARGO
* MOUSEHOLE
  * TAGRUN
* BADOMEN
  * OMSHELL
* EVILSCHOLAR
  * CODECALL

## Scope
For this analysis we focus on the Pipedream / Incontroller malware by the Chernovite group

## Goals
* Attempt to understand which mitigations are best suited to defend against the mentioned malware
* Attempt to understand which data sources are best suited to detect the mentioned malware

## References
* https://github.com/mitre-attack/attack-stix-data/blob/master/USAGE.md#the-attck-data-model
* https://oasis-open.github.io/cti-documentation/examples/visualized-sdo-relationships

## Dependencies

In [2]:
%pip install pandas
%pip install taxii2-client
%pip install stix2
%pip install networkx
%pip install pyvis
# required if you want to do text extraction
#%pip install PyPDF2

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


# Imports & data access setup

In [3]:
from stix2 import Filter
import pandas as pd
import networkx as nx
from pyvis.network import Network

## Select & setup LIVE collections

In [4]:
## Disabled by default, since we have local files
runme = False
if runme:
    from taxii2client.v20 import Server
    from stix2 import TAXIICollectionSource, Filter
    from taxii2client.v20 import Collection

    #setup taxii logging
    import logging
    logging.getLogger('taxii2client').setLevel(logging.CRITICAL)
    server = Server("https://cti-taxii.mitre.org/taxii/")

### Online data

In [5]:
runme = False
if runme:
    api_root = server.api_roots[0]
    for idx, collection in enumerate(api_root.collections):
        print(f"[{idx}] {collection.title} -> {collection.description}")

## Offline data: download collections
to be used when the live connection fails

In [6]:
## Use this if you don't have local files or want to update them
runme = False
if runme:
    import requests
    import json

    def get_data_from_branch(domain, branch="master"):
        # get the ATT&CK STIX data from MITRE/CTI. Domain should be 'enterprise-attack', 'mobile-attack' or 'ics-attack'. Branch should typically be master
        #stix_json = requests.get(f"https://raw.githubusercontent.com/mitre/cti/{branch}/{domain}/{domain}.json").json()
        stix_json = requests.get(f"https://github.com/mitre-attack/attack-stix-data/raw/{branch}/{domain}/{domain}.json").json()
        with open(f"stixdata/{domain}.json","w") as stixfile:
            stixfile.write(json.dumps(stix_json))

    for domain in ['enterprise-attack','mobile-attack','ics-attack']:
        get_data_from_branch(domain)
    print("Downloaded")

### Offline data loading
To be used when online query provides errors

In [7]:
from stix2 import MemoryStore

stix_ics = MemoryStore()
stix_ics.load_from_file("stixdata/ics-attack.json")

stix_enterprise = MemoryStore()
stix_enterprise.load_from_file("stixdata/enterprise-attack.json")

print(stix_ics)
print(stix_enterprise)

<stix2.datastore.memory.MemoryStore object at 0x78b6dc8032b0>
<stix2.datastore.memory.MemoryStore object at 0x78b6dc8037f0>


# Available techniques
We did this manually, since automated extraction is kinda challenging.
You can do it automatically if you want, but extracting techniques not numbers is challenging.

In [8]:
"""
# probably useless for most PDF files, but can still be useful
from PyPDF2 import PdfReader
import re

technique_pattern = r' ?[tT]\d{1,4}'

reader = PdfReader("PDF/Dragos_ChernoviteWP_v2b.pdf")
for page in reader.pages:
    techniques_found = re.findall(technique_pattern, page.extract_text())
    if len(techniques_found) > 0:
        print(techniques_found)
"""

'\n# probably useless for most PDF files, but can still be useful\nfrom PyPDF2 import PdfReader\nimport re\n\ntechnique_pattern = r\' ?[tT]\\d{1,4}\'\n\nreader = PdfReader("PDF/Dragos_ChernoviteWP_v2b.pdf")\nfor page in reader.pages:\n    techniques_found = re.findall(technique_pattern, page.extract_text())\n    if len(techniques_found) > 0:\n        print(techniques_found)\n'

In [9]:
# Techniques without ID, needs resolving
PIPEDREAM_UNRESOLVED = ['Command-Line Interface',
                        'Connection Proxy',
                        'Commonly Used Port',
                        'Denial of Control',
                        'Default Credentials',
                        'Denial of Service',
                        'Denial of View',
                        'Detect Operating Mode',
                        'Device Restart/Shutdown',
                        'Execution through API',
                        'Exploitation for Privilege Escalation',
                        'Lateral Tool Transfer',
                        'Loss of Availability',
                        'Loss of Control',
                        'Loss of Productivity and Revenue',
                        'Loss of Safety',
                        'Loss of View',
                        'Manipulate I/O Image',
                        'Manipulation of Control',
                        'Modify Parameter',
                        'Network Sniffing',                        
                        'Point & Tag Identification',
                        'Program Download',
                        'Program Upload',
                        'Remote Services',                        
                        'Remote System Discovery',
                        'Remote System Information Discovery',
                        'Rootkit',
                        'Scripting',
                        'Standard Application Layer Protocol',
                        'System Firmware',
                        'Theft of Operational Information',
                        'User Execution',
                        'Unauthorized Command Message',
                        'Valid Accounts',
                       ]

PIPEDREAM_TECHNIQUES = {'T1047':'Windows Management Instrumentation',
                        'T1059':'Command and Scripting Interpreter',                     
                        'T1105':'Ingress Tool Transfer',
                        'T1544':'Remote File Copy',
                        }

BADOMEN_TECHNIQUES = {'T0801':'Monitor Process State',
                      'T0807':'Command-Line Interface',
                      'T0812':'Default Credentials',
                      'T0821':'Modify Controller Tasking',
                      'T0831':'Manipulation of Control',
                      'T0834':'Native API',
                      'T0836':'Modify Parameter',
                      'T0837':'Loss of Safety',
                      'T0842':'Network Sniffing',
                      'T0843':'Program Download',
                      'T0845':'Program Upload',
                      'T0846':'Remote System Discovery',
                      'T0853':'Scripting',
                      'T0855':'Unauthorized Command Message',
                      'T0858':'Change Operating Mode',
                      'T0859':'Valid Accounts',
                      'T0867':'Lateral Tool Transfer',
                      'T0868':'Detect Operating Mode',
                      'T0869':'Standard Application Layer Protocol',
                      'T0879':'Damage to Property',
                      'T0881':'Service Stop',
                      'T0882':'Theft of Operational Information',
                      'T0885':'Commonly Used Port',
                      'T0886':'Remote Services',
                      'T0888':'Remote System Information Discovery',
                      'T0889':'Modify Program',
                      'T1021':'Remote Services',
                      'T1552':'Unsecured Credentials',
                      'T1544':'Remote File Copy',
                      'T1573':'Encrypted Channel',
                     }

EVILSCHOLAR_TECHNIQUES = {'T0801':'Monitor Process State',
                          'T0803':'Block Command Message',
                          'T0804':'Block Reporting Message',
                          'T0807':'Command-Line Interface',
                          'T0809':'Data Destruction',
                          'T0812':'Default Credentials',
                          'T0813':'Denial of Control',
                          'T0814':'Denial of Service',
                          'T0815':'Denial of View',
                          'T0816':'Device Restart/Shutdown',
                          'T0826':'Loss of Availability',
                          'T0827':'Loss of Control',
                          'T0828':'Loss of Productivity and Revenue',
                          'T0831':'Manipulation of Control',
                          'T0836':'Modify Parameter',
                          'T0843':'Program Download',
                          'T0845':'Program Upload',
                          'T0846':'Remote System Discovery',
                          'T0853':'Scripting',
                          'T0855':'Unauthorized Command Message',
                          'T0857':'System Firmware',
                          'T0859':'Valid Accounts',
                          'T0869':'Standard Application Layer Protocol',
                          'T0882':'Theft of Operational Information',
                          'T0885':'Commonly Used Port',
                          'T0888':'Remote System Information Discovery',
                          'T0889':'Modify Program',
                          'T1078':'Valid Accounts',
                          'T1110':'Brute Force',
                         }

LAZYCARGO_TECHNIQUES = {'T1544':'Remote File Copy'
                       }

MOUSEHOLE_TECHNIQUES = {'T0801':'Monitor Process State',
                        'T0807':'Command-Line Interface',
                        'T0832':'Manipulation of View',
                        'T0846':'Remote System Discovery',
                        'T0853':'Scripting',
                        'T0859':'Valid Accounts',
                        'T0861':'Point & Tag Identification',
                        'T0869':'Standard Application Layer Protocol',
                        'T0885':'Commonly Used Port',
                        'T0882':'Theft of Operational Information',
                        'T0888':'Remote System Information Discovery',
                        'T1046':'Network Service Scanning'
                       }

## Resolve unknown techniques

In [10]:
print(f"before {len(PIPEDREAM_TECHNIQUES)}")
for unresolved in PIPEDREAM_UNRESOLVED:
    filters = [
        Filter("type", "=", "attack-pattern"),
        Filter("external_references.source_name", "=", 'mitre-attack'),
        Filter("name", "=", unresolved),
    ]

    techniques = stix_ics.query(filters)
    if len(techniques) > 0:
        for external_reference in techniques[0]['external_references']:
            if external_reference['source_name'] == 'mitre-attack':
                PIPEDREAM_TECHNIQUES[external_reference['external_id']] = unresolved
    else:
        print(f"unresolved {unresolved}")

print(f"after {len(PIPEDREAM_TECHNIQUES)}")
print(PIPEDREAM_TECHNIQUES)

before 4
after 39
{'T1047': 'Windows Management Instrumentation', 'T1059': 'Command and Scripting Interpreter', 'T1105': 'Ingress Tool Transfer', 'T1544': 'Remote File Copy', 'T0807': 'Command-Line Interface', 'T0884': 'Connection Proxy', 'T0885': 'Commonly Used Port', 'T0813': 'Denial of Control', 'T0812': 'Default Credentials', 'T0814': 'Denial of Service', 'T0815': 'Denial of View', 'T0868': 'Detect Operating Mode', 'T0816': 'Device Restart/Shutdown', 'T0871': 'Execution through API', 'T0890': 'Exploitation for Privilege Escalation', 'T0867': 'Lateral Tool Transfer', 'T0826': 'Loss of Availability', 'T0827': 'Loss of Control', 'T0828': 'Loss of Productivity and Revenue', 'T0880': 'Loss of Safety', 'T0829': 'Loss of View', 'T0835': 'Manipulate I/O Image', 'T0831': 'Manipulation of Control', 'T0836': 'Modify Parameter', 'T0842': 'Network Sniffing', 'T0861': 'Point & Tag Identification', 'T0843': 'Program Download', 'T0845': 'Program Upload', 'T0886': 'Remote Services', 'T0846': 'Remot

## Combine all techniques
We also want to have a complete set of techniques for this family.

In [11]:
print(f"before {len(PIPEDREAM_TECHNIQUES)}")

PIPEDREAM_TECHNIQUES.update(BADOMEN_TECHNIQUES)
PIPEDREAM_TECHNIQUES.update(EVILSCHOLAR_TECHNIQUES)
PIPEDREAM_TECHNIQUES.update(LAZYCARGO_TECHNIQUES)
PIPEDREAM_TECHNIQUES.update(MOUSEHOLE_TECHNIQUES)

print(f"after {len(PIPEDREAM_TECHNIQUES)}")

before 39
after 57


# Helper functions

In [12]:
class MitreTechniqueExplorer:
    def __init__(self, stix_data_store, mitre_technique_id):
        self.stix_store = stix_data_store
        self._technique_id = mitre_technique_id.upper()
        self._stix_id = self._stixid_from_id()
        self._technique_name = self._name_from_id()
        self._technique_tactics = self._tactics_from_id()
        self._datasources = self._datasources_from_id()
        self._mitigations = self._mitigations_from_id()
        self._targets = self._targets_from_id()
        
    @property
    def id_mitre(self):
        return self._technique_id
    
    @property
    def id_stix(self):
        return self._stix_id
    
    @property
    def name(self):
        return self._technique_name
    
    @property
    def tactics(self):
        return self._technique_tactics
    
    @property
    def datasources(self):
        return self._datasources
    
    @property
    def mitigations(self):
        return self._mitigations
    
    @property
    def targets(self):
        return self._targets
            
    def _filtered_query(self, original_filter):
        clean_results = []
        filters = list(original_filter)
        filters.append(Filter("revoked","=",False))
        search_results = self.stix_store.query(filters)
        for result in search_results:
            if result.get('x_mitre_deprecated',False) is False:
                clean_results.append(result)
        return clean_results
    
    def _stixid_from_id(self):   
        filters = [
            Filter("type", "=", "attack-pattern"),
            Filter("external_references.source_name", "=", 'mitre-attack'),
            Filter("external_references.external_id","=",self._technique_id)
        ]

        search_results = self._filtered_query(filters)
        for result in search_results:
            return result.get('id','')
        
    def _name_from_id(self):   
        filters = [
            Filter("type", "=", "attack-pattern"),
            Filter("external_references.source_name", "=", 'mitre-attack'),
            Filter("external_references.external_id","=",self._technique_id)
        ]

        search_results = self._filtered_query(filters)
        for result in search_results:
            return result.get('name','')

    def _tactics_from_id(self):
        icstactics = []
        filters = [
            Filter("type", "=", "attack-pattern"),
            Filter("kill_chain_phases.kill_chain_name", "=", 'mitre-ics-attack'),
            Filter("external_references.source_name", "=", 'mitre-attack'),
            Filter("external_references.external_id", "=", self._technique_id)
        ]

        search_results = self._filtered_query(filters)
        for result in search_results:
            for kcphase in result['kill_chain_phases']:
                if kcphase['kill_chain_name'] == 'mitre-ics-attack':
                    icstactics.append(kcphase['phase_name'])

        return icstactics if icstactics else None

    def _datasources_from_id(self):
        datasources = []
        filters = [
            Filter("type", "=", "attack-pattern"),
            Filter("kill_chain_phases.kill_chain_name", "=", 'mitre-ics-attack'),
            Filter("external_references.source_name", "=", 'mitre-attack'),
            Filter("external_references.external_id", "=", self._technique_id)
        ]

        search_results = self._filtered_query(filters)
        for result in search_results:
            if 'x_mitre_data_sources' in result:
                _datasources = result['x_mitre_data_sources']
                for _datasource in _datasources:
                    datasources.append({k: v for k, v in [_datasource.split(':') if ':' in _datasource else (_datasource,'')]})
        return datasources if datasources else None

    def _mitigations_from_id(self):
        technique_mitigations = []
        if not self._stix_id:
            return None
        
        courseofaction_relation_filters = [
            Filter("type","=","relationship"),
            Filter("relationship_type","=","mitigates"),
            Filter("target_ref","=",self._stix_id)
        ]


        mitigations = self._filtered_query(courseofaction_relation_filters)
        for mitigation in mitigations:
            courseofaction_id = mitigation['source_ref']
            courseofaction_filters = [
                Filter("type", "=", "course-of-action"),
                Filter("id", "=", courseofaction_id)
            ]
            courseofactions = self._filtered_query(courseofaction_filters)
            for courseofaction in courseofactions:
                technique_mitigations.append(courseofaction['name'])
                
        return technique_mitigations if technique_mitigations else None   
    
    def _targets_from_id(self):
        target_assets = []
        if not self._stix_id:
            return None

        assets_relation_filters = [
            Filter("type","=","relationship"),
            Filter("relationship_type","=","targets"),
            Filter("source_ref","=",self._stix_id)
        ]
        
        targetedassets = self._filtered_query(assets_relation_filters)
        for targetedasset in targetedassets:
            assetid = targetedasset['target_ref']
            asset_filters = [
                Filter("type", "=", "x-mitre-asset"),
                Filter("id", "=", assetid)
            ]
            
            asset_results = self._filtered_query(asset_filters)
            for asset in asset_results:
                target_assets.append(asset['name'])
                
        return target_assets if target_assets else None
        
if True:
    my_test = MitreTechniqueExplorer(stix_ics,'T0845')
    print(my_test.id_mitre)
    print(my_test.id_stix)
    print(my_test.name)
    print(my_test.tactics)
    print(my_test.datasources)
    print(my_test.mitigations)
    print(my_test.targets)
    
    
        

T0845
attack-pattern--3067b85e-271e-4bc5-81ad-ab1a81d411e3
Program Upload
['collection']
[{'Network Traffic': ' Network Traffic Content'}, {'Network Traffic': ' Network Traffic Flow'}, {'Application Log': ' Application Log Content'}]
['Software Process and Device Authentication', 'Communication Authenticity', 'Authorization Enforcement', 'Access Management', 'Network Segmentation', 'Filter Network Traffic', 'Network Allowlists', 'Human User Authentication']
['Programmable Logic Controller (PLC)', 'Safety Controller']


# Create datasets
Let's create some datasets that we can query and play with.

In [13]:
all_reporttechniques = {'pipedream':PIPEDREAM_TECHNIQUES,'badomen':BADOMEN_TECHNIQUES,'evilscholar':EVILSCHOLAR_TECHNIQUES,
                        'lazycargo':LAZYCARGO_TECHNIQUES,'mousehole':MOUSEHOLE_TECHNIQUES}

techniques_report = {'mitre_id':[],'mitre_tactic':[],'mitre_name':[],'malware_name':[],'targets':[],
                     'datasource':[],'datacomponent':[],'mitigations':[]}

for dname,dvalue in all_reporttechniques.items():
    for k,v in dvalue.items():
        technique = MitreTechniqueExplorer(stix_ics, k)
        techniques_report['mitre_id'].append(technique.id_mitre)
        techniques_report['mitre_tactic'].append(technique.tactics)
        techniques_report['mitre_name'].append(technique.name)
        techniques_report['malware_name'].append(dname)
        techniques_report['targets'].append(technique.targets)
        if technique.datasources is not None:
            techniques_report['datasource'].append([value for d in technique.datasources for value in d.values()])
            techniques_report['datacomponent'].append([value for d in technique.datasources for value in d.keys()])
        else:
            techniques_report['datasource'].append(None)
            techniques_report['datacomponent'].append(None)
        techniques_report['mitigations'].append(technique.mitigations)

pipedreamraw_df = pd.DataFrame(techniques_report)
pipedreamraw_df.head()

Unnamed: 0,mitre_id,mitre_tactic,mitre_name,malware_name,targets,datasource,datacomponent,mitigations
0,T1047,,,pipedream,,,,
1,T1059,,,pipedream,,,,
2,T1105,,,pipedream,,,,
3,T1544,,,pipedream,,,,
4,T0807,[execution],Command-Line Interface,pipedream,"[Human-Machine Interface (HMI), Data Gateway, ...","[ Command Execution, Application Log Content,...","[Command, Application Log, Process]","[Execution Prevention, Disable or Remove Featu..."


In [14]:
# We explode to have rows with just one tactic per technique.
# beware: this increase the amount of rows and double values
pipedreamall_df = pipedreamraw_df.explode('mitre_tactic')
pipedreamall_df = pipedreamall_df.explode('targets')
pipedreamall_df = pipedreamall_df.explode('datasource')
pipedreamall_df = pipedreamall_df.explode('datacomponent')
pipedreamall_df = pipedreamall_df.explode('mitigations')
#pipedreamall_df = pipedreamall_df.astype(str)

# Different df that only contains the modules
pipedream_modules_df = pipedreamall_df.drop(pipedreamall_df[pipedreamall_df['malware_name'] == 'pipedream'].index)
#pipedream_modules_df = pipedream_modules_df.astype(str)

# Different df that only contains pipedream
pipedream_df = pipedreamall_df[pipedreamall_df['malware_name'] == 'pipedream']

# Graphs
So what would we want to do with graphs? I mean most of our questions can be answered with tables and statistics. Still I like graphs, so why not try it out. We have a couple of nodes/edges to think about, for which some of the work has already been done:

* https://www.mbsecure.nl/blog/2019/5/dettact-mapping-your-blue-team-to-mitre-attack

so with that picture in mind and our dataset the first node/edges we need are:

* mitre_id --accomplishes--> mitre_tactic
* mitre_id --logged_in--> datasource
* mitre_id --logged_in--> datacomponent
* datasource --consists_of--> datacomponent
* malware_name --implements--> mitre_id
* mitre_id --targets-->assets



## Pipedream module implements mitre techniques

In [15]:
# prepare data nodes & edges
df_edges = pipedream_modules_df[['malware_name','mitre_id','mitre_name']].drop_duplicates()
unique_nodes_src = df_edges['malware_name'].unique()
unique_nodes_dst = df_edges['mitre_id'].unique()
edges_malwarename_mitreid = list(zip(df_edges['malware_name'],df_edges['mitre_id']))

#prepare graph data nodes and edges
G = nx.DiGraph()
graph_scale = 2
for node in unique_nodes_src:
    G.add_node(node,color='red')

for node in unique_nodes_dst:
    node_title = df_edges.loc[df_edges['mitre_id'] == node, 'mitre_name'].values[0]
    G.add_node(node,title=node_title,color='skyblue')
    
G.add_edges_from(edges_malwarename_mitreid,color='skyblue')
graph_node_degrees = dict(G.degree())
for nodename,nodedegree in graph_node_degrees.items():
    if nodedegree < 5:
        graph_scale = 3
    G.nodes[nodename]['weight'] = nodedegree * graph_scale
    G.nodes[nodename]['size'] = nodedegree * graph_scale

# draw the graph
nt = Network(notebook=True, directed=True, filter_menu=True, height="800px", width="100%", cdn_resources="in_line")
# populates the nodes and edges data structures
nt.from_nx(G)
#nt.show_buttons()
nt.toggle_physics(False)
nt.show('nx.html')


nx.html


## Pipedream module targets assets

In [16]:
df_edges = pipedream_modules_df[['malware_name','targets','mitre_id','mitre_name']].dropna(subset=['targets','mitre_id']).drop_duplicates()
edges_malwarename_targets = list(zip(df_edges['malware_name'],df_edges['mitre_id']))
edges_malwarename_targets2 = list(zip(df_edges['mitre_id'],df_edges['targets']))
totaledges = []
totaledges.extend(edges_malwarename_targets)
totaledges.extend(edges_malwarename_targets2)
csvfile = open('data_edges.csv','w')
csvfile.write("Source,Target\n")
for edge in totaledges:
    csvfile.write(f"{edge[0]},{edge[1]}\n")
csvfile.close()

In [17]:
# prepare data nodes & edges
df_edges = pipedream_modules_df[['malware_name','targets','mitre_id','mitre_name']].dropna(subset=['targets','mitre_id']).drop_duplicates()
unique_nodes_src = df_edges['malware_name'].unique()
unique_nodes_dst = df_edges['mitre_id'].unique()
unique_nodes_dst2 = df_edges['targets'].unique()
edges_malwarename_targets = list(zip(df_edges['malware_name'],df_edges['mitre_id']))
edges_malwarename_targets2 = list(zip(df_edges['mitre_id'],df_edges['targets']))
print(edges_malwarename_targets2)

#prepare graph data nodes and edges
G = nx.DiGraph()
graph_scale = 2
for node in unique_nodes_src:
    G.add_node(node,color='red')

for node in unique_nodes_dst:
    node_title = df_edges.loc[df_edges['mitre_id'] == node, 'mitre_name'].values[0]
    G.add_node(node,title=node_title,color='skyblue')

for node in unique_nodes_dst2:
    G.add_node(node,title=node,color='black')
    
G.add_edges_from(edges_malwarename_targets,color='skyblue')
G.add_edges_from(edges_malwarename_targets2,color='skyblue')

graph_node_degrees = dict(G.degree())
for nodename,nodedegree in graph_node_degrees.items():
    if nodedegree < 5:
        graph_scale = 3
    G.nodes[nodename]['weight'] = nodedegree * graph_scale
    G.nodes[nodename]['size'] = nodedegree * graph_scale

# draw the graph
nt = Network(notebook=True, directed=True, filter_menu=True, height="800px", width="100%")
# populates the nodes and edges data structures
nt.from_nx(G)
nt.show_buttons()
nt.toggle_physics(False)
nt.show('nx.html')

[('T0801', 'Safety Controller'), ('T0801', 'Intelligent Electronic Device (IED)'), ('T0801', 'Field I/O'), ('T0801', 'Human-Machine Interface (HMI)'), ('T0801', 'Remote Terminal Unit (RTU)'), ('T0801', 'Data Historian'), ('T0801', 'Control Server'), ('T0801', 'Programmable Logic Controller (PLC)'), ('T0801', 'Data Gateway'), ('T0807', 'Human-Machine Interface (HMI)'), ('T0807', 'Data Gateway'), ('T0807', 'Application Server'), ('T0807', 'Control Server'), ('T0807', 'Jump Host'), ('T0807', 'Workstation'), ('T0807', 'Data Historian'), ('T0812', 'Programmable Logic Controller (PLC)'), ('T0812', 'Remote Terminal Unit (RTU)'), ('T0812', 'Jump Host'), ('T0812', 'Intelligent Electronic Device (IED)'), ('T0812', 'Workstation'), ('T0812', 'Safety Controller'), ('T0812', 'Routers'), ('T0812', 'Application Server'), ('T0812', 'Human-Machine Interface (HMI)'), ('T0812', 'Control Server'), ('T0812', 'Data Gateway'), ('T0812', 'Data Historian'), ('T0812', 'Field I/O'), ('T0812', 'Virtual Private Net

In [18]:
# prepare data nodes & edges
df_edges = pipedream_modules_df[['malware_name','mitigations','mitre_id','mitre_name']].dropna(subset=['mitigations','mitre_id']).drop_duplicates()
unique_nodes_src = df_edges['malware_name'].unique()
unique_nodes_dst = df_edges['mitre_id'].unique()
unique_nodes_dst2 = df_edges['mitigations'].unique()
edges_malwarename_targets = list(zip(df_edges['malware_name'],df_edges['mitre_id']))
edges_malwarename_targets2 = list(zip(df_edges['mitre_id'],df_edges['mitigations']))

#prepare graph data nodes and edges
G = nx.DiGraph()
graph_scale = 2
for node in unique_nodes_src:
    G.add_node(node,color='red')

for node in unique_nodes_dst:
    node_title = df_edges.loc[df_edges['mitre_id'] == node, 'mitre_name'].values[0]
    print(df_edges.loc[df_edges['mitre_id'] == node])
    G.add_node(node,title=node_title,color='skyblue')

for node in unique_nodes_dst2:
    G.add_node(node,title=node,color='black')
    
G.add_edges_from(edges_malwarename_targets,color='skyblue')
G.add_edges_from(edges_malwarename_targets2,color='skyblue')

graph_node_degrees = dict(G.degree())
for nodename,nodedegree in graph_node_degrees.items():
    if nodedegree < 5:
        graph_scale = 3
    G.nodes[nodename]['weight'] = nodedegree * graph_scale
    G.nodes[nodename]['size'] = nodedegree * graph_scale

# draw the graph
nt = Network(notebook=True, directed=True, filter_menu=True, height="800px", width="100%")
# populates the nodes and edges data structures
nt.from_nx(G)
nt.show_buttons()
nt.toggle_physics(False)
nt.show('nx.html')

    malware_name                          mitigations mitre_id  \
57       badomen  Mitigation Limited or Not Effective    T0801   
87   evilscholar  Mitigation Limited or Not Effective    T0801   
117    mousehole  Mitigation Limited or Not Effective    T0801   

                mitre_name  
57   Monitor Process State  
87   Monitor Process State  
117  Monitor Process State  
    malware_name                           mitigations mitre_id  \
58       badomen                  Execution Prevention    T0807   
58       badomen  Disable or Remove Feature or Program    T0807   
90   evilscholar                  Execution Prevention    T0807   
90   evilscholar  Disable or Remove Feature or Program    T0807   
118    mousehole                  Execution Prevention    T0807   
118    mousehole  Disable or Remove Feature or Program    T0807   

                 mitre_name  
58   Command-Line Interface  
58   Command-Line Interface  
90   Command-Line Interface  
90   Command-Line Interface  

In [19]:
# prepare data nodes & edges
df_edges = pipedream_modules_df[['malware_name','mitigations']].dropna(subset=['mitigations']).drop_duplicates()
unique_nodes_src = df_edges['malware_name'].unique()
unique_nodes_dst2 = df_edges['mitigations'].unique()
edges_malwarename_targets = list(zip(df_edges['malware_name'],df_edges['mitigations']))

#prepare graph data nodes and edges
G = nx.DiGraph()
graph_scale = 2
for node in unique_nodes_src:
    G.add_node(node,color='red')

for node in unique_nodes_dst2:
    G.add_node(node,title=node,color='black')
    
G.add_edges_from(edges_malwarename_targets,color='skyblue')

graph_node_degrees = dict(G.degree())
for nodename,nodedegree in graph_node_degrees.items():
    if nodedegree < 5:
        graph_scale = 3
    G.nodes[nodename]['weight'] = nodedegree * graph_scale
    G.nodes[nodename]['size'] = nodedegree * graph_scale

# draw the graph
nt = Network(notebook=True, directed=True, filter_menu=True, height="800px", width="100%")
# populates the nodes and edges data structures
nt.from_nx(G)
nt.show_buttons()
nt.toggle_physics(True)
nt.show('nx.html')

nx.html


In [20]:
# prepare data nodes & edges
df_edges = pipedream_modules_df[['malware_name','datasource','mitre_id','mitre_name']].dropna(subset=['datasource','mitre_id']).drop_duplicates()
unique_nodes_src = df_edges['malware_name'].unique()
unique_nodes_dst = df_edges['mitre_id'].unique()
unique_nodes_dst2 = df_edges['datasource'].unique()
edges_malwarename_targets = list(zip(df_edges['malware_name'],df_edges['mitre_id']))
edges_malwarename_targets2 = list(zip(df_edges['mitre_id'],df_edges['datasource']))

#prepare graph data nodes and edges
G = nx.DiGraph()
graph_scale = 2
for node in unique_nodes_src:
    G.add_node(node,color='red')

for node in unique_nodes_dst:
    node_title = df_edges.loc[df_edges['mitre_id'] == node, 'mitre_name'].values[0]
    G.add_node(node,title=node_title,color='skyblue')

for node in unique_nodes_dst2:
    G.add_node(node,title=node,color='black')
    
G.add_edges_from(edges_malwarename_targets,color='skyblue')
G.add_edges_from(edges_malwarename_targets2,color='skyblue')

graph_node_degrees = dict(G.degree())
for nodename,nodedegree in graph_node_degrees.items():
    if nodedegree < 5:
        graph_scale = 3
    G.nodes[nodename]['weight'] = nodedegree * graph_scale
    G.nodes[nodename]['size'] = nodedegree * graph_scale

# draw the graph
nt = Network(notebook=True, directed=True, filter_menu=True, height="800px", width="100%")
# populates the nodes and edges data structures
nt.from_nx(G)
nt.show_buttons()
nt.toggle_physics(True)
nt.show('nx.html')

nx.html


In [21]:
# prepare data nodes & edges
df_edges = pipedream_modules_df[['malware_name','datasource']].dropna(subset=['datasource']).drop_duplicates()
unique_nodes_src = df_edges['malware_name'].unique()
unique_nodes_dst2 = df_edges['datasource'].unique()
edges_malwarename_targets = list(zip(df_edges['malware_name'],df_edges['datasource']))

#prepare graph data nodes and edges
G = nx.DiGraph()
graph_scale = 2
for node in unique_nodes_src:
    G.add_node(node,color='red')

for node in unique_nodes_dst2:
    G.add_node(node,title=node,color='black')
    
G.add_edges_from(edges_malwarename_targets,color='skyblue')

graph_node_degrees = dict(G.degree())
for nodename,nodedegree in graph_node_degrees.items():
    if nodedegree < 5:
        graph_scale = 3
    G.nodes[nodename]['weight'] = nodedegree * graph_scale
    G.nodes[nodename]['size'] = nodedegree * graph_scale

# draw the graph
nt = Network(notebook=True, directed=True, filter_menu=True, height="800px", width="100%")
# populates the nodes and edges data structures
nt.from_nx(G)
nt.show_buttons()
nt.toggle_physics(True)
nt.show('nx.html')

nx.html


In [22]:
# tactic edges
df_edges = pipedream_modules_df[['mitre_id','mitre_tactic']].drop_duplicates()
edges_mid_mta = list(zip(df_edges['mitre_id'],df_edges['mitre_tactic']))

df_edges = pipedream_modules_df[['mitre_id','datasource']].drop_duplicates()
edges_mid_ds = list(zip(df_edges['mitre_id'],df_edges['datasource']))

df_edges = pipedream_modules_df[['mitre_id','datacomponent']].drop_duplicates()
edges_mid_dsdc = list(zip(df_edges['mitre_id'],df_edges['datacomponent']))

df_edges = pipedream_modules_df[['malware_name','mitre_id']].drop_duplicates()
edges_maln_mid = list(zip(df_edges['malware_name'],df_edges['mitre_id']))

df_edges = pipedream_modules_df[['malware_name','datasource']].drop_duplicates()
edges_maln_ds = list(zip(df_edges['malware_name'],df_edges['datasource']))

df_edges = pipedream_modules_df[['datasource','datacomponent']].drop_duplicates()
edges_ds_dc = list(zip(df_edges['datasource'],df_edges['datacomponent']))

df_edges = pipedream_modules_df[['targets','mitre_tactic']].drop_duplicates()
edges_targets_mta = list(zip(df_edges['targets'],df_edges['mitre_tactic']))

df_edges = pipedream_modules_df[['malware_name','targets']].dropna(subset=['targets']).drop_duplicates()
edges_maln_targets = list(zip(df_edges['malware_name'],df_edges['targets']))

df_edges = pipedream_modules_df[['targets','mitre_id']].drop_duplicates()
edges_mid_targets = list(zip(df_edges['mitre_id'],df_edges['targets']))

df_edges = pipedream_modules_df[['mitre_id','mitigations']].drop_duplicates()
edges_mid_mitigation = list(zip(df_edges['mitre_id'],df_edges['mitigations']))

df_edges = pipedream_modules_df[['malware_name','mitigations']].drop_duplicates()
edges_maln_mitigation = list(zip(df_edges['malware_name'],df_edges['mitigations']))

In [23]:
# Graph the malware->mitre_id
G = nx.DiGraph()

#unique_nodes = []
#for edge in edges_maln_mid:
    

#unique_nodes.extend(unique_mitre_id)
#unique_nodes.extend(unique_mitre_tactic)
#unique_nodes.extend(unique_datasource)
#unique_nodes.extend(unique_datacomponent)
#unique_nodes.extend(unique_malware)
#unique_nodes.extend(unique_platform)

# Add unique edges to the graph
for node in unique_malware:
    G.add_node(node,color='red')
#G.add_nodes_from(unique_nodes)

all_edges = []
#all_edges.extend(edges_mid_mta)
#all_edges.extend(edges_mid_ds)
#all_edges.extend(edges_mid_dsdc)
all_edges.extend(edges_maln_mid)
#all_edges.extend(edges_maln_ds)
#all_edges.extend(edges_ds_dc)
#all_edges.extend(edges_platform_mta)
#all_edges.extend(edges_maln_platform)
#all_edges.extend(edges_mid_platform)

G.add_edges_from(all_edges,color='black')
nt = Network(notebook=True, directed=True, filter_menu=True, height="1024px", width="100%")
# populates the nodes and edges data structures
nt.from_nx(G)
nt.show_buttons()
nt.toggle_physics(False)
nt.show('nx.html')

NameError: name 'unique_malware' is not defined

In [None]:
# Graph the mitre_id -> platform
G = nx.DiGraph()

all_edges = []
#all_edges.extend(edges_mid_mta)
#all_edges.extend(edges_mid_ds)
#all_edges.extend(edges_mid_dsdc)
#all_edges.extend(edges_maln_mid)
#all_edges.extend(edges_maln_ds)
#all_edges.extend(edges_ds_dc)
#all_edges.extend(edges_platform_mta)
#all_edges.extend(edges_maln_platform)
all_edges.extend(edges_mid_targets)

G.add_edges_from(all_edges)

nt = Network(notebook=True, directed=True, filter_menu=True)
# populates the nodes and edges data structures
nt.from_nx(G)
nt.show_buttons()
nt.toggle_physics(False)
nt.show('nx.html')

In [None]:
# Graph the mitre_id -> mitigations
G = nx.DiGraph()

all_edges = []
#all_edges.extend(edges_mid_mta)
#all_edges.extend(edges_mid_ds)
#all_edges.extend(edges_mid_dsdc)
#all_edges.extend(edges_maln_mid)
#all_edges.extend(edges_maln_ds)
#all_edges.extend(edges_ds_dc)
#all_edges.extend(edges_platform_mta)
#all_edges.extend(edges_maln_platform)
all_edges.extend(edges_maln_targets)

G.add_edges_from(all_edges)

nt = Network(notebook=True, directed=True, filter_menu=True)
# populates the nodes and edges data structures
nt.from_nx(G)
nt.show_buttons()
nt.toggle_physics(False)
nt.show('nx.html')

In [None]:
# Graph the mitre_id -> mitigations
G = nx.DiGraph()

all_edges = []
#all_edges.extend(edges_mid_mta)
#all_edges.extend(edges_mid_ds)
#all_edges.extend(edges_mid_dsdc)
#all_edges.extend(edges_maln_mid)
#all_edges.extend(edges_maln_ds)
#all_edges.extend(edges_ds_dc)
#all_edges.extend(edges_platform_mta)
#all_edges.extend(edges_maln_platform)
all_edges.extend(edges_maln_mitigation)

G.add_edges_from(all_edges)

nt = Network(notebook=True, directed=True, filter_menu=True)
# populates the nodes and edges data structures
nt.from_nx(G)
nt.show_buttons()
nt.show('nx.html')

In [None]:
G2 = nx.DiGraph()
paths = nx.all_simple_paths(G, source='badomen', target='Input/Output Server')
for path in map(nx.utils.pairwise, paths):
    for p in path:
        nx.add_path(G2, p)

In [None]:
nt = Network(notebook=True, directed=True, filter_menu=True)
# populates the nodes and edges data structures
nt.from_nx(G)
nt.show_buttons()
nt.show('nx.html')

# Statistical overview
Let's do some preliminary statistica overview with the data. We ask the following questions for example:

* Data sources
  * Which datasource occurs the most?
  * Which datacomponent occurs the most?
  * Which platform occurs the most?
* Which MITRE tactic occurs the most?
* Which MITRE tactic occurs per platform?
* Which malware occurs per MITRE tactic?
* Which malware occurs per platform?
* Which datasource occurs per platform?
* Which malware occurs per datasource?

In [None]:
# A reminder to know how our data looks like
pipedream_modules_df.head()

## Data sources

### Most frequent data source

In [None]:
t = pipedream_modules_df.drop(pipedream_modules_df[pipedream_modules_df['datasource'] == 'nan'].index)
res = t.groupby('datasource')['mitre_id'].agg('nunique')
res.sort_values(ascending=False)

### Most frequent data component

In [None]:
t = pipedream_modules_df.drop(pipedream_modules_df[pipedream_modules_df['datacomponent'] == 'nan'].index)
res = t.groupby('datacomponent')['mitre_id'].agg('nunique')
res.sort_values(ascending=False)

### Data sources and components grouped

In [None]:
t = pipedream_modules_df.drop(pipedream_modules_df[pipedream_modules_df['datasource'] == 'nan'].index)
t = t.drop(t[t['datacomponent'] == 'nan'].index)
res = t.groupby(['datasource','datacomponent'])['mitre_id'].agg('nunique')
print(res)

## Platform

### Techniques per module

In [None]:
pd.set_option('display.max_rows', None)
res = pipedream_modules_df.groupby(['mitre_id','malware_name'])['mitre_id'].agg('nunique')
print(res)

### Techniques per platform

In [None]:
res = pipedream_modules_df.groupby(['platform','mitre_id'])['mitre_id'].agg('nunique')
res.sort_values(ascending=False)

### Tactics per platform

In [None]:
pipedream_modules_df.groupby(['platform','mitre_tactic'])['mitre_id'].agg('nunique')

### Technique per tactic

In [None]:
res = pipedream_modules_df.groupby('mitre_tactic')['mitre_id'].agg('nunique')
res.sort_values(ascending=False)

In [None]:
pipedream_modules_df.groupby(['mitre_tactic','malware_name'])['mitre_id'].agg('nunique')

In [None]:
pipedream_modules_df.groupby(['platform','malware_name'])['mitre_id'].agg('nunique')

In [None]:
pipedream_modules_df.groupby(['platform','datasource'])['mitre_id'].agg('nunique')

In [None]:
t = pipedream_modules_df.drop(pipedream_modules_df[pipedream_modules_df['datasource'] == 'nan'].index)
t.groupby(['datasource','malware_name'])['mitre_id'].agg('nunique')