# ATT&CK® Data Sources Definition

---------------------------------

* **Author**: Jose Luis Rodrigues - [@Cyb3rPandaH](https://twitter.com/Cyb3rPandaH)
* **Organization**: [MITRE ATT&CK](https://attack.mitre.org/)
* **Blog Reference**: 
 - [Defining ATT&CK Data Sources, Part I: Enhancing the Current State](https://medium.com/mitre-attack/defining-attack-data-sources-part-i-4c39e581454f) - [Defining ATT&CK Data Sources, Part II: Operationalizing the Methodology](https://medium.com/mitre-attack/defining-attack-data-sources-part-ii-1fc98738ba5b)

## Goal & Scope

The present notebook is intended to provide basic examples of how can you merge **current data sources information** from **ATT&CK** and **new metadata** provided for **data source objects**. The examples provided consider (sub)techniques for the windows platform within the enterprise matrix.

## Requeriments

* [Python 3](https://www.python.org)

* [attackcti](https://pypi.org/project/attackcti/)

* [pandas](https://pandas.pydata.org/)

* [yaml](https://pyyaml.org/wiki/PyYAML)

## First: Gathering current ATT&CK data sources metadata

* Importing python libraries and modules

In [1]:
# Importing library to interact with up to date ATT&CK content available in STIX format via public TAXII server
from attackcti import attack_client

# Importing library to manipulate data
import pandas as pd
from pandas import json_normalize

# Do not truncate Pandas output
pd.set_option('display.max_colwidth', None)
pd.set_option('max_rows', None)

* Getting (sub)techniques

In [2]:
# Instantiating attack_client class
lift = attack_client()

# Collecting all techniques (Revoked and not revoked) for windows platform within the enterprise matrix
attck = lift.get_techniques_by_platform(name = 'Windows', stix_format = False)

# Removing revoked techniques
attck = lift.remove_revoked(attck)

# Generating a dataframe with information collected
attck = json_normalize(attck)

# Selecting columns
attck = attck[['tactic','technique_id','technique','data_sources']]

# Showing information collected
attck.head(3)

Unnamed: 0,tactic,technique_id,technique,data_sources
0,"[defense-evasion, persistence, command-and-control]",T1205.001,Port Knocking,"[Netflow/Enclave netflow, Packet capture]"
1,[defense-evasion],T1564.006,Run Virtual Instance,"[Packet capture, Host network interface, Windows Registry, File monitoring, Process monitoring, Process command-line parameters]"
2,[defense-evasion],T1564.005,Hidden File System,"[File monitoring, Windows Registry]"


* Splitting data_sources field

In [3]:
attck = attck['data_sources'].apply(pd.Series)\
.merge(attck, left_index = True, right_index = True)\
.drop(["data_sources"], axis = 1)\
.melt(id_vars = ['tactic','technique_id','technique'], value_name = "data_sources")\
.drop("variable", axis = 1)\
.dropna(subset=['data_sources'])
attck.head()

Unnamed: 0,tactic,technique_id,technique,data_sources
0,"[defense-evasion, persistence, command-and-control]",T1205.001,Port Knocking,Netflow/Enclave netflow
1,[defense-evasion],T1564.006,Run Virtual Instance,Packet capture
2,[defense-evasion],T1564.005,Hidden File System,File monitoring
3,"[persistence, privilege-escalation, defense-evasion]",T1574.012,COR_PROFILER,Windows Registry
4,[defense-evasion],T1480.001,Environmental Keying,Process monitoring


## Second: Gathering new metadata of data source objects

* Importing python libraries

In [4]:
import yaml

* Getting metadata of data source objects

In [5]:
filePath = 'attack_data_sources.yaml'
yamlFile = open(filePath, 'r')
metadata = yaml.safe_load(yamlFile)
metadata = json_normalize(metadata)
yamlFile.close()
metadata.head(3)

Unnamed: 0,name,definition,collection_layers,platforms,contributors,data_components,references
0,Service,Information about software programs that run in the background and typically start with the operating system.,[host],[Windows],[Jose Rodriguez @Cyb3rPandaH],"[{'name': 'service creation', 'type': 'activity', 'relationships': [{'source_data_element': 'user', 'relationship': 'created', 'target_data_element': 'service'}]}]","[https://docs.microsoft.com/en-us/dotnet/framework/windows-services/introduction-to-windows-service-applications, https://www.linux.com/news/introduction-services-runlevels-and-rcd-scripts/]"
1,Module,"Information about portable executable files, such as a dll or an executable, consisting of one or more classes and interfaces.",[host],[Windows],[Jose Rodriguez @Cyb3rPandaH],"[{'name': 'module load', 'type': 'activity', 'relationships': [{'source_data_element': 'process', 'relationship': 'loaded', 'target_data_element': 'dll'}, {'source_data_element': 'process', 'relationship': 'loaded', 'target_data_element': 'executable'}]}]","[https://docs.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibrarya, https://docs.microsoft.com/en-us/dotnet/api/system.reflection.module?view=netcore-3.1]"
2,WMI object,"Information about objects from the system classes, such as filters and consumers, that support Windows Management Instrumentation activitites.",[host],[Windows],[Jose Rodriguez @Cyb3rPandaH],"[{'name': 'wmi object context', 'type': 'information', 'relationships': [{'source_data_element': 'wmi subscription', 'relationship': 'created', 'target_data_element': None}]}, {'name': 'wmi object creation', 'type': 'activity', 'relationships': [{'source_data_element': 'user', 'relationship': 'created', 'target_data_element': 'wmi filter'}, {'source_data_element': 'user', 'relationship': 'created', 'target_data_element': 'wmi consumer'}, {'source_data_element': 'user', 'relationship': 'created', 'target_data_element': 'wmi subscription'}]}, {'name': 'wmi object deletion', 'type': 'activity', 'relationships': [{'source_data_element': 'user', 'relationship': 'deleted', 'target_data_element': 'wmi filter'}, {'source_data_element': 'user', 'relationship': 'deleted', 'target_data_element': 'wmi consumer'}, {'source_data_element': 'user', 'relationship': 'deleted', 'target_data_element': 'wmi subscription'}]}]",[https://docs.microsoft.com/en-us/windows/win32/wmisdk/wmi-system-classes]


* Mapping data sources to data components

In [6]:
dataComponentsNamesDict = {}
for i in range(0,len(metadata.index)):
    key = metadata.iloc[i,0]
    if key not in dataComponentsNamesDict:
            dataComponentsNamesDict[key] = []
    for dc in metadata['data_components'][i]:
            value = dc['name']
            dataComponentsNamesDict[key].append(value)

dataSources = pd.DataFrame(list(dataComponentsNamesDict.items()), columns=['data_sources', 'data_components'])
dataSources.head(3)

Unnamed: 0,data_sources,data_components
0,Service,[service creation]
1,Module,[module load]
2,WMI object,"[wmi object context, wmi object creation, wmi object deletion]"


* Mapping data components to relationships

   a) Dictionary format

In [7]:
dataComponentsDict = {}
for ds in metadata['data_components']:
    for dc in ds:
        key = dc['name']
        for rel in dc['relationships']:
            if key not in dataComponentsDict:
                dataComponentsDict[key] = []
            dataComponentsDict[key].append(rel)

dataComponents = pd.DataFrame(list(dataComponentsDict.items()), columns=['data_components', 'relationships'])
dataComponents.head(3)

Unnamed: 0,data_components,relationships
0,service creation,"[{'source_data_element': 'user', 'relationship': 'created', 'target_data_element': 'service'}]"
1,module load,"[{'source_data_element': 'process', 'relationship': 'loaded', 'target_data_element': 'dll'}, {'source_data_element': 'process', 'relationship': 'loaded', 'target_data_element': 'executable'}]"
2,wmi object context,"[{'source_data_element': 'wmi subscription', 'relationship': 'created', 'target_data_element': None}]"


b) Friendly format

In [8]:
dataComponentsDict = {}
for ds in metadata['data_components']:
    for dc in ds:
        key = dc['name']
        for rel in dc['relationships']:
            target = ' ' if rel['target_data_element'] is None else rel['target_data_element']
            x = str(rel['source_data_element']) + '-' + str(rel['relationship']) + '-' + target
            if key not in dataComponentsDict:
                dataComponentsDict[key] = []
            dataComponentsDict[key].append(x)

dataComponents = pd.DataFrame(list(dataComponentsDict.items()), columns=['data_components', 'relationships'])
dataComponents.head(3)

Unnamed: 0,data_components,relationships
0,service creation,[user-created-service]
1,module load,"[process-loaded-dll, process-loaded-executable]"
2,wmi object context,[wmi subscription-created- ]


## Third: Mapping data components to relationships

* Getting new data sources names

According to the propossed methodoly, data sources names shoud make reference to the main data element they are related. Because of this, some of the current names might change. Therefore, in order to relate **(sub)techniques** to **data components**, we need update data sources names. We are providing a yaml file with data sources which names have been changed. We will use this file to update the content of our data frame.

In [9]:
namesPath = 'new_data_sources_names.yaml'
yamlNamesFile = open(namesPath, 'r')
names = yaml.safe_load(yamlNamesFile)
yamlNamesFile.close()
names = pd.DataFrame(list(names[0].items()), columns=['data_sources', 'new_names'])
names.head(3)

Unnamed: 0,data_sources,new_names
0,Sensor health and status,Sensor log
1,Access tokens,Access token
2,PowerShell logs,Powershell log


* Updating data sources names

In [10]:
attck = pd.merge(attck, names, on = 'data_sources', how = 'left')
namesUpdate = attck['new_names'].fillna(attck['data_sources'])
attck = attck.drop(columns = ['new_names']).assign(data_sources = namesUpdate)
attck = attck.drop_duplicates(subset = ['technique_id','data_sources'])\
        .sort_values(by = ['technique'])\
        .reset_index(drop = True)
attck.head()

Unnamed: 0,tactic,technique_id,technique,data_sources
0,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,File
1,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,API
2,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,Windows registry
3,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,Process
4,"[defense-evasion, privilege-escalation]",T1134,Access Token Manipulation,Access token


* Mapping techniques to sub data sources

In [11]:
mappingDataComponents = pd.merge(attck, dataSources, on = 'data_sources', how = 'left')

mappingDataComponents = mappingDataComponents['data_components'].apply(pd.Series)\
    .merge(mappingDataComponents, left_index = True, right_index = True)\
    .drop(["data_components"], axis = 1)\
    .melt(id_vars = ['tactic','technique_id','technique','data_sources'], value_name = "data_components")\
    .drop("variable", axis = 1)\
    .drop_duplicates(subset = ['technique_id','data_sources','data_components'])\
    .dropna(subset=['data_components'])\
    .sort_values(by = ['technique'])\
    .reset_index(drop = True)

mappingDataComponents.head()

mappingDataComponents.head()

Unnamed: 0,tactic,technique_id,technique,data_sources,data_components
0,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,File,file creation
1,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,Process,process modification
2,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,Process,process network connection
3,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,Windows registry,Windows registry key access
4,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,File,file access


* Mapping techniques to relationships

In [12]:
mappingRelationships = pd.merge(mappingDataComponents, dataComponents, on = 'data_components', how = 'left')

mappingRelationships = mappingRelationships['relationships'].apply(pd.Series)\
    .merge(mappingRelationships, left_index = True, right_index = True)\
    .drop(["relationships"], axis = 1)\
    .melt(id_vars = ['tactic','technique_id','technique','data_sources','data_components'], value_name = "relationships")\
    .drop("variable", axis = 1)\
    .drop_duplicates(subset = ['technique_id','data_sources','data_components','relationships'])\
    .dropna(subset=['relationships'])\
    .sort_values(by = ['technique','data_sources','data_components'])\
    .reset_index(drop = True)

mappingRelationships.head()

Unnamed: 0,tactic,technique_id,technique,data_sources,data_components,relationships
0,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,File,file access,user-accessed-file
1,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,File,file access,user-requested access-file
2,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,File,file access,process-requested access-file
3,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,File,file creation,process-created-file
4,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,File,file deletion,process-deleted-file


## Use case: T1112 Modify Registry

* What are the recommended data sources?

In [13]:
attck[attck['technique_id'] == 'T1112']

Unnamed: 0,tactic,technique_id,technique,data_sources
633,[defense-evasion],T1112,Modify Registry,File
634,[defense-evasion],T1112,Modify Registry,Windows event logs
635,[defense-evasion],T1112,Modify Registry,Windows registry
636,[defense-evasion],T1112,Modify Registry,Process


* What are the available data components and relationships?

In [14]:
mappingRelationships[mappingRelationships['technique_id'] == 'T1112']

Unnamed: 0,tactic,technique_id,technique,data_sources,data_components,relationships
3443,[defense-evasion],T1112,Modify Registry,File,file access,user-accessed-file
3444,[defense-evasion],T1112,Modify Registry,File,file access,user-requested access-file
3445,[defense-evasion],T1112,Modify Registry,File,file access,process-requested access-file
3446,[defense-evasion],T1112,Modify Registry,File,file creation,process-created-file
3447,[defense-evasion],T1112,Modify Registry,File,file deletion,process-deleted-file
3448,[defense-evasion],T1112,Modify Registry,Process,process access,process-accessed-process
3449,[defense-evasion],T1112,Modify Registry,Process,process access,process-requested access-process
3450,[defense-evasion],T1112,Modify Registry,Process,process creation,user-created-process
3451,[defense-evasion],T1112,Modify Registry,Process,process creation,process-created-process
3452,[defense-evasion],T1112,Modify Registry,Process,process modification,process-wrote to-process


# Thank you :)