# ATT&CK® Data Sources Definition

---------------------------------

* **Author**: Jose Luis Rodriguez - [@Cyb3rPandaH](https://twitter.com/Cyb3rPandaH)
* **Organization**: [MITRE ATT&CK](https://attack.mitre.org/)
* **Blog Reference**: 
 - [Defining ATT&CK Data Sources, Part I: Enhancing the Current State](https://medium.com/mitre-attack/defining-attack-data-sources-part-i-4c39e581454f)
 - [Defining ATT&CK Data Sources, Part II: Operationalizing the Methodology](https://medium.com/mitre-attack/defining-attack-data-sources-part-ii-1fc98738ba5b)

## Goal & Scope

The present notebook is intended to provide basic examples of how can you merge **current data sources information** from **ATT&CK** and **new metadata** provided for **data source objects**. The examples provided consider (sub)techniques for the windows platform within the enterprise matrix.

## Requeriments

* [Python 3](https://www.python.org)
* [attackcti](https://pypi.org/project/attackcti/)
* [pandas](https://pandas.pydata.org/)
* [yaml](https://pyyaml.org/wiki/PyYAML)

## First: Gathering current ATT&CK data sources metadata

* Importing python libraries and modules

In [1]:
# Importing library to interact with up to date ATT&CK content available in STIX format via public TAXII server
from attackcti import attack_client

# Importing library to manipulate data
import pandas as pd
from pandas import json_normalize

* Getting (sub)techniques

In [2]:
# Instantiating attack_client class
lift = attack_client()

# Collecting all techniques (Revoked and not revoked) for windows platform within the enterprise matrix
attck = lift.get_techniques_by_platform(name = 'Windows', stix_format = False)

# Removing revoked techniques
attck = lift.remove_revoked(attck)

# Generating a dataframe with information collected
attck = json_normalize(attck)

# Selecting columns
attck = attck[['tactic','technique_id','technique','data_sources']]

# Showing information collected
attck.head()

Unnamed: 0,tactic,technique_id,technique,data_sources
0,"[defense-evasion, persistence, command-and-con...",T1205.001,Port Knocking,"[Netflow/Enclave netflow, Packet capture]"
1,[defense-evasion],T1564.006,Run Virtual Instance,"[Packet capture, Host network interface, Windo..."
2,[defense-evasion],T1564.005,Hidden File System,"[File monitoring, Windows Registry]"
3,"[persistence, privilege-escalation, defense-ev...",T1574.012,COR_PROFILER,"[Windows Registry, File monitoring, Process mo..."
4,[defense-evasion],T1480.001,Environmental Keying,[Process monitoring]


* Splitting data_sources field

In [3]:
attck = attck.explode('data_sources').reset_index(drop=True)
attck.head()

Unnamed: 0,tactic,technique_id,technique,data_sources
0,"[defense-evasion, persistence, command-and-con...",T1205.001,Port Knocking,Netflow/Enclave netflow
1,"[defense-evasion, persistence, command-and-con...",T1205.001,Port Knocking,Packet capture
2,[defense-evasion],T1564.006,Run Virtual Instance,Packet capture
3,[defense-evasion],T1564.006,Run Virtual Instance,Host network interface
4,[defense-evasion],T1564.006,Run Virtual Instance,Windows Registry


## Second: Gathering new metadata of data source objects

* Importing python libraries

In [4]:
import yaml

* Getting metadata of data source objects

In [5]:
filePath = 'attack_data_sources.yaml'
yamlFile = open(filePath, 'r')
metadata = yaml.safe_load(yamlFile)
metadata = json_normalize(metadata)
yamlFile.close()
metadata.head()

Unnamed: 0,name,definition,collection_layers,platforms,contributors,data_components,references
0,Service,Information about software programs that run i...,[host],[Windows],[ATT&CK],"[{'name': 'service creation', 'type': 'activit...",[https://docs.microsoft.com/en-us/dotnet/frame...
1,Module,"Information about portable executable files, s...",[host],[Windows],[ATT&CK],"[{'name': 'module load', 'type': 'activity', '...",[https://docs.microsoft.com/en-us/windows/win3...
2,WMI object,Information about objects from the system clas...,[host],[Windows],[ATT&CK],"[{'name': 'wmi object context', 'type': 'infor...",[https://docs.microsoft.com/en-us/windows/win3...
3,File,Information about file objects that represent ...,[host],[Windows],[ATT&CK],"[{'name': 'file creation', 'type': 'activity',...",[https://docs.microsoft.com/en-us/windows/win3...
4,Named pipe,Information about mechanisms that allow inter-...,[host],[Windows],[ATT&CK],"[{'name': 'named pipe creation', 'relationship...",[https://docs.microsoft.com/en-us/windows/win3...


* Splitting data components content

In [6]:
# Splitting rows for data_components list
metadata = metadata.explode('data_components').reset_index(drop=True).rename(columns={'name':'data_sources'})

# Splitting columns for data_components dict
metadata = metadata['data_components'].apply(pd.Series).merge(metadata,left_index=True,right_index=True)\
.reset_index(drop=True).drop(['data_components'], axis = 1).rename(columns={'name':'data_components'})

# Splitting rows for relationships list
metadata = metadata.explode('relationships').reset_index(drop=True)

# Splitting columns for relationships dict
metadata = metadata['relationships'].apply(pd.Series).merge(metadata,left_index=True,right_index=True)\
.reset_index(drop=True).drop(['relationships'], axis = 1)

# Replacing Null values in target_data_element
metadata['target_data_element'].fillna(' ', inplace = True)

# Concatenating: Source + Relationship + Target
metadata['relationships'] = metadata['source_data_element']+'-'+metadata['relationship']+'-'+\
                            metadata['target_data_element']

# Deleting columns
dataSources = metadata.drop(['source_data_element','relationship','target_data_element'], axis = 1)

dataSources.head()

Unnamed: 0,data_components,type,data_sources,definition,collection_layers,platforms,contributors,references,relationships
0,service creation,activity,Service,Information about software programs that run i...,[host],[Windows],[ATT&CK],[https://docs.microsoft.com/en-us/dotnet/frame...,user-created-service
1,module load,activity,Module,"Information about portable executable files, s...",[host],[Windows],[ATT&CK],[https://docs.microsoft.com/en-us/windows/win3...,process-loaded-dll
2,module load,activity,Module,"Information about portable executable files, s...",[host],[Windows],[ATT&CK],[https://docs.microsoft.com/en-us/windows/win3...,process-loaded-executable
3,wmi object context,information,WMI object,Information about objects from the system clas...,[host],[Windows],[ATT&CK],[https://docs.microsoft.com/en-us/windows/win3...,wmi subscription-created-
4,wmi object creation,activity,WMI object,Information about objects from the system clas...,[host],[Windows],[ATT&CK],[https://docs.microsoft.com/en-us/windows/win3...,user-created-wmi filter


## Third: Mapping (sub)techniques to data components & relationships

* Getting new data sources names

According to the propossed methodoly, data sources names shoud make reference to the main data element they are related. Because of this, some of the current names might change. Therefore, in order to relate **(sub)techniques** to **data components**, we need update data sources names. We are providing a yaml file with data sources which names have been changed. We will use this file to update the content of our data frame.

In [7]:
namesPath = 'new_data_sources_names.yaml'
yamlNamesFile = open(namesPath, 'r')
names = yaml.safe_load(yamlNamesFile)
yamlNamesFile.close()
names = pd.DataFrame(list(names[0].items()), columns=['data_sources', 'new_names'])
names.head()

Unnamed: 0,data_sources,new_names
0,Sensor health and status,Sensor log
1,Access tokens,Access token
2,PowerShell logs,Powershell log
3,API monitoring,API
4,Application logs,Application log


* Updating data sources names

In [8]:
attck = pd.merge(attck, names, on = 'data_sources', how = 'left')
namesUpdate = attck['new_names'].fillna(attck['data_sources'])
attck = attck.drop(columns = ['new_names']).assign(data_sources = namesUpdate)
attck = attck.drop_duplicates(subset = ['technique_id','data_sources'])\
        .sort_values(by = ['technique'])\
        .reset_index(drop = True)
attck.head()

Unnamed: 0,tactic,technique_id,technique,data_sources
0,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,API
1,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,Process
2,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,File
3,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,Windows registry
4,"[defense-evasion, privilege-escalation]",T1134,Access Token Manipulation,Process


* Mapping techniques to data components and relationships

In [9]:
techniques = pd.merge(attck, dataSources, on = 'data_sources', how = 'left')\
             [['tactic','technique_id','technique','data_sources','data_components','relationships']]

techniques.head()

Unnamed: 0,tactic,technique_id,technique,data_sources,data_components,relationships
0,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,API,,
1,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,Process,process creation,user-created-process
2,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,Process,process creation,process-created-process
3,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,Process,process modification,process-wrote to-process
4,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,Process,process access,process-accessed-process


## Use case: T1112 Modify Registry

* What are the recommended data sources?

In [10]:
attck[attck['technique_id'] == 'T1112']

Unnamed: 0,tactic,technique_id,technique,data_sources
635,[defense-evasion],T1112,Modify Registry,Process
636,[defense-evasion],T1112,Modify Registry,Windows registry
637,[defense-evasion],T1112,Modify Registry,File
638,[defense-evasion],T1112,Modify Registry,Windows event logs


* What are the available data components and relationships?

In [11]:
techniques[techniques['technique_id'] == 'T1112']

Unnamed: 0,tactic,technique_id,technique,data_sources,data_components,relationships
3901,[defense-evasion],T1112,Modify Registry,Process,process creation,user-created-process
3902,[defense-evasion],T1112,Modify Registry,Process,process creation,process-created-process
3903,[defense-evasion],T1112,Modify Registry,Process,process modification,process-wrote to-process
3904,[defense-evasion],T1112,Modify Registry,Process,process access,process-accessed-process
3905,[defense-evasion],T1112,Modify Registry,Process,process access,process-requested access-process
3906,[defense-evasion],T1112,Modify Registry,Process,process network connection,process-connected to-port
3907,[defense-evasion],T1112,Modify Registry,Process,process network connection,process-connected to-ip
3908,[defense-evasion],T1112,Modify Registry,Process,process network connection,process-connected to-host
3909,[defense-evasion],T1112,Modify Registry,Process,process network connection,process-connected from-port
3910,[defense-evasion],T1112,Modify Registry,Process,process network connection,process-connected from-ip


# Thank you :)