# ATT&CK® Data Sources Definition

---------------------------------

* **Author**: Jose Luis Rodriguez - [@Cyb3rPandaH](https://twitter.com/Cyb3rPandaH)
* **Organization**: [MITRE ATT&CK](https://attack.mitre.org/)
* **Blog Reference**: 
 - [Defining ATT&CK Data Sources, Part I: Enhancing the Current State](https://medium.com/mitre-attack/defining-attack-data-sources-part-i-4c39e581454f)
 - [Defining ATT&CK Data Sources, Part II: Operationalizing the Methodology](https://medium.com/mitre-attack/defining-attack-data-sources-part-ii-1fc98738ba5b)

## Goal & Scope

The present notebook is intended to provide basic examples of how can you merge **current data sources information** from **ATT&CK** and **new metadata** provided for **data source objects**. The examples provided consider (sub)techniques for the windows platform within the enterprise matrix.

## Requeriments

* [Python 3](https://www.python.org)
* [attackcti](https://pypi.org/project/attackcti/)
* [pandas](https://pandas.pydata.org/)
* [yaml](https://pyyaml.org/wiki/PyYAML)

## First: Gathering current ATT&CK data sources metadata

* Importing python libraries and modules

In [1]:
# Importing library to interact with up to date ATT&CK content available in STIX format via public TAXII server
from attackcti import attack_client

# Importing library to manipulate data
import pandas as pd
from pandas import json_normalize

* Getting (sub)techniques

In [2]:
# Instantiating attack_client class
lift = attack_client()

# Collecting all techniques (Revoked and not revoked) for windows platform within the enterprise matrix
attck = lift.get_techniques_by_platform(name = 'Windows', stix_format = False)

# Removing revoked techniques
attck = lift.remove_revoked(attck)

# Generating a dataframe with information collected
attck = json_normalize(attck)

# Selecting columns
attck = attck[['tactic','technique_id','technique','data_sources']]

# Showing information collected
attck.head()

Unnamed: 0,tactic,technique_id,technique,data_sources
0,"[defense-evasion, privilege-escalation]",T1484.002,Domain Trust Modification,"[Windows event logs, PowerShell logs, Azure ac..."
1,"[defense-evasion, privilege-escalation]",T1484.001,Group Policy Modification,[Windows event logs]
2,[credential-access],T1606.002,SAML Tokens,"[Windows event logs, Authentication logs]"
3,[credential-access],T1606.001,Web Cookies,"[Web logs, Authentication logs]"
4,[credential-access],T1606,Forge Web Credentials,"[Web logs, Authentication logs]"


* Splitting data_sources field

In [3]:
attck = attck.explode('data_sources').reset_index(drop=True)
attck.head()

Unnamed: 0,tactic,technique_id,technique,data_sources
0,"[defense-evasion, privilege-escalation]",T1484.002,Domain Trust Modification,Windows event logs
1,"[defense-evasion, privilege-escalation]",T1484.002,Domain Trust Modification,PowerShell logs
2,"[defense-evasion, privilege-escalation]",T1484.002,Domain Trust Modification,Azure activity logs
3,"[defense-evasion, privilege-escalation]",T1484.001,Group Policy Modification,Windows event logs
4,[credential-access],T1606.002,SAML Tokens,Windows event logs


## Second: Gathering new metadata of data source objects

* Importing python libraries

In [4]:
import yaml

* Getting metadata of data source objects

In [5]:
filePath = 'attack_data_sources.yaml'
yamlFile = open(filePath, 'r')
metadata = yaml.safe_load(yamlFile)
metadata = json_normalize(metadata)
yamlFile.close()
metadata.head()

Unnamed: 0,name,definition,collection_layers,platforms,contributors,data_components,references
0,Active Directory,Information associated with the Active Directo...,"[host, cloud]","[Windows, Azure AD]","[ATT&CK, CTID]","[{'name': 'active directory object creation', ...",[https://docs.microsoft.com/en-us/windows-serv...
1,Application Log,Logs from events in third-party applications (...,"[host, cloud]","[Windows, Linux, MacOS, IaaS, SaaS, Office 365]",[ATT&CK],"[{'name': 'application log content', 'type': '...",[https://confluence.atlassian.com/doc/working-...
2,Cloud Service,Information about a service available within a...,[cloud],"[IaaS, SaaS, Office 365, Azure AD]","[ATT&CK, CTID]","[{'name': 'cloud service metadata', 'type': 'i...","[https://aws.amazon.com, https://azure.microso..."
3,Cloud Storage,Information associated with data object storag...,[cloud],[IaaS],"[ATT&CK, CTID]","[{'name': 'cloud storage creation', 'type': 'a...","[https://aws.amazon.com/s3/, https://azure.mic..."
4,Command,Information about commands that can be used th...,[host],"[Windows, Linux, macOS, Network]","[Austin Clark, ATT&CK]","[{'name': 'command execution', 'type': 'activi...",[https://tools.ietf.org/id/draft-ietf-opsawg-t...


* Splitting data components content

In [6]:
# Splitting rows for data_components list
metadata = metadata.explode('data_components').reset_index(drop=True).rename(columns={'name':'data_sources'})

# Splitting columns for data_components dict
metadata = metadata['data_components'].apply(pd.Series).merge(metadata,left_index=True,right_index=True)\
.reset_index(drop=True).drop(['data_components'], axis = 1).rename(columns={'name':'data_components'})

# Splitting rows for relationships list
metadata = metadata.explode('relationships').reset_index(drop=True)

# Splitting columns for relationships dict
metadata = metadata['relationships'].apply(pd.Series).merge(metadata,left_index=True,right_index=True)\
.reset_index(drop=True).drop(['relationships'], axis = 1)

# Replacing Null values in target_data_element
metadata['target_data_element'].fillna(' ', inplace = True)

# Concatenating: Source + Relationship + Target
metadata['relationships'] = metadata['source_data_element']+'-'+metadata['relationship']+'-'+\
                            metadata['target_data_element']

# Deleting columns
dataSources = metadata.drop(['source_data_element','relationship','target_data_element'], axis = 1)

dataSources.head()

Unnamed: 0,data_components,description,type,data_sources,definition,collection_layers,platforms,contributors,references,relationships
0,active directory object creation,An active directory object was created.,activity,Active Directory,Information associated with the Active Directo...,"[host, cloud]","[Windows, Azure AD]","[ATT&CK, CTID]",[https://docs.microsoft.com/en-us/windows-serv...,user-created-ad object
1,active directory object deletion,An active directory object was deleted,activity,Active Directory,Information associated with the Active Directo...,"[host, cloud]","[Windows, Azure AD]","[ATT&CK, CTID]",[https://docs.microsoft.com/en-us/windows-serv...,user-deleted-ad object
2,active directory object modification,An active directory service or object was modi...,activity,Active Directory,Information associated with the Active Directo...,"[host, cloud]","[Windows, Azure AD]","[ATT&CK, CTID]",[https://docs.microsoft.com/en-us/windows-serv...,user-modified-ad object
3,active directory credential request,"A user requested active directory credentials,...",activity,Active Directory,Information associated with the Active Directo...,"[host, cloud]","[Windows, Azure AD]","[ATT&CK, CTID]",[https://docs.microsoft.com/en-us/windows-serv...,user-requested-ad credential
4,active directory object access,An active directory object was accessed.,activity,Active Directory,Information associated with the Active Directo...,"[host, cloud]","[Windows, Azure AD]","[ATT&CK, CTID]",[https://docs.microsoft.com/en-us/windows-serv...,user-accessed-ad object


## Third: Mapping (sub)techniques to data components & relationships

* Getting new data sources names

According to the propossed methodoly, data sources names shoud make reference to the main data element they are related. Because of this, some of the current names might change. Therefore, in order to relate **(sub)techniques** to **data components**, we need update data sources names. We are providing a yaml file with data sources which names have been changed. We will use this file to update the content of our data frame.

In [7]:
namesPath = 'new_data_sources_names.yaml'
yamlNamesFile = open(namesPath, 'r')
names = yaml.safe_load(yamlNamesFile)
yamlNamesFile.close()
names = pd.DataFrame(list(names[0].items()), columns=['data_sources', 'new_names'])
names.head()

Unnamed: 0,data_sources,new_names
0,Sensor health and status,Sensor Health
1,Access tokens,Web Credential
2,PowerShell logs,Script
3,API monitoring,Process
4,Application logs,Application Log


* Updating data sources names

In [8]:
attck = pd.merge(attck, names, on = 'data_sources', how = 'left')
namesUpdate = attck['new_names'].fillna(attck['data_sources'])
attck = attck.drop(columns = ['new_names']).assign(data_sources = namesUpdate)
attck = attck.drop_duplicates(subset = ['technique_id','data_sources'])\
        .sort_values(by = ['technique'])\
        .reset_index(drop = True)
attck.head()

Unnamed: 0,tactic,technique_id,technique,data_sources
0,"[credential-access, collection]",T1557.002,ARP Cache Poisoning,Network Traffic
1,[credential-access],T1558.004,AS-REP Roasting,Logon Session
2,[credential-access],T1558.004,AS-REP Roasting,Windows event logs
3,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,Windows Registry
4,"[privilege-escalation, defense-evasion]",T1548,Abuse Elevation Control Mechanism,File


* Mapping techniques to data components and relationships

In [9]:
techniques = pd.merge(attck, dataSources, on = 'data_sources', how = 'left')\
             [['tactic','technique_id','technique','data_sources','data_components','relationships']]

techniques.head()

Unnamed: 0,tactic,technique_id,technique,data_sources,data_components,relationships
0,"[credential-access, collection]",T1557.002,ARP Cache Poisoning,Network Traffic,network traffic flow,network traffic flow-originated from-ip
1,"[credential-access, collection]",T1557.002,ARP Cache Poisoning,Network Traffic,network traffic flow,network traffic flow-responded from-ip
2,"[credential-access, collection]",T1557.002,ARP Cache Poisoning,Network Traffic,network traffic flow,network traffic flow-originated from-port
3,"[credential-access, collection]",T1557.002,ARP Cache Poisoning,Network Traffic,network traffic flow,network traffic flow-responded from-port
4,"[credential-access, collection]",T1557.002,ARP Cache Poisoning,Network Traffic,network traffic flow,network traffic flow-identified-transport laye...


## Use case: T1112 Modify Registry

* What are the recommended data sources?

In [10]:
attck[attck['technique_id'] == 'T1112']

Unnamed: 0,tactic,technique_id,technique,data_sources
778,[defense-evasion],T1112,Modify Registry,Windows event logs
779,[defense-evasion],T1112,Modify Registry,Windows Registry
780,[defense-evasion],T1112,Modify Registry,Command
781,[defense-evasion],T1112,Modify Registry,Process
782,[defense-evasion],T1112,Modify Registry,File


* What are the available data components and relationships?

In [11]:
techniques[techniques['technique_id'] == 'T1112']

Unnamed: 0,tactic,technique_id,technique,data_sources,data_components,relationships
7441,[defense-evasion],T1112,Modify Registry,Windows event logs,,
7442,[defense-evasion],T1112,Modify Registry,Windows Registry,windows registry key creation,process-created-windows registry key
7443,[defense-evasion],T1112,Modify Registry,Windows Registry,windows registry key creation,process-created-windows registry key value
7444,[defense-evasion],T1112,Modify Registry,Windows Registry,windows registry key deletion,user-deleted-windows registry key
7445,[defense-evasion],T1112,Modify Registry,Windows Registry,windows registry key deletion,process-deleted-windows registry key
7446,[defense-evasion],T1112,Modify Registry,Windows Registry,windows registry key deletion,process-deleted-windows registry key value
7447,[defense-evasion],T1112,Modify Registry,Windows Registry,windows registry key modification,process-modified-windows registry key
7448,[defense-evasion],T1112,Modify Registry,Windows Registry,windows registry key modification,process-modified-windows registry key value
7449,[defense-evasion],T1112,Modify Registry,Windows Registry,windows registry key modification,user-modified-windows registry key
7450,[defense-evasion],T1112,Modify Registry,Windows Registry,windows registry key modification,user-modified-windows registry key value


# Thank you :)