# Agents: platform-specific processes

The ACME4 dataset is captured by running the Wintap agent on the hosts that make up a lab network in an AWS instance.
This is a specific, nonstandard implementation of a Windows-based IT network.
While all networks have their own _computational noise_ profile,
it is useful to identify the platform-specific components of this noise.
It enables someone to look at what noise without these components would look like,
and perhaps compare with other network noises with less known bias.

This notebook labels the processes that participate in the Wintap data collection,
as well as those that generate the telemetry collected by AWS to operate its systems.

In [1]:
%load_ext autoreload
%load_ext dotenv
#%load_ext quak
%load_ext sql

In [2]:
%autoreload 1
%aimport acme4_explore

In [3]:
%dotenv

In [4]:
import acme4_explore
import logging as lg
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
from pathlib import Path
import re
from tqdm.auto import tqdm, trange

In [5]:
lg.basicConfig(**acme4_explore.logging_config())
LOG = lg.getLogger("notebook")

In [6]:
db = acme4_explore.connect_db()
%sql db --alias duckdb
%config SqlMagic.displaycon=False
%config SqlMagic.autopandas=True

The Wintap agent is named `wintap.exe`. Let's find it in the process trees of processes.

In [7]:
%%sql wintap_collect <<
select pid_hash, process_name, ptree, ptree_list
from process_path
where lower(ptree) like '%wintap%'

In [8]:
wintap_collect

Unnamed: 0,pid_hash,process_name,ptree,ptree_list
0,3D7BFC8EF077DA8AF0FE0324540B7CDD,wintapsvcmgr.exe,=wintapsvcmgr.exe->svchost.exe->services.exe->...,"[3D7BFC8EF077DA8AF0FE0324540B7CDD, 975D050838F..."
1,44790E1F4FF242FC55503C081C9D41BB,wintapsvcmgr.exe,=wintapsvcmgr.exe->svchost.exe->services.exe->...,"[44790E1F4FF242FC55503C081C9D41BB, 975D050838F..."
2,4EA4AFDC1B59F6AC3BEBC56EAF8E6451,wintapsvcmgr.exe,=wintapsvcmgr.exe->svchost.exe->services.exe->...,"[4EA4AFDC1B59F6AC3BEBC56EAF8E6451, 975D050838F..."
3,5159625391EC89C697E6B642AD97FA4D,wintapsvcmgr.exe,=wintapsvcmgr.exe->svchost.exe->services.exe->...,"[5159625391EC89C697E6B642AD97FA4D, 975D050838F..."
4,5EBD5D7C84597A400BE4C19C3ACA9939,wintapsvcmgr.exe,=wintapsvcmgr.exe->svchost.exe->services.exe->...,"[5EBD5D7C84597A400BE4C19C3ACA9939, 975D050838F..."
...,...,...,...,...
247207,FED017C622988FE9FB11EB77BFF0A68B,wintapsvcmgr.exe,=wintapsvcmgr.exe->unknown->ntoskrnl.exe,"[FED017C622988FE9FB11EB77BFF0A68B, 6DA116CF301..."
247208,78B405D7FCFA713E3A57058D88EF92F2,wintapsvcmgr.exe,=wintapsvcmgr.exe->unknown->ntoskrnl.exe,"[78B405D7FCFA713E3A57058D88EF92F2, 6DA116CF301..."
247209,952B780A4873FB450266ECBC5526E661,wintapsvcmgr.exe,=wintapsvcmgr.exe->unknown->ntoskrnl.exe,"[952B780A4873FB450266ECBC5526E661, 6DA116CF301..."
247210,A464747A4581978C18C062AAD7EA550E,wintapsvcmgr.exe,=wintapsvcmgr.exe->unknown->ntoskrnl.exe,"[A464747A4581978C18C062AAD7EA550E, 6DA116CF301..."


This is a significant subspace of the whole process set (composed of 1.7 million instances),
and especially, more than a third of the [set of lineaged processes](process-metadata-irregularities.ipynb#proper-lineage).

AWS-related processes are harder to trick out, let's check whether the `amazon` substring appears anywhere.

In [9]:
%%sql
select pid_hash, process_name, ptree, ptree_list
from process_path
where lower(ptree) like '%amazon%'

Unnamed: 0,pid_hash,process_name,ptree,ptree_list
0,0017CF8579D1B0A203230C4445486C6A,wmic.exe,=wmic.exe->ssm-agent-worker.exe->amazon-ssm-ag...,"[0017CF8579D1B0A203230C4445486C6A, 509D73E59EA..."
1,0024EDDD1E7D68B26787944479F7B056,wmic.exe,=wmic.exe->ssm-agent-worker.exe->amazon-ssm-ag...,"[0024EDDD1E7D68B26787944479F7B056, 509D73E59EA..."
2,00298EB1C6BC14848306E4670ED2B9E4,wmic.exe,=wmic.exe->ssm-agent-worker.exe->amazon-ssm-ag...,"[00298EB1C6BC14848306E4670ED2B9E4, 509D73E59EA..."
3,0048AAA3DB863E49FDFF040806EFFE4D,wmic.exe,=wmic.exe->ssm-agent-worker.exe->amazon-ssm-ag...,"[0048AAA3DB863E49FDFF040806EFFE4D, 509D73E59EA..."
4,0076E393DC3412AF8DCE8394B54687EB,wmic.exe,=wmic.exe->ssm-agent-worker.exe->amazon-ssm-ag...,"[0076E393DC3412AF8DCE8394B54687EB, 509D73E59EA..."
...,...,...,...,...
140267,FE04A5B16EEA76C6473BA6BF381FFA33,wmic.exe,=wmic.exe->ssm-agent-worker.exe->amazon-ssm-ag...,"[FE04A5B16EEA76C6473BA6BF381FFA33, 3DAA7D3B6E0..."
140268,FE4940116DD5B3581DF834014939900F,wmic.exe,=wmic.exe->ssm-agent-worker.exe->amazon-ssm-ag...,"[FE4940116DD5B3581DF834014939900F, 3DAA7D3B6E0..."
140269,FEE1B13CC7F37821DBE318A202559018,wmic.exe,=wmic.exe->ssm-agent-worker.exe->amazon-ssm-ag...,"[FEE1B13CC7F37821DBE318A202559018, 3DAA7D3B6E0..."
140270,FF5D32B16D3A43667F608C855FB799DC,wmic.exe,=wmic.exe->ssm-agent-worker.exe->amazon-ssm-ag...,"[FF5D32B16D3A43667F608C855FB799DC, 3DAA7D3B6E0..."


Poring over this table suggests that AWS telemetry gathering is performed under the augur of a process named
`amazon-ssm-agent.exe`.
This spawns off other processes carrying the `ssm` substring.
Let's look for that substring more widely.

In [10]:
%%sql aws_ssm_collect <<
select pid_hash, process_name, ptree, ptree_list
from process_path
where lower(ptree) like '%ssm-%'

In [11]:
aws_ssm_collect

Unnamed: 0,pid_hash,process_name,ptree,ptree_list
0,0017CF8579D1B0A203230C4445486C6A,wmic.exe,=wmic.exe->ssm-agent-worker.exe->amazon-ssm-ag...,"[0017CF8579D1B0A203230C4445486C6A, 509D73E59EA..."
1,0024EDDD1E7D68B26787944479F7B056,wmic.exe,=wmic.exe->ssm-agent-worker.exe->amazon-ssm-ag...,"[0024EDDD1E7D68B26787944479F7B056, 509D73E59EA..."
2,00298EB1C6BC14848306E4670ED2B9E4,wmic.exe,=wmic.exe->ssm-agent-worker.exe->amazon-ssm-ag...,"[00298EB1C6BC14848306E4670ED2B9E4, 509D73E59EA..."
3,0048AAA3DB863E49FDFF040806EFFE4D,wmic.exe,=wmic.exe->ssm-agent-worker.exe->amazon-ssm-ag...,"[0048AAA3DB863E49FDFF040806EFFE4D, 509D73E59EA..."
4,0076E393DC3412AF8DCE8394B54687EB,wmic.exe,=wmic.exe->ssm-agent-worker.exe->amazon-ssm-ag...,"[0076E393DC3412AF8DCE8394B54687EB, 509D73E59EA..."
...,...,...,...,...
142077,D95AD57389810648D2C8787ACB195BEE,wmic.exe,=wmic.exe->ssm-document-worker.exe->unknown->n...,"[D95AD57389810648D2C8787ACB195BEE, CB58BEB1B6D..."
142078,DB70C7436C99744CFD01B5CD32E2FBD0,powershell.exe,=powershell.exe->ssm-document-worker.exe->unkn...,"[DB70C7436C99744CFD01B5CD32E2FBD0, 4455B49608C..."
142079,DDF7F2DD171D2A416296CC3046200BBF,wmic.exe,=wmic.exe->ssm-document-worker.exe->unknown->n...,"[DDF7F2DD171D2A416296CC3046200BBF, 35267721427..."
142080,E6DB7F07934675EA84C08C732589D254,powershell.exe,=powershell.exe->ssm-document-worker.exe->unkn...,"[E6DB7F07934675EA84C08C732589D254, 3A86716D030..."


This dredges up a few more process instances.
What do these look like?

In [12]:
%%sql
select *
from aws_ssm_collect
where ptree not like '%amazon%'

Unnamed: 0,pid_hash,process_name,ptree,ptree_list
0,A2D75A76959A7CDC9D327FC1F0EEBC9C,powershell.exe,=powershell.exe->ssm-document-worker.exe->unkn...,"[A2D75A76959A7CDC9D327FC1F0EEBC9C, 78B17E54BD8..."
1,B6E3F986ECE69E793A337FFEB7155662,wmic.exe,=wmic.exe->ssm-document-worker.exe->unknown->n...,"[B6E3F986ECE69E793A337FFEB7155662, 78B17E54BD8..."
2,79C830208C507BC785E181D755C45C58,powershell.exe,=powershell.exe->ssm-document-worker.exe->unkn...,"[79C830208C507BC785E181D755C45C58, 78B17E54BD8..."
3,95C9FCEEA366D28501DB01680DF76096,powershell.exe,=powershell.exe->ssm-document-worker.exe->unkn...,"[95C9FCEEA366D28501DB01680DF76096, 78B17E54BD8..."
4,6C6F718C03AE70E4F8F1F405E3EC41AF,powershell.exe,=powershell.exe->ssm-document-worker.exe->unkn...,"[6C6F718C03AE70E4F8F1F405E3EC41AF, 78B17E54BD8..."
...,...,...,...,...
1805,D95AD57389810648D2C8787ACB195BEE,wmic.exe,=wmic.exe->ssm-document-worker.exe->unknown->n...,"[D95AD57389810648D2C8787ACB195BEE, CB58BEB1B6D..."
1806,DB70C7436C99744CFD01B5CD32E2FBD0,powershell.exe,=powershell.exe->ssm-document-worker.exe->unkn...,"[DB70C7436C99744CFD01B5CD32E2FBD0, 4455B49608C..."
1807,DDF7F2DD171D2A416296CC3046200BBF,wmic.exe,=wmic.exe->ssm-document-worker.exe->unknown->n...,"[DDF7F2DD171D2A416296CC3046200BBF, 35267721427..."
1808,E6DB7F07934675EA84C08C732589D254,powershell.exe,=powershell.exe->ssm-document-worker.exe->unkn...,"[E6DB7F07934675EA84C08C732589D254, 3A86716D030..."


Ah, so these instances where `amazon-ssm-agent.exe` disappears from the lineage seem to occur mostly when
there is partial lineage information loss from Wintap,
and processes get replaced with `unknown`.

Let's preserve this labeling of agent processes.

In [13]:
%%sql
copy (
    select *, 'wintap'
    from wintap_collect
    union
    select *, 'aws'
    from aws_ssm_collect
)
to '{{acme4_explore.dir_work()}}/agents_platform.parquet'
(format parquet, compression 'zstd')

Unnamed: 0,Count
0,389294


With that in mind, what proportion of processes is generated by platform-specific data collect rather than actual IT activity?

In [14]:
num_total, = db.sql("select count(*) from process_nondud").fetchone()
num_wintap, = db.sql("select count(*) from process_path inner join process_nondud using (pid_hash) where ptree like '%wintap%'").fetchone()
num_aws, = db.sql("select count(*) from process_path inner join process_nondud using (pid_hash) where ptree like '%ssm-%'").fetchone()
(num_wintap + num_aws) / num_total

0.2210257432241002

The volume of agent chatter is thus about 22% of the process volume.
Thus, only 4 out of 5 processes being run on this network is actual IT or malicious activity.