# Sysmon and Netflow-like Coordinated Data Demonstration   

This demonstrates how Sysmon and Netflow-like entries can be matched along parent-child process calls, and Netflow activity.

This uses <a href="https://github.com/mitre/brawl-public-game-001">the MITRE BRAWL dataset</a>, which demonstrates both red and blue activity over a cyber-range.  

As with most cybersecurity datasets, most arrive in only one format (e.g., Sysmon or Netflow), when in actual practice these would be jointly.  For example, suspcious network activity in Netflow directs finer grained examination of corresponding events in Sysmon.

Similar to other datasets, this data is exclusively Sysmon.  We use these Event ID 3 (network connection) events to create fake Netflow records to illustrate correlations between Sysmon and Netflow activity.  This "Netflow-like" representation includes source and destination IPs and ports, but unlike regular Netflow does not the duration nor magnitude of bytes.  However, this should serve to illustrate the relationship between Netflow IPs and Sysmon entries.

Note while BRAWL does include Red and Blue activity, it does not incorporate "regular" user activity such as Microsoft Word.  It is also based on Windows 8, and does not include the full array of Sysmon event types.  It also uses Python-style "Snake-Case" for the field names, whereas Sysmon entries generally are stored in mixed case.  However, these should not detract illustrating  linkages between events.

The column 'ptimestamp' is a float representing the POSIX timestamp that mirrors @timestap, and is offered as a convenience.

In [27]:
import pandas as pd
from pathlib import Path
from tqdm import tqdm
from IPython.display import display, HTML

CSV_ROOT = Path("csvs")
sysmon_df = pd.read_csv(Path(CSV_ROOT, "brawl_sysmon.csv"))
netflow_df = pd.read_csv(Path(CSV_ROOT, "brawl_netflow_like.csv"))
fqdn2ip_df = pd.read_csv(Path(CSV_ROOT, "fqdn2ip.csv"))

  sysmon_df = pd.read_csv(Path(CSV_ROOT, "brawl_sysmon.csv"))


<p><b>NOTE</b>: Other users have noticed that the record number order conflicts with timestamps in Sysmon, due to multiple possible causes such as delays in log processing or arrival order of events to the log processor.  To account for this, we sort by the event POSIX timestamps.</p>

<p>This occurs in the BRAWL dataset, and also in other Sysmon datasets such as Attack Data.</p>

<p>If you are working "right-of-the-bang" and are looking at logs detailing the potential incident or activity, then sorting by timestamp should be fine.  If you are looking at working with the record entries ``as-is'' due to a more real-time need, then you will have to disable sorting and work with the temporally disordered entries directly.</p>

In [28]:
sysmon_df.sort_values(by=['ptimestamp'], inplace=True)
netflow_df.sort_values(by=['ptimestamp'], inplace=True)

# Matches between Parent-Child Sysmon Events

We demonstrate how process and parent host and process identifiers are aligned between two events in Sysmon.  Here we have a call to taskeng.exe (event 1), which spawns a child process that runs GoogleUpdate.exe (event 2). 

The correspondences between events 1 and 2 are given in the following table:

|Entry 1 Field|Entry 2 Field|
|--|--|
|host|host|
|pid|parent_pid|
|process_guid|parent_process_guid|
|image_path|parent_image_path|

Note that process GUIDs are hashes combining the hostname, process creation time, and process ID.  

In [23]:
display(HTML("""<p>We first sample a call to taskeng.exe and use this as our event 1.</p>"""))
evt1_df = sysmon_df[sysmon_df.event_code == 1]
nbstat_df = evt1_df[evt1_df.command_line.str.contains("taskeng")]

row = nbstat_df.iloc[0]
fqdn, pid, ppid, process_guid = row[['fqdn', 'pid', 'ppid', 'process_guid']]
display(nbstat_df[['@timestamp', 'fqdn', 'pid', 'process_guid', 'command_line']][0:1])

Unnamed: 0,@timestamp,fqdn,pid,process_guid,command_line
11460,2017-05-01T19:09:29.854Z,minahan-pc.brawlco.com,996,{6C70CE0A-87E9-5907-0000-00108C901100},taskeng.exe {8AA7A318-060A-4A07-8CF6-D4C11333F...


In [24]:
display(HTML("""<p>We now search for matches of Sysmon events that mark them as children of event 1.</p>
             <p>Per the above table, this includes matching parent PID and host agaiunst the originating process PID and host, or by matching the parent process GUID against the one from the originating process.</p>"""))
m1 = sysmon_df.ppid == pid 
m2 = sysmon_df.fqdn == fqdn
m3 = sysmon_df.parent_process_guid == process_guid
matched_df = sysmon_df[m1 & m2 & m3]
display(matched_df[['@timestamp', 'fqdn', 'pid', 'ppid', 'process_guid', 'parent_process_guid', 'command_line', 'exe']])
display(HTML("<p>We find task.eng spawned a call to run GoogleUpdate.exe, likely as a scheduled task.</p>"))

Unnamed: 0,@timestamp,fqdn,pid,ppid,process_guid,parent_process_guid,command_line,exe
11461,2017-05-01T19:09:29.916Z,minahan-pc.brawlco.com,1804,996.0,{6C70CE0A-87E9-5907-0000-00101D921100},{6C70CE0A-87E9-5907-0000-00108C901100},"""C:\Program Files (x86)\Google\Update\GoogleUp...",GoogleUpdate.exe


# Sysmon to Netflow Matches

We now demonstrate activity matching between our Netflow-like representation and Sysmon.  We note that standard Netflow does not include hostnames, nor do they name ports.  Matching must be done by the IP address.

In [25]:
display(HTML("""<p>WMIC is another one that has events 1 and 5 associated with 3, likely because it's not a service that is already spawned at start
We start from this netflow-like entry, wanting to identify what was occurring on the machine that originated this traffic.</p>"""))
example_ptimestamp = 1493690508.25
netflow_entry = netflow_df[netflow_df.ptimestamp == example_ptimestamp].iloc[0]
print("Netflow entry of interest")
print(netflow_entry)

src_ip = netflow_entry.src_ip
dest_ip = netflow_entry.dest_ip

print(f"src_ip={src_ip}, dest_ip={dest_ip}")

display(HTML("<p>Use our lookup to match which hosts the source and destination IPs are associated with.</p>"))
ipv4_addr = netflow_entry.src_ip
tgt_hostname = fqdn2ip_df[fqdn2ip_df.ipv4 == src_ip].iloc[0].fqdn
tgt_ptimestamp = netflow_entry.ptimestamp
print(f"Associated source hostname={tgt_hostname}, ip={ipv4_addr}, POSIX timestamp={tgt_ptimestamp}")

dest_hostname = fqdn2ip_df[fqdn2ip_df.ipv4 == dest_ip].iloc[0].fqdn
print(f"Destination hostname={dest_hostname}")

Netflow entry of interest
@timestamp    2017-05-01T19:01:48.250Z
utc_time       2017-05-01 19:01:48.250
pid                               2720
transport                          tcp
src_ipv6                         False
src_ip                     10.3.15.216
src_port                         50153
dest_ipv6                        False
dest_ip                    10.3.15.212
dest_port                        49155
ppid                               NaN
ptimestamp               1493690508.25
Name: 902, dtype: object
src_ip=10.3.15.216, dest_ip=10.3.15.212


Associated source hostname=sounder-pc.brawlco.com, ip=10.3.15.216, POSIX timestamp=1493690508.25
Destination hostname=escue-pc.brawlco.com


In [26]:

window = 0.1
m1 = (sysmon_df.ptimestamp >= (tgt_ptimestamp - window)) & (sysmon_df.ptimestamp <= (tgt_ptimestamp + window)) 
m2 = sysmon_df.host == tgt_hostname
matched_df = sysmon_df[ m1 & m2][['@timestamp', 'event_code', 'host', 'command_line', 'exe', 'process_guid']]
display(HTML("""
<p>We now look for Sysmon entries within a one second bound of this event's timestamp.  Note this may technically include
events in the past, as variance can be introduced into logged timestamps due to issues such as order of arrival to the 
logger, etc...  To account for this, we search for events within a window around the timestamp of interest, matched against
the targeted hostname.</p>
        """))
display(matched_df)

display(HTML("""<p>We observe the following executables making outbound calls that meet our criteria.  We note that
lsass.exe is the Windows authentication and security service, while svchost.exe manages scheduled tasks.
WMIC.exe is a commandline utility for management tasks.</p>"""))
 
wmic_df = matched_df[matched_df.exe == "WMIC.exe"]
tgt_process_guid = wmic_df.iloc[0].process_guid

display(wmic_df)

display(HTML("""<p>Following the WMIC.exe's process GUID, we find a subsequent call by wmic to escue-pc.brawlco.com, which matches the dest_ip
identified in our netflow-like entry.</p>
 
<p><b>Note</b> this is an event code 1 (process creation), instead of event_code 3 (network connection), and if following
the transcript literally it occurs after the event call.  Due to logging setup and issues such as time of arrival
to the logger, there may be some variance in the timestamps.</p>"""))
display(sysmon_df[(sysmon_df.process_guid == tgt_process_guid)][['@timestamp', 'event_code', 'host', 'command_line', 'exe', 'process_guid']])

Unnamed: 0,@timestamp,event_code,host,command_line,exe,process_guid
1563,2017-05-01T19:01:48.245Z,3,sounder-pc.brawlco.com,,svchost.exe,{6C70CE0A-7DA7-5907-0000-0010A0970000}
1564,2017-05-01T19:01:48.250Z,3,sounder-pc.brawlco.com,,WMIC.exe,{6C70CE0A-861C-5907-0000-0010B7671300}
1565,2017-05-01T19:01:48.252Z,3,sounder-pc.brawlco.com,,lsass.exe,{6C70CE0A-7DA2-5907-0000-0010364D0000}
1566,2017-05-01T19:01:48.262Z,3,sounder-pc.brawlco.com,,lsass.exe,{6C70CE0A-7DA2-5907-0000-0010364D0000}
1567,2017-05-01T19:01:48.264Z,3,sounder-pc.brawlco.com,,lsass.exe,{6C70CE0A-7DA2-5907-0000-0010364D0000}
1568,2017-05-01T19:01:48.283Z,3,sounder-pc.brawlco.com,,lsass.exe,{6C70CE0A-7DA2-5907-0000-0010364D0000}


Unnamed: 0,@timestamp,event_code,host,command_line,exe,process_guid
1564,2017-05-01T19:01:48.250Z,3,sounder-pc.brawlco.com,,WMIC.exe,{6C70CE0A-861C-5907-0000-0010B7671300}


Unnamed: 0,@timestamp,event_code,host,command_line,exe,process_guid
1564,2017-05-01T19:01:48.250Z,3,sounder-pc.brawlco.com,,WMIC.exe,{6C70CE0A-861C-5907-0000-0010B7671300}
1549,2017-05-01T19:01:48.611Z,1,sounder-pc.brawlco.com,"""wmic"" /node:""escue-pc.brawlco.com"" /user:""bra...",WMIC.exe,{6C70CE0A-861C-5907-0000-0010B7671300}
1551,2017-05-01T19:01:49.017Z,5,sounder-pc.brawlco.com,,WMIC.exe,{6C70CE0A-861C-5907-0000-0010B7671300}


# Notes

- Not every net connection event (event code 3) has a corresponding process create (event code 1) or terminate (event code 5).  This may occur because the process making the call was a system service that was started before Sysmon.  
- Time of arrival to the logging machine can impact 