# **HELK: Basic Sysmon ProcessCreate Graph Query**
## Goals:
* Confirm Jupyter can talk to Spark & Graphframes
* Confirm Spark & Graphframes can pull data from ES
* Create a graphframe from sysmon Index
  * Creating vertices and edges dataframes
* Running a basic query using GraphFrames Motifs

## Check the Spark Context via the variable spark

In [1]:
spark

## Import Graphframes & SQL Functions

In [2]:
from graphframes import *

In [11]:
from pyspark.sql.functions import *

## Set a Custom Spark Session

In [12]:
spark = SparkSession \
    .builder \
    .appName("HELK") \
    .config("es.read.field.as.array.include", "tags") \
    .config("es.nodes","172.18.0.2:9200") \
    .getOrCreate()

## Read data from the HELK (Elasticsearch-Sysmon Index)

In [13]:
df = spark.read.format("org.elasticsearch.spark.sql").load("logs-endpoint-winevent-sysmon-*/doc")

## Print DataFrame Schema

In [14]:
df.printSchema()

root
 |-- @meta: struct (nullable = true)
 |    |-- sysmon: struct (nullable = true)
 |    |    |-- timestamp: timestamp (nullable = true)
 |-- @timestamp: timestamp (nullable = true)
 |-- @version: string (nullable = true)
 |-- action: string (nullable = true)
 |-- beat: struct (nullable = true)
 |    |-- hostname: string (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- version: string (nullable = true)
 |-- command_line: string (nullable = true)
 |-- dst_host: string (nullable = true)
 |-- dst_ip: string (nullable = true)
 |-- dst_isipv6: string (nullable = true)
 |-- dst_port_name: string (nullable = true)
 |-- dst_port_number: integer (nullable = true)
 |-- event_id: integer (nullable = true)
 |-- file_creation_time: string (nullable = true)
 |-- file_name: string (nullable = true)
 |-- geoip: struct (nullable = true)
 |    |-- city_name: string (nullable = true)
 |    |-- continent_code: string (nullable = true)
 |    |-- country_code2: string (nullable = true)

## Create Vertices & Edges Dataframes

In [28]:
v = df.withColumn("id", df.process_guid).select("id","user_name","host_name","process_parent_name","process_name","action")
v = v.filter(v.action == "processcreate")

In [30]:
v.show(3,truncate=False)

+------------------------------------+---------+---------------+-------------------+----------------------+-------------+
|id                                  |user_name|host_name      |process_parent_name|process_name          |action       |
+------------------------------------+---------+---------------+-------------------+----------------------+-------------+
|A98268C1-7717-5A99-0000-001044AED200|wardog   |DESKTOP-29DJI4T|svchost.exe        |backgroundTaskHost.exe|processcreate|
|A98268C1-7725-5A99-0000-0010042CD400|wardog   |DESKTOP-29DJI4T|svchost.exe        |RuntimeBroker.exe     |processcreate|
|A98268C1-772B-5A99-0000-001054EDD400|SYSTEM   |DESKTOP-29DJI4T|services.exe       |TrustedInstaller.exe  |processcreate|
+------------------------------------+---------+---------------+-------------------+----------------------+-------------+
only showing top 3 rows



In [31]:
e = df.filter(df.action == "processcreate").selectExpr("process_parent_guid as src","process_guid as dst").withColumn("relationship", lit("spawned"))

In [33]:
e.show(3,truncate=False)

+------------------------------------+------------------------------------+------------+
|src                                 |dst                                 |relationship|
+------------------------------------+------------------------------------+------------+
|A98268C1-7715-5A99-0000-00109295D200|A98268C1-772A-5A99-0000-0010A58CD400|spawned     |
|A98268C1-A584-5A97-0000-0010B9240100|A98268C1-7715-5A99-0000-00101E8DD200|spawned     |
|A98268C1-770F-5A99-0000-0010A712D200|A98268C1-7715-5A99-0000-0010EE99D200|spawned     |
+------------------------------------+------------------------------------+------------+
only showing top 3 rows



## Create a Graph (Vertices & Edges DataFrames)

In [34]:
g = GraphFrame(v, e)

## Look for (Process A spawning Process B AND Process B Spawning Process C) 

In [35]:
motifs = g.find("(a)-[]->(b);(b)-[]->(c)")

In [43]:
motifs.select("a.process_parent_name","a.process_name","b.process_parent_name","b.process_name","c.process_parent_name","c.process_name").show(10,truncate=False)

+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+
|process_parent_name|process_name       |process_parent_name|process_name       |process_parent_name|process_name       |
+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+
|svchost.exe        |DeviceCensus.exe   |DeviceCensus.exe   |DeviceCensus.exe   |DeviceCensus.exe   |conhost.exe        |
|explorer.exe       |powershell.exe     |powershell.exe     |cmd.exe            |cmd.exe            |powershell.exe     |
|svchost.exe        |CompatTelRunner.exe|CompatTelRunner.exe|CompatTelRunner.exe|CompatTelRunner.exe|CompatTelRunner.exe|
+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+

