 # Cybersecurity Graph Analytics with xGT on HPE Superdome Flex
  
 ## Lateral Movement
 --- 
*Lateral movement* is a cyberattack pattern that describes how an adversary leverages a single foothold to compromise other systems within a network. Identifying and stopping lateral movement is an important step in controlling the damage from a breach, and also plays a role in forensic analysis of a cyberatt ack, helping to identify its source and reconstruct what happened.  

## Lateral Movement (Kerberos - Pass The Ticket)
---
*Pass the ticket (PTT)* is a method of authenticating to a system using Kerberos tickets without having access to an account's password. Kerberos authentication can be used as the first step to lateral movement to a remote system.

| Malware | Vulnerabilities |  Windows Event Id |  Windows Port | Process/Executables | DLL/SharedLibrary | Other Parameters |
|---------|-----------------|-------------------|-------------- |---------------------|-------------------|------------------|
| Empire  |                 |    4768 , 4769    | 135,445       |  lsass.exe          |                   | Auth Pkg = ‘Kerberos’


### Attack Steps
---
*Step1*:  Forge 4768(TGT REQUEST) using tools like mimikatz. This prevents generation of 4768 for TGT REQUEST.<br>
*Step2*: Make a legitimate Service ticket request. This generates 4769.<br>
*Step3*: Look for 4769 logged for which 4768 has not been logged for a given time duration.<br>
*Step4*: Check for LM in shortlisted Devices in Step 3. <br>

In this notebook, we show how Cybersecurity Graph Analytics can be implemented with xGT on HPE Superdome Flex for  large data.

Mitre Attack Catalog https://attack.mitre.org/

Dataset : https://datasets.trovares.com/cyber/LANL/ind	


In [101]:
import xgt
import os
import pandas

from platform import python_version
print (python_version())

3.7.4


In [102]:
if os.environ.get('https_proxy'):
 del os.environ['https_proxy']
if os.environ.get('http_proxy'):
 del os.environ['http_proxy']

In [103]:
conn=xgt.Connection()
conn.server_version

'1.3.0'

In [104]:
try:
  devices = conn.get_vertex_frame('Devices')
except xgt.XgtNameError:
  devices = conn.create_vertex_frame(
      name='Devices',
      schema=[['device', xgt.TEXT]],
      key='device')
devices

<xgt.graph.VertexFrame at 0x7f7cffb208d0>

In [105]:
try:
  netflow = conn.get_edge_frame('Netflow')
except xgt.XgtNameError:
  netflow = conn.create_edge_frame(
      name='Netflow',
      schema=[['epoch_time', xgt.INT],
              ['duration', xgt.INT],
              ['src_device', xgt.TEXT],
              ['dst_device', xgt.TEXT],
              ['protocol', xgt.INT],
              ['src_port', xgt.INT],
              ['dst_port', xgt.INT],
              ['src_packets', xgt.INT],
              ['dst_packets', xgt.INT],
              ['src_bytes', xgt.INT],
              ['dst_bytes', xgt.INT]],
      source=devices,
      target=devices,
      source_key='src_device',
      target_key='dst_device')
netflow

<xgt.graph.EdgeFrame at 0x7f7d07eb3490>

In [106]:
try:
  auth_events = conn.get_edge_frame('AuthEvents')
except xgt.XgtNameError:
  auth_events = conn.create_edge_frame(
           name='AuthEvents',
           schema = [['epoch_time',xgt.INT],
                     ['event_id',xgt.INT],
                     ['log_host',xgt.TEXT],
                     ['logon_type',xgt.INT],
                     ['logon_type_description',xgt.TEXT],
                     ['user_name',xgt.TEXT],
                     ['domain_name',xgt.TEXT],
                     ['logon_id',xgt.INT],
                     ['subject_user_name',xgt.TEXT],
                     ['subject_domain_name',xgt.TEXT],
                     ['subject_logon_id',xgt.TEXT],
                     ['status',xgt.TEXT],
                     ['src',xgt.TEXT],
                     ['service_name',xgt.TEXT],
                     ['destination',xgt.TEXT],
                     ['authentication_package',xgt.TEXT],
                     ['failure_reason',xgt.TEXT],
                     ['process_name',xgt.TEXT],
                     ['process_id',xgt.INT],
                     ['parent_process_name',xgt.TEXT],
                     ['parent_process_id',xgt.INT]],
            source = 'Devices',
            target = 'Devices',
            source_key = 'src',
            target_key = 'destination')
auth_events

<xgt.graph.EdgeFrame at 0x7f7d89490a90>

In [107]:
# Utility to print the sizes of data currently in xGT
def print_data_summary():
  print('Devices (vertices): {:,}'.format(devices.num_vertices))
  print('Netflow (edges): {:,}'.format(netflow.num_edges))
  print('Authentication events (edges): {:,}'.format(auth_events.num_edges))
  print('Total (edges): {:,}'.format(
      netflow.num_edges + auth_events.num_edges))
    
print_data_summary()

Devices (vertices): 0
Netflow (edges): 0
Authentication events (edges): 0
Total (edges): 0


In [108]:
%%time

# Load the AuthEvents event data:
if auth_events.num_edges == 0:
    urls = ["xgtd://nvme_data3/data_2v/wls_day-85_2v.csv"]
    auth_events.load(urls)
    print_data_summary()

Devices (vertices): 12,288
Netflow (edges): 0
Authentication events (edges): 47,790,045
Total (edges): 47,790,045
CPU times: user 242 ms, sys: 130 ms, total: 373 ms
Wall time: 38.3 s


In [109]:
%%time

# Load the netflow data:
if netflow.num_edges == 0:
    urls = ["xgtd://nvme_data5/data_nf/nf_day-85.csv"]
    netflow.load(urls)
    print_data_summary()

Devices (vertices): 137,812
Netflow (edges): 235,661,328
Authentication events (edges): 47,790,045
Total (edges): 283,451,373
CPU times: user 228 ms, sys: 199 ms, total: 427 ms
Wall time: 1min 32s


In [110]:
# Generate a new edge frame for holding only the "Kerberos - Pass The Ticket" Flow edges
import time
query_start_time = time.time()

conn.drop_frame('PTTFlow')
ptt_flow = conn.create_edge_frame(
            name='PTTFlow',
            schema=netflow.schema,
            source=devices,
            target=devices,
            source_key='src_device',
            target_key='dst_device')
ptt_flow

<xgt.graph.EdgeFrame at 0x7f7d88061550>

In [111]:
# Utility function to launch queries and show job number:
#   The job number may be useful if a long-running job needs
#   to be canceled.

def run_query(query, table_name = "answers", drop_answer_table=True, show_query=False):
    if drop_answer_table:
        conn.drop_frame(table_name)
    if query[-1] != '\n':
        query += '\n'
    query += 'INTO {}'.format(table_name)
    if show_query:
        print("Query:\n" + query)
    job = conn.schedule_job(query)
    print("Launched job {}".format(job.id))
    conn.wait_for_job(job)
    table = conn.get_table_frame(table_name)
    return table

In [112]:
%%time

#filtering with port no. 135 & 445

PTT_LMFlow_Query = """
MATCH (v0:Devices)-[edge:Netflow]->(v1:Devices) 
WHERE edge.dst_port=135 OR edge.dst_port=445 
CREATE (v0)-[e:PTTFlow 
             {epoch_time : edge.epoch_time, 
              duration : edge.duration, 
              protocol : edge.protocol, 
              src_port : edge.src_port, 
              dst_port : edge.dst_port}]->(v1) 
RETURN count(*)
"""
data = run_query(PTT_LMFlow_Query)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

Launched job 1189
Number of answers: 5,210,584
CPU times: user 102 ms, sys: 16.1 ms, total: 118 ms
Wall time: 21.1 s


In [113]:
# retrieve the answer rows to the client in a pandas frame
data1 = data.get_data_pandas()
data1[0:10]

Unnamed: 0,count(*)
0,5210584


In [114]:
%%time

# retrieve the answer rows to the client in a pandas frame
print("Print PTT_Flow Data")

Query_PTTFlow_Data = """
MATCH (v0:Devices)-[edge:PTTFlow]->(v1:Devices) 
RETURN edge.epoch_time,
       edge.duration,
       edge.src_device,
       edge.dst_device,
       edge.protocol,
       edge.src_port,
       edge.dst_port
"""
data = run_query(Query_PTTFlow_Data)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

Print PTT_Flow Data
Launched job 1319
Number of answers: 7,293,972
CPU times: user 23.5 s, sys: 1.33 s, total: 24.8 s
Wall time: 45.8 s


In [115]:
data1 = data.get_data_pandas()
data1[0:10]

Unnamed: 0,edge_epoch_time,edge_duration,edge_src_device,edge_dst_device,edge_protocol,edge_src_port,edge_dst_port
0,7293972,0,Comp623258,Comp193491,6,74883,445
1,7290109,10,Comp905013,ActiveDirectory,6,25884,135
2,7315129,10,Comp843020,ActiveDirectory,6,91590,445
3,7263537,0,Comp040151,ActiveDirectory,6,45525,445
4,7311726,1,Comp073202,Comp217504,6,38883,445
5,7337522,11,Comp683943,ActiveDirectory,6,73889,445
6,7296756,1,Comp623258,Comp299069,6,15280,445
7,7287745,1,Comp623258,Comp942350,6,68234,445
8,7304611,11,EnterpriseAppServer,Comp704126,6,71237,445
9,7328275,1,Comp844043,Comp858248,6,8828,445


In [116]:
#Count of PTTFlow Edges Created
data=None
if ptt_flow.num_edges == 0:
    print("PTTFlow is empty")
elif ptt_flow.num_edges <= 1000:
    data = ptt_flow.get_data_pandas()
else:
    data = 'PTTflow (edges): {:,}'.format(ptt_flow.num_edges)
data

'PTTflow (edges): 5,210,584'

In [117]:
#Create TGT_REQ_Events Edge Frames 
try:
  tgt_req_events = conn.get_edge_frame('TGT_REQ_Events')
except xgt.XgtNameError:
  tgt_req_events = conn.create_edge_frame(
           name='TGT_REQ_Events',
           schema = [['epoch_time',xgt.INT],
                     ['event_id',xgt.INT],
                     ['src',xgt.TEXT],
                     ['destination',xgt.TEXT],
                     ['is_attack',xgt.BOOLEAN]],
            source = 'Devices',
            target = 'Devices',
            source_key = 'src',
            target_key = 'destination')
tgt_req_events

<xgt.graph.EdgeFrame at 0x7f7cff176490>

In [118]:
%%time

#Polulate TGT_REQ_Events EdgeFrames

START_TIME = 7257600
END_TIME   = 7258600

TGT_REQ_Query = """
MATCH (n1:Devices)-[r:AuthEvents]->(n2:Devices)
WHERE r.event_id = 4768
AND r.epoch_time >= 7257600
AND r.epoch_time <= 7258600
CREATE (n1)-[r1:TGT_REQ_Events 
             {epoch_time:r.epoch_time, 
              event_id:r.event_id,
              is_attack:TRUE}]->(n2) 
 RETURN count(*)
"""

print(TGT_REQ_Query)

data = run_query(TGT_REQ_Query)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))


MATCH (n1:Devices)-[r:AuthEvents]->(n2:Devices)
WHERE r.event_id = 4768
AND r.epoch_time >= 7257600
AND r.epoch_time <= 7258600
CREATE (n1)-[r1:TGT_REQ_Events 
             {epoch_time:r.epoch_time, 
              event_id:r.event_id,
              is_attack:TRUE}]->(n2) 
 RETURN count(*)

Launched job 1384
Number of answers: 11,354
CPU times: user 45 ms, sys: 13.3 ms, total: 58.2 ms
Wall time: 662 ms


In [119]:
%%time

# retrieve the answer rows to the client in a pandas frame
print("Print TGT_REQ Data")

Query_TGT_REQ_Data = """
MATCH (n1:Devices)-[edge:TGT_REQ_Events]->(n2:Devices) 
RETURN edge.epoch_time,
       edge.event_id,
       edge.src,
       edge.destination,
       edge.is_attack
"""
data = run_query(Query_TGT_REQ_Data)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

data1 = data.get_data_pandas()
data1[0:10]

Print TGT_REQ Data
Launched job 1412
Number of answers: 7,258,434
CPU times: user 135 ms, sys: 9.13 ms, total: 144 ms
Wall time: 366 ms


Unnamed: 0,edge_epoch_time,edge_event_id,edge_src,edge_destination,edge_is_attack
0,7258434,4768,Comp916004,ActiveDirectory,True
1,7258540,4768,Comp520997,ActiveDirectory,True
2,7258485,4768,Comp755918,ActiveDirectory,True
3,7257676,4768,Comp916004,ActiveDirectory,True
4,7258072,4768,Comp520997,ActiveDirectory,True
5,7257684,4768,Comp755918,ActiveDirectory,True
6,7258427,4768,Comp738736,ActiveDirectory,True
7,7257821,4768,Comp755918,ActiveDirectory,True
8,7258549,4768,Comp221976,ActiveDirectory,True
9,7258091,4768,Comp520997,ActiveDirectory,True


In [120]:
#Create SERVICE_REQ_Events Edge Frames 

try:
  service_req_events = conn.get_edge_frame('SERVICE_REQ_Events')
except xgt.XgtNameError:
  service_req_events = conn.create_edge_frame(
           name='SERVICE_REQ_Events',
           schema = [['epoch_time',xgt.INT],
                     ['event_id',xgt.INT],
                     ['src',xgt.TEXT],
                     ['destination',xgt.TEXT],
                     ['is_attack',xgt.BOOLEAN]],
            source = 'Devices',
            target = 'Devices',
            source_key = 'src',
            target_key = 'destination')
service_req_events

<xgt.graph.EdgeFrame at 0x7f7cf43a87d0>

In [121]:
%%time

#Polulate SERVICE_REQ_Events EdgeFrames
START_TIME = 7257600
END_TIME   = 7258600

SERVICE_REQ_Query = """
MATCH (n1:Devices)-[r:AuthEvents]->(n2:Devices) 
WHERE r.event_id = 4769
AND r.epoch_time >= 7257600
AND r.epoch_time <= 7258600
CREATE (n1)-[r1:SERVICE_REQ_Events 
             {epoch_time:r.epoch_time, 
              event_id:r.event_id,
              is_attack:TRUE}]->(n2) 
RETURN count(*)
"""
data = run_query(SERVICE_REQ_Query)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

Launched job 1420
Number of answers: 21,813
CPU times: user 30 ms, sys: 5.24 ms, total: 35.3 ms
Wall time: 654 ms


In [122]:
%%time

# retrieve the answer rows to the client in a pandas frame
print("Print SERVICE_REQ Data")

Query_SERVICE_REQ_Data = """
MATCH (n1:Devices)-[edge:SERVICE_REQ_Events]->(n2:Devices) 
RETURN edge.epoch_time,
       edge.event_id,
       edge.src,
       edge.destination,
       edge.is_attack
"""
data = run_query(Query_SERVICE_REQ_Data)
#print('Number of answers: {:,}'.format(data.get_data()[0][0]))
data1 = data.get_data_pandas()
data1[0:10]

Print SERVICE_REQ Data
Launched job 1452
CPU times: user 156 ms, sys: 6 ms, total: 162 ms
Wall time: 352 ms


Unnamed: 0,edge_epoch_time,edge_event_id,edge_src,edge_destination,edge_is_attack
0,7258224,4769,Comp916004,ActiveDirectory,True
1,7257601,4769,Comp369682,ActiveDirectory,True
2,7257601,4769,ActiveDirectory,ActiveDirectory,True
3,7257894,4769,Comp916004,ActiveDirectory,True
4,7257914,4769,Comp505747,ActiveDirectory,True
5,7258201,4769,Comp916004,ActiveDirectory,True
6,7258073,4769,Comp755918,ActiveDirectory,True
7,7257978,4769,Comp493156,ActiveDirectory,True
8,7257601,4769,ActiveDirectory,ActiveDirectory,True
9,7258195,4769,Comp415540,ActiveDirectory,True


In [123]:
q="""
MATCH (n1:Devices)-[r1:SERVICE_REQ_Events]->(n2:Devices)
RETURN COUNT(*)
"""
data = run_query(q)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

Launched job 1457
Number of answers: 21,813


In [124]:
q="""
MATCH (n1:Devices)-[r1:SERVICE_REQ_Events]->(n2:Devices)
WHERE r1.is_attack=FALSE
RETURN COUNT(*)
"""
data = run_query(q)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

Launched job 1461
Number of answers: 0


In [125]:
q="""
MATCH (n1:Devices)-[r1:SERVICE_REQ_Events]->(n2:Devices)
WHERE r1.is_attack=TRUE
RETURN COUNT(*)
"""
data = run_query(q)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

Launched job 1465
Number of answers: 21,813


In [126]:
%%time
EPOCH_TIME_DIFF_THRESHOLD = 500

Valid_tgt = """
MATCH (n1:Devices)-[r1:SERVICE_REQ_Events]->(n2:Devices), (n1)-[r2:TGT_REQ_Events]->(n2) 
WHERE r2.epoch_time <= r1.epoch_time 
AND r1.epoch_time - r2.epoch_time < {0}
SET r1.is_attack=FALSE
RETURN r1.src,r1.destination,r2.destination,r1.epoch_time,r2.epoch_time
""".format(EPOCH_TIME_DIFF_THRESHOLD)

print(Valid_tgt)
answer_table = run_query(Valid_tgt)
print('Number of answers: {:,}'.format(answer_table.num_rows))


MATCH (n1:Devices)-[r1:SERVICE_REQ_Events]->(n2:Devices), (n1)-[r2:TGT_REQ_Events]->(n2) 
WHERE r2.epoch_time <= r1.epoch_time 
AND r1.epoch_time - r2.epoch_time < 500
SET r1.is_attack=FALSE
RETURN r1.src,r1.destination,r2.destination,r1.epoch_time,r2.epoch_time

Launched job 1469
Number of answers: 15,238,712
CPU times: user 102 ms, sys: 18.4 ms, total: 121 ms
Wall time: 12.1 s


In [127]:
%%time
LATERAL_MOVEMENT_HOP_THRESHOLD = 300 

Final_PTT_ATTACK_Query = """
MATCH (n1:Devices)-[r1:SERVICE_REQ_Events]->(n2:Devices), (n1)-[r2:PTTFlow]->(n3:Devices) 
WHERE r1.is_attack=TRUE 
AND n2 <> n3 
AND r1.epoch_time <= r2.epoch_time 
AND r2.epoch_time - r1.epoch_time < {0} 
RETURN r1.src,r1.destination,r2.dst_device,r1.epoch_time,r2.epoch_time
""".format(LATERAL_MOVEMENT_HOP_THRESHOLD)

print(Final_PTT_ATTACK_Query)
answer_table = run_query(Final_PTT_ATTACK_Query)
print('Number of answers: {:,}'.format(answer_table.num_rows))


MATCH (n1:Devices)-[r1:SERVICE_REQ_Events]->(n2:Devices), (n1)-[r2:PTTFlow]->(n3:Devices) 
WHERE r1.is_attack=TRUE 
AND n2 <> n3 
AND r1.epoch_time <= r2.epoch_time 
AND r2.epoch_time - r1.epoch_time < 300 
RETURN r1.src,r1.destination,r2.dst_device,r1.epoch_time,r2.epoch_time

Launched job 1584
Number of answers: 4,154
CPU times: user 8.92 ms, sys: 5.07 ms, total: 14 ms
Wall time: 317 ms


In [128]:
data = answer_table.get_data_pandas()
data1 = data.drop_duplicates()
data1[0:10]

Unnamed: 0,r1_src,r1_destination,r2_dst_device,r1_epoch_time,r2_epoch_time
0,Comp348553,ActiveDirectory,Comp354767,7257773,7257789
2,Comp348553,ActiveDirectory,Comp354767,7257779,7257789
6,Comp308413,ActiveDirectory,Comp354767,7258077,7258090
8,Comp814303,ActiveDirectory,Comp805594,7257624,7257912
9,Comp814303,ActiveDirectory,Comp805594,7257695,7257912
18,Comp855562,ActiveDirectory,Comp354767,7258273,7258287
20,Comp855562,ActiveDirectory,Comp354767,7258272,7258287
26,Comp814303,ActiveDirectory,Comp805594,7257624,7257852
27,Comp814303,ActiveDirectory,Comp805594,7257695,7257852
37,Comp814303,ActiveDirectory,Comp805594,7257695,7257972
