 # Cybersecurity Graph Analytics with xGT on HPE Superdome Flex
  
 ## Lateral Movement
 --- 
*Lateral movement* is a cyberattack pattern that describes how an adversary leverages a single foothold to compromise other systems within a network. Identifying and stopping lateral movement is an important step in controlling the damage from a breach, and also plays a role in forensic analysis of a cyberatt ack, helping to identify its source and reconstruct what happened.  

## Lateral Movement (Kerberos - Pass The Ticket)
---
*Pass the ticket (PTT)* is a method of authenticating to a system using Kerberos tickets without having access to an account's password. Kerberos authentication can be used as the first step to lateral movement to a remote system.

| Malware | Vulnerabilities |  Windows Event Id |  Windows Port | Process/Executables | DLL/SharedLibrary | Other Parameters |
|---------|-----------------|-------------------|-------------- |---------------------|-------------------|------------------|
| Empire  |                 |    4768 , 4769    | 135,445       |  lsass.exe          |                   | Auth Pkg = ‘Kerberos’


### Attack Steps
---
*Step1*:  Forge 4768(TGT REQUEST) using tools like mimikatz. This prevents generation of 4768 for TGT REQUEST.<br>
*Step2*: Make a legitimate Service ticket request. This generates 4769.<br>
*Step3*: Look for 4769 logged for which 4768 has not been logged for a given time duration.<br>
*Step4*: Check for LM in shortlisted Devices in Step 3. <br>

In this notebook, we show how Cybersecurity Graph Analytics can be implemented with xGT on HPE Superdome Flex for  large data.

Mitre Attack Catalog https://attack.mitre.org/

Dataset : https://datasets.trovares.com/cyber/LANL/ind	


In [2]:
import xgt
import os
import pandas

from platform import python_version
print (python_version())

3.7.4


In [3]:
if os.environ.get('https_proxy'):
 del os.environ['https_proxy']
if os.environ.get('http_proxy'):
 del os.environ['http_proxy']

In [4]:
conn=xgt.Connection()
conn.server_version

'1.3.0'

In [5]:
try:
  devices = conn.get_vertex_frame('Devices')
except xgt.XgtNameError:
  devices = conn.create_vertex_frame(
      name='Devices',
      schema=[['device', xgt.TEXT]],
      key='device')
devices

<xgt.graph.VertexFrame at 0x7f7d911a1410>

In [6]:
try:
  netflow = conn.get_edge_frame('Netflow')
except xgt.XgtNameError:
  netflow = conn.create_edge_frame(
      name='Netflow',
      schema=[['epoch_time', xgt.INT],
              ['duration', xgt.INT],
              ['src_device', xgt.TEXT],
              ['dst_device', xgt.TEXT],
              ['protocol', xgt.INT],
              ['src_port', xgt.INT],
              ['dst_port', xgt.INT],
              ['src_packets', xgt.INT],
              ['dst_packets', xgt.INT],
              ['src_bytes', xgt.INT],
              ['dst_bytes', xgt.INT]],
      source=devices,
      target=devices,
      source_key='src_device',
      target_key='dst_device')
netflow

<xgt.graph.EdgeFrame at 0x7f7d905ff210>

In [7]:
try:
  auth_events = conn.get_edge_frame('AuthEvents')
except xgt.XgtNameError:
  auth_events = conn.create_edge_frame(
           name='AuthEvents',
           schema = [['epoch_time',xgt.INT],
                     ['event_id',xgt.INT],
                     ['log_host',xgt.TEXT],
                     ['logon_type',xgt.INT],
                     ['logon_type_description',xgt.TEXT],
                     ['user_name',xgt.TEXT],
                     ['domain_name',xgt.TEXT],
                     ['logon_id',xgt.INT],
                     ['subject_user_name',xgt.TEXT],
                     ['subject_domain_name',xgt.TEXT],
                     ['subject_logon_id',xgt.TEXT],
                     ['status',xgt.TEXT],
                     ['src',xgt.TEXT],
                     ['service_name',xgt.TEXT],
                     ['destination',xgt.TEXT],
                     ['authentication_package',xgt.TEXT],
                     ['failure_reason',xgt.TEXT],
                     ['process_name',xgt.TEXT],
                     ['process_id',xgt.INT],
                     ['parent_process_name',xgt.TEXT],
                     ['parent_process_id',xgt.INT]],
            source = 'Devices',
            target = 'Devices',
            source_key = 'src',
            target_key = 'destination')
auth_events

<xgt.graph.EdgeFrame at 0x7f7d9060b610>

In [8]:
# Utility to print the sizes of data currently in xGT
def print_data_summary():
  print('Devices (vertices): {:,}'.format(devices.num_vertices))
  print('Netflow (edges): {:,}'.format(netflow.num_edges))
  print('Authentication events (edges): {:,}'.format(auth_events.num_edges))
  print('Total (edges): {:,}'.format(
      netflow.num_edges + auth_events.num_edges))
    
print_data_summary()

Devices (vertices): 0
Netflow (edges): 0
Authentication events (edges): 0
Total (edges): 0


In [9]:
%%time

# Load the AuthEvents event data:
if auth_events.num_edges == 0:
    urls = ["xgtd://nvme_data3/data_2v/wls_day-85_2v.csv"]
    auth_events.load(urls)
    print_data_summary()

Devices (vertices): 12,288
Netflow (edges): 0
Authentication events (edges): 47,790,045
Total (edges): 47,790,045
CPU times: user 129 ms, sys: 75.3 ms, total: 204 ms
Wall time: 34 s


In [10]:
%%time

# Load the netflow data:
if netflow.num_edges == 0:
    urls = ["xgtd://nvme_data5/data_nf/nf_day-85.csv"]
    netflow.load(urls)
    print_data_summary()

Devices (vertices): 137,812
Netflow (edges): 235,661,328
Authentication events (edges): 47,790,045
Total (edges): 283,451,373
CPU times: user 305 ms, sys: 224 ms, total: 529 ms
Wall time: 1min 52s


In [11]:
# Generate a new edge frame for holding only the "Kerberos - Pass The Ticket" Flow edges
import time
query_start_time = time.time()

conn.drop_frame('PTTFlow')
ptt_flow = conn.create_edge_frame(
            name='PTTFlow',
            schema=netflow.schema,
            source=devices,
            target=devices,
            source_key='src_device',
            target_key='dst_device')
ptt_flow

<xgt.graph.EdgeFrame at 0x7f7d906141d0>

In [12]:
# Utility function to launch queries and show job number:
#   The job number may be useful if a long-running job needs
#   to be canceled.

def run_query(query, table_name = "answers", drop_answer_table=True, show_query=False):
    if drop_answer_table:
        conn.drop_frame(table_name)
    if query[-1] != '\n':
        query += '\n'
    query += 'INTO {}'.format(table_name)
    if show_query:
        print("Query:\n" + query)
    job = conn.schedule_job(query)
    print("Launched job {}".format(job.id))
    conn.wait_for_job(job)
    table = conn.get_table_frame(table_name)
    return table

In [13]:
%%time

#filtering with port no. 135 & 445

PTT_LMFlow_Query = """
MATCH (v0:Devices)-[edge:Netflow]->(v1:Devices) 
WHERE edge.dst_port=135 OR edge.dst_port=445 
CREATE (v0)-[e:PTTFlow 
             {epoch_time : edge.epoch_time, 
              duration : edge.duration, 
              protocol : edge.protocol, 
              src_port : edge.src_port, 
              dst_port : edge.dst_port}]->(v1) 
RETURN count(*)
"""
data = run_query(PTT_LMFlow_Query)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

Launched job 1295
Number of answers: 5,210,584
CPU times: user 53.9 ms, sys: 40.9 ms, total: 94.8 ms
Wall time: 21.1 s


In [14]:
# retrieve the answer rows to the client in a pandas frame
data1 = data.get_data_pandas()
data1[0:10]

Unnamed: 0,count(*)
0,5210584


In [15]:
%%time

# retrieve the answer rows to the client in a pandas frame
print("Print PTT_Flow Data")

Query_PTTFlow_Data = """
MATCH (v0:Devices)-[edge:PTTFlow]->(v1:Devices) 
RETURN edge.epoch_time,
       edge.duration,
       edge.src_device,
       edge.dst_device,
       edge.protocol,
       edge.src_port,
       edge.dst_port
"""
data = run_query(Query_PTTFlow_Data)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

Print PTT_Flow Data
Launched job 1404
Number of answers: 7,298,261
CPU times: user 17.2 s, sys: 2.29 s, total: 19.5 s
Wall time: 41.7 s


In [16]:
data1 = data.get_data_pandas()
data1[0:10]

Unnamed: 0,edge_epoch_time,edge_duration,edge_src_device,edge_dst_device,edge_protocol,edge_src_port,edge_dst_port
0,7298261,12,Comp754845,ActiveDirectory,6,80924,445
1,7292861,12,Comp754845,ActiveDirectory,6,54098,445
2,7301861,12,Comp754845,ActiveDirectory,6,83398,445
3,7306581,5,Comp317953,Comp015815,6,54613,135
4,7269906,14,Comp834987,ActiveDirectory,6,32900,135
5,7269906,48,Comp834987,ActiveDirectory,6,65006,135
6,7267144,15,Comp834987,ActiveDirectory,6,27058,135
7,7285034,15,Comp535172,ActiveDirectory,6,95033,445
8,7269076,14,Comp834987,ActiveDirectory,6,75299,445
9,7284077,612,Comp535172,Comp908480,6,30195,445


In [17]:
#Count of PTTFlow Edges Created
data=None
if ptt_flow.num_edges == 0:
    print("PTTFlow is empty")
elif ptt_flow.num_edges <= 1000:
    data = ptt_flow.get_data_pandas()
else:
    data = 'PTTflow (edges): {:,}'.format(ptt_flow.num_edges)
data

'PTTflow (edges): 5,210,584'

In [18]:
#Create TGT_RES_Events Edge Frames 
try:
  tgt_req_events = conn.get_edge_frame('TGT_REQ_Events')
except xgt.XgtNameError:
  tgt_req_events = conn.create_edge_frame(
           name='TGT_REQ_Events',
           schema = [['epoch_time',xgt.INT],
                     ['event_id',xgt.INT],
                     ['src',xgt.TEXT],
                     ['destination',xgt.TEXT],
                     ['is_attack',xgt.BOOLEAN]],
            source = 'Devices',
            target = 'Devices',
            source_key = 'src',
            target_key = 'destination')
tgt_req_events

<xgt.graph.EdgeFrame at 0x7f7d905cd490>

In [19]:
%%time

#Polulate TGT_REQ_Events EdgeFrames

TGT_REQ_Query = """
MATCH (n1:Devices)-[r:AuthEvents]->(n2:Devices) 
WHERE r.event_id = 4768 
CREATE (n1)-[r1:TGT_REQ_Events 
             {epoch_time:r.epoch_time, 
              event_id:r.event_id,
              is_attack:TRUE}]->(n2) 
RETURN count(*)
"""

data = run_query(TGT_REQ_Query)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

Launched job 1480
Number of answers: 875,992
CPU times: user 20.2 ms, sys: 8.09 ms, total: 28.3 ms
Wall time: 1.05 s


In [20]:
%%time

# retrieve the answer rows to the client in a pandas frame
print("Print TGT_REQ Data")

Query_TGT_REQ_Data = """
MATCH (n1:Devices)-[edge:TGT_REQ_Events]->(n2:Devices) 
RETURN edge.epoch_time,
       edge.event_id,
       edge.src,
       edge.destination,
       edge.is_attack
"""
data = run_query(Query_TGT_REQ_Data)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

data1 = data.get_data_pandas()
data1[0:10]

Print TGT_REQ Data
Launched job 1492
Number of answers: 7,286,748
CPU times: user 8.69 s, sys: 469 ms, total: 9.16 s
Wall time: 24.9 s


Unnamed: 0,edge_epoch_time,edge_event_id,edge_src,edge_destination,edge_is_attack
0,7286748,4768,EnterpriseAppServer,ActiveDirectory,True
1,7336218,4768,Comp916004,ActiveDirectory,True
2,7329519,4768,Comp916004,ActiveDirectory,True
3,7315548,4768,Comp916004,ActiveDirectory,True
4,7270986,4768,Comp916004,ActiveDirectory,True
5,7297055,4768,Comp823551,ActiveDirectory,True
6,7298492,4768,Comp916004,ActiveDirectory,True
7,7297814,4768,Comp520997,ActiveDirectory,True
8,7265498,4768,Comp520997,ActiveDirectory,True
9,7285147,4768,Comp380546,ActiveDirectory,True


In [21]:
#Create SERVICE_REQ_Events Edge Frames 

try:
  service_req_events = conn.get_edge_frame('SERVICE_REQ_Events')
except xgt.XgtNameError:
  service_req_events = conn.create_edge_frame(
           name='SERVICE_REQ_Events',
           schema = [['epoch_time',xgt.INT],
                     ['event_id',xgt.INT],
                     ['src',xgt.TEXT],
                     ['destination',xgt.TEXT],
                     ['is_attack',xgt.BOOLEAN]],
            source = 'Devices',
            target = 'Devices',
            source_key = 'src',
            target_key = 'destination')
service_req_events

<xgt.graph.EdgeFrame at 0x7f7d905ed610>

In [22]:
%%time

#Polulate SERVICE_REQ_Events EdgeFrames

SERVICE_REQ_Query = """
MATCH (n1:Devices)-[r:AuthEvents]->(n2:Devices) 
WHERE r.event_id = 4769 
CREATE (n1)-[r1:SERVICE_REQ_Events 
             {epoch_time:r.epoch_time, 
              event_id:r.event_id,
              is_attack:TRUE}]->(n2) 
RETURN count(*)
"""
data = run_query(SERVICE_REQ_Query)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

Launched job 1555
Number of answers: 2,271,788
CPU times: user 33.4 ms, sys: 20.7 ms, total: 54.1 ms
Wall time: 6.67 s


In [23]:
%%time

# retrieve the answer rows to the client in a pandas frame
print("Print SERVICE_REQ Data")

Query_SERVICE_REQ_Data = """
MATCH (n1:Devices)-[edge:SERVICE_REQ_Events]->(n2:Devices) 
RETURN edge.epoch_time,
       edge.event_id,
       edge.src,
       edge.destination,
       edge.is_attack
"""
data = run_query(Query_SERVICE_REQ_Data)
#print('Number of answers: {:,}'.format(data.get_data()[0][0]))
data1 = data.get_data_pandas()
data1[0:10]

Print SERVICE_REQ Data
Launched job 1600
CPU times: user 14.7 s, sys: 705 ms, total: 15.4 s
Wall time: 22 s


Unnamed: 0,edge_epoch_time,edge_event_id,edge_src,edge_destination,edge_is_attack
0,7274472,4769,Comp916004,ActiveDirectory,True
1,7278945,4769,Comp244393,ActiveDirectory,True
2,7331911,4769,Comp297849,ActiveDirectory,True
3,7339386,4769,Comp244393,ActiveDirectory,True
4,7315373,4769,Comp326739,ActiveDirectory,True
5,7262396,4769,Comp755918,ActiveDirectory,True
6,7281678,4769,Comp916004,ActiveDirectory,True
7,7343102,4769,ActiveDirectory,ActiveDirectory,True
8,7295012,4769,Comp373920,ActiveDirectory,True
9,7297452,4769,Comp916004,ActiveDirectory,True


In [25]:
q="""
MATCH (n1:Devices)-[r1:SERVICE_REQ_Events]->(n2:Devices)
RETURN COUNT(*)
"""
data = run_query(q)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

Launched job 1652
Number of answers: 2,271,788


In [26]:
q="""
MATCH (n1:Devices)-[r1:SERVICE_REQ_Events]->(n2:Devices)
WHERE r1.is_attack=FALSE
RETURN COUNT(*)
"""
data = run_query(q)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

Launched job 1655
Number of answers: 0


In [27]:
q="""
MATCH (n1:Devices)-[r1:SERVICE_REQ_Events]->(n2:Devices)
WHERE r1.is_attack=TRUE
RETURN COUNT(*)
"""
data = run_query(q)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

Launched job 1669
Number of answers: 2,271,788


In [None]:
%%time
EPOCH_TIME_DIFF_THRESHOLD = 3600

Valid_tgt = """
MATCH (n1:Devices)-[r1:SERVICE_REQ_Events]->(n2:Devices), (n1)-[r2:TGT_REQ_Events]->(n2) 
WHERE r2.epoch_time <= r1.epoch_time 
AND r1.epoch_time - r2.epoch_time < {0}
SET r1.is_attack=FALSE
RETURN r1.src,r1.destination,r2.destination,r1.epoch_time,r2.epoch_time
LIMIT 1000
""".format(EPOCH_TIME_DIFF_THRESHOLD)

print(Valid_tgt)
answer_table = run_query(Valid_tgt)
print('Number of answers: {:,}'.format(answer_table.num_rows))

In [None]:
%%time
LATERAL_MOVEMENT_HOP_THRESHOLD = 300 

Final_PTT_ATTACK_Query = """
MATCH (n1:Devices)-[r1:SERVICE_REQ_Events]->(n2:Devices), (n1)-[r2:PTTFlow]->(n3:Devices) 
WHERE r1.is_attack=TRUE 
AND n2 <> n3 
AND r1.epoch_time <= r2.epoch_time 
AND r2.epoch_time - r1.epoch_time < {0} 
RETURN r1.src,r1.destination,r2.dst_device,r1.epoch_time,r2.epoch_time
LIMIT 1000
""".format(LATERAL_MOVEMENT_HOP_THRESHOLD)

print(Final_PTT_ATTACK_Query)
answer_table = run_query(Final_PTT_ATTACK_Query)
print('Number of answers: {:,}'.format(answer_table.num_rows))