 # Cybersecurity Graph Analytics with xGT on HPE Superdome Flex
  
 ## Lateral Movement
 --- 
*Lateral movement* is a cyberattack pattern that describes how an adversary leverages a single foothold to compromise other systems within a network. Identifying and stopping lateral movement is an important step in controlling the damage from a breach, and also plays a role in forensic analysis of a cyberatt ack, helping to identify its source and reconstruct what happened.  

## Lateral Movement (Kerberos - Pass The Ticket)
---
*Pass the ticket (PTT)* is a method of authenticating to a system using Kerberos tickets without having access to an account's password. Kerberos authentication can be used as the first step to lateral movement to a remote system.

| Malware | Vulnerabilities |  Windows Event Id |  Windows Port | Process/Executables | DLL/SharedLibrary | Other Parameters |
|---------|-----------------|-------------------|-------------- |---------------------|-------------------|------------------|
| Empire  |                 |    4768 , 4769    | 135,445       |  lsass.exe          |                   | Auth Pkg = ‘Kerberos’


### Attack Steps
---
*Step1*:  Forge 4768(TGT REQUEST) using tools like mimikatz. This prevents generation of 4768 for TGT REQUEST.<br>
*Step2*: Make a legitimate Service ticket request. This generates 4769.<br>
*Step3*: Look for 4769 logged for which 4768 has not been logged for a given time duration.<br>
*Step4*: Check for LM in shortlisted Devices in Step 3. <br>

In this notebook, we show how Cybersecurity Graph Analytics can be implemented with xGT on HPE Superdome Flex for  large data.

Mitre Attack Catalog https://attack.mitre.org/

Dataset : https://datasets.trovares.com/cyber/LANL/ind	


In [239]:
import xgt
import os
import pandas

from platform import python_version
print (python_version())

3.7.4


In [240]:
if os.environ.get('https_proxy'):
 del os.environ['https_proxy']
if os.environ.get('http_proxy'):
 del os.environ['http_proxy']

In [241]:
conn=xgt.Connection()
conn.server_version

'1.3.0'

In [242]:
try:
  devices = conn.get_vertex_frame('Devices')
except xgt.XgtNameError:
  devices = conn.create_vertex_frame(
      name='Devices',
      schema=[['device', xgt.TEXT]],
      key='device')
devices

<xgt.graph.VertexFrame at 0x7fc9b478cad0>

In [243]:
try:
  netflow = conn.get_edge_frame('Netflow')
except xgt.XgtNameError:
  netflow = conn.create_edge_frame(
      name='Netflow',
      schema=[['epoch_time', xgt.INT],
              ['duration', xgt.INT],
              ['src_device', xgt.TEXT],
              ['dst_device', xgt.TEXT],
              ['protocol', xgt.INT],
              ['src_port', xgt.INT],
              ['dst_port', xgt.INT],
              ['src_packets', xgt.INT],
              ['dst_packets', xgt.INT],
              ['src_bytes', xgt.INT],
              ['dst_bytes', xgt.INT]],
      source=devices,
      target=devices,
      source_key='src_device',
      target_key='dst_device')
netflow

<xgt.graph.EdgeFrame at 0x7fc98efb8a10>

In [244]:
try:
  auth_events = conn.get_edge_frame('AuthEvents')
except xgt.XgtNameError:
  auth_events = conn.create_edge_frame(
           name='AuthEvents',
           schema = [['epoch_time',xgt.INT],
                     ['event_id',xgt.INT],
                     ['log_host',xgt.TEXT],
                     ['logon_type',xgt.INT],
                     ['logon_type_description',xgt.TEXT],
                     ['user_name',xgt.TEXT],
                     ['domain_name',xgt.TEXT],
                     ['logon_id',xgt.INT],
                     ['subject_user_name',xgt.TEXT],
                     ['subject_domain_name',xgt.TEXT],
                     ['subject_logon_id',xgt.TEXT],
                     ['status',xgt.TEXT],
                     ['src',xgt.TEXT],
                     ['service_name',xgt.TEXT],
                     ['destination',xgt.TEXT],
                     ['authentication_package',xgt.TEXT],
                     ['failure_reason',xgt.TEXT],
                     ['process_name',xgt.TEXT],
                     ['process_id',xgt.INT],
                     ['parent_process_name',xgt.TEXT],
                     ['parent_process_id',xgt.INT]],
            source = 'Devices',
            target = 'Devices',
            source_key = 'src',
            target_key = 'destination')
auth_events

<xgt.graph.EdgeFrame at 0x7fcb2aef5d10>

In [245]:
# Utility to print the sizes of data currently in xGT
def print_data_summary():
  print('Devices (vertices): {:,}'.format(devices.num_vertices))
  print('Netflow (edges): {:,}'.format(netflow.num_edges))
  print('Authentication events (edges): {:,}'.format(auth_events.num_edges))
  print('Total (edges): {:,}'.format(
      netflow.num_edges + auth_events.num_edges))
    
print_data_summary()

Devices (vertices): 0
Netflow (edges): 0
Authentication events (edges): 0
Total (edges): 0


In [246]:
%%time

# Load the AuthEvents event data:
if auth_events.num_edges == 0:
    urls = ["xgtd://nvme_data3/data_2v/wls_day-85_2v.csv"]
    auth_events.load(urls)
    print_data_summary()

Devices (vertices): 12,288
Netflow (edges): 0
Authentication events (edges): 47,790,045
Total (edges): 47,790,045
CPU times: user 155 ms, sys: 52.1 ms, total: 207 ms
Wall time: 41.1 s


In [247]:
%%time

# Load the netflow data:
if netflow.num_edges == 0:
    urls = ["xgtd://nvme_data5/data_nf/nf_day-85.csv"]
    netflow.load(urls)
    print_data_summary()

Devices (vertices): 137,812
Netflow (edges): 235,661,328
Authentication events (edges): 47,790,045
Total (edges): 283,451,373
CPU times: user 588 ms, sys: 209 ms, total: 797 ms
Wall time: 1min 51s


In [248]:
# Generate a new edge frame for holding only the "Kerberos - Pass The Ticket" Flow edges
import time
query_start_time = time.time()

conn.drop_frame('PTTFlow')
ptt_flow = conn.create_edge_frame(
            name='PTTFlow',
            schema=netflow.schema,
            source=devices,
            target=devices,
            source_key='src_device',
            target_key='dst_device')
ptt_flow

<xgt.graph.EdgeFrame at 0x7fca48315550>

In [249]:
# Utility function to launch queries and show job number:
#   The job number may be useful if a long-running job needs
#   to be canceled.

def run_query(query, table_name = "answers", drop_answer_table=True, show_query=False):
    if drop_answer_table:
        conn.drop_frame(table_name)
    if query[-1] != '\n':
        query += '\n'
    query += 'INTO {}'.format(table_name)
    if show_query:
        print("Query:\n" + query)
    job = conn.schedule_job(query)
    print("Launched job {}".format(job.id))
    conn.wait_for_job(job)
    table = conn.get_table_frame(table_name)
    return table

In [250]:
%%time

#filtering with port no. 135 & 445

PTT_LMFlow_Query = """
MATCH (v0:Devices)-[edge:Netflow]->(v1:Devices) 
WHERE edge.dst_port=135 OR edge.dst_port=445 
CREATE (v0)-[e:PTTFlow 
             {epoch_time : edge.epoch_time, 
              duration : edge.duration, 
              protocol : edge.protocol, 
              src_port : edge.src_port, 
              dst_port : edge.dst_port}]->(v1) 
RETURN count(*)
"""
data = run_query(PTT_LMFlow_Query)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

Launched job 1352
Number of answers: 5,210,584
CPU times: user 114 ms, sys: 12.1 ms, total: 126 ms
Wall time: 21.2 s


In [251]:
# retrieve the answer rows to the client in a pandas frame
data1 = data.get_data_pandas()
data1[0:10]

Unnamed: 0,count(*)
0,5210584


In [252]:
%%time

# retrieve the answer rows to the client in a pandas frame
print("Print PTT_Flow Data")

Query_PTTFlow_Data = """
MATCH (v0:Devices)-[edge:PTTFlow]->(v1:Devices) 
RETURN edge.epoch_time,
       edge.duration,
       edge.src_device,
       edge.dst_device,
       edge.protocol,
       edge.src_port,
       edge.dst_port
"""
data = run_query(Query_PTTFlow_Data)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

Print PTT_Flow Data
Launched job 1466
Number of answers: 7,302,914
CPU times: user 34.7 s, sys: 1.97 s, total: 36.7 s
Wall time: 56.8 s


In [253]:
data1 = data.get_data_pandas()
data1[0:10]

Unnamed: 0,edge_epoch_time,edge_duration,edge_src_device,edge_dst_device,edge_protocol,edge_src_port,edge_dst_port
0,7302914,0,Comp044772,Comp811836,6,56710,445
1,7304884,2,Comp296454,Comp244393,6,10924,445
2,7276124,318,Comp472185,ActiveDirectory,6,48511,445
3,7314370,47,Comp399174,Comp704126,6,72838,445
4,7333359,15,Comp015598,ActiveDirectory,6,37611,445
5,7258900,1,Comp288359,Comp479002,6,24620,445
6,7281823,47226,Comp515096,Comp274690,6,25631,445
7,7309326,10,Comp422721,ActiveDirectory,6,55453,135
8,7280063,16,Comp784926,ActiveDirectory,6,24116,445
9,7276825,11,Comp989545,Comp712936,6,43860,445


In [254]:
#Count of PTTFlow Edges Created
data=None
if ptt_flow.num_edges == 0:
    print("PTTFlow is empty")
elif ptt_flow.num_edges <= 1000:
    data = ptt_flow.get_data_pandas()
else:
    data = 'PTTflow (edges): {:,}'.format(ptt_flow.num_edges)
data

'PTTflow (edges): 5,210,584'

In [255]:
#Create TGT_RES_Events Edge Frames 
try:
  tgt_req_events = conn.get_edge_frame('TGT_REQ_Events')
except xgt.XgtNameError:
  tgt_req_events = conn.create_edge_frame(
           name='TGT_REQ_Events',
           schema = [['epoch_time',xgt.INT],
                     ['event_id',xgt.INT],
                     ['src',xgt.TEXT],
                     ['destination',xgt.TEXT],
                     ['is_attack',xgt.BOOLEAN]],
            source = 'Devices',
            target = 'Devices',
            source_key = 'src',
            target_key = 'destination')
tgt_req_events

<xgt.graph.EdgeFrame at 0x7fc9bae06950>

In [256]:
%%time

#Polulate TGT_REQ_Events EdgeFrames

TGT_REQ_Query = """
MATCH (n1:Devices)-[r:AuthEvents]->(n2:Devices) 
WHERE r.event_id = 4768 
CREATE (n1)-[r1:TGT_REQ_Events 
             {epoch_time:r.epoch_time, 
              event_id:r.event_id,
              is_attack:TRUE}]->(n2) 
RETURN count(*)
"""

data = run_query(TGT_REQ_Query)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

Launched job 1529
Number of answers: 875,992
CPU times: user 25.9 ms, sys: 6.38 ms, total: 32.3 ms
Wall time: 1.14 s


In [257]:
%%time

# retrieve the answer rows to the client in a pandas frame
print("Print TGT_REQ Data")

Query_TGT_REQ_Data = """
MATCH (n1:Devices)-[edge:TGT_REQ_Events]->(n2:Devices) 
RETURN edge.epoch_time,
       edge.event_id,
       edge.src,
       edge.destination,
       edge.is_attack
"""
data = run_query(Query_TGT_REQ_Data)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

data1 = data.get_data_pandas()
data1[0:10]

Print TGT_REQ Data
Launched job 1559
Number of answers: 7,276,387
CPU times: user 14.8 s, sys: 233 ms, total: 15.1 s
Wall time: 19.3 s


Unnamed: 0,edge_epoch_time,edge_event_id,edge_src,edge_destination,edge_is_attack
0,7276387,4768,Comp377544,ActiveDirectory,True
1,7275528,4768,Comp377544,ActiveDirectory,True
2,7289111,4768,Comp377544,ActiveDirectory,True
3,7258327,4768,Comp511228,ActiveDirectory,True
4,7302059,4768,Comp916004,ActiveDirectory,True
5,7257645,4768,Comp755918,ActiveDirectory,True
6,7291179,4768,Comp791826,ActiveDirectory,True
7,7276433,4768,Comp377544,ActiveDirectory,True
8,7301462,4768,Comp234343,ActiveDirectory,True
9,7296895,4768,Comp916004,ActiveDirectory,True


In [258]:
#Create SERVICE_REQ_Events Edge Frames 

try:
  service_req_events = conn.get_edge_frame('SERVICE_REQ_Events')
except xgt.XgtNameError:
  service_req_events = conn.create_edge_frame(
           name='SERVICE_REQ_Events',
           schema = [['epoch_time',xgt.INT],
                     ['event_id',xgt.INT],
                     ['src',xgt.TEXT],
                     ['destination',xgt.TEXT],
                     ['is_attack',xgt.BOOLEAN]],
            source = 'Devices',
            target = 'Devices',
            source_key = 'src',
            target_key = 'destination')
service_req_events

<xgt.graph.EdgeFrame at 0x7fc96fe851d0>

In [259]:
%%time

#Polulate SERVICE_REQ_Events EdgeFrames

SERVICE_REQ_Query = """
MATCH (n1:Devices)-[r:AuthEvents]->(n2:Devices) 
WHERE r.event_id = 4769 
CREATE (n1)-[r1:SERVICE_REQ_Events 
             {epoch_time:r.epoch_time, 
              event_id:r.event_id,
              is_attack:TRUE}]->(n2) 
RETURN count(*)
"""
data = run_query(SERVICE_REQ_Query)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

Launched job 1587
Number of answers: 2,271,788
CPU times: user 29 ms, sys: 4.26 ms, total: 33.2 ms
Wall time: 2.83 s


In [260]:
%%time

# retrieve the answer rows to the client in a pandas frame
print("Print SERVICE_REQ Data")

Query_SERVICE_REQ_Data = """
MATCH (n1:Devices)-[edge:SERVICE_REQ_Events]->(n2:Devices) 
RETURN edge.epoch_time,
       edge.event_id,
       edge.src,
       edge.destination,
       edge.is_attack
"""
data = run_query(Query_SERVICE_REQ_Data)
#print('Number of answers: {:,}'.format(data.get_data()[0][0]))
data1 = data.get_data_pandas()
data1[0:10]

Print SERVICE_REQ Data
Launched job 1590
CPU times: user 19 s, sys: 449 ms, total: 19.4 s
Wall time: 24.6 s


Unnamed: 0,edge_epoch_time,edge_event_id,edge_src,edge_destination,edge_is_attack
0,7314279,4769,Comp987304,ActiveDirectory,True
1,7265685,4769,Comp916004,ActiveDirectory,True
2,7291876,4769,Comp548524,ActiveDirectory,True
3,7284663,4769,Comp307913,ActiveDirectory,True
4,7291202,4769,Comp415540,ActiveDirectory,True
5,7274697,4769,Comp916004,ActiveDirectory,True
6,7297042,4769,Comp195375,ActiveDirectory,True
7,7308754,4769,Comp372126,ActiveDirectory,True
8,7266534,4769,Comp415540,ActiveDirectory,True
9,7297067,4769,Comp780583,ActiveDirectory,True


In [261]:
%%time
EPOCH_TIME_DIFF_THRESHOLD = 3600

Valid_tgt = """
MATCH (n1:Devices)-[r1:SERVICE_REQ_Events]->(n2:Devices), (n1)-[r2:TGT_REQ_Events]->(n2) 
WHERE r2.epoch_time <= r1.epoch_time 
AND r1.epoch_time - r2.epoch_time < {0}
SET r1.is_attack=FALSE
RETURN r1.src,r1.destination,r2.destination,r1.epoch_time,r2.epoch_time
LIMIT 1000
""".format(EPOCH_TIME_DIFF_THRESHOLD)

print(Valid_tgt)
answer_table = run_query(Valid_tgt)
print('Number of answers: {:,}'.format(answer_table.num_rows))


MATCH (n1:Devices)-[r1:SERVICE_REQ_Events]->(n2:Devices), (n1)-[r2:TGT_REQ_Events]->(n2) 
WHERE r2.epoch_time <= r1.epoch_time 
AND r1.epoch_time - r2.epoch_time < 3600
SET r1.is_attack=FALSE
RETURN r1.src,r1.destination,r2.destination,r1.epoch_time,r2.epoch_time
LIMIT 1000

Launched job 1645
Number of answers: 1,000
CPU times: user 48.6 s, sys: 4.69 s, total: 53.3 s
Wall time: 2h 26min 59s


In [262]:
%%time
LATERAL_MOVEMENT_HOP_THRESHOLD = 300 

Final_PTT_ATTACK_Query = """
MATCH (n1:Devices)-[r1:SERVICE_REQ_Events]->(n2:Devices), (n1)-[r2:PTTFlow]->(n3:Devices) 
WHERE r1.is_attack=TRUE 
AND n2 <> n3 
AND r1.epoch_time <= r2.epoch_time 
AND r2.epoch_time - r1.epoch_time < {0} 
RETURN r1.src,r1.destination,r2.dst_device,r1.epoch_time,r2.epoch_time
LIMIT 1000
""".format(LATERAL_MOVEMENT_HOP_THRESHOLD)

print(Final_PTT_ATTACK_Query)
answer_table = run_query(Final_PTT_ATTACK_Query)
print('Number of answers: {:,}'.format(answer_table.num_rows))


MATCH (n1:Devices)-[r1:SERVICE_REQ_Events]->(n2:Devices), (n1)-[r2:PTTFlow]->(n3:Devices) 
WHERE r1.is_attack=TRUE 
AND n2 <> n3 
AND r1.epoch_time <= r2.epoch_time 
AND r2.epoch_time - r1.epoch_time < 300 
RETURN r1.src,r1.destination,r2.dst_device,r1.epoch_time,r2.epoch_time
LIMIT 1000

Launched job 89005
Number of answers: 1,000
CPU times: user 40.6 ms, sys: 20 ms, total: 60.7 ms
Wall time: 19.1 s
