# Cybersecurity Graph Analytics with xGT on HPE Superdome Flex
  
 ## Lateral Movement
 --- 
*Lateral movement* is a cyberattack pattern that describes how an adversary leverages a single foothold to compromise other systems within a network. Identifying and stopping lateral movement is an important step in controlling the damage from a breach, and also plays a role in forensic analysis of a cyberatt ack, helping to identify its source and reconstruct what happened.  

## Lateral Movement (Pass The Hash)
---
Pass the hash (PtH) is a method of authenticating as a user without having access to the user's cleartext password. This method bypasses standard authentication steps that require a cleartext password, moving directly into the portion of the authentication that uses the password hash.

### Event ID's Meaning <br>
    1.4688 - Process Execution in windows
    
### Details associated with Pass the Hash attack
| Malware | TECHNIQUE | O/S | Windows Event Id | Windows Port | Process | DLL | Other Parameters
| --- | --- | --- | --- | --- | --- | --- | --- |
| SoftCell/Empire | T1075 | Windows | 4688 | 135/445 |  |  | Authentication Package = ‘NTLM’ |

### Pass the Hash Attack Flow
    1.Windows uses lsass to maintain the users password hashes in memory. 
    2.Tools like mimikatz can dump these hashes from memory.
    3.These hashes can be used to authenticate as legitimate users.
    4.Soft cell uses these hashes along with PsExec(445) or WMI(135) for lateral movement

In this notebook, we show how Cybersecurity Graph Analytics can be implemented with xGT on HPE Superdome Flex for  large data.

Mitre Attack Catalog https://attack.mitre.org/

Dataset : https://datasets.trovares.com/cyber/LANL/ind	

In [16]:
import xgt
import os
import pandas

from platform import python_version
print (python_version())

3.7.4


In [17]:
if os.environ.get('https_proxy'):
 del os.environ['https_proxy']
if os.environ.get('http_proxy'):
 del os.environ['http_proxy']

In [18]:
conn=xgt.Connection()
conn.server_version

'1.3.0'

In [19]:
try:
  devices = conn.get_vertex_frame('Devices')
except xgt.XgtNameError:
  devices = conn.create_vertex_frame(
      name='Devices',
      schema=[['device', xgt.TEXT]],
      key='device')
devices

<xgt.graph.VertexFrame at 0x7f1161811ad0>

In [20]:
try:
  netflow = conn.get_edge_frame('Netflow')
except xgt.XgtNameError:
  netflow = conn.create_edge_frame(
      name='Netflow',
      schema=[['epoch_time', xgt.INT],
              ['duration', xgt.INT],
              ['src_device', xgt.TEXT],
              ['dst_device', xgt.TEXT],
              ['protocol', xgt.INT],
              ['src_port', xgt.INT],
              ['dst_port', xgt.INT],
              ['src_packets', xgt.INT],
              ['dst_packets', xgt.INT],
              ['src_bytes', xgt.INT],
              ['dst_bytes', xgt.INT]],
      source=devices,
      target=devices,
      source_key='src_device',
      target_key='dst_device')
netflow

<xgt.graph.EdgeFrame at 0x7f116181a610>

In [21]:
try:
  host_events = conn.get_edge_frame('HostEvents')
except xgt.XgtNameError:
  host_events = conn.create_edge_frame(
      name='HostEvents',
      schema=[['epoch_time', xgt.INT],
              ['event_id', xgt.INT],
              ['log_host', xgt.TEXT],
              ['user_name', xgt.TEXT],
              ['domain_name', xgt.TEXT],
              ['logon_id', xgt.INT],
              ['process_name', xgt.TEXT],
              ['process_id', xgt.INT],
              ['parent_process_name', xgt.TEXT],
              ['parent_process_id', xgt.INT]],
           source=devices,
           target=devices,
           source_key='log_host',
           target_key='log_host')
host_events

<xgt.graph.EdgeFrame at 0x7f11617f88d0>

In [22]:
# Utility to print the sizes of data currently in xGT
def print_data_summary():
  print('Devices (vertices): {:,}'.format(devices.num_vertices))
  print('Netflow (edges): {:,}'.format(netflow.num_edges))
  print('Host events (edges): {:,}'.format(host_events.num_edges))
  print('Total (edges): {:,}'.format(netflow.num_edges + host_events.num_edges))
    
print_data_summary()

Devices (vertices): 137,416
Netflow (edges): 235,661,328
Host events (edges): 0
Total (edges): 235,661,328


In [23]:
%%time

# Load the HostEvents event data:
if host_events.num_edges == 0:
    urls = ["xgtd://nvme_data1/data_1v/wls_day-85_1v.csv"]
    host_events.load(urls)
    print_data_summary()

Devices (vertices): 137,479
Netflow (edges): 235,661,328
Host events (edges): 18,637,483
Total (edges): 254,298,811
CPU times: user 36.3 ms, sys: 22.4 ms, total: 58.6 ms
Wall time: 11.3 s


In [24]:
%%time

# Load the netflow data:
if netflow.num_edges == 0:
    urls = ["xgtd://nvme_data1/data_nf/nf_day-85.csv"]
    netflow.load(urls)
    print_data_summary()

CPU times: user 0 ns, sys: 972 µs, total: 972 µs
Wall time: 790 µs


In [25]:
# Utility function to launch queries and show job number:
#   The job number may be useful if a long-running job needs
#   to be canceled.

def run_query(query, table_name = "answers", drop_answer_table=True, show_query=False):
    if drop_answer_table:
        conn.drop_frame(table_name)
    if query[-1] != '\n':
        query += '\n'
    query += 'INTO {}'.format(table_name)
    if show_query:
        print("Query:\n" + query)
    job = conn.schedule_job(query)
    print("Launched job {}".format(job.id))
    conn.wait_for_job(job)
    table = conn.get_table_frame(table_name)
    return table

In [26]:
# Generate a new edge frame for holding only the RDP edges
import time
query_start_time = time.time()

conn.drop_frame('PTHFlow')
pth_flow = conn.create_edge_frame(
            name='PTHFlow',
            schema=netflow.schema,
            source=devices,
            target=devices,
            source_key='src_device',
            target_key='dst_device')
pth_flow

<xgt.graph.EdgeFrame at 0x7f1161806b50>

In [27]:
%%time

#filtering with port no. 135 & 445

q = """
MATCH (v0)-[edge:Netflow]->(v1)
WHERE edge.dst_port=135 OR edge.dst_port=445
CREATE (v0)-[e:PTHFlow {epoch_time : edge.epoch_time,
  duration : edge.duration, protocol : edge.protocol,
  src_port : edge.src_port, dst_port : edge.dst_port}]->(v1)
RETURN count(*)
"""
data = run_query(q)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

Launched job 998
Number of answers: 5,210,584
CPU times: user 25 ms, sys: 19.9 ms, total: 45 ms
Wall time: 7.9 s


In [28]:
data=None
if pth_flow.num_edges == 0:
    print("PTH Flow is empty")
elif pth_flow.num_edges <= 1000:
    data = pth_flow.get_data_pandas()
else:
    data = 'PTH Flow (edges): {:,}'.format(pth_flow.num_edges)
data

'PTH Flow (edges): 5,210,584'

In [29]:
# Utility to print the data sizes currently in xGT
def print_netflow_data_summary():
  print_data_summary()
  print('PTH Flow (edges): {:,}'.format(pth_flow.num_edges))

print_netflow_data_summary()

Devices (vertices): 137,479
Netflow (edges): 235,661,328
Host events (edges): 18,637,483
Total (edges): 254,298,811
PTH Flow (edges): 5,210,584


In [30]:
%%time

#Lateral Movement Query
time_threshold_hijack = 180          # three minutes
q = """
MATCH (A:Devices)-[r1:PTHFlow]->(B:Devices), (A)-[hijack1:HostEvents]->(A)
WHERE A <> B
  AND hijack1.event_id = 4688
  AND hijack1.parent_process_name = "lsass"
  AND hijack1.epoch_time <= r1.epoch_time
  AND r1.epoch_time - hijack1.epoch_time < {0}
RETURN r1.src_device, r1.dst_device, r1.epoch_time, hijack1.epoch_time
""".format(time_threshold_hijack)
answer_table = run_query(q)
print('Number of answers: {:,}'.format(answer_table.num_rows))

Launched job 1028
Number of answers: 119
CPU times: user 41.3 ms, sys: 48.1 ms, total: 89.4 ms
Wall time: 1min 6s
