# Cybersecurity Graph Analytics with xGT on HPE Superdome Flex
  
 ## Lateral Movement
 --- 
*Lateral movement* is a cyberattack pattern that describes how an adversary leverages a single foothold to compromise other systems within a network. Identifying and stopping lateral movement is an important step in controlling the damage from a breach, and also plays a role in forensic analysis of a cyberatt ack, helping to identify its source and reconstruct what happened.  

## Lateral Movement (Third Party Software)
---
Third-party applications and software deployment systems may be in use in the network environment for administration purposes (e.g., SCCM, VNC, HBSS, Altiris, etc.). If an adversary gains access to these systems, then they may be able to execute code.
### Event ID's Meaning <br>
    1.4609/1100 - Windows Shutdown 
    
### Details associated with Third Party Software attack for wiper malware
| Malware | TECHNIQUE | O/S | Windows Event Id | Windows Port | Process | DLL | Other Parameters
| --- | --- | --- | --- | --- | --- | --- | --- |
| Wiper | T1072 | Windows | 4609/1100 | 22 |  |  |  |

### Third Party Software Attack Flow<font color = red>(Only for Wiper. Can not be generalized to the Attack)</font>
    1.Malware uses conime.exe (SCP Client) to transfer the pr1.tmp(Shell script). 
    2.Later it uses alg.exe (SSH Client) to execute it. 
    3.Next it attempts to kill pasvc.exe and clisvc.exe if exists. These are Anti-virus processes.
    4.In Windows this sequence proceeds to reboot the system before erasing the disks. 
    5.So look for port 22 in netflow 2 consecutive times and 4609/1100(reboot) of the destination in a given time stamps.

In this notebook, we show how Cybersecurity Graph Analytics can be implemented with xGT on HPE Superdome Flex for  large data.

Mitre Attack Catalog https://attack.mitre.org/

Dataset : https://datasets.trovares.com/cyber/LANL/ind	

In [1]:
import xgt
import os
import pandas

from platform import python_version
print (python_version())

3.7.4


In [2]:
if os.environ.get('https_proxy'):
 del os.environ['https_proxy']
if os.environ.get('http_proxy'):
 del os.environ['http_proxy']

In [3]:
conn=xgt.Connection()
conn.server_version

'1.3.0'

In [4]:
try:
  devices = conn.get_vertex_frame('Devices')
except xgt.XgtNameError:
  devices = conn.create_vertex_frame(
      name='Devices',
      schema=[['device', xgt.TEXT]],
      key='device')
devices

<xgt.graph.VertexFrame at 0x7f7957029ad0>

In [5]:
try:
  netflow = conn.get_edge_frame('Netflow')
except xgt.XgtNameError:
  netflow = conn.create_edge_frame(
      name='Netflow',
      schema=[['epoch_time', xgt.INT],
              ['duration', xgt.INT],
              ['src_device', xgt.TEXT],
              ['dst_device', xgt.TEXT],
              ['protocol', xgt.INT],
              ['src_port', xgt.INT],
              ['dst_port', xgt.INT],
              ['src_packets', xgt.INT],
              ['dst_packets', xgt.INT],
              ['src_bytes', xgt.INT],
              ['dst_bytes', xgt.INT]],
      source=devices,
      target=devices,
      source_key='src_device',
      target_key='dst_device')
netflow

<xgt.graph.EdgeFrame at 0x7f79554868d0>

In [6]:
try:
  host_events = conn.get_edge_frame('HostEvents')
except xgt.XgtNameError:
  host_events = conn.create_edge_frame(
      name='HostEvents',
      schema=[['epoch_time', xgt.INT],
              ['event_id', xgt.INT],
              ['log_host', xgt.TEXT],
              ['user_name', xgt.TEXT],
              ['domain_name', xgt.TEXT],
              ['logon_id', xgt.INT],
              ['process_name', xgt.TEXT],
              ['process_id', xgt.INT],
              ['parent_process_name', xgt.TEXT],
              ['parent_process_id', xgt.INT]],
           source=devices,
           target=devices,
           source_key='log_host',
           target_key='log_host')
host_events

<xgt.graph.EdgeFrame at 0x7f7955482e90>

In [7]:
# Utility to print the sizes of data currently in xGT
def print_data_summary():
  print('Devices (vertices): {:,}'.format(devices.num_vertices))
  print('Netflow (edges): {:,}'.format(netflow.num_edges))
  print('Host events (edges): {:,}'.format(host_events.num_edges))
  print('Total (edges): {:,}'.format(
      netflow.num_edges + host_events.num_edges))
    
print_data_summary()

Devices (vertices): 0
Netflow (edges): 0
Host events (edges): 0
Total (edges): 0


In [8]:
%%time

# Load the HostEvents event data:
if host_events.num_edges == 0:
    urls = ["xgtd://nvme_data1/data_1v/wls_day-85_1v.csv"]
    host_events.load(urls)
    print_data_summary()

Devices (vertices): 10,324
Netflow (edges): 0
Host events (edges): 18,637,483
Total (edges): 18,637,483
CPU times: user 72.6 ms, sys: 41.9 ms, total: 115 ms
Wall time: 13.5 s


In [9]:
%%time

# Load the netflow data:
if netflow.num_edges == 0:
    urls = ["xgtd://nvme_data1/data_nf/nf_day-85.csv"]
    netflow.load(urls)
    print_data_summary()

Devices (vertices): 137,479
Netflow (edges): 235,661,328
Host events (edges): 18,637,483
Total (edges): 254,298,811
CPU times: user 433 ms, sys: 190 ms, total: 623 ms
Wall time: 1min 37s


In [10]:
# Utility function to launch queries and show job number:
#   The job number may be useful if a long-running job needs
#   to be canceled.

def run_query(query, table_name = "answers", drop_answer_table=True, show_query=False):
    if drop_answer_table:
        conn.drop_frame(table_name)
    if query[-1] != '\n':
        query += '\n'
    query += 'INTO {}'.format(table_name)
    if show_query:
        print("Query:\n" + query)
    job = conn.schedule_job(query)
    print("Launched job {}".format(job.id))
    conn.wait_for_job(job)
    table = conn.get_table_frame(table_name)
    return table

In [11]:
# Generate a new edge frame for holding only the RDP edges
import time
query_start_time = time.time()

conn.drop_frame('SSHFlow')
ssh = conn.create_edge_frame(
            name='SSHFlow',
            schema=netflow.schema,
            source=devices,
            target=devices,
            source_key='src_device',
            target_key='dst_device')
ssh

<xgt.graph.EdgeFrame at 0x7f795548c150>

In [12]:
%%time

#Extract forward RDP edges
q = """
MATCH (v0)-[edge:Netflow]->(v1)
WHERE edge.dst_port=22
CREATE (v0)-[e:SSHFlow {epoch_time : edge.epoch_time,
  duration : edge.duration, protocol : edge.protocol,
  src_port : edge.src_port, dst_port : edge.dst_port}]->(v1)
RETURN count(*)
"""
data = run_query(q)
print('Number of answers: {:,}'.format(data.get_data()[0][0]))

Launched job 990
Number of answers: 435,663
CPU times: user 30.8 ms, sys: 12.9 ms, total: 43.6 ms
Wall time: 2.84 s


In [13]:
data=None
if ssh.num_edges == 0:
    print("SSHFlow is empty")
elif ssh.num_edges <= 1000:
    data = ssh.get_data_pandas()
else:
    data = 'SSHFlow (edges): {:,}'.format(ssh.num_edges)
data

'SSHFlow (edges): 435,663'

In [14]:
# Utility to print the data sizes currently in xGT
def print_netflow_data_summary():
  print_data_summary()
  print('SSHFlow (edges): {:,}'.format(ssh.num_edges))

print_netflow_data_summary()

Devices (vertices): 137,479
Netflow (edges): 235,661,328
Host events (edges): 18,637,483
Total (edges): 254,298,811
SSHFlow (edges): 435,663


In [15]:
%%time

#Lateral Movement Query

SSH_HOP_TIME = 60
REBOOT_SIG_TIME = 180

q = """
MATCH (n1:Devices)-[r1:SSHFlow]->(n2:Devices), (n1)-[r2:SSHFlow]->(n2), (n2)-[rb:HostEvents]->(n2)
WHERE rb.event_id=1100
  AND n1 <> n2
  AND r1 <> r2
  AND r1.epoch_time <= r2.epoch_time
  AND r2.epoch_time - r1.epoch_time < {0}
  AND r2.epoch_time <= rb.epoch_time
  AND rb.epoch_time - r2.epoch_time < {1}
RETURN r1.src_device,r1.dst_device,r1.epoch_time,r2.epoch_time,rb.epoch_time
LIMIT 1000
""".format(SSH_HOP_TIME,REBOOT_SIG_TIME)
answer_table = run_query(q)
print('Number of answers: {:,}'.format(answer_table.num_rows))

Launched job 1018
Number of answers: 1,000
CPU times: user 58.4 ms, sys: 69.6 ms, total: 128 ms
Wall time: 1min 14s
