Skip to content

Latest commit

 

History

History
239 lines (130 loc) · 13.8 KB

File metadata and controls

239 lines (130 loc) · 13.8 KB

Exploring Event History - Dashboards and Run Books for Analysis, Threat Hunting and Investigations

Objectives

After this lab you will be able to:

  1. Describe how to access the events stored in the index and HDFS to create dashboards, analytics for threat hunting, and run books for investigations.

Background

After an event is fully enriched and triaged, Metron automatically stores the event in an index (Solr or Elastic Search) and in HDFS. HDFS stores the entire history of events for the specified retention period. The index typically stores several months of the most recent events. The retention period for each storage mechanism is configurable and there are several cost effective options for storing and analyzing years of events.

Apache Metron and the Hadoop ecosystem offer many options for analyzing stored cybersecurity events. For example:

  1. Metron Alerts UI
  2. Apache Zeppelin Notebooks using Apache Spark
  3. Banana and Kibana Dashboards
  4. Third party business intelligence tools

Metron Alerts UI

The Metron Alerts UI reads events from the index which provides a very flexible search API that quickly filters the haystack to a subset of events. See the Getting started lab for instructions on opening the Metron Alerts UI. Once the UI is open, enter a simple or complex query in the UI and the filtered events will display in the UI. Click on the headers to sort ascending or descending by an event field. Complete the earlier labs to see the Alerts UI in action.

Apache Zeppelin and Spark

Apache Zeppelin is an open source web based notebook for recording and sharing analytics and visualizations of event data stored in both HDFS and the index. Zeppelin offers a number of interpeters including shell commands, Apache Spark and SQL queries. Different interpreters can be combined in the same notebook as can python, scala and R code.

Apache Spark is a distributed analytics engine capable of analyzing the entire haystack and building machine learning models. Spark supports scala, python, java, and R programming languages.

Spark code included in a notebook records commands for common cyber security analytics tasks such as dashboard visualizations, investigation run books, and threat hunting queries and visualizations.

Exploring a Zeppelin Run Book

  1. Open Ambari in the browser:
    http://mobius.local:8080/
  2. Select Zeppelin Notebook on the left hand side of the browser window.
  3. Pull down the Quick Links menu to the right of the Summary and Configs tab and select Zeppelin UI.

  1. The Zeppelin page appears. Click the Login button on the upper right side of the page.

  1. Enter the Zeppelin credentials (admin/admin). Click Login.

  1. The Welcome to Zeppelin page opens. Click on the Auth Investigation notebook link.

  1. The Auth Investigation notebook is an example of a run book to assist a SOC analyst when investigating an authentication alert. A notebook consists of a set of paragraphs. Each paragraph starts with a % followed by an interpreter name. For example the first paragraph uses the shell interpreter. The second paragraph uses the spark2 interpreter. The third paragraph uses the spark2.sql interpreter. Following the interpreter is some commands or code in the language required by the interpreter.

  1. For better readability, the notebook author can show or hide the commands and command output independently. The first paragraph lists the auth events files stored in HDFS. The output is hidden. Click on the Show output icon to the right of the "Find auth events in HDFS paragraph".

  1. Metron groups events together and writes the fully enriched and triaged events to HDFS files in the directory /apps/metron/indexing/indexed/. The hdfs dfs -ls shell command lists the auth events stored in HDFS.

  1. A quick way to explore the events is to load them into a Spark dataframe, create a temporary table, and use SQL to access the events in the table. The spark.read.json function reads all the json auth event files in the /apps/metron/indexing/indexed/auth HDFS directory and loads them into a DataFrame. The createOrReplaceTempView function creates a temporary Spark SQL table called auth_events. Once the table is created, we can query the auth events using SQL.

Click the Run button to the right of the "Register auth events as table so we can query" paragraph to reload the temporary table.

In the spark2.sql "Which users log into many distinct hosts?" paragraph, Zeppelin sends the query over the auth_events temporary table to Spark and displays the results in a table.

  1. Zeppelin supports other query result visualizations including bar charts, pie charts, line charts, area charts, line charts and scatter charts. Scroll down to the two line charts "Distinct host logons by time" and "Number of Logons by Time".

  1. Click on the Show editor button on the upper right of the "Distinct host logons by time" paragraph to show the SQL that produced the points in the line graph below. The value of the user in the where clause is parameterized. Zeppelin replaces ${user} in the query with the value entered in the user field text entry (U66). Parameterized queries are helpful in run books or for threat hunting when you want to use the same visualization for different entities.

To change the visualization, click on the tool bar below the user text entry field.

  1. Scroll through the visualizations in the rest of the notebook. Go to the bottom of the notebook. You will see an empty new paragraph.

  1. Enter the commands below and click Run. The number of events stored in HDFS should display in a table below.
%spark2.sql
select count(*) from auth_events

If you see the error below, scroll back up to the top of the notebook and click the run button on the "Register auth events as table so we can query paragraph". Then run the query again:

Table or view not found: auth_events; line 1 pos 21
set zeppelin.spark.sql.stacktrace = true to see full stacktrace

  1. Notebooks automate many common analytics tasks. Once the data is loaded into a Spark DataFrame, you can perform common queries for threat hunting or other explorations. The data loaded into a dataframe can also be used for training machine learning models.

Creating a Solr Banana Dashboard

Banana is another option for visualizing data stored in Solr indices.

  1. In the browser, enter the Banana url:

http://mobius.local.localdomain:8983/solr/banana

  1. The Basic Dashboard with Pointers appears.

  2. Click on the New icon in the upper right. Select Time-series dashboard.

  1. The New Dashboard Settings dialog appears. Enter mysquid in the Collection Name. Enter timestamp_solr in the Time Field. Click Create.

  1. The New Time Series Dashboard appears with default visualizations. The dashboard consists of horizontal rows. Each row has one or more panels that display data.

  1. Locate Total Hits in the upper right of the dashboard. The total hits is zero. Click on the gear icon in the Total Hits panel.

  1. The Hits Setting dialog opens.

  1. Click on the Panel tab.

  2. Enter guid in the Field. Click Close.

  1. The Total Hits will now display the number of mysquid events in the time period for the dashboard.

  1. The Time Window in the upper left controls the time period of the data displayed in the dashboard. Time periods are relative to the current time by default.

  1. Click on the disk icon to save the dashboard. Enter squid in the text entry field. Click the disk icon to save.

  1. To select the dashboard displayed, select the folder icon. Click on the squid link at the bottom of the dialog.

  1. Check Auto-refresh and the dashboard will read the latest data from the index and update the visualization.

  1. Scroll to the bottom of the window and click + Add a row.

  1. Enter Map in the Title and 300px in the Height. Click Create Row.

  1. Click the up arrow for the Map row to move the Map row above the Table. Map should now be the second to last row. Click Close.

  1. An empty row now appears underneath the Event Counts histograms.

  1. Click Add panel to emptry row.

  2. Click the + to the right of Panels.

  1. Select map for the Panel Type. Enter Destinations in Title. Select 6 for the Span. The Span controls the width of the panel. A panel extending across the full width of the row is Span 12. Enter enrichments.geo.ip_dst_addr.country. Click Add Panel.

  1. Select sunburst for the Panel Type. Enter Cities in the Title. Select 6 for the Span. Enter enrichments.geo.ip_dst_addr.country,enrichments.geo.ip_dst_addr.city in the Facet Pivot String. Click Add Panel.

  1. Click Close.

  1. The dashboard now shows the location of the events on a map and in a sunburst. The countries with more events have darker colors. Hover over the country and the number of events destined to the country pops up.

  1. Hover over the inner ring of the sunburst to see the breakdown of the number of events destined for each country. The outer ring of the sunburst shows the breakdown of the events destined for each city.

  1. Go to the top right and click the save button.

  2. Add a new row called Bytes and move it above the Table. Click Close.

  1. Scroll the dashboard up and locate the new empty row. Add a histgram panel. Enter Bytes for the Title. Enter 12 in the Span. For Mode select values. Enter Value Field bytes.

  1. Scroll down to Chart Settings. Check Lines and uncheck Bars. Click Add Panel.

  1. A line graph of the bytes over time is now shown on the dashboard.

  1. Click Save to save the Dashboard.

Using a Business Intelligence Tool

Many business intelligence tools include connectors to Solr, Elastic Search and Spark SQL. For example, ZoomData is convenient for creating visualizations and dashboards from Metron events with no programming.

References

Tutorials

Spark Dataframe and DataSet Tutorial

Getting started with Apache Zeppelin Tutorial

Intro to Machine Learning with Apache Spark and Apache Zeppelin Tutorial

Documentation

Apache Spark Component Guide

Apache Zeppelin Component Guide