Skip to content
This repository has been archived by the owner on May 15, 2019. It is now read-only.

Threat Investigation

elopezsa edited this page Sep 21, 2016 · 5 revisions

Purpose and Audience

This section contains a walk-through of the Threat Investigation analyst view. The intended audience is Security Analysts responsible for reviewing the results for potential threats. The Threat Investigation notebook provides a way to perform a more detailed analysis of the connections previously scored as high risk. Users will select a day to investigate, starting at the Suspicious Connects section to later get to the detailed analysis performed with a Threat Investigation Jupyter notebook.

###Walk-through Access the analyst view for suspicious connects http://“server-ip":8889/files/ui/flow/suspicious.html Select the date that you want to review. Your screen should now look like this:

The analyst must score the suspicious connections before moving into Threat Investigation View, please refer to Suspicious Connects Analyst View walk-through

Select Flows > Threat Investigation from Open Network Insight Menu.

Threat Investigation Web Page will be opened, loading the embedded Jupyter notebook.

Expanded search

You can select any IP from the list and click "Search"� to view specific details about it. A query to the flow table will be executed looking into the raw data initially collected to find all communication between this and any other IP Addresses during the day, collecting additional information, such as:

  • max & avg number of bytes sent/received
  • max & avg number of packets sent/received
  • destination port
  • source port
  • first & last connection time
  • count of connections

The full output of this query is stored into the ir-<ip>.csv file. If an expanded search was previously executed on this IP, the system will extract the results from the preexisting file to reduce the execution time by avoiding another query to the table. Query execution time is long and will vary depending on whether Hive or Impala is being used.

Based on the results in this file, the following functions will be executed:

get_in_out_and_twoway_conns
add_geospatial_info()
add_network_context() 

The system will create three dictionaries, each containing:

  • Inbound connections (when the suspicious IP acts only as destination)
  • Outbound connections (when the suspicious IP acts only as source)
  • 2Way Connections (when the suspicious IP acts as both source and destination)

If an iploc.csv file is available, each dictionary will be updated with the geolocation data for each IP.
If a network_context_1.txt file is available, a description for each identified node will also be added to each dictionary.

The connections dictionary will be separated into two smaller dictionaries, each containing

  • Top 'n' IP's per number of connections.
  • Top 'n' IP's per bytes transferred.
    The number of results stored in the dictionaries (n) can be set by updating the value of the top_results variable.

Save Comments

In addition, a web form is displayed under the title of 'Threat summary', where the analyst can enter a Title & Description on the kind of attack/behavior described by the particular IP address that is under investigation.

Click on the Save button after entering the data to write it into a CSV file, which eventually will be used in the Storyboard Analyst View.

After creating the csv file with the analysis description, the following functions will generate all graphs and diagrams related to the IP under investigation, to populate the Storyboard Analyst view.

generate_attack_map_file(anchor_ip, top_inbound_b, outbound, twoway)
generate_stats(anchor_ip, top_inbound_b, outbound, twoway, threat_name)
generate_dendro(anchor_ip, top_inbound_b, outbound, twoway, date)
details_inbound(anchor_ip,top_inbound_b)

generate_attack_map_file() - create a globe map indicating the trajectory of the connections based on their geolocation. This function depends on having geolocation data for each IP. If you haven't set up a geolocation database file, the map file won't be generated.
Output: globe_<ip>.json

generate_stats() - This will create the horizontal bar graph for the Impact Analysis. This will represent the number of inbound, outbound and twoway connections found.
Output: stats-<ip>.json

generate_dendro() - This function creates a file linking all different IP's that have connected to the IP under investigation, this will be displayed in the Storyboard under the Incident Progression panel as a dendrogram.
If no network context file is included, the dendrogram will only be 1 level deep, but if a network context file is included, additional levels will be added to the dendrogram to break down the threat activity.
Output: dendro-<ip>.json

details_inbound() - This function executes a query to the flow table, to find additional details on the IP under investigation and its connections grouping them by time; so the result will be a graph showing the number of connections occurring in a customizable timeframe.
Output: sbdet-<ip>.tsv

add_threat() - This function updates/creates the threats.csv file, appending a new line for every threat analyzed. This file will serve as an index for the Storyboard and is displayed in the 'Executive Threat Briefing' panel.
Output: threats.csv

Each function will print a message to let you know if its output file was successfully updated.

Continue to the Storyboard

Once you have saved comments on any suspicious IP, you can continue to the Storyboard to check the results.

Input files

flow_scores.csv  
iploc.csv
network_context_1.txt  

Output files

/oni-oa/data/flow/<date>/threats.csv  
/oni-oa/data/flow/<date>/threat_<ip>.csv  
/oni-oa/data/flow/<date>/sbdet-<ip>.tsv  
/oni-oa/data/flow/<date>/globe_<ip>.json  
/oni-oa/data/flow/<date>/stats-<ip>.json  
/oni-oa/data/flow/<date>/dendro-<ip>.json  

HDFS tables consumed

flow