# Let's Focus on Behavior!
Hunt1 used our knowledge of the scanning and enumeration tool (nmap) to pickup on an anomaly. In Hunt2, we want to assume our attacker is better at hiding their tracks and we will instead focus on relationships and behaviors of clients/servers or consumers/producers. This is something that is not trivial for the attacker to manipulate or obfuscate (i.e. Pyramid of Pain). The user-agent string is controlled by the sending host/attacker and can be easily manipulated. However, the response code from the server is not something that the attacker gets to manipulate.

In [7]:
import json
from datetime import datetime, timedelta
import matplotlib.pylab as plot
import matplotlib.pyplot as plt
from matplotlib import dates
import pandas as pd
import numpy as np

import matplotlib
matplotlib.style.use('ggplot')
%matplotlib inline

In [8]:
# Read data from http bro logs
with open("http.log",'r') as infile:
    file_data = infile.read()
    
# Split file by newlines
file_data = file_data.split('\n')

# Remove comment lines
http_data = []
for line in file_data:
    if line[0] is not None and line[0] != "#":
        http_data.append(line)

## The response codes or status codes from the HTTP requests can help us determine if scanning behavior is occurring agains our network. The following will track the response codes and the time in which the code occurred.

In [9]:
# Analyze status codes
status_code_analysis = {}
status_code_overall = {}
earliest_time = None
latest_time = None
for line in http_data:
    
    # Extract the timestamp
    timestamp = datetime.fromtimestamp(float(line.split('\t')[0]))
    
    # Strip minute, second and microsecond from timestamp
    timestamp = str(timestamp.replace(second=0,microsecond=0))
    
    # Extract the status code
    status_code = line.split('\t')[14]
    
    # Update status code analysis variable
    if status_code not in status_code_analysis.keys():
        status_code_analysis[status_code] = {timestamp: 1}
    else:
        if timestamp not in status_code_analysis[status_code].keys():
            status_code_analysis[status_code][timestamp] = 1
        else:
            status_code_analysis[status_code][timestamp] += 1
            
    # Update overall status code count
    if status_code not in status_code_overall.keys():
        status_code_overall[status_code] = 1
    else:
        status_code_overall[status_code] += 1
    
    # Update our earliest and latest time as needed
    if earliest_time is None or timestamp < earliest_time:
        earliest_time = timestamp
    if latest_time is None or timestamp > latest_time:
        latest_time = timestamp

In [None]:
# Format data for the plot function
status_label = []
data = []
for code in sorted(status_code_overall.keys()):
    status_label.append(str(code) + " (" + str(status_code_overall[code]) + ")")
    data.append(status_code_overall[code])

plot.figure(1,figsize=[8,8])
patches, texts = plot.pie(data, shadow=True, startangle=90)
plot.legend(patches, status_label,loc="best")
plot.title('Status Code Distribution')
plot.axis('equal')
plot.tight_layout()
plot.show()

## The 200 responses are the majority and also don't seem to indicate anything malicious. This process of observing baselines and carving away the good to see the bad is discussed in various Threat Hunting white papers. See https://www.netresec.com/?page=Blog&month=2015-08&post=Rinse-Repeat-Intrusion-Detection

In [None]:
# Remove the 200 status code and re-plot the status codes
status_code_analysis2 = status_code_analysis
if '200' in status_code_analysis2.keys():
    del status_code_analysis2['200']
#print(status_code_analysis2.keys())
df2 = pd.DataFrame.from_dict(status_code_analysis2,orient='columns').fillna(0)
df2.plot(rot=90, figsize=(12,9))

## In the above graph, there is something interesting occurring with the number and types of response codes. Can you see the anomaly?
The status code spike, around 5am, is indicative of enumeration/scanning tools triggering a multitude of error responses.