# Jupyter Notebook for Cybersecurity - Part 1/5

Jupter Notebook is an open interactive tool for large-scale data exploration, transformation, analysis, and visualization. It is built on Jupyter (formerly IPython) and is similar to Google Cloud's Datalab.


The data set used in this Jupyter Notebook is the **CobaltStrike_Hunting Google Doc** The sheet named **Cobalt Strike -Te-k research 2020** was downloaded and saved locally as a CSV file. 

Link to the Google Doc Sheet is: https://docs.google.com/spreadsheets/d/1bYvBh6NkNYGstfQWnT5n7cSxdhjSn1mduX8cziWSGrw/edit#gid=516128248

# Getting Started

## Install Libraries

First step: in your computer's command prompt enter the following two commands to install the pandas and matplotlib libraries: 

Note: "Pip3" is a version of the pip installer for python3

## Import Libraries

Second Step: import the above installed libraries in your Jupyter Notebook

In [3]:
import pandas as pd
# data analysis and manipulation tool

import numpy as np
# mathematical functions

import matplotlib.pyplot as plt
#creating static, animated, and interactive visualizations

%matplotlib inline
# renders static images 

%matplotlib notebook
# renders dynamic interactive images 

## Load  Data

The third and last step in Getting Started is to load your raw data. This is much easier if you have your raw data saved in the same directory where you are running Jupyter Notebook from.

In [5]:
df = pd.read_csv('cb_servers_small.csv')


### Confirm a Successful Data Load

In [6]:
#print just the top (x) rows
print(df.head(3))

              Host   SSL  Port  \
0    54.66.253.144  True   443   
1  103.243.183.250  True   443   
2    185.82.126.47  True   443   

                                             GET uri  \
0  54.66.253.144,/s/ref=nb_sb_noss_1/167-3294888-...   
1                         103.243.183.250,/search.js   
2                               185.82.126.47,/pixel   

                    POST uri  \
0  /N4215/adj/amzn.us.sr.aps   
1                        /hr   
2                /submit.php   

                                          User Agent  Watermark  
0  Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7....  562884990  
1  Mozilla/5.0 (Linux; Android 6.0; HTC One X10 B...  305419896  
2  Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...  305419896  


In [7]:
#print just the bottom (x) rows
print(df.tail(3))

              Host   SSL  Port                     GET uri     POST uri  \
1  103.243.183.250  True   443  103.243.183.250,/search.js          /hr   
2    185.82.126.47  True   443        185.82.126.47,/pixel  /submit.php   
3   94.156.174.121  True   443       94.156.174.121,/watch   /ptracking   

                                          User Agent  Watermark  
1  Mozilla/5.0 (Linux; Android 6.0; HTC One X10 B...  305419896  
2  Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...  305419896  
3  Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3...   76803050  


# Data Exploration

Exploratory Data Analysis: understanding the data at a high-level and summarize its main characteristics

In [8]:
# quick snapshot of the dataset
df.head()

Unnamed: 0,Host,SSL,Port,GET uri,POST uri,User Agent,Watermark
0,54.66.253.144,True,443,"54.66.253.144,/s/ref=nb_sb_noss_1/167-3294888-...",/N4215/adj/amzn.us.sr.aps,Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7....,562884990
1,103.243.183.250,True,443,"103.243.183.250,/search.js",/hr,Mozilla/5.0 (Linux; Android 6.0; HTC One X10 B...,305419896
2,185.82.126.47,True,443,"185.82.126.47,/pixel",/submit.php,Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...,305419896
3,94.156.174.121,True,443,"94.156.174.121,/watch",/ptracking,Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3...,76803050


In [9]:
#basic information of the data such as total columns, total entries, data types
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 7 columns):
Host          4 non-null object
SSL           4 non-null bool
Port          4 non-null int64
GET uri       4 non-null object
POST uri      4 non-null object
User Agent    4 non-null object
Watermark     4 non-null int64
dtypes: bool(1), int64(2), object(4)
memory usage: 196.0+ bytes


In [10]:
# print statistical values of applicable columns.
## not meaningful in this current dataset
df.describe()

Unnamed: 0,Port,Watermark
count,4.0,4.0
mean,443.0,312632000.0
std,0.0,198616800.0
min,443.0,76803050.0
25%,443.0,248265700.0
50%,443.0,305419900.0
75%,443.0,369786200.0
max,443.0,562885000.0


In [16]:
#read just headers

df.columns

Index(['Host', 'SSL', 'Port', 'GET uri', 'POST uri', 'User Agent',
       'Watermark'],
      dtype='object')

## Closer Look at Specifc Data Values

In [11]:
# read a specific column

df['Host']

0        54.66.253.144
1      103.243.183.250
2        185.82.126.47
3       94.156.174.121
4       194.36.191.118
            ...       
517    167.179.105.132
518       23.106.160.2
519      103.143.28.25
520      98.142.141.43
521        146.0.72.91
Name: Host, Length: 522, dtype: object

In [12]:
# read specific columns
df[['Host', 'User Agent']]

Unnamed: 0,Host,User Agent
0,54.66.253.144,Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7....
1,103.243.183.250,Mozilla/5.0 (Linux; Android 6.0; HTC One X10 B...
2,185.82.126.47,Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...
3,94.156.174.121,Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3...


In [13]:
# read specific row number
(df.iloc[2])

Host                                              185.82.126.47
SSL                                                        True
Port                                                        443
GET uri                                    185.82.126.47,/pixel
POST uri                                            /submit.php
User Agent    Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...
Watermark                                             305419896
Name: 2, dtype: object

In [14]:
# read a range of row numbers
(df.iloc[2:5])

Unnamed: 0,Host,SSL,Port,GET uri,POST uri,User Agent,Watermark
2,185.82.126.47,True,443,"185.82.126.47,/pixel",/submit.php,Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...,305419896
3,94.156.174.121,True,443,"94.156.174.121,/watch",/ptracking,Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3...,76803050


In [17]:
# read value from a specifc row, column location
(df.iloc[1,5])

## Note:count for both rows and columns starts at 0

'Mozilla/5.0 (Linux; Android 6.0; HTC One X10 Build/MRA58K; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0'

## Data Iteration

In [18]:
#iterate rows by reading all of the data for the first row, then the second row, etc.

for index, row in df.iterrows():
    print(index,row)

0 Host                                              54.66.253.144
SSL                                                        True
Port                                                        443
GET uri       54.66.253.144,/s/ref=nb_sb_noss_1/167-3294888-...
POST uri                              /N4215/adj/amzn.us.sr.aps
User Agent    Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7....
Watermark                                             562884990
Name: 0, dtype: object
1 Host                                            103.243.183.250
SSL                                                        True
Port                                                        443
GET uri                              103.243.183.250,/search.js
POST uri                                                    /hr
User Agent    Mozilla/5.0 (Linux; Android 6.0; HTC One X10 B...
Watermark                                             305419896
Name: 1, dtype: object
2 Host                                              18

In [26]:
#iterate over the "User Agent" row 

for index, row in df.iterrows():
    print(index,row['User Agent'])

0 Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko
1 Mozilla/5.0 (Linux; Android 6.0; HTC One X10 Build/MRA58K; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0
2 Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; MASB)
3 Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko)


In [25]:
#filter on the "POST uri" column with the value of "/submit.php"

df.loc[df['POST uri'] == "/submit.php"]


Unnamed: 0,Host,SSL,Port,GET uri,POST uri,User Agent,Watermark
2,185.82.126.47,True,443,"185.82.126.47,/pixel",/submit.php,Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...,305419896


## Sort Data

In [34]:
#sort by the "Host" column
df.sort_values('Host')

Unnamed: 0,Host,SSL,Port,GET uri,POST uri,User Agent,Watermark
295,100.24.69.72,True,443,"one.vhy.me,/__utm.gif",/___utm.gif,Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7....,1363933851
291,101.32.186.10,True,443,"cs.lg22l.com,/Content",/br,Mozilla/5.0 (iPhone; CPU iPhone OS 12_0 like M...,305419896
319,103.114.162.182,True,443,"156.226.191.234,/_/scs/mail-static/_/js/,djiqo...",/mail/u/0/,Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...,305419896
253,103.114.162.7,True,443,"62.236.206.5,/cx",/submit.php,Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...,829629313
436,103.126.6.149,True,443,"103.126.6.149,/jquery-3.3.1.min.js",/jquery-3.3.2.min.js,Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:1...,305419896
...,...,...,...,...,...,...,...
174,95.217.197.78,True,443,"oomdatacollect.global.ssl.fastly.net,/pixel.gif",/submit.php,Mozilla/4.0 (compatible; MSIE 7.0; Windows NT ...,0
471,95.217.197.85,True,443,"oomdatacollect.global.ssl.fastly.net,/pixel.gif",/submit.php,Mozilla/4.0 (compatible; MSIE 7.0; Windows NT ...,0
520,98.142.141.43,True,443,"www.nameshow.site,/jquery-3.3.1.min.js",/jquery-3.3.2.min.js,Mozilla/5.1 (Windows NT 6.4; Trident/7.1; rv:1...,305419896
226,98.142.143.100,True,443,"d3kgm44zuz83i3.cloudfront.net,/access/",/radio/xmlrpc/v35,Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKi...,305419896


In [20]:
#sort by "Host" column and in a descending order
df.sort_values('Host', ascending=False)

Unnamed: 0,Host,SSL,Port,GET uri,POST uri,User Agent,Watermark
3,94.156.174.121,True,443,"94.156.174.121,/watch",/ptracking,Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3...,76803050
0,54.66.253.144,True,443,"54.66.253.144,/s/ref=nb_sb_noss_1/167-3294888-...",/N4215/adj/amzn.us.sr.aps,Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7....,562884990
2,185.82.126.47,True,443,"185.82.126.47,/pixel",/submit.php,Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...,305419896
1,103.243.183.250,True,443,"103.243.183.250,/search.js",/hr,Mozilla/5.0 (Linux; Android 6.0; HTC One X10 B...,305419896


## Filter Data

In [24]:
#filter on "User Agent" value of "Shockwave Flash"
df.loc[(df['User Agent'] == 'Shockwave Flash')]

Unnamed: 0,Host,SSL,Port,GET uri,POST uri,User Agent,Watermark


In [22]:
#filter on "User Agent" value of "Shockwave Flash" OR "User Agent" value "Microsoft BITS/7.8"

df.loc[(df['User Agent'] == 'Shockwave Flash') | (df['User Agent'] == 'Microsoft BITS/7.8')]

## Note: instead of an "OR" you can look at "AND" by changing "|" to  "&"

Unnamed: 0,Host,SSL,Port,GET uri,POST uri,User Agent,Watermark


In [23]:
#filter on a column that "contains" a certain value

df.loc[df['GET uri'].str.contains('login')]

Unnamed: 0,Host,SSL,Port,GET uri,POST uri,User Agent,Watermark
