# Jupyter Notebook for Cybersecurity - Part 3/5

**The is part 3/5 of the project. While some of the instructions and code will repeat it is recommended that you review previous parts to get a complete understanding of the process.**

Jupter Notebook is an open interactive tool for large-scale data exploration, transformation, analysis, and visualization. It is built on Jupyter (formerly IPython) and is similar to Google Cloud's Datalab.

The data set used in this Jupyter Notebook is the **CobaltStrike_Hunting Google Doc** The sheet named **Cobalt Strike -Te-k research 2020** was downloaded and saved locally as a CSV file. 

Link to the Google Doc Sheet is: https://docs.google.com/spreadsheets/d/1bYvBh6NkNYGstfQWnT5n7cSxdhjSn1mduX8cziWSGrw/edit#gid=516128248

# Getting Started

## Install Libraries

First step: in your computer's command prompt enter the following two commands to install the pandas and matplotlib libraries:

Note: "Pip3" is a version of the pip installer for python3

Note: If you have already installed these during the earlier parts then you can skip the above step

## Import Libraries

Second Step: import the above installed libraries in your Jupyter Notebook with few additional

In [2]:
import pandas as pd
# data analysis and manipulation tool

import numpy as np
# mathematical functions

import matplotlib.pyplot as plt
# creating static, animated, and interactive visualizations

%matplotlib inline
# renders static images 

%matplotlib notebook
# renders dynamic interactive images 

import re
# regular expression library

import socket
# allows various network operations. Here its used to get hostname of an IP address

import ipapi
# https://github.com/ipapi-co/ipapi-python

# import json
# handling json format data

import time
# provides various time-related functions. Here its used to pause execution for few seconds

import sys
# access system-specific parameters and functions

In [3]:
df = pd.read_csv('cb_servers_small.csv')

##default display setting in the pandas library is to show just a few rows from the full output. You can override this to display all results:
# pd.set_option('display.max_rows', df.shape[0]+1)
# print(df)

# Default Data Display

By default, the pandas' library displays just a few rows of the full output. You can override this to display all of the output by executing the following code:

### Confirm a Successful Data Load

In [4]:
df.head()

Unnamed: 0,Host,SSL,Port,GET uri,POST uri,User Agent,Watermark
0,54.66.253.144,True,443,"54.66.253.144,/s/ref=nb_sb_noss_1/167-3294888-...",/N4215/adj/amzn.us.sr.aps,Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7....,562884990
1,103.243.183.250,True,443,"103.243.183.250,/search.js",/hr,Mozilla/5.0 (Linux; Android 6.0; HTC One X10 B...,305419896
2,185.82.126.47,True,443,"185.82.126.47,/pixel",/submit.php,Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...,305419896
3,94.156.174.121,True,443,"94.156.174.121,/watch",/ptracking,Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3...,76803050


## Data Iteration

In [8]:
# Lists contents of just the "Host" column
df['Host'].head()

0      54.66.253.144
1    103.243.183.250
2      185.82.126.47
3     94.156.174.121
Name: Host, dtype: object

In [9]:
# Iterate over the "Host" column
for i, row in df.iterrows():
    print(row['Host'])

54.66.253.144
103.243.183.250
185.82.126.47
94.156.174.121


### Save Iteration Results

In [12]:
# Save the results from the above iteration on the "Host" column to a new variable named "ip_list"
ip_list = []
for i, row in df.iterrows():
    my_list = (row['Host'])
    ip_list.append(my_list)
print(ip_list)

['54.66.253.144', '103.243.183.250', '185.82.126.47', '94.156.174.121']


## Enrich Data

The IP addresses by themselves are not as useful. We will use a free service (ipapi[.]co) to pull intelligence on the IP addresses such as their geo-location, ASN, and hosting provider.

In [13]:
# create an empty list to store results
ip_res_result = []

# iterate each IP in the "ip_list" that was created earlier and enrich each IP using the ipapi library
for ip in ip_list:
    try:
        ip_res_result.append((ipapi.location(ip)))
        time.sleep(5)
    except Exception as e:
        ip_res_result.append(e)

In [15]:
# print results of the "ip_res_result" which now contains enriched data in json format
ip_res_result

[{'asn': 'AS16509',
  'city': 'Sydney',
  'continent_code': 'OC',
  'country': 'AU',
  'country_area': 7686850.0,
  'country_calling_code': '+61',
  'country_capital': 'Canberra',
  'country_code': 'AU',
  'country_code_iso3': 'AUS',
  'country_name': 'Australia',
  'country_population': 24992369.0,
  'country_tld': '.au',
  'currency': 'AUD',
  'currency_name': 'Dollar',
  'in_eu': False,
  'ip': '54.66.253.144',
  'languages': 'en-AU',
  'latitude': -33.8591,
  'longitude': 151.2002,
  'org': 'AMAZON-02',
  'postal': '2000',
  'region': 'New South Wales',
  'region_code': 'NSW',
  'timezone': 'Australia/Sydney',
  'utc_offset': '+1000',
  'version': 'IPv4'},
 {'asn': 'AS133115',
  'city': 'Kwai Chung',
  'continent_code': 'AS',
  'country': 'HK',
  'country_area': 1092.0,
  'country_calling_code': '+852',
  'country_capital': 'Hong Kong',
  'country_code': 'HK',
  'country_code_iso3': 'HKG',
  'country_name': 'Hong Kong',
  'country_population': 7451000.0,
  'country_tld': '.hk',
  '

# JSON to Pandas Dataframe

The "ip_res_result" list contains data in JSON format. We need to convert it to Pandas dataframe

In [18]:
# convert json list to Pandas Dataframe "df_json"
df_json = pd.DataFrame.from_records(ip_res_result)
#https://stackoverflow.com/questions/48687857/python-json-list-to-pandas-dataframe

In [19]:
# print the contents of the "df_json" which now contains the earlier JSON data as a dataframe
df_json

Unnamed: 0,asn,city,continent_code,country,country_area,country_calling_code,country_capital,country_code,country_code_iso3,country_name,...,languages,latitude,longitude,org,postal,region,region_code,timezone,utc_offset,version
0,AS16509,Sydney,OC,AU,7686850.0,61,Canberra,AU,AUS,Australia,...,en-AU,-33.8591,151.2002,AMAZON-02,2000,New South Wales,NSW,Australia/Sydney,1000.0,IPv4
1,AS133115,Kwai Chung,AS,HK,1092.0,852,Hong Kong,HK,HKG,Hong Kong,...,"zh-HK,yue,zh,en",,,HK Kwaifong Group Limited,,Tsuen Wan,,,,IPv4
2,AS52173,Riga,EU,LV,64589.0,371,Riga,LV,LVA,Latvia,...,"lv,ru,lt",56.9496,24.0978,Sia Nano IT,LV-1058,Riga,RIX,Europe/Riga,300.0,IPv4
3,AS44901,Sofia,EU,BG,110910.0,359,Sofia,BG,BGR,Bulgaria,...,"bg,tr-BG,rom",42.697708,23.321868,Belcloud LTD,,Sofia-grad,,,,IPv4


# Data Exploration

Basic exploration of the new dataframe "df_json"

In [22]:
# basic information of the data such as total columns, total entries, data types
df_json.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 26 columns):
asn                     4 non-null object
city                    4 non-null object
continent_code          4 non-null object
country                 4 non-null object
country_area            4 non-null float64
country_calling_code    4 non-null object
country_capital         4 non-null object
country_code            4 non-null object
country_code_iso3       4 non-null object
country_name            4 non-null object
country_population      4 non-null float64
country_tld             4 non-null object
currency                4 non-null object
currency_name           4 non-null object
in_eu                   4 non-null bool
ip                      4 non-null object
languages               4 non-null object
latitude                3 non-null float64
longitude               3 non-null float64
org                     4 non-null object
postal                  2 non-null object
region         

In [23]:
# read just headers

df_json.columns

Index(['asn', 'city', 'continent_code', 'country', 'country_area',
       'country_calling_code', 'country_capital', 'country_code',
       'country_code_iso3', 'country_name', 'country_population',
       'country_tld', 'currency', 'currency_name', 'in_eu', 'ip', 'languages',
       'latitude', 'longitude', 'org', 'postal', 'region', 'region_code',
       'timezone', 'utc_offset', 'version'],
      dtype='object')