# Data manipulation and analysis
The main objective of this notebook is to change and process some data in order to make some interesting plots using d3 library. I think that this way it will be easier to understand the telescope and how the IP addresses behave between all the requests that it receives.


## Dataset
This notebook utilizeses a set of datasets that contains different informations about the network telescope including ip.dst data, ip.src data, frame.protocols, timeseries (UNIXTIME) and volume of packet in bytes. Each dataset contains a .csv file that tries to produce an interesting visualization of that are of the telescope. All the data was collected from a /19 telescope between 2023-12-14 and 2024-01-14.

### IP dst
That dataset contains the number of requests and volume in bytes that each of our IP addresses received.

### Protocols IP dst
That dataset contains the number of requests and volume received by each IP address segreggating between the type of protocols that were collected.

### IP src
That dataset contains the information about the attackers IP.srcs that sent some requests to our telescope. (How much different Ips each one attacked, the volume and the quantity of packets sent)

### AS information
That dataset shows the AS srcs that were utilized and some geographical location of those, as well as the number of requests and volume

# Table of content
1.[Requests per Ip destination](#Requests-per-ip-destination)

# Requets per IP destination
Our csv in this step, only contains information about the ip.src in the format ('xxx.xxx.xxx.xxx') we need to convert that to an integer and to separate that in the way to get (---.---.xxx.yyy) types of data (don't need the first 2 octects as they are the same in the whole telescope.

In [1]:
!pip3 install pandas 



In [4]:
import pandas as pd
import ipaddress
df = pd.read_csv('../data/hilbert/csvs/ip_dst.csv')
df.head()

Unnamed: 0,dst,packet_total,volume_total
0,200.236.63.99,285255,18404202
1,200.236.63.98,275735,17846544
2,200.236.63.97,274255,17732546
3,200.236.63.96,276651,17902022
4,200.236.63.95,277501,17957421


# Expanding the ip field
First we use the ipaddress library to transform the ip.dst into a integer and extract the third and fourth octect

In [6]:
df['ip_int'] = df['dst'].apply(lambda x: int(ipaddress.IPv4Address(x)))
df['third_octet'] = df['dst'].apply(lambda x: int(x.split('.')[2]))
df['fourth_octet'] = df['dst'].apply(lambda x: int(x.split('.')[3]))
df.head()

Unnamed: 0,dst,packet_total,volume_total,ip_int,third_octet,fourth_octet
0,200.236.63.99,285255,18404202,3370925923,63,99
1,200.236.63.98,275735,17846544,3370925922,63,98
2,200.236.63.97,274255,17732546,3370925921,63,97
3,200.236.63.96,276651,17902022,3370925920,63,96
4,200.236.63.95,277501,17957421,3370925919,63,95


In [7]:
# Saving into a csv
df.to_csv('../data/hilbert/csvs/ip_dst_processed.csv')