# Botnet analysis

Some statistical mesures on the botnet dataset.

In [54]:
from botdet.data.dataset_botnet import BotnetDataset

import plotly.express as px
import networkx as nx

from collections import OrderedDict

We are using the _P2P_ dataset. To obtain mesurements, we need the labels, so the training section is used. 

_NetworkX_ has tons of statistical graph mesurements, so the graphs are loaded in that format. However, constructing them is very
inefficient, so for more heavy calculations, they are also loaded as a plain dictionary.

And because of memory limitations, the whole dataset is not loaded into memory.

In [3]:
data_nx = BotnetDataset(
  name='p2p',
  split='train',
  graph_format='nx',
  in_memory=False
)

data_dict = BotnetDataset(
  name='p2p',
  split='train',
  graph_format='dict',
  in_memory=False
)

## Botnet evolution

First, we check the evolution of the botnet size.

In [10]:
seg = range(len(data_dict))

px.line(
  x=seg, 
  y=[data_dict[i]['num_evils'] for i in seg],
  labels={'x':'tiempo', 'y':'número de bots'}
)

## First network

We will apply some mesurements to the first network of the dataset.

First, we extract the botnet from the total network.

In [15]:
from networkx.classes.function import info

bots = [n for n, attr in data_nx[0].nodes(data=True) if attr['is_bot'] == 1]
botnet = data_nx[0].subgraph(bots)
print(info(botnet))

Name: 
Type: Graph
Number of nodes: 3111
Number of edges: 6779
Average degree:   4.3581


### Degree

In [51]:
dgs = [dg for _, dg in botnet.degree()]

In [52]:
px.histogram(x=dgs, labels={'x': 'grado'}, nbins=max(dgs), log_y=True, log_x=True)

### Clustering

In [41]:
from networkx.algorithms import cluster

c = cluster.clustering(botnet)



In [72]:
clt = {}
for key, val in c.items():
    if val in clt:
        clt[val] += 1
    else:
        clt[val] = 1

vals, freq = zip(*sorted(clt.items()))
px.scatter(x=vals, y=freq, log_y=True, labels={'x':'clustering', 'y':'frecuencia'})

### Efficiency

In [73]:
from networkx.algorithms import efficiency_measures

efficiency_measures.local_efficiency(botnet)

0.9852968321261513

In [74]:
efficiency_measures.global_efficiency(botnet)

0.39751216428251734