# Calendar Analysis

Our Calendar Analysis is based on a heatmap of attacks with Day of Month and Time of Day as axis.
We generate one calendar per attacker (source IP)

Copyright (C) GoSecure 2023, 2024

Distributed under the MIT license.

## Installation instructions

This code requires some Python librairies installed. Note that these libraries might require additional Operating System dependencies.

You can install the Python dependencies with:

```
pip install pandas plotly
```

## Import dependencies

In [16]:
import json

# pandas
import numpy as np
import pandas as pd

# plotly
import plotly.express as px
import plotly.graph_objects as go

# Display images in the notebook
from IPython.display import Image

## Load and filter data

In [20]:
# load data
# data is in a one row per attack activity format, contains all attackers, with a timestamp column stored as a string
df = pd.read_csv('rdp-attacks-with-timings-07.csv', index_col=0)

# Set correct type on timestamp field
df['timestamp'] = df['timestamp'].astype('datetime64[s]')

# enrich dataset with a time column
df['time'] = df.timestamp.dt.strftime('%H:%M')

In [21]:
# Filter the data to focus on one adversary
_ip = "5.181.86.95"
att_df = df[(df.clientIp == _ip)].copy().reset_index(drop=True)

## Preparing data for the HeatMap

Here we transform a DataFrame of one attack per row with many attributes to a matrix of:

* x: time of day (hour:minute)
* y: day of month
* and where the cell on row x and column y contains the number of attacks

In [22]:
# Transposing to attacks per minute
sum_df = att_df.groupby(['day', 'time']).size().to_frame(name = 'attack_count').reset_index()

# fill all the missing minutes of the month with zeros
# otherwise the empty portions of the heatmap are collapsed giving a wrong impression
min_date = (att_df['timestamp'].dt.normalize() - pd.offsets.MonthBegin(1)).iloc[0]
max_date = (att_df['timestamp'].dt.normalize() + pd.offsets.MonthEnd(0) + pd.offsets.Day(1) - pd.offsets.Second(1)).iloc[0]
dates = pd.date_range(start=min_date, end=max_date, freq='1min')
date_range_df = pd.DataFrame({
    'day': dates.day,
    'time': dates.strftime('%H:%M'),
    'attack_count': 0
})

# Combine the attacker data with the zeros whole month matrix
sum_df = pd.concat([sum_df, date_range_df]).groupby(['day', 'time']).sum().reset_index()

# sort by time
sum_df.sort_values(by=["time"], inplace=True)

## Rendering the Calendar of Attack (density heatmap)

In [25]:
# Create the graph
fig = px.density_heatmap(sum_df, x="time", y="day", z="attack_count",
                         marginal_x="histogram", marginal_y="histogram",
                         height=1080, width=1600,
                         title="Heatmap of Attacks by Time of Day and Date in Month",
                        )

# manually control bins to see the whole month (y) and whole days (x) even if there are no attacks
x_bins = pd.date_range(start='2022-01-01', end='2022-01-02', freq='1min').strftime('%H:%M').tolist()[:-1]
y_bins = list(range(1, 33))
fig.update_traces(
    xbins=dict(start=min(x_bins), end=max(x_bins), size=1), 
    ybins=dict(start=min(y_bins), end=max(y_bins), size=1)
)

# pretty ticks on axes
fig.update_xaxes(
    tickformat="%H\n%M",
)
fig.update_yaxes(
    tick0 = 0,
    dtick = 1,
)

fig.data[0].hovertemplate = "time=%{x}<br>day=%{y}<br>nb of attacks=%{z}<extra></extra>"
fig.update_layout(coloraxis_colorbar=dict(title='Nb of Attacks'))
fig.data[1].x = att_df['time']
fig.data[2].y = att_df['day']

fig.show()

# Optionally, write the heatmap to file
#fig.write_image("attack-calendar.png")