Intelligent Web Application Firewall
====================================

# Initial Setup

## Download Data
If the dataset already exists, this will in effect do nothing. Else, it will create the dataset directory and download the dataset from the [Systems and Networking Group at UC San Diego](https://www.sysnet.ucsd.edu/projects/url/).

In [30]:
import pathlib
import urllib.request

DATASET_URL = 'https://www.sysnet.ucsd.edu/projects/url/url.mat'

DATASET_DIRECTORY_PATH = pathlib.Path('dataset')
DATASET_NORMALIZED_PATH = DATASET_DIRECTORY_PATH.joinpath('url.npz')
FEATURE_TYPES_PATH = DATASET_DIRECTORY_PATH.joinpath('feature-types.npz')
DATASET_RAW_PATH = DATASET_DIRECTORY_PATH.joinpath('url.mat')

if DATASET_DIRECTORY_PATH.is_dir():
    print('[✅] Dataset Directory Exists')
else:
    print('[~] Creating Dataset Directory')
    DATASET_DIRECTORY_PATH.mkdir(parents=True, exist_ok=True)

if DATASET_NORMALIZED_PATH.is_file():
    print('[✅] Dataset Normalized Directory Exists')
elif DATASET_RAW_PATH.is_file():
    print('[✅] Dataset Raw File Exists')
else:
    print(f'[~] Creating Dataset File')
    urllib.request.urlretrieve(DATASET_URL, DATASET_RAW_PATH)

[✅] Dataset Directory Exists
[✅] Dataset Raw File Exists


## Load Data
Since the data is stored as a MATLAB save, we'll need to load it. We should also try to make it easier for ourselves later.

In [45]:
from scipy.io import loadmat
import numpy as np

if not DATASET_NORMALIZED_PATH.is_file() or not FEATURE_TYPES_PATH.is_file():
    print('[~] Loading Data from MATLAB')
    dataset = loadmat(DATASET_RAW_PATH)

if DATASET_NORMALIZED_PATH.is_file():
    print('[✅] Using Saved Normalized Day Data')
    days = np.load(DATASET_NORMALIZED_PATH, allow_pickle=True)['arr_0']
else:
    print('[~] Extracting and Saving Day Data')
    days = list()
    for key in dataset.keys():
        if key.startswith('Day'):
            days.append(dataset.get(key))

    np.savez(DATASET_NORMALIZED_PATH, np.array(days))

if FEATURE_TYPES_PATH.is_file():
    print('[✅] Using Saved Feature Types Data')
    feature_types = np.load(FEATURE_TYPES_PATH)['arr_0']
else:
    print('[~] Creating Feature Types Save')

    feature_types = dataset.get('FeatureTypes')
    np.savez(FEATURE_TYPES_PATH, feature_types)

[✅] Using Saved Normalized Day Data
[✅] Using Saved Feature Types Data


## Analyze Data

In [53]:
print('Days:', days.shape[0])

Days: 121
