# VAE (variational autoencoder) outlier detector deployment

Wrap a keras VAE python model for use as a prediction microservice in seldon-core and deploy on seldon-core running on minikube

## Dependencies

- [helm](https://github.com/helm/helm)
- [minikube](https://github.com/kubernetes/minikube) --> install 0.25.2
- [s2i](https://github.com/openshift/source-to-image)

python packages:
- keras: pip install keras
- tensorflow: https://www.tensorflow.org/install/pip
- scikit-learn: pip install sklearn

## Task

The outlier detector needs to detect computer network intrusions using TCP dump data for a local-area network (LAN) simulating a typical U.S. Air Force LAN. A connection is a sequence of TCP packets starting and ending at some well defined times, between which data flows to and from a source IP address to a target IP address under some well defined protocol. Each connection is labeled as either normal, or as an attack. 

There are 4 types of attacks in the dataset:
- DOS: denial-of-service, e.g. syn flood;
- R2L: unauthorized access from a remote machine, e.g. guessing password;
- U2R:  unauthorized access to local superuser (root) privileges;
- probing: surveillance and other probing, e.g., port scanning.
    
The dataset contains about 5 million connection records.

There are 3 types of features:
- basic features of individual connections, e.g. duration of connection
- content features within a connection, e.g. number of failed log in attempts
- traffic features within a 2 second window, e.g. number of connections to the same host as the current connection

The outlier detector is only using the continuous (18 out of 41) features.

## Train locally

Train on small dataset of normal traffic.

In [22]:
!python train.py \
--dataset 'kddcup99' \
--samples 50000 \
--hidden_layers 1 \
--latent_dim 2 \
--hidden_dim 9 \
--epochs 10 \
--batch_size 32 \
--print_progress \
--save_path './models/'

## Test using Kubernetes cluster on GCP or Minikube

Pick Kubernetes cluster on GCP or Minikube.

In [5]:
minikube = False

In [6]:
if minikube:
    !minikube start --memory 4096 --feature-gates=CustomResourceValidation=true \
    --extra-config=apiserver.Authorization.Mode=RBAC
else:
    !gcloud container clusters get-credentials standard-cluster-1 --zone europe-west1-b --project seldon-demos

Starting local Kubernetes v1.9.4 cluster...
Starting VM...
Getting VM IP address...
Moving files into cluster...
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Starting cluster components...
Kubectl is now configured to use the cluster.
Loading cached images from config file.


Create a cluster-wide cluster-admin role assigned to a service account named “default” in the namespace “kube-system”.

In [3]:
!kubectl create clusterrolebinding kube-system-cluster-admin --clusterrole=cluster-admin \
--serviceaccount=kube-system:default

In [4]:
!kubectl create namespace seldon

Add current context details to the configuration file in the seldon namespace.

In [5]:
!kubectl config set-context $(kubectl config current-context) --namespace=seldon

Create tiller service account and give it a cluster-wide cluster-admin role.

In [6]:
!kubectl -n kube-system create sa tiller
!kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller
!helm init --service-account tiller

Check deployment rollout status and deploy seldon/spartakus helm charts.

In [8]:
!kubectl rollout status deploy/tiller-deploy -n kube-system

In [7]:
!helm install ../../../helm-charts/seldon-core-crd --name seldon-core-crd \
    --set usage_metrics.enabled=true

In [9]:
!helm install ../../../helm-charts/seldon-core --name seldon-core \
        --namespace seldon \
        --set ambassador.enabled=true

Check deployment rollout status for seldon core.

In [10]:
!kubectl rollout status deploy/seldon-core-seldon-cluster-manager -n seldon
!kubectl rollout status deploy/seldon-core-seldon-apiserver -n seldon

If Minikube used: create docker image for outlier detector inside Minikube using s2i.

In [11]:
if minikube:
    !eval $(minikube docker-env) && s2i build . seldonio/seldon-core-s2i-python3:0.4-SNAPSHOT outlier-vae:0.1
    img_name = outlier-vae:0.1
else:
    img_name = seldonio/outlier-vae:0.1

Install outlier detector helm charts and set "threshold" and reservoir_size" hyperparameter values.

In [12]:
!helm install ../../../helm-charts/seldon-outlier-detection \
    --set model.image.name=img_name \
    --set model.threshold=10 \
    --set model.reservoir_size=50000 \
    --name outlier-detector --set oauth.key=oauth-key \
    --set oauth.secret=oauth-secret \
    --namespace=seldon

## Port forward Ambassador

kubectl port-forward $(kubectl get pods -n seldon -l service=ambassador -o jsonpath='{.items[0].metadata.name}') -n seldon 8003:8080

## Define requests and payload

In [13]:
import json
import requests

def get_payload(arr):
    features = ["srv_count","serror_rate","srv_serror_rate","rerror_rate","srv_rerror_rate","same_srv_rate",
             "diff_srv_rate","srv_diff_host_rate","dst_host_count","dst_host_srv_count","dst_host_same_srv_rate",
             "dst_host_diff_srv_rate","dst_host_same_src_port_rate","dst_host_srv_diff_host_rate",
             "dst_host_serror_rate","dst_host_srv_serror_rate","dst_host_rerror_rate","dst_host_srv_rerror_rate"]
    datadef = {"names":features,"ndarray":arr.tolist()}
    payload = {"meta":{},"data":datadef}
    return payload

In [14]:
def rest_request_ambassador(deploymentName,request,endpoint="localhost:8003"):
    response = requests.post(
                "http://"+endpoint+"/seldon/"+deploymentName+"/api/v0.1/predictions",
                json=request)
    print(response.status_code)
    print(response.text)
    return response.json()

In [15]:
def send_feedback_rest(deploymentName,request,response,reward,truth,endpoint="localhost:8003"):
    feedback = {
        "request": request,
        "response": response,
        "reward": reward,
        "truth": {"data":{"ndarray":truth.tolist()}}
    }
    ret = requests.post(
         "http://"+endpoint+"/seldon/"+deploymentName+"/api/v0.1/feedback",
        json=feedback)
    print(ret.status_code)
    print(ret.text)
    return ret.text

## Load and test network intrusion data

In [16]:
from utils import get_kdd_data, generate_batch

data = get_kdd_data() # load dataset
print(data.shape)

Generate a random batch from the data

In [None]:
import numpy as np

samples = 1
fraction_outlier = 0.
X, labels = generate_batch(data,samples,fraction_outlier)
print(X.shape)
print(labels.shape)

Test the rest requests with the generated data

In [17]:
request = get_payload(X)

In [17]:
response = rest_request_ambassador("outlier-detector",request,endpoint="localhost:8003")

In [18]:
feedback = send_feedback_rest("outlier-detector",request,response,0,labels,endpoint="localhost:8003")

## Analytics

Install the helm charts for prometheus and the grafana dashboard

In [19]:
!helm install ../../../helm-charts/seldon-core-analytics --name seldon-core-analytics \
    --set grafana_prom_admin_password=password \
    --set persistence.enabled=false \
    --namespace seldon

Attach label to node

In [20]:
!kubectl label nodes $(kubectl get nodes -o jsonpath='{.items[0].metadata.name}') role=locust

## Port forward Grafana dashboard

kubectl port-forward $(kubectl get pods -n seldon -l app=grafana-prom-server -o jsonpath='{.items[0].metadata.name}') -n seldon 3000:3000

You can then view an analytics dashboard inside the cluster at http://localhost:3000/dashboard/db/prediction-analytics?refresh=5s&orgId=1. Your IP address may be different. get it via minikube ip. Login with:

Username : admin

password : password (as set when starting seldon-core-analytics above)

Import the outlier-detector dashboard from ../../../helm-charts/seldon-core-analytics/files/grafana/configs.

## Run simulation

- Sample random network intrusion data with a certain outlier probability.
- Get payload for the observation.
- Make a prediction.
- Send the "true" label with the feedback.

View the progress on the grafana "Outlier Detection" dashboard.

In [21]:
import time
n_requests = 1000
samples = 1
for i in range(n_requests):
    fraction_outlier = .1
    X, labels = generate_batch(data,samples,fraction_outlier)
    request = get_payload(X)
    response = rest_request_ambassador("outlier-detector",request,endpoint="localhost:8003")
    feedback = send_feedback_rest("outlier-detector",request,response,0,labels,endpoint="localhost:8003")    
    time.sleep(1)