## Network Traffic Dataset for malicious attack

This dataset of network traffic flow is generated by CICFlowMeter, indicate whether the traffic is malicious attack (Bot) or not (Benign).
CICFlowMeter - network traffic flow generator generates 69 statistical features such as Duration, Number of packets, Number of bytes, Length of packets, etc are also calculated separately in the forward and reverse direction.
The output of the application is the CSV file format with two columns labeled for each flow, namely Benign or Bot. The dataset has been organized per day, for each day the raw data including the network traffic (Pcaps) and event logs (windows and Ubuntu event Logs) per machine are recorded. Download the dataset from the below wget command line provided and rename it as Network_Traffic.csv

In [None]:
! wget https://cse-cic-ids2018.s3.ca-central-1.amazonaws.com/Processed+Traffic+Data+for+ML+Algorithms/Friday-02-03-2018_TrafficForML_CICFlowMeter.csv
! mv Friday-02-03-2018_TrafficForML_CICFlowMeter.csv Network_Traffic.csv

## Create requirements.txt

In [1]:
%%writefile requirements.txt
cloudpickle==1.1.1
pandas
scikit-learn==0.22.2
imblearn
joblib
numpy
seldon-core
tornado>=6.0.3
tensorflow==1.13.1
keras==2.2.4
google-cloud-storage
kubeflow-tfjob
azure==4.0.0
kubeflow-fairing
kubernetes==10.0.1

Writing requirements.txt


## Install the packages listed in requirements.txt using pip

In [2]:
!pip install --user -r requirements.txt

You should consider upgrading via the 'pip install --upgrade pip' command.[0m


## Restart the Kernal

In [None]:
from IPython.display import display_html
display_html("<script>Jupyter.notebook.kernel.restart()</script>",raw=True)

# Configure docker credentials

Get your docker registry user and password encoded in base64 

echo -n USER:PASSWORD | base64 

Create a config.json file with your Docker registry url and the previous generated base64 string 

In [None]:
!echo -n USER:PASSWORD | base64

In [2]:
%%writefile config.json
{
    "auths": {
        "https://index.docker.io/v1/": {
            "auth": "<<Provide previous generated base64 string>>"
        }
    }
}

Writing config.json


### Create a config-map in the namespace you're using with the docker config

In [3]:
!kubectl create --namespace anonymous configmap docker-config --from-file=./config.json

configmap/docker-config created


## Dockerfile
Update dockerfile base image with tensorflow-gpu image if device type is GPU

In [4]:
device_type="gpu"  #Provide cpu or gpu
if device_type=="gpu":
    !sed -i "s/py3/gpu-py3/g" Dockerfile
    !cat Dockerfile
else:
    !cat Dockerfile

FROM tensorflow/tensorflow:1.14.0-gpu-py3
RUN pip install -U scikit-learn
RUN pip install keras pandas imblearn
ADD network_model.py  /opt/network_model.py
ADD Network_Traffic.csv /opt/Network_Traffic.csv
RUN chmod +x /opt/network_model.py  /opt/Network_Traffic.csv
WORKDIR /opt/
RUN mkdir -p /mnt/Model_Network
CMD python network_model.py

## Import Libraries

In [5]:
from kubernetes.client import V1PodTemplateSpec
from kubernetes.client import V1ObjectMeta
from kubernetes.client import V1PodSpec
from kubernetes.client import V1Container
from kubernetes.client import V1VolumeMount
from kubernetes.client import V1Volume
from kubernetes.client import V1PersistentVolumeClaimVolumeSource
from kubernetes.client import V1ResourceRequirements

from kubeflow.tfjob import constants
from kubeflow.tfjob import utils
from kubeflow.tfjob import V1ReplicaSpec
from kubeflow.tfjob import V1TFJob
from kubeflow.tfjob import V1TFJobSpec
from kubeflow.tfjob import TFJobClient


import time
import re, os
import tensorflow as tf
import pandas as pd
import numpy as np
import logging
import sys
import importlib

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


## Set up Kubeflow Fairing for training and predictions on On-premise
Import the fairing library and configure the onprem environment that your training or prediction job will run in.

In [6]:
from kubernetes import client as k8s_client
from kubernetes.client import rest as k8s_rest
from kubernetes import config as k8s_config
from kubernetes.client.rest import ApiException

from kubeflow import fairing   
from kubeflow.fairing import utils as fairing_utils
from kubeflow.fairing import TrainJob
from kubeflow.fairing.preprocessors.function import FunctionPreProcessor
from kubeflow.fairing.preprocessors import base as base_preprocessor
from kubeflow.fairing.builders.cluster.cluster import ClusterBuilder

from kubeflow.fairing.cloud.k8s import MinioUploader
from kubeflow.fairing.builders.cluster.minio_context import MinioContextSource
from kubeflow.fairing import PredictionEndpoint
from kubeflow.fairing.kubernetes.utils import mounting_pvc
from kubeflow.fairing.kubernetes.utils import mounting_pvc

BackendClass = getattr(importlib.import_module('kubeflow.fairing.backends'), "KubernetesBackend")
namespace = fairing_utils.get_current_k8s_namespace()
print("Namespace : %s"%namespace)

Namespace : anonymous


## Get minio-service cluster IP to upload docker build context
#### Set DOCKER_REGISTRY
The DOCKER_REGISTRY variable is used to push the newly built image. 
Please change the variable to the registry for which you've configured credentials.

In [7]:
DOCKER_REGISTRY = "edward1723"

k8s_config.load_incluster_config()
api_client = k8s_client.CoreV1Api()
minio_service_endpoint = None

try:
    minio_service_endpoint = api_client.read_namespaced_service(name='minio-service', namespace='kubeflow').spec.cluster_ip
except ApiException as e:
    if e.status == 403:
        logging.warning(f"The service account doesn't have sufficient privileges "
                      f"to get the kubeflow minio-service. "
                      f"You will have to manually enter the minio cluster-ip. "
                      f"To make this function work ask someone with cluster "
                      f"priveleges to create an appropriate "
                      f"clusterrolebinding by running a command.\n"
                      f"kubectl create --namespace=kubeflow rolebinding "
                       "--clusterrole=kubeflow-view "
                       "--serviceaccount=${NAMESPACE}:default-editor "
                       "${NAMESPACE}-minio-view")
        logging.error("API access denied with reason: {e.reason}")

s3_endpoint = minio_service_endpoint
minio_endpoint = "http://"+s3_endpoint+":9000"
minio_username = "minio"
minio_key = "minio123"
minio_region = "us-east-1"
print(minio_endpoint)


minio_uploader = MinioUploader(endpoint_url=minio_endpoint, minio_secret=minio_username, minio_secret_key=minio_key, region_name=minio_region)
minio_context_source = MinioContextSource(endpoint_url=minio_endpoint, minio_secret=minio_username, minio_secret_key=minio_key, region_name=minio_region)

http://10.98.188.38:9000


## Build docker image

Note: Upload dataset, Dockerfile, and network_model.py into notebook

In [8]:
#output_map is a map of extra files to add to the notebook.
# It is a map from source location to the location inside the context.
output_map= {
    "Dockerfile": "Dockerfile", #Dockerfile
    "network_model.py":"network_model.py",
    "Network_Traffic.csv": "Network_Traffic.csv"
}
preprocessor = base_preprocessor.BasePreProcessor(output_map=output_map)

preprocessor.preprocess()
builder = ClusterBuilder(registry=DOCKER_REGISTRY, preprocessor=preprocessor, context_source=minio_context_source)

builder.build()

Building image using cluster builder.
Creating docker context: /tmp/fairing_context_5vaiaitz
/tmp/fairing_dockerfile_38wtz9rv already exists in Fairing context, skipping...
Waiting for fairing-builder-dkfzh-tdf9h to start...
Waiting for fairing-builder-dkfzh-tdf9h to start...
Waiting for fairing-builder-dkfzh-tdf9h to start...
Waiting for fairing-builder-dkfzh-tdf9h to start...
Pod started running True


[36mINFO[0m[0003] Resolved base name tensorflow/tensorflow:1.14.0-gpu-py3 to tensorflow/tensorflow:1.14.0-gpu-py3
[36mINFO[0m[0003] Resolved base name tensorflow/tensorflow:1.14.0-gpu-py3 to tensorflow/tensorflow:1.14.0-gpu-py3
[36mINFO[0m[0003] Downloading base image tensorflow/tensorflow:1.14.0-gpu-py3
[36mINFO[0m[0004] Error while retrieving image from cache: getting file info: stat /cache/sha256:e72e66b3dcb9c9e8f4e5703965ae1466b23fe8cad59e1c92c6e9fa58f8d81dc8: no such file or directory
[36mINFO[0m[0004] Downloading base image tensorflow/tensorflow:1.14.0-gpu-py3
[36mINFO[0m[0005] Built cross stage deps: map[]
[36mINFO[0m[0005] Downloading base image tensorflow/tensorflow:1.14.0-gpu-py3
[36mINFO[0m[0006] Error while retrieving image from cache: getting file info: stat /cache/sha256:e72e66b3dcb9c9e8f4e5703965ae1466b23fe8cad59e1c92c6e9fa58f8d81dc8: no such file or directory
[36mINFO[0m[0006] Downloading base image tensorflow/tensorflow:1.14.0-gpu-py3
[36mINFO[0m[00

In [9]:
builder.image_tag

'edward1723/fairing-job:FBD61A65'

## Define TFJob Class to create training job

In [10]:
tfjob_name="network-fairing-tfjob"
class Tfjob(object):

    def get_tfjob_params(self):
    
        #Defining a Volume Mount
        volume_mount = V1VolumeMount(name="nfsvolume", mount_path="/mnt/Model_Network")

        #Defining a Persistent Volume Claim
        persistent_vol_claim = V1PersistentVolumeClaimVolumeSource(claim_name="nfs1")

        #Defining a Volume
        volume = V1Volume(name="nfsvolume", persistent_volume_claim=persistent_vol_claim)
        
        if device_type=="gpu":
            #Defining a Container
            container = V1Container(
                name="tensorflow",            
                image=builder.image_tag,
                volume_mounts=[volume_mount],
                resources=V1ResourceRequirements(limits={"nvidia.com/gpu": 1})
            )
        else:
            #Defining a Container
            container = V1Container(
                name="tensorflow",            
                image=builder.image_tag,
                volume_mounts=[volume_mount]
            )
        
        return (volume_mount, persistent_vol_claim, volume, container)
        
    def get_tfjob_nodes(self):
    
        params = self.get_tfjob_params()

        #Defining a Master
        master = V1ReplicaSpec(replicas=1,
                               restart_policy="Never",
                               template=V1PodTemplateSpec(spec=V1PodSpec(
                                                    containers=[params[3]],
                                                    volumes=[params[2]])))
        
        #Defining Worker Spec
        worker = V1ReplicaSpec(replicas=1,
                               restart_policy="Never",
                               template=V1PodTemplateSpec(spec=V1PodSpec(
                                                    containers=[params[3]],
                                                    volumes=[params[2]],
                                                    
                               )))
        
        #Defining Parameter server(PS) Spec
        ps = V1ReplicaSpec(replicas=1,
                               restart_policy="Never",
                               template=V1PodTemplateSpec(spec=V1PodSpec(
                                                    containers=[params[3]],
                                                    volumes=[params[2]])))
        
        return (master,worker,ps)
    
    def create_tfjob(self):
        
        tfjob_node_spec = self.get_tfjob_nodes()
        
        #Defining TFJob
        tfjob = V1TFJob(
            api_version="kubeflow.org/v1",
            kind="TFJob",
            metadata=V1ObjectMeta(name=tfjob_name,namespace=namespace),
            spec=V1TFJobSpec(
                clean_pod_policy="None",
                tf_replica_specs={"PS":tfjob_node_spec[2],"Worker": tfjob_node_spec[1],"Master":tfjob_node_spec[0]}
            )
        )
        
        #Creating TFJob
        tfjob_client = TFJobClient()
        tfjob_client.create(tfjob, namespace=namespace)

## Define Network class to be used by Kubeflow fairing 
## ( Must necessarily contain train() and predict() methods)

In [11]:
class NetworkServe(object):
    
    def __init__(self):
        self.model=None
        
    def train(self):
        
        Tfjob().create_tfjob()
        
    def predict(self,X,feature_names=None):
        
        feature_col=['BwdIATMean', 'BwdIATTot', 'BwdPktLenMax', 'BwdPktLenMean', 'FlowDuration', 'FlowIATMean', 'FlowIATStd', 'FwdPSHFlags', 'FwdSegSizeMin', 'InitBwdWinByts']
        model_input1=tf.train.Example()
        for i in range(len(X)):
            model_input1.features.feature[feature_col[i]].float_list.value.append(X[i])
            
        path=os.path.join(os.getcwd(), "/mnt/Model_Network")
        for dir in os.listdir(path):
            if re.match('[0-9]',dir):
                exported_path=os.path.join(path,dir)
                break
        
        # Open a Session to predict
        with tf.Session() as sess:
         tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.SERVING], exported_path)
         model_input =model_input1
         
         predictor= tf.contrib.predictor.from_saved_model(exported_path,signature_def_key='predict')
         input_tensor=tf.get_default_graph().get_tensor_by_name("input_example_tensor:0")
            
         model_input=model_input.SerializeToString()
         output_dict= predictor({"examples":[model_input]})
        sess.close()
        
        response = output_dict.items()
        print(response)
        response1 = output_dict['class_ids']
        return response1

## Train Network model remotely on Kubeflow
Kubeflow Fairing packages the NetworkServe class, the training data, and the training job's software prerequisites as a Docker image. Then Kubeflow Fairing deploys and runs the training job on kubeflow.

In [12]:
train_job = TrainJob(NetworkServe, input_files=["Network_Traffic.csv", "requirements.txt"],
                     pod_spec_mutators = [mounting_pvc(pvc_name="nfs1", pvc_mount_path="/mnt/Model_Network")],
                     docker_registry=DOCKER_REGISTRY, backend=BackendClass(build_context_source=minio_context_source))
train_job.submit()

Using default base docker image: registry.hub.docker.com/library/python:3.6.9
Using builder: <class 'kubeflow.fairing.builders.cluster.cluster.ClusterBuilder'>
Building the docker image.
Building image using cluster builder.
/home/jovyan/.local/lib/python3.6/site-packages/kubeflow/fairing/__init__.py already exists in Fairing context, skipping...
Creating docker context: /tmp/fairing_context_1t205qfk
/home/jovyan/.local/lib/python3.6/site-packages/kubeflow/fairing/__init__.py already exists in Fairing context, skipping...
Waiting for fairing-builder-49klw-q2rhg to start...
Waiting for fairing-builder-49klw-q2rhg to start...
Waiting for fairing-builder-49klw-q2rhg to start...
Waiting for fairing-builder-49klw-q2rhg to start...
Pod started running True


[36mINFO[0m[0003] Resolved base name registry.hub.docker.com/library/python:3.6.9 to registry.hub.docker.com/library/python:3.6.9
[36mINFO[0m[0003] Resolved base name registry.hub.docker.com/library/python:3.6.9 to registry.hub.docker.com/library/python:3.6.9
[36mINFO[0m[0003] Downloading base image registry.hub.docker.com/library/python:3.6.9
[36mINFO[0m[0004] Error while retrieving image from cache: getting file info: stat /cache/sha256:036d4ab50fa49df89e746cf1b5369c88db46e8af2fbd08531788e7d920e9a491: no such file or directory
[36mINFO[0m[0004] Downloading base image registry.hub.docker.com/library/python:3.6.9
[36mINFO[0m[0005] Built cross stage deps: map[]
[36mINFO[0m[0005] Downloading base image registry.hub.docker.com/library/python:3.6.9
[36mINFO[0m[0006] Error while retrieving image from cache: getting file info: stat /cache/sha256:036d4ab50fa49df89e746cf1b5369c88db46e8af2fbd08531788e7d920e9a491: no such file or directory
[36mINFO[0m[0006] Downloading base ima

The job fairing-job-bmhl2 launched.
Waiting for fairing-job-bmhl2-gv6kt to start...
Waiting for fairing-job-bmhl2-gv6kt to start...
Waiting for fairing-job-bmhl2-gv6kt to start...
Waiting for fairing-job-bmhl2-gv6kt to start...
Pod started running True


  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


Cleaning up job fairing-job-bmhl2...


'fairing-job-bmhl2'

## Deploy the trained model to Kubeflow for predictions
Kubeflow Fairing packages the NetworkServe class, the trained model, and the prediction endpoint's software prerequisites as a Docker image. Then Kubeflow Fairing deploys and runs the prediction endpoint on Kubeflow.

In [13]:
endpoint = PredictionEndpoint(NetworkServe, input_files=["Network_Traffic.csv", "requirements.txt"],
                              docker_registry=DOCKER_REGISTRY,
                              pod_spec_mutators = [mounting_pvc(pvc_name="nfs1", pvc_mount_path="/mnt/Model_Network")],
                              backend=BackendClass(build_context_source=minio_context_source))
endpoint.create()

Using default base docker image: registry.hub.docker.com/library/python:3.6.9
Using builder: <class 'kubeflow.fairing.builders.cluster.cluster.ClusterBuilder'>
Building the docker image.
Building image using cluster builder.
/home/jovyan/.local/lib/python3.6/site-packages/kubeflow/fairing/__init__.py already exists in Fairing context, skipping...
Creating docker context: /tmp/fairing_context_pg0en7wg
/home/jovyan/.local/lib/python3.6/site-packages/kubeflow/fairing/__init__.py already exists in Fairing context, skipping...
Waiting for fairing-builder-2s879-n9b2q to start...
Waiting for fairing-builder-2s879-n9b2q to start...
Waiting for fairing-builder-2s879-n9b2q to start...
Waiting for fairing-builder-2s879-n9b2q to start...
Pod started running True


[36mINFO[0m[0003] Resolved base name registry.hub.docker.com/library/python:3.6.9 to registry.hub.docker.com/library/python:3.6.9
[36mINFO[0m[0003] Resolved base name registry.hub.docker.com/library/python:3.6.9 to registry.hub.docker.com/library/python:3.6.9
[36mINFO[0m[0003] Downloading base image registry.hub.docker.com/library/python:3.6.9
[36mINFO[0m[0004] Error while retrieving image from cache: getting file info: stat /cache/sha256:036d4ab50fa49df89e746cf1b5369c88db46e8af2fbd08531788e7d920e9a491: no such file or directory
[36mINFO[0m[0004] Downloading base image registry.hub.docker.com/library/python:3.6.9
[36mINFO[0m[0005] Built cross stage deps: map[]
[36mINFO[0m[0005] Downloading base image registry.hub.docker.com/library/python:3.6.9
[36mINFO[0m[0006] Error while retrieving image from cache: getting file info: stat /cache/sha256:036d4ab50fa49df89e746cf1b5369c88db46e8af2fbd08531788e7d920e9a491: no such file or directory
[36mINFO[0m[0006] Downloading base ima

Deploying the endpoint.
Cluster endpoint: http://fairing-service-pxnbw.anonymous.svc.cluster.local:5000/predict
Prediction endpoint: http://fairing-service-pxnbw.anonymous.svc.cluster.local:5000/predict


## Wait for  prediction pod ready state

In [14]:
!kubectl get deploy -l fairing-deployer=serving -n anonymous

NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
fairing-deployer-xrz2m   1/1     1            1           88s


##  Get prediction endpoint

In [15]:
endpoint.url

'http://fairing-service-pxnbw.anonymous.svc.cluster.local:5000/predict'

## Call the prediction endpoint
Use the endpoint from previous cell

In [16]:
! curl -v http://fairing-service-pxnbw.anonymous.svc.cluster.local:5000/predict -H "Content-Type: application/x-www-form-urlencoded" -d 'json={"data":{"ndarray":[0.000000, 0.000000, 0.000000, 0.000000, 0.000005, 0.000000, 0.000000, 0.000000, 0.000000, 0.000004]}}'

*   Trying 10.109.244.138...
* TCP_NODELAY set
* Connected to fairing-service-pxnbw.anonymous.svc.cluster.local (10.109.244.138) port 5000 (#0)
> POST /predict HTTP/1.1
> Host: fairing-service-pxnbw.anonymous.svc.cluster.local:5000
> User-Agent: curl/7.58.0
> Accept: */*
> Content-Type: application/x-www-form-urlencoded
> Content-Length: 126
> 
* upload completely sent off: 126 out of 126 bytes
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Content-Type: application/json
< Content-Length: 53
< Access-Control-Allow-Origin: *
< Server: Werkzeug/1.0.1 Python/3.6.9
< Date: Thu, 30 Apr 2020 07:59:19 GMT
< 
{"data":{"names":["t:0"],"ndarray":[[0]]},"meta":{}}
* Closing connection 0


## Clean up the prediction endpoint
Delete the prediction endpoint created by this notebook.

In [17]:
endpoint.delete()

Deleting the endpoint. 
Deleted service: anonymous/fairing-service-pxnbw
Deleted deployment: anonymous/fairing-deployer-xrz2m


## Clean up TFjob

In [18]:
TFJobClient().delete(tfjob_name, namespace=namespace)

{'kind': 'Status',
 'apiVersion': 'v1',
 'metadata': {},
 'status': 'Success',
 'details': {'name': 'network-fairing-tfjob',
  'group': 'kubeflow.org',
  'kind': 'tfjobs',
  'uid': '294d27c9-acfa-4767-baf1-99261bdcf7c3'}}

## Delete config.json and requirements.txt

In [19]:
!rm -rf config.json requirements.txt
if device_type=="gpu":
    !sed -i "s/gpu-py3/py3/g" Dockerfile
    !cat Dockerfile

FROM tensorflow/tensorflow:1.14.0-py3
RUN pip install -U scikit-learn
RUN pip install keras pandas imblearn
ADD network_model.py  /opt/network_model.py
ADD Network_Traffic.csv /opt/Network_Traffic.csv
RUN chmod +x /opt/network_model.py  /opt/Network_Traffic.csv
WORKDIR /opt/
RUN mkdir -p /mnt/Model_Network
CMD python network_model.py