## Network Traffic Dataset for malicious attack

This dataset of network traffic flow is generated by CICFlowMeter, indicate whether the traffic is malicious attack (Bot) or not (Benign).                             
CICFlowMeter - network traffic flow generator generates 69 statistical features such as Duration, Number of packets, Number of bytes, Length of packets, etc are also calculated separately in the forward and reverse direction.   
The output of the application is the CSV file format with two columns labeled for each flow, namely Benign or Bot.
The dataset has been organized per day, for each day the raw data including the network traffic (Pcaps) and event logs (windows and Ubuntu event Logs) per machine
are recorded.                  Download the dataset from the below wget command line provided.

In [None]:
! wget https://cse-cic-ids2018.s3.ca-central-1.amazonaws.com/Processed+Traffic+Data+for+ML+Algorithms/Friday-02-03-2018_TrafficForML_CICFlowMeter.csv

## Install Libraries

In [1]:
! pip install imblearn
!pip install pandas

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


## Restart Notebook Kernel

In [None]:
from IPython.display import display_html
display_html("<script>Jupyter.notebook.kernel.restart()</script>",raw=True)

## Import Libraries

In [2]:
import tensorflow as tf
import pandas as pd
import numpy as np
import tempfile
import os
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder,MinMaxScaler
from sklearn.model_selection import KFold
from imblearn.combine import SMOTETomek
from imblearn.over_sampling import RandomOverSampler
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Using TensorFlow backend.


## Declare Variables

In [3]:
INPUT_FEATURE = 'x'

lstZerodrp = ['Timestamp', 'BwdPSHFlags', 'FwdURGFlags', 'BwdURGFlags', 'CWEFlagCount', 'FwdBytsbAvg', 'FwdPktsbAvg',
              'FwdBlkRateAvg', 'BwdBytsbAvg',
              'BwdBlkRateAvg', 'BwdPktsbAvg']

lstScaledrp = ['FwdPSHFlags', 'FINFlagCnt', 'SYNFlagCnt', 'RSTFlagCnt', 'PSHFlagCnt', 'ACKFlagCnt', 'URGFlagCnt',
               'ECEFlagCnt']

DATA_FILE = 'Network_Traffic.csv'

In [4]:
def read_dataFile():
    """
    Reads data file and returns dataframe result
    """
    chunksize = 10000
    chunk_list = []
    missing_values = ["n/a", "na", "--", "Infinity", "infinity", "Nan", "NaN"]

    for chunk in pd.read_csv(DATA_FILE, chunksize=chunksize, na_values=missing_values):
        chunk_list.append(chunk)
        break
    dataFrme = pd.concat(chunk_list)

    lstcols = []
    for i in dataFrme.columns:
        i = str(i).replace(' ', '').replace('/', '')
        lstcols.append(i)
    dataFrme.columns = lstcols
    dfAllCpy = dataFrme.copy()
    dataFrme = dataFrme.drop(lstZerodrp, axis=1)
    return dataFrme

## Network Traffic Input Dataset 

### Attribute Information
    Features extracted from the captured traffic using CICFlowMeter-V3 = 69
    After removal of noise/unwarranted features, number of feature columns chosen: 10
    Features: FlowDuration,BwdPktLenMax,FlowIATStd,FwdPSHFlags,BwdPktLenMean,FlowIATMean,BwdIATMean,
              FwdSegSizeMin,InitBwdWinByts,BwdPktLenMin
    Flows labelled: Bot or Benign

In [5]:
read_dataFile().head()

Unnamed: 0,DstPort,Protocol,FlowDuration,TotFwdPkts,TotBwdPkts,TotLenFwdPkts,TotLenBwdPkts,FwdPktLenMax,FwdPktLenMin,FwdPktLenMean,...,FwdSegSizeMin,ActiveMean,ActiveStd,ActiveMax,ActiveMin,IdleMean,IdleStd,IdleMax,IdleMin,Label
0,443,6,141385,9,7,553,3773,202,0,61.444444,...,20,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Benign
1,49684,6,281,2,1,38,0,38,0,19.0,...,20,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Benign
2,443,6,279824,11,15,1086,10527,385,0,98.727273,...,20,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Benign
3,443,6,132,2,0,0,0,0,0,0.0,...,20,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Benign
4,443,6,274016,9,13,1285,6141,517,0,142.777778,...,20,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Benign


In [6]:
def preprocess_na(dataFrme):
    """
    Removing NA values
    """
    na_lst = dataFrme.columns[dataFrme.isna().any()].tolist()
    for j in na_lst:
        dataFrme[j].fillna(0, inplace=True)
    return dataFrme

In [7]:
def create_features_label(dataFrme):
    """
    Create independent and Dependent Features
    """
    columns = dataFrme.columns.tolist()
    # Filter the columns to remove data we do not want 
    columns = [c for c in columns if c not in ["Label"]]
    # Store the variable we are predicting 
    target = "Label"
    # Define a random state 
    state = np.random.RandomState(42)
    X = dataFrme[columns]
    Y = dataFrme[target]
    return X, Y

In [8]:
def label_substitution(dataFrme):
    """
    Label substitution : 'Benign'as 0, 'Bot'as 1
    """
    dictLabel = {'Benign': 0, 'Bot': 1}
    dataFrme['Label'] = dataFrme['Label'].map(dictLabel)

    LABELS = ['Benign', 'Bot']
    count_classes = pd.value_counts(dataFrme['Label'], sort=True)
    
    # Get the Benign and the Bot values 
    Benign = dataFrme[dataFrme['Label'] == 0]
    Bot = dataFrme[dataFrme['Label'] == 1]
    return dataFrme

In [9]:
def handle_class_imbalance(X,Y):
    """
    Handle Class imbalancement 
    """
#    os_us = SMOTETomek(ratio=0.5)
#    X_res, y_res = os_us.fit_sample(X, Y)
    ros = RandomOverSampler(random_state=50)
    X_res, y_res = ros.fit_sample(X, Y)
    ibtrain_X = pd.DataFrame(X_res,columns=X.columns)
    ibtrain_y = pd.DataFrame(y_res,columns=['Label']) 
    return ibtrain_X,ibtrain_y

In [10]:
def correlation_features(ibtrain_X):
    """
    Feature Selection - Correlation Ananlysis 
    """
    corr = ibtrain_X.corr()
    cor_columns = np.full((corr.shape[0],), True, dtype=bool)
    for i in range(corr.shape[0]):
        for j in range(i + 1, corr.shape[0]):
            if corr.iloc[i, j] >= 0.9:
                if cor_columns[j]:
                    cor_columns[j] = False

    dfcorr_features = ibtrain_X[corr.columns[cor_columns]]
    return dfcorr_features

In [11]:
def top_ten_features(dfcorr_features, ibtrain_X, ibtrain_y):
    """
    Feature Selection - SelectKBest : Return best 10 features. 
    """
    feat_X = dfcorr_features
    feat_y = ibtrain_y['Label']
    # apply SelectKBest class to extract top 10 best features
    bestfeatures = SelectKBest(score_func=f_classif, k=10)
    fit = bestfeatures.fit(feat_X, feat_y)
    dfscores = pd.DataFrame(fit.scores_)
    dfcolumns = pd.DataFrame(feat_X.columns)
    # concat two dataframes for better visualization
    featureScores = pd.concat([dfcolumns, dfscores], axis=1)
    featureScores.columns = ['Features', 'Score']  # naming the dataframe columns
    final_feature = featureScores.nlargest(10, 'Score')['Features'].tolist()
    dictLabel1 = {'Benign': 0, 'Bot': 1}
    ibtrain_y['Label'] = ibtrain_y['Label'].map(dictLabel1)
    selected_X = ibtrain_X[final_feature]
    selected_Y = ibtrain_y['Label']
    return selected_X, selected_Y, final_feature

In [12]:
def normalize_data(selected_X, selected_Y):
    """
    Normalize data 
    """
    scaler = MinMaxScaler(feature_range=(0, 1))
    selected_X = pd.DataFrame(scaler.fit_transform(selected_X), columns=selected_X.columns, index=selected_X.index)
    trainX, testX, trainY, testY = train_test_split(selected_X, selected_Y, test_size=0.25)
    print('-----------------------------------------------------------------')
    print("## Final features and Data pre-process for prediction")
    print('-----------------------------------------------------------------')
    print(testX)
    return trainX, testX, trainY, testY

## Definition of Serving Input Receiver Function

In [13]:
def serving_input_receiver_fn():
    """
    This is used to define inputs to serve the model.
    :return: ServingInputReciever
    """
    receiver_tensors = {
        'FlowDuration': tf.placeholder(tf.float32, [None, 1]),
        'BwdPktLenMax': tf.placeholder(tf.float32, [None, 1]),
        'FlowIATStd': tf.placeholder(tf.float32, [None, 1]),
        'BwdPktLenMean': tf.placeholder(tf.float32, [None, 1]),
        'FwdPSHFlags': tf.placeholder(tf.float32, [None, 1]),
        'FlowIATMean': tf.placeholder(tf.float32, [None, 1]),
        'BwdIATMean': tf.placeholder(tf.float32, [None, 1]),
        'FwdSegSizeMin': tf.placeholder(tf.float32, [None, 1]),
        'InitBwdWinByts': tf.placeholder(tf.float32, [None, 1]),
        'BwdPktLenMin': tf.placeholder(tf.float32, [None, 1])

    }

    # Convert give inputs to adjust to the model.
    features = {
        INPUT_FEATURE: tf.concat([
            receiver_tensors['FlowDuration'],
            receiver_tensors['BwdPktLenMax'],
            receiver_tensors['FlowIATStd'],
            receiver_tensors['BwdPktLenMean'],
            receiver_tensors['FwdPSHFlags'],
            receiver_tensors['FlowIATMean'],
            receiver_tensors['BwdIATMean'],
            receiver_tensors['FwdSegSizeMin'],
            receiver_tensors['InitBwdWinByts'],
            receiver_tensors['BwdPktLenMin'],
        ], axis=1)
    }
    return tf.estimator.export.ServingInputReceiver(receiver_tensors=receiver_tensors,
                                                    features=features)

In [14]:
def get_model(trainX, trainY, testX, testY, final_feature):
    
    """
    Training and Evalution
    """
    
    TF_DATA_DIR = os.getenv("TF_DATA_DIR", "/tmp/data/")
    TF_MODEL_DIR = os.getenv("TF_MODEL_DIR", "network/")
    TF_EXPORT_DIR = os.getenv("TF_EXPORT_DIR", "network/")

    train_X = np.asarray(trainX)
    train_y = np.asarray(trainY)
    feature_columns = [tf.feature_column.numeric_column(INPUT_FEATURE, shape=[10])]

    config = tf.estimator.RunConfig(model_dir=TF_MODEL_DIR, save_summary_steps=10, save_checkpoints_steps=10)

    train_input_fn = tf.estimator.inputs.numpy_input_fn(
        x={INPUT_FEATURE: train_X},
        y=train_y,
        batch_size=32,
        num_epochs=10,
        shuffle=False
    )

    test_input_fn = tf.estimator.inputs.numpy_input_fn(
        x={INPUT_FEATURE: train_X},
        y=train_y,
        batch_size=32,
        num_epochs=10,
        shuffle=True,
        queue_capacity=10,
        num_threads=1
    )

    model = tf.estimator.DNNClassifier(hidden_units=[13, 65, 110],
                                       feature_columns=feature_columns,
                                       model_dir=TF_MODEL_DIR,
                                       n_classes=2, config=config
                                       )

    export_final = tf.estimator.FinalExporter(TF_EXPORT_DIR, serving_input_receiver_fn=serving_input_receiver_fn)

    train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn,
                                        max_steps=1)

    eval_spec = tf.estimator.EvalSpec(input_fn=test_input_fn,
                                      steps=1,
                                      exporters=export_final,
                                      throttle_secs=1,
                                      start_delay_secs=1)

    result = tf.estimator.train_and_evaluate(model, train_spec, eval_spec)
    print(result)

## Train and Save Network Traffic Model

In [15]:
def main(unused_args):
    
    tf.logging.set_verbosity(tf.logging.INFO)
    
    '''Reads data file and returns dataframe result'''
    dataFrme = read_dataFile()
  
    ''' Removing NA values'''
    dataFrme = preprocess_na(dataFrme)

    '''Create independent and Dependent Features'''
    X, Y = create_features_label(dataFrme)

    '''Label substitution : 'Benign'as 0, 'Bot'as 1'''
    dataFrme = label_substitution(dataFrme)

    '''Handle Class imbalancement'''
    ibtrain_X, ibtrain_y = handle_class_imbalance(X, Y)

    '''Feature Selection - Correlation Ananlysis'''
    dfcorr_features = correlation_features(ibtrain_X)

    '''Feature Selection - SelectKBest : Return best 10 features'''
    selected_X, selected_Y, final_feature = top_ten_features(dfcorr_features, ibtrain_X, ibtrain_y)

    '''Normalize data '''
    trainX, testX, trainY, testY = normalize_data(selected_X, selected_Y)
    
    '''Train and Evaluate'''
    get_model(trainX, trainY, testX, testY, final_feature)

    print('Training finished successfully')
    
if __name__ == "__main__":
    tf.app.run()

-----------------------------------------------------------------
## Final features and Data pre-process for prediction
-----------------------------------------------------------------
       FlowDuration  BwdIATTot  BwdPktLenMax  FlowIATStd  BwdPktLenMean  \
7678   4.808377e-06   0.000000      0.000000    0.000000       0.000000   
7355   4.150038e-06   0.000000      0.000000    0.000000       0.000000   
1638   4.050037e-06   0.000000      0.000000    0.000000       0.000000   
5709   4.183371e-06   0.000000      0.000000    0.000000       0.000000   
276    2.347830e-03   0.000000      0.000000    0.004407       0.000000   
...             ...        ...           ...         ...            ...   
3842   7.796737e-05   0.000075      0.076712    0.000093       0.023585   
11563  8.333409e-09   0.000000      0.000000    0.000000       0.000000   
9802   9.260917e-05   0.000089      0.076712    0.000111       0.023585   
882    9.928674e-01   0.983333      0.034932    0.013026       0

  f = msb / msw
I0421 09:58:49.229977 140556435072832 estimator.py:209] Using config: {'_model_dir': 'network/', '_tf_random_seed': None, '_save_summary_steps': 10, '_save_checkpoints_steps': 10, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd5425c41d0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


INFO:tensorflow:Not using Distribute Coordinator.


I0421 09:58:49.232479 140556435072832 estimator_training.py:186] Not using Distribute Coordinator.


INFO:tensorflow:Running training and evaluation locally (non-distributed).


I0421 09:58:49.233749 140556435072832 training.py:612] Running training and evaluation locally (non-distributed).


INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 10 or save_checkpoints_secs None.


I0421 09:58:49.234976 140556435072832 training.py:700] Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 10 or save_checkpoints_secs None.


Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.


W0421 09:58:49.256448 140556435072832 deprecation.py:323] From /home/jovyan/.local/lib/python3.6/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.


Instructions for updating:
To construct input pipelines, use the `tf.data` module.


W0421 09:58:49.268494 140556435072832 deprecation.py:323] From /home/jovyan/.local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/inputs/queues/feeding_queue_runner.py:62: QueueRunner.__init__ (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.


Instructions for updating:
To construct input pipelines, use the `tf.data` module.


W0421 09:58:49.271283 140556435072832 deprecation.py:323] From /home/jovyan/.local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/inputs/queues/feeding_functions.py:500: add_queue_runner (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.


INFO:tensorflow:Calling model_fn.


I0421 09:58:49.278822 140556435072832 estimator.py:1145] Calling model_fn.


Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


W0421 09:58:49.284144 140556435072832 deprecation.py:506] From /home/jovyan/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


Instructions for updating:
Use `tf.cast` instead.


W0421 09:58:49.881215 140556435072832 deprecation.py:323] From /home/jovyan/.local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/canned/head.py:437: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


W0421 09:58:49.947916 140556435072832 deprecation.py:323] From /home/jovyan/.local/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


W0421 09:58:50.136243 140556435072832 deprecation.py:506] From /home/jovyan/.local/lib/python3.6/site-packages/tensorflow/python/training/adagrad.py:76: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


INFO:tensorflow:Done calling model_fn.


I0421 09:58:50.174300 140556435072832 estimator.py:1147] Done calling model_fn.


INFO:tensorflow:Create CheckpointSaverHook.


I0421 09:58:50.175942 140556435072832 basic_session_run_hooks.py:541] Create CheckpointSaverHook.


INFO:tensorflow:Graph was finalized.


I0421 09:58:50.320286 140556435072832 monitored_session.py:240] Graph was finalized.


INFO:tensorflow:Running local_init_op.


I0421 09:58:50.385766 140556435072832 session_manager.py:500] Running local_init_op.


INFO:tensorflow:Done running local_init_op.


I0421 09:58:50.391565 140556435072832 session_manager.py:502] Done running local_init_op.


Instructions for updating:
To construct input pipelines, use the `tf.data` module.


W0421 09:58:50.406064 140556435072832 deprecation.py:323] From /home/jovyan/.local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py:875: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.


INFO:tensorflow:Saving checkpoints for 0 into network/model.ckpt.


I0421 09:58:50.619344 140556435072832 basic_session_run_hooks.py:606] Saving checkpoints for 0 into network/model.ckpt.


INFO:tensorflow:loss = 21.61299, step = 1


I0421 09:58:50.821840 140556435072832 basic_session_run_hooks.py:262] loss = 21.61299, step = 1


INFO:tensorflow:Saving checkpoints for 1 into network/model.ckpt.


I0421 09:58:50.823867 140556435072832 basic_session_run_hooks.py:606] Saving checkpoints for 1 into network/model.ckpt.


INFO:tensorflow:Calling model_fn.


I0421 09:58:50.889825 140556435072832 estimator.py:1145] Calling model_fn.


Instructions for updating:
Deprecated in favor of operator or tf.math.divide.


W0421 09:58:51.151136 140556435072832 deprecation.py:323] From /home/jovyan/.local/lib/python3.6/site-packages/tensorflow/python/ops/metrics_impl.py:2027: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.




W0421 09:58:51.540073 140556435072832 metrics_impl.py:804] Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.




W0421 09:58:51.557292 140556435072832 metrics_impl.py:804] Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.


INFO:tensorflow:Done calling model_fn.


I0421 09:58:51.573993 140556435072832 estimator.py:1147] Done calling model_fn.


INFO:tensorflow:Starting evaluation at 2020-04-21T09:58:51Z


I0421 09:58:51.589221 140556435072832 evaluation.py:255] Starting evaluation at 2020-04-21T09:58:51Z


INFO:tensorflow:Graph was finalized.


I0421 09:58:51.678585 140556435072832 monitored_session.py:240] Graph was finalized.


Instructions for updating:
Use standard file APIs to check for files with this prefix.


W0421 09:58:51.681171 140556435072832 deprecation.py:323] From /home/jovyan/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.


INFO:tensorflow:Restoring parameters from network/model.ckpt-1


I0421 09:58:51.683492 140556435072832 saver.py:1280] Restoring parameters from network/model.ckpt-1


INFO:tensorflow:Running local_init_op.


I0421 09:58:51.734582 140556435072832 session_manager.py:500] Running local_init_op.


INFO:tensorflow:Done running local_init_op.


I0421 09:58:51.759773 140556435072832 session_manager.py:502] Done running local_init_op.


INFO:tensorflow:Evaluation [1/1]


I0421 09:58:51.967252 140556435072832 evaluation.py:167] Evaluation [1/1]


INFO:tensorflow:Finished evaluation at 2020-04-21-09:58:52


I0421 09:58:52.136378 140556435072832 evaluation.py:275] Finished evaluation at 2020-04-21-09:58:52


INFO:tensorflow:Saving dict for global step 1: accuracy = 0.53125, accuracy_baseline = 0.53125, auc = 0.7431372, auc_precision_recall = 0.7068181, average_loss = 0.6070869, global_step = 1, label/mean = 0.46875, loss = 19.42678, precision = 0.0, prediction/mean = 0.4088927, recall = 0.0


I0421 09:58:52.138460 140556435072832 estimator.py:2039] Saving dict for global step 1: accuracy = 0.53125, accuracy_baseline = 0.53125, auc = 0.7431372, auc_precision_recall = 0.7068181, average_loss = 0.6070869, global_step = 1, label/mean = 0.46875, loss = 19.42678, precision = 0.0, prediction/mean = 0.4088927, recall = 0.0


INFO:tensorflow:Saving 'checkpoint_path' summary for global step 1: network/model.ckpt-1


I0421 09:58:52.242865 140556435072832 estimator.py:2099] Saving 'checkpoint_path' summary for global step 1: network/model.ckpt-1


INFO:tensorflow:Performing the final export in the end of training.


I0421 09:58:52.244974 140556435072832 exporter.py:410] Performing the final export in the end of training.


INFO:tensorflow:Calling model_fn.


I0421 09:58:52.255735 140556435072832 estimator.py:1145] Calling model_fn.


INFO:tensorflow:Done calling model_fn.


I0421 09:58:52.581747 140556435072832 estimator.py:1147] Done calling model_fn.


Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.


W0421 09:58:52.584490 140556435072832 deprecation.py:323] From /home/jovyan/.local/lib/python3.6/site-packages/tensorflow/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.


INFO:tensorflow:Signatures INCLUDED in export for Classify: None


I0421 09:58:52.586727 140556435072832 export_utils.py:170] Signatures INCLUDED in export for Classify: None


INFO:tensorflow:Signatures INCLUDED in export for Regress: None


I0421 09:58:52.587942 140556435072832 export_utils.py:170] Signatures INCLUDED in export for Regress: None


INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']


I0421 09:58:52.589061 140556435072832 export_utils.py:170] Signatures INCLUDED in export for Predict: ['predict']


INFO:tensorflow:Signatures INCLUDED in export for Train: None


I0421 09:58:52.590137 140556435072832 export_utils.py:170] Signatures INCLUDED in export for Train: None


INFO:tensorflow:Signatures INCLUDED in export for Eval: None


I0421 09:58:52.591190 140556435072832 export_utils.py:170] Signatures INCLUDED in export for Eval: None


INFO:tensorflow:Signatures EXCLUDED from export because they cannot be be served via TensorFlow Serving APIs:


I0421 09:58:52.592355 140556435072832 export_utils.py:173] Signatures EXCLUDED from export because they cannot be be served via TensorFlow Serving APIs:


INFO:tensorflow:'serving_default' : Classification input must be a single string Tensor; got {'FlowDuration': <tf.Tensor 'Placeholder:0' shape=(?, 1) dtype=float32>, 'BwdPktLenMax': <tf.Tensor 'Placeholder_1:0' shape=(?, 1) dtype=float32>, 'FlowIATStd': <tf.Tensor 'Placeholder_2:0' shape=(?, 1) dtype=float32>, 'BwdPktLenMean': <tf.Tensor 'Placeholder_3:0' shape=(?, 1) dtype=float32>, 'FwdPSHFlags': <tf.Tensor 'Placeholder_4:0' shape=(?, 1) dtype=float32>, 'FlowIATMean': <tf.Tensor 'Placeholder_5:0' shape=(?, 1) dtype=float32>, 'BwdIATMean': <tf.Tensor 'Placeholder_6:0' shape=(?, 1) dtype=float32>, 'FwdSegSizeMin': <tf.Tensor 'Placeholder_7:0' shape=(?, 1) dtype=float32>, 'InitBwdWinByts': <tf.Tensor 'Placeholder_8:0' shape=(?, 1) dtype=float32>, 'BwdPktLenMin': <tf.Tensor 'Placeholder_9:0' shape=(?, 1) dtype=float32>}


I0421 09:58:52.593506 140556435072832 export_utils.py:176] 'serving_default' : Classification input must be a single string Tensor; got {'FlowDuration': <tf.Tensor 'Placeholder:0' shape=(?, 1) dtype=float32>, 'BwdPktLenMax': <tf.Tensor 'Placeholder_1:0' shape=(?, 1) dtype=float32>, 'FlowIATStd': <tf.Tensor 'Placeholder_2:0' shape=(?, 1) dtype=float32>, 'BwdPktLenMean': <tf.Tensor 'Placeholder_3:0' shape=(?, 1) dtype=float32>, 'FwdPSHFlags': <tf.Tensor 'Placeholder_4:0' shape=(?, 1) dtype=float32>, 'FlowIATMean': <tf.Tensor 'Placeholder_5:0' shape=(?, 1) dtype=float32>, 'BwdIATMean': <tf.Tensor 'Placeholder_6:0' shape=(?, 1) dtype=float32>, 'FwdSegSizeMin': <tf.Tensor 'Placeholder_7:0' shape=(?, 1) dtype=float32>, 'InitBwdWinByts': <tf.Tensor 'Placeholder_8:0' shape=(?, 1) dtype=float32>, 'BwdPktLenMin': <tf.Tensor 'Placeholder_9:0' shape=(?, 1) dtype=float32>}


INFO:tensorflow:'classification' : Classification input must be a single string Tensor; got {'FlowDuration': <tf.Tensor 'Placeholder:0' shape=(?, 1) dtype=float32>, 'BwdPktLenMax': <tf.Tensor 'Placeholder_1:0' shape=(?, 1) dtype=float32>, 'FlowIATStd': <tf.Tensor 'Placeholder_2:0' shape=(?, 1) dtype=float32>, 'BwdPktLenMean': <tf.Tensor 'Placeholder_3:0' shape=(?, 1) dtype=float32>, 'FwdPSHFlags': <tf.Tensor 'Placeholder_4:0' shape=(?, 1) dtype=float32>, 'FlowIATMean': <tf.Tensor 'Placeholder_5:0' shape=(?, 1) dtype=float32>, 'BwdIATMean': <tf.Tensor 'Placeholder_6:0' shape=(?, 1) dtype=float32>, 'FwdSegSizeMin': <tf.Tensor 'Placeholder_7:0' shape=(?, 1) dtype=float32>, 'InitBwdWinByts': <tf.Tensor 'Placeholder_8:0' shape=(?, 1) dtype=float32>, 'BwdPktLenMin': <tf.Tensor 'Placeholder_9:0' shape=(?, 1) dtype=float32>}


I0421 09:58:52.594688 140556435072832 export_utils.py:176] 'classification' : Classification input must be a single string Tensor; got {'FlowDuration': <tf.Tensor 'Placeholder:0' shape=(?, 1) dtype=float32>, 'BwdPktLenMax': <tf.Tensor 'Placeholder_1:0' shape=(?, 1) dtype=float32>, 'FlowIATStd': <tf.Tensor 'Placeholder_2:0' shape=(?, 1) dtype=float32>, 'BwdPktLenMean': <tf.Tensor 'Placeholder_3:0' shape=(?, 1) dtype=float32>, 'FwdPSHFlags': <tf.Tensor 'Placeholder_4:0' shape=(?, 1) dtype=float32>, 'FlowIATMean': <tf.Tensor 'Placeholder_5:0' shape=(?, 1) dtype=float32>, 'BwdIATMean': <tf.Tensor 'Placeholder_6:0' shape=(?, 1) dtype=float32>, 'FwdSegSizeMin': <tf.Tensor 'Placeholder_7:0' shape=(?, 1) dtype=float32>, 'InitBwdWinByts': <tf.Tensor 'Placeholder_8:0' shape=(?, 1) dtype=float32>, 'BwdPktLenMin': <tf.Tensor 'Placeholder_9:0' shape=(?, 1) dtype=float32>}


INFO:tensorflow:'regression' : Regression input must be a single string Tensor; got {'FlowDuration': <tf.Tensor 'Placeholder:0' shape=(?, 1) dtype=float32>, 'BwdPktLenMax': <tf.Tensor 'Placeholder_1:0' shape=(?, 1) dtype=float32>, 'FlowIATStd': <tf.Tensor 'Placeholder_2:0' shape=(?, 1) dtype=float32>, 'BwdPktLenMean': <tf.Tensor 'Placeholder_3:0' shape=(?, 1) dtype=float32>, 'FwdPSHFlags': <tf.Tensor 'Placeholder_4:0' shape=(?, 1) dtype=float32>, 'FlowIATMean': <tf.Tensor 'Placeholder_5:0' shape=(?, 1) dtype=float32>, 'BwdIATMean': <tf.Tensor 'Placeholder_6:0' shape=(?, 1) dtype=float32>, 'FwdSegSizeMin': <tf.Tensor 'Placeholder_7:0' shape=(?, 1) dtype=float32>, 'InitBwdWinByts': <tf.Tensor 'Placeholder_8:0' shape=(?, 1) dtype=float32>, 'BwdPktLenMin': <tf.Tensor 'Placeholder_9:0' shape=(?, 1) dtype=float32>}


I0421 09:58:52.595860 140556435072832 export_utils.py:176] 'regression' : Regression input must be a single string Tensor; got {'FlowDuration': <tf.Tensor 'Placeholder:0' shape=(?, 1) dtype=float32>, 'BwdPktLenMax': <tf.Tensor 'Placeholder_1:0' shape=(?, 1) dtype=float32>, 'FlowIATStd': <tf.Tensor 'Placeholder_2:0' shape=(?, 1) dtype=float32>, 'BwdPktLenMean': <tf.Tensor 'Placeholder_3:0' shape=(?, 1) dtype=float32>, 'FwdPSHFlags': <tf.Tensor 'Placeholder_4:0' shape=(?, 1) dtype=float32>, 'FlowIATMean': <tf.Tensor 'Placeholder_5:0' shape=(?, 1) dtype=float32>, 'BwdIATMean': <tf.Tensor 'Placeholder_6:0' shape=(?, 1) dtype=float32>, 'FwdSegSizeMin': <tf.Tensor 'Placeholder_7:0' shape=(?, 1) dtype=float32>, 'InitBwdWinByts': <tf.Tensor 'Placeholder_8:0' shape=(?, 1) dtype=float32>, 'BwdPktLenMin': <tf.Tensor 'Placeholder_9:0' shape=(?, 1) dtype=float32>}




W0421 09:58:52.597010 140556435072832 export_utils.py:182] Export includes no default signature!


INFO:tensorflow:Restoring parameters from network/model.ckpt-1


I0421 09:58:52.645568 140556435072832 saver.py:1280] Restoring parameters from network/model.ckpt-1


INFO:tensorflow:Assets added to graph.


I0421 09:58:52.666398 140556435072832 builder_impl.py:661] Assets added to graph.


INFO:tensorflow:No assets to write.


I0421 09:58:52.668029 140556435072832 builder_impl.py:456] No assets to write.


INFO:tensorflow:SavedModel written to: network/export/network/temp-b'1587463132'/saved_model.pb


I0421 09:58:52.712650 140556435072832 builder_impl.py:421] SavedModel written to: network/export/network/temp-b'1587463132'/saved_model.pb


INFO:tensorflow:Loss for final step: 21.61299.


I0421 09:58:52.789323 140556435072832 estimator.py:368] Loss for final step: 21.61299.


({'accuracy': 0.53125, 'accuracy_baseline': 0.53125, 'auc': 0.7431372, 'auc_precision_recall': 0.7068181, 'average_loss': 0.6070869, 'label/mean': 0.46875, 'loss': 19.42678, 'precision': 0.0, 'prediction/mean': 0.4088927, 'recall': 0.0, 'global_step': 1}, [b'network/export/network/1587463132'])
Training finished successfully


SystemExit: 

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


## Update  storageUri in network_kfserving.yaml with pvc-name

In [16]:
pvcname = !(echo  $HOSTNAME | sed 's/.\{2\}$//')
pvc = "workspace-"+pvcname[0]
! sed -i "s/nfs/$pvc/g" network_kfserving.yaml
! cat network_kfserving.yaml

apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
  name: "network-model"
  namespace: anonymous
spec:
  default:
    predictor:
      tensorflow:
        storageUri: "pvc://workspace-network/network/export/network"


## Serving Network Traffic Model using kubeflow kfserving

In [17]:
!kubectl apply -f network_kfserving.yaml -n anonymous

inferenceservice.serving.kubeflow.org/network-model created


In [21]:
!kubectl get inferenceservices -n anonymous

NAME            URL                                                                  READY   DEFAULT TRAFFIC   CANARY TRAFFIC   AGE
network-model   http://network-model.anonymous.example.com/v1/models/network-model   True    100                                28s


#### Note:
Wait for inference service READY=\"True\"

## Predict data from serving after setting INGRESS_IP
### Note - Use one of preprocessed row values from Data pre-process for prediction output cell

In [22]:
! curl -v -H "Host: network-model.anonymous.example.com" http://<<INGRESS_IP>>:<<PORT>>/v1/models/network-model:predict -d '{"signature_name":"predict","instances":[{"FlowDuration":[0.45808569] , "BwdPktLenMax":[0.62440011] , "FlowIATStd":[2.35424276], "FwdPSHFlags":[0.45808569] , "BwdPktLenMean":[0.62440011] , "FlowIATMean":[2.35424276] , "BwdIATMean":[0.45808569] , "FwdSegSizeMin":[0.62440011] , "InitBwdWinByts":[2.35424276] , "BwdPktLenMin":[0.62440011]}]}'

*   Trying 10.30.118.172...
* TCP_NODELAY set
* Connected to 10.30.118.172 (10.30.118.172) port 31380 (#0)
> POST /v1/models/network-model:predict HTTP/1.1
> Host: network-model.anonymous.example.com
> User-Agent: curl/7.58.0
> Accept: */*
> Content-Length: 339
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 339 out of 339 bytes
< HTTP/1.1 200 OK
< content-length: 318
< content-type: application/json
< date: Tue, 21 Apr 2020 10:05:58 GMT
< x-envoy-upstream-service-time: 9568
< server: istio-envoy
< 
{
    "predictions": [
        {
            "logits": [-1.10299587],
            "class_ids": [0],
            "classes": ["0"],
            "all_class_ids": [0, 1],
            "logistic": [0.249178976],
            "all_classes": ["0", "1"],
            "probabilities": [0.750821054, 0.249178976]
        }
    ]
* Connection #0 to host 10.30.118.172 left intact
}

## Delete kfserving model & Clean up of stored models

In [23]:
!kubectl delete -f network_kfserving.yaml
!rm -rf /mnt/network
pvcname = !(echo  $HOSTNAME | sed 's/.\{2\}$//')
pvc = "workspace-"+pvcname[0]
! sed -i "s/$pvc/nfs/g" network_kfserving.yaml
! cat network_kfserving.yaml

inferenceservice.serving.kubeflow.org "network-model" deleted
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
  name: "network-model"
  namespace: anonymous
spec:
  default:
    predictor:
      tensorflow:
        storageUri: "pvc://nfs/network/export/network"
