## IT Operations: Root Cause Analysis

A data center team wants to build a model to predict causes of issues reported by customers.
They use a system monitoring tool to track CPU, memory, and application latency for all their servers.
In addition, they also track specific errors reported by applications

#### Problem Statement
Using data about CPU load, memory load, network delays, and three types of errors, build a model to predict root cause of error
A data set is available for each incident, indicating if any load issues or errors was observed during that time

### Loading the Dataset

In [25]:
import pandas as pd
import os
import tensorflow as tf

data = pd.read_csv('root_cause_analysis.csv')

print(data.dtypes)
data.head()

ID                   int64
CPU_LOAD             int64
MEMORY_LEAK_LOAD     int64
DELAY                int64
ERROR_1000           int64
ERROR_1001           int64
ERROR_1002           int64
ERROR_1003           int64
ROOT_CAUSE          object
dtype: object


Unnamed: 0,ID,CPU_LOAD,MEMORY_LEAK_LOAD,DELAY,ERROR_1000,ERROR_1001,ERROR_1002,ERROR_1003,ROOT_CAUSE
0,1,0,0,0,0,1,0,1,MEMORY_LEAK
1,2,0,0,0,0,0,0,1,MEMORY_LEAK
2,3,0,1,1,0,0,1,1,MEMORY_LEAK
3,4,0,1,0,1,1,0,1,MEMORY_LEAK
4,5,1,1,0,1,0,1,0,NETWORK_DELAY


### Convert Data
Input data needs to be converted to formats that can be consumed by ML algorithms

In [26]:
from sklearn import preprocessing
from sklearn.model_selection import train_test_split

label_encoder = preprocessing.LabelEncoder()
data['ROOT_CAUSE'] = label_encoder.fit_transform(data['ROOT_CAUSE'])

#Convert pandas dataframe to numpy vector
np_symptom = data.to_numpy().astype(float)

#Extract the feature variables (X)
X = np_symptom[:,1:8]

#Extract the target variable (Y), convert to to one-hot-encoding
Y = np_symptom[:,8]
Y = tf.keras.utils.to_categorical(Y,3)

#Splitting trainig and test data
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.10)

print('Shape of feature variables: ', X_train.shape)
print('Shape of target variable: ', Y_train.shape)

Shape of feature variables:  (900, 7)
Shape of target variable:  (900, 3)


### Building and Evaluating the Model

In [27]:
from tensorflow import keras
from tensorflow.keras import optimizers
from tensorflow.keras.regularizers import l2

#Setup training parameters
Epochs=20
Batch_Size=64
Verbose=1
Output_Classes=len(label_encoder.classes_)
N_Hidden=128
Validation_Split=0.2

#Create a Keras sequential model
model = tf.keras.models.Sequential()
#Add a Dense layer
model.add(keras.layers.Dense(N_Hidden,
                            input_shape=(7,),
                            name='Dense-Layer-1',
                            activation='relu'))

#Add a second dense layer
model.add(keras.layers.Dense(N_Hidden,
                              name='Dense-Layer-2',
                              activation='relu'))

#Add a softmax layer for categorial prediction
model.add(keras.layers.Dense(Output_Classes,
                             name='Final',
                             activation='softmax'))

#Compile the model
model.compile(
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

#Build the model
model.fit(X_train,
          Y_train,
          batch_size=Batch_Size,
          epochs=Epochs,
          verbose=Verbose,
          validation_split=Validation_Split)


#Evaluate the model against the test dataset and print results
print("\nEvaluation against Test Dataset :\n")
model.evaluate(X_test,Y_test)

Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 Dense-Layer-1 (Dense)       (None, 128)               1024      
                                                                 
 Dense-Layer-2 (Dense)       (None, 128)               16512     
                                                                 
 Final (Dense)               (None, 3)                 387       
                                                                 
Total params: 17,923
Trainable params: 17,923
Non-trainable params: 0
_________________________________________________________________
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20

Evaluation against Test Dataset :



[0.5805673599243164, 0.75]

### Predicting Root Causes

In [30]:
#Pass individual flags to predict the root cause
import numpy as np

CPU_Load=1
Memory_Load=0
Delay=0
Error_1000=0
Error_1001=1
Error_1002=2
Error_1003=3

prediction = np.argmax(model.predict(
    [[CPU_Load,Memory_Load,Delay,
      Error_1000,Error_1001,Error_1002,Error_1003]]), axis=1)

print(label_encoder.inverse_transform(prediction))

['DATABASE_ISSUE']


In [31]:
#Predicting as a Batch
print(label_encoder.inverse_transform(np.argmax(
        model.predict([[1,0,0,0,1,1,0],
                                [0,1,1,1,0,0,0],
                                [1,1,0,1,1,0,1],
                                [0,0,0,0,0,1,0],
                                [1,0,1,0,1,1,1]]), axis=1 )))

['DATABASE_ISSUE' 'NETWORK_DELAY' 'MEMORY_LEAK' 'DATABASE_ISSUE'
 'DATABASE_ISSUE']
