# APS SYSTEM FAILURE PREDICTION IN SCANIA TRUCKS

The data set consists of data collected from heavy Scania trucks in everyday usage. The system in focus is the Air Pressure system (APS) which generates pressurized air that is utilized in various functions in a truck, such as braking and gear changes. The data sets' positive class consists of component failures for a specific component of the APS system. The negative class consists of trucks with failures for components not related to the APS. The data consists of a subset of all available data, selected by experts. The training set contains 60000 examples in total in which 59000 belong to the negative class and 1000 positive class. The test set contains 16000 examples. There are 171 attributes per record. It was imported from the UCI ML Repository https://archive.ics.uci.edu/ml/datasets/APS+Failure+at+Scania+Trucks

In [9]:
! wget https://archive.ics.uci.edu/ml/machine-learning-databases/00421/aps_failure_training_set.csv
! wget https://archive.ics.uci.edu/ml/machine-learning-databases/00421/aps_failure_test_set.csv

# Deep Learning Model with Keras Library

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
% matplotlib inline
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from keras.preprocessing import sequence
from keras.models import load_model
from keras.layers import Dense
from keras.layers import Input, LSTM
from keras.models import Model
from keras.models import Sequential
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score

train = pd.read_csv("aps_failure_training_set.csv", skiprows=20)
test = pd.read_csv("aps_failure_test_set.csv", skiprows=20)


def pre_process(df):
    df.replace('na', np.nan, inplace = True)
    df.fillna(0, inplace = True)
    #Get result field out and replace neg with 0 and pos with 1
    result = df["class"]
    df3 = result.replace('neg',0,inplace = True)
    df4 = result.replace('pos',1,inplace = True)
    df_numeric = df.astype(float)
    return df_numeric



df = pre_process(train)
test_df = pre_process(test)

X = df.drop("class", axis=1)
Y = df["class"]

X_test = test_df.drop("class", axis=1)
Y_test = test_df["class"]


batch_size = 64
epochs = 120

# Baseline model for the neural network. We choose a hidden layer of 10 neurons. The lesser number of neurons helps to eliminate the redundancies in the data and select the more important features.
def create_baseline():
    # create model
    model = Sequential()
    model.add(Dense(10, input_dim=170, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))
    #compile model. We use the logarithmic loss function, and the Adam gradient optimizer.
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# Evaluate model using standardized dataset. 
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=create_baseline, epochs=epochs, batch_size=batch_size, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Results: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))



Using TensorFlow backend.


Results: 99.17% (0.08%)


## Fit model on training set to predict the test set

In [2]:
prediction = pipeline.fit(X, Y)

test_prediction = pipeline.predict(X_test)

from sklearn.metrics import accuracy_score
accuracy = accuracy_score(Y_test, test_prediction)
print("accuracy of the model on test data set: %.2f%% " % (accuracy.mean()*100))



accuracy of the model on test data set: 98.98% 
