# Using the skills I learned from 365 Data Science Bootcamp, I have created a machine learning model that recognizes fraudulent credit card transactions. Dataset downloaded from https://www.kaggle.com/mlg-ulb/creditcardfraud

The .csv summarises the data. Columns V1, V2, … V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Column 'Time' contains the seconds taken between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. Column 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.

This is a classification problem with two classes. The main idea is to detect more accurately of both fraud and non-fraud transactions.

## Create the machine learning algorithm

### Import the relevant libraries

In [1]:
import numpy as np
import tensorflow as tf

### Data

In [2]:
npz = np.load('creditcard_fraud_train.npz')

# extract the inputs using the keyword under which we saved them
# cast the inputs to float
train_inputs = npz['inputs'].astype(np.float)
# cast the targets to sparse_categorical_crossentropy so we can smoothly one-hot encode them
train_targets = npz['targets'].astype(np.int)

# load the validation data in the temporary variable
npz = np.load('creditcard_fraud_validation.npz')
validation_inputs, validation_targets = npz['inputs'].astype(np.float), npz['targets'].astype(np.int)

npz = np.load('creditcard_fraud_test.npz')
test_inputs, test_targets = npz['inputs'].astype(np.float), npz['targets'].astype(np.int)

### Model
Outline, optimizers, loss, early stopping and training

In [4]:
# Set the input and output sizes
input_size = 30
output_size = 2
hidden_layer_size = 50
    
# define how the model will look like
model = tf.keras.Sequential([
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

batch_size = 100

# set a maximum number of training epochs
max_epochs = 100

# set an early stopping mechanism
# set patience=2, to be a bit tolerant against random validation loss increases
early_stopping = tf.keras.callbacks.EarlyStopping(patience=2)

# fit the model
model.fit(train_inputs,
          train_targets,
          batch_size=batch_size,
          epochs=max_epochs,
          #check if val_loss is increasing
          callbacks=[early_stopping], # early stopping
          validation_data=(validation_inputs, validation_targets), # validation data
          verbose = 2 # making sure we get enough information about the training process
          )  

Epoch 1/100
8/8 - 0s - loss: 0.8062 - accuracy: 0.4828 - val_loss: 0.5994 - val_accuracy: 0.7041
Epoch 2/100
8/8 - 0s - loss: 0.5231 - accuracy: 0.8158 - val_loss: 0.4168 - val_accuracy: 0.9184
Epoch 3/100
8/8 - 0s - loss: 0.3740 - accuracy: 0.8983 - val_loss: 0.3055 - val_accuracy: 0.9388
Epoch 4/100
8/8 - 0s - loss: 0.2814 - accuracy: 0.9301 - val_loss: 0.2315 - val_accuracy: 0.9592
Epoch 5/100
8/8 - 0s - loss: 0.2191 - accuracy: 0.9390 - val_loss: 0.1812 - val_accuracy: 0.9592
Epoch 6/100
8/8 - 0s - loss: 0.1772 - accuracy: 0.9454 - val_loss: 0.1448 - val_accuracy: 0.9592
Epoch 7/100
8/8 - 0s - loss: 0.1458 - accuracy: 0.9581 - val_loss: 0.1191 - val_accuracy: 0.9592
Epoch 8/100
8/8 - 0s - loss: 0.1235 - accuracy: 0.9606 - val_loss: 0.0995 - val_accuracy: 0.9796
Epoch 9/100
8/8 - 0s - loss: 0.1060 - accuracy: 0.9644 - val_loss: 0.0845 - val_accuracy: 0.9796
Epoch 10/100
8/8 - 0s - loss: 0.0929 - accuracy: 0.9670 - val_loss: 0.0734 - val_accuracy: 0.9694
Epoch 11/100
8/8 - 0s - loss:

<tensorflow.python.keras.callbacks.History at 0x1be1f983100>

## Test the model
Test the final prediction power of the model by running it on the test dataset that the algorithm has never seen before

In [5]:
test_loss, test_accuracy = model.evaluate(test_inputs, test_targets)



In [6]:
print('\nTest loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))


Test loss: 0.04. Test accuracy: 97.98%
