# Credit Rating Prediction
### Deep Learning Classifiers with Neural Networks and Multi-layer Perceptrons (MLP)

## Introduction

Similar to the classifier in the second project of this portfolio, this project continues to explore the question of predicting the credit rating of a corporation given its finanical indicators. Are we able to predict the credit ratings of corporations given historical ratings data and the financials of the companies?

To answer the questions above, I built two Neural Networks with ReLU and sigmoid activation functions and a Multi-layer Perceptron Neural Network model to predict companies credit ratings based on historical ratings and financial indicators.

In [86]:
# import 
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.neural_network import MLPClassifier

In order to make rational inferences, we want the historical rating data to be sequential or have at least multiple records of each of the companies. Here we choose to include data records for companies that have more than 2 rating records in the data.

In [123]:
# import data
corporate_pd = pd.read_csv("../corporate_rating.csv")
corporate_pd = corporate_pd.drop(['Name','Date','Rating Agency Name','Sector'],axis = 1)

# drop corporations with 2 or less observations
# https://stackoverflow.com/questions/29836836/how-do-i-filter-a-pandas-dataframe-based-on-value-counts
corporate_filtered = corporate_pd.groupby('Symbol').filter(lambda x: len(x) > 2)
corporate_filtered = corporate_filtered.drop('Symbol',axis = 1)

# convert the ratings to numerical values
ratings = corporate_filtered['Rating'].unique()
values = [2,3,1,4,5,6,9,7,0,8]
corporate_filtered['Rating'].replace(ratings,values,inplace=True)
corporate_filtered.head()

Unnamed: 0,Rating,currentRatio,quickRatio,cashRatio,daysOfSalesOutstanding,netProfitMargin,pretaxProfitMargin,grossProfitMargin,operatingProfitMargin,returnOnAssets,...,effectiveTaxRate,freeCashFlowOperatingCashFlowRatio,freeCashFlowPerShare,cashPerShare,companyEquityMultiplier,ebitPerRevenue,enterpriseValueMultiple,operatingCashFlowPerShare,operatingCashFlowSalesRatio,payablesTurnover
0,2,0.945894,0.426395,0.09969,44.203245,0.03748,0.049351,0.176631,0.06151,0.041189,...,0.202716,0.437551,6.810673,9.809403,4.008012,0.049351,7.057088,15.565438,0.058638,3.906655
1,3,1.033559,0.498234,0.20312,38.991156,0.044062,0.048857,0.175715,0.066546,0.053204,...,0.074155,0.541997,8.625473,17.40227,3.156783,0.048857,6.460618,15.91425,0.067239,4.002846
2,3,0.963703,0.451505,0.122099,50.841385,0.032709,0.044334,0.170843,0.059783,0.032497,...,0.214529,0.513185,9.693487,13.103448,4.094575,0.044334,10.49197,18.888889,0.074426,3.48351
3,3,1.019851,0.510402,0.176116,41.161738,0.020894,-0.012858,0.138059,0.04243,0.02569,...,1.816667,-0.14717,-1.015625,14.440104,3.63095,-0.012858,4.080741,6.901042,0.028394,4.58115
4,3,0.957844,0.495432,0.141608,47.761126,0.042861,0.05377,0.17772,0.065354,0.046363,...,0.166966,0.451372,7.135348,14.257556,4.01278,0.05377,8.293505,15.808147,0.058065,3.85779


In [124]:
corporate_filtered_np = corporate_filtered.to_numpy()
corporate_filtered_np

array([[ 2.        ,  0.9458936 ,  0.42639463, ..., 15.56543837,
         0.05863769,  3.90665455],
       [ 3.        ,  1.03355902,  0.49823374, ..., 15.91424968,
         0.06723853,  4.00284605],
       [ 3.        ,  0.96370344,  0.45150542, ..., 18.88888889,
         0.07442633,  3.48350951],
       ...,
       [ 5.        ,  0.88387525,  0.84255282, ...,  1.5753285 ,
         0.28363421,  2.30016775],
       [ 5.        ,  0.91171323,  0.74835646, ...,  1.07444056,
         0.21778343,  1.99760765],
       [ 6.        ,  1.0850071 ,  1.02637452, ...,  2.25865015,
         0.25260643,  1.86568167]])

## Split the data into 10% testing and 90% training

In order to get the parameters of the data, we split the filtered data into 90% training and 10% testing. We will first train our models on the training data and test its prediction mse on the test set.

In [125]:
# sample 10% of the total data in the test set
def split_train_test(data):
    np.random.seed(0)
    # sample 10% of the total number of indices 
    index = np.random.choice(len(data), size = len(data) // 10, replace = False)
    # save 10% in the test set
    test = data[index,:]
    # save the rest in the training set
    train = np.delete(data, index, axis = 0)
    return train,test

In [126]:
train,test = split_train_test(corporate_filtered_np)
print(test.shape)
print(train.shape)

(173, 26)
(1565, 26)


## Split the data into ratings and features

For each of the deep learning models, the model takes in the 25 numerical features in the data and outputs the classification/ rating results. In the function defined below, we split the data into features(input) and ratings(output).

In [127]:
def split_rating_features(data):
    ratings = data[:,0]
    features = data[:,1:]
    return ratings, features

In [128]:
test_ratings, test_features = split_rating_features(test)
train_ratings, train_features = split_rating_features(train)
test_features.shape

(173, 25)

## Neural Network with ReLU activation function

Here we will build the first Neural Network with the ReLU activation function.

This model has 3 layers:
- The first layer takes in the input and sends out vectors of size 25
- We drop a portion of the nodes mid-stream
- The output layer outputs vectors of length 10 that corresponds to the 10 credit ratings

In [130]:
# build a neural network with 3 layers
# the output layer has vector size of 10 since we have 10 credit ratings
model1 = tf.keras.models.Sequential([
  tf.keras.layers.Dense(25, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

# Defining the loss function
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

model1.compile(optimizer = 'adam',
              loss = loss_fn,
              metrics = ['accuracy'])

In [136]:
# fit the model
train_features = np.asarray(train_features).astype('float32')
model1.fit(train_features, train_ratings, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x221db3e6f70>

In [132]:
# test the model
model1.evaluate(test_features, test_ratings, verbose=2)

6/6 - 0s - loss: 2086.7722 - accuracy: 0.2717 - 120ms/epoch - 20ms/step


[2086.772216796875, 0.27167630195617676]

The Neural Network model with the ReLU activiation function has accuracy of 27\% which is not so ideal in term of predicting credit ratings. We will build another Neural Network model using the Sigmoid activation function.

## Neural Network with Sigmoid activation function

In [135]:
model2 = tf.keras.models.Sequential([
  tf.keras.layers.Dense(20, activation='sigmoid'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

# Defining the loss function
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

model2.compile(optimizer = 'adam',
              loss = loss_fn,
              metrics = ['accuracy'])

# fit the model
train_features = np.asarray(train_features).astype('float32')
model2.fit(train_features, train_ratings, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x221da1b7e80>

In [137]:
# test the model
model2.evaluate(test_features,  test_ratings, verbose=2)

6/6 - 0s - loss: 1.6277 - accuracy: 0.3179 - 139ms/epoch - 23ms/step


[1.6276652812957764, 0.3179190754890442]

Compared to the ReLU model, the Sigmoid model has a higher accuracy in term of prediction. Another thing I noticed when testing out the number of epochs to train the model is that the accuracy for both models actually decreases if the number of epochs gets too big.

## Multi-layer Perceptron Neural Network

While a typical Neural Network has an input and an output layer, MLPs have the same input and output layers but may have multiple hidden layers in between.

The parameters for an MLP model using sklearn library includes:

- hidden_layer_sizes: it is a tuple where each element represents one layer and its value represents the number of neurons on each hidden layer.
- learning_rate_init: It used to controls the step-size in updating the weights. 
- activation: Activation function for the hidden layer. Examples, identity, logistic, tanh, and relu. by default, relu is used as an activation function.
- random_state: It defines the random number for weights and bias initialization. 
- verbose: It used to print progress messages to standard output.

Here we build an MLP classifier with 4 hidden layers that each has 12 neurons within the layer.

In [154]:
# MLP model
model3 = MLPClassifier(hidden_layer_sizes=(12,12,12,12),
                    random_state=5,
                    verbose=True,
                    learning_rate_init=0.01)

In [155]:
# fit the data
model3.fit(train_features,train_ratings)

Iteration 1, loss = 3.72984181
Iteration 2, loss = 2.45539575
Iteration 3, loss = 2.41872758
Iteration 4, loss = 2.42863294
Iteration 5, loss = 2.33681491
Iteration 6, loss = 2.09645417
Iteration 7, loss = 2.00837448
Iteration 8, loss = 1.93551072
Iteration 9, loss = 1.89590582
Iteration 10, loss = 1.89562857
Iteration 11, loss = 1.86693820
Iteration 12, loss = 1.79030502
Iteration 13, loss = 1.76603240
Iteration 14, loss = 1.71707904
Iteration 15, loss = 1.68011463
Iteration 16, loss = 1.67766815
Iteration 17, loss = 1.65139638
Iteration 18, loss = 1.64835176
Iteration 19, loss = 1.62308430
Iteration 20, loss = 1.62282310
Iteration 21, loss = 1.61585122
Iteration 22, loss = 1.61054233
Iteration 23, loss = 1.60315863
Iteration 24, loss = 1.59611257
Iteration 25, loss = 1.59021970
Iteration 26, loss = 1.58569127
Iteration 27, loss = 1.57938786
Iteration 28, loss = 1.57313440
Iteration 29, loss = 1.56804517
Iteration 30, loss = 1.56376281
Iteration 31, loss = 1.55562172
Iteration 32, los

In [156]:
# calculate classification mse
def classification_mse(class_truth, pred_class):
    error = 0
    for i in range(len(class_truth)):
        if class_truth[i] != pred_class[i]:
            error = error + 1
    return error/len(class_truth)

In [157]:
# make predictions on the test set
preds=model3.predict(test_features)

# compute the classification mse 
print("The accuracy of the MLP model is ", 1-classification_mse(test_ratings, preds))

print(preds)

The accuracy of the MLP model is  0.32369942196531787
[3. 4. 3. 3. 3. 3. 3. 5. 2. 4. 3. 3. 3. 4. 8. 3. 3. 3. 4. 3. 2. 5. 3. 3.
 4. 3. 3. 3. 4. 3. 3. 3. 4. 4. 3. 4. 3. 3. 5. 4. 4. 3. 4. 3. 3. 3. 4. 4.
 4. 4. 3. 3. 3. 4. 3. 4. 3. 3. 3. 3. 5. 3. 3. 3. 3. 3. 3. 4. 4. 3. 3. 3.
 1. 3. 3. 3. 3. 4. 3. 3. 2. 3. 4. 3. 3. 3. 4. 3. 4. 3. 4. 3. 4. 3. 3. 3.
 3. 3. 3. 2. 2. 3. 3. 3. 3. 3. 3. 5. 4. 3. 3. 2. 3. 4. 3. 3. 3. 3. 3. 3.
 2. 5. 3. 2. 4. 3. 5. 3. 3. 3. 4. 3. 4. 4. 3. 3. 3. 2. 3. 3. 4. 3. 3. 4.
 3. 3. 3. 3. 5. 3. 3. 3. 3. 3. 4. 2. 3. 3. 3. 4. 4. 3. 3. 4. 3. 3. 3. 3.
 3. 3. 3. 5. 3.]


Under the default ReLU activation function, the MLP model does not improve much in terms of accuracy compared to the Sigmoid model above. Both models have an accuracy of around 32%. The results show that the model seems to be predicting a lot of A-ratings for the corporations. 

## Summary

Of all three neural networks we created, none of them have a satisfiactory accuracy of over 50% (even lower than the random forest classifier in the last project). This is mainly due to the fact that the models are based on a limited number of data points and there are more complicated features than just the financial performance indicators to predict the credit ratings of companies. Future improvements can be feeding more data and including more features in the data set to train the model.

### References
- Multilayer Perceptron Neural Network Tutorial: https://machinelearninggeek.com/multi-layer-perceptron-neural-network-using-python/
- Select observations based on value counts:  https://stackoverflow.com/questions/29836836/how-do-i-filter-a-pandas-dataframe-based-on-value-counts