### Multilayer Perceptron Classifier
* Multilayer perceptron classifier (MLPC) is a classifier based on the FeedForward Artififcial Neural Network. MLPC consist of multiple layers of nodes. Each layer is fully connected to the next layer in the network. Nodes in the input layer represent the input data. All other nodes map inputs to outputs by a linear combination of the inputs with the node's weight w and bias b and applying an activation function. This can be written in matrix form for MLPC with k + 1 layers. The number of nodes in the output layer corresponds to the number of classes. MLPC employs backpropagation for learning the model. We use the logsitic loss function for optimization.

In [None]:
from pyspark.ml.classification import MultilayerPerceptronClassifier
from pyspark.sql import SparkSession
from pyspark.ml.evaluation import MulticlassClassificationEvaluator

In [None]:
spark = SparkSession.builder.appName("Multilayer Perceptron classifier").getOrCreate()
data = spark.read.csv("Data/loan_data.csv", header=True, inferSchema=True)
data.show()

In [None]:
data.columns

In [None]:
data = data.drop("purpose")
data.show()

In [None]:
#Feature assembler
from pyspark.ml.feature import VectorAssembler
featureassembler = VectorAssembler(inputCols = ['credit_policy',
                                                    'int_rate',
                                                    'installment',
                                                    'log_annual_inc',
                                                    'dti',
                                                    'fico',
                                                    'days_with_cr_line',
                                                    'revol_bal',
                                                    'revol_util',
                                                    'inq_last_6mths',
                                                    'delinq_2yrs',
                                                    'pub_rec'], outputCol = 'features')

output = featureassembler.transform(data)
output.show()

In [None]:
finalized_data = output.select("features", "not_fully_paid")
finalized_data.show()

In [None]:
train, test = finalized_data.randomSplit([0.6, 0.4], 1234)
#Specify layers for the nueral network:
#input layer of size 4 (features), two intermediate of size 5 and 4 and output size 3 (classes)
layers = [2, 5, 4, 2]

* Model Training

In [None]:
trainer = MultilayerPerceptronClassifier(featuresCol = "features", labelCol = "not_fully_paid",maxIter = 100, layers = layers, blockSize = 128, seed = 1234)

In [None]:
model = trainer.fit(train)

In [None]:
#Compute accuracy on the test set
result = model.transform(test)

In [None]:
result.show()

* Model Evaluations

In [None]:
MulticlassClassificationEvaluator()

In [None]:
#Model Evaluations
predictionAndLabels = result.select("prediction", "not_fully_paid")
evaluator = MulticlassClassificationEvaluator(labelCol ="not_fully_paid" ,metricName="accuracy")

In [None]:
print("Test set Accuracy = " + {str(evaluator.evaluate(predictionAndLabels))})