### Breast cancer prediction example

In this notebook, we'll walk through the process of training a neural network to classify breast cancer as malignant or benign based on a dataset of patient features.

In [20]:
@file:DependsOn("../target/kgrad-1.0.jar")

import kgrad.Value
import kgrad.mlp.MLP

In [21]:
%useLatestDescriptors
%use dataframe

Define the path to the CSV file containing the breast cancer dataset

In [22]:
val path = "breast_cancer.csv"

Read the CSV file into a DataFrame

In [23]:
var breastCancerDataset = DataFrame.readCSV(path)

Update the 'diagnosis' column: Replace 'M' (Malignant) with '0.0' and 'B' (Benign) with '1.0'

In [24]:
breastCancerDataset = breastCancerDataset
    .update { col(1) }.where { get(df().getColumn(1)) == "M" }.with { "0.0" }
    .update { col(1) }.where { get(df().getColumn(1)) == "B" }.with { "1.0" }

Parse the 'diagnosis' column to ensure it is treated as numeric

In [25]:
breastCancerDataset = breastCancerDataset.parse("diagnosis")

Print the summary statistics and the first few rows of the dataset

In [26]:
breastCancerDataset.describe()

name,type,count,unique,nulls,top,freq,mean,std,min,median,max
id,Int,569,569,0,842302.0,1,30371831.432337,125020585.612224,8670.0,906024.0,911320502.0
diagnosis,Double,569,2,0,1.0,357,0.627417,0.483918,0.0,1.0,1.0
radius_mean,Double,569,456,0,12.34,4,14.127292,3.524049,6.981,13.37,28.11
texture_mean,Double,569,479,0,15.7,3,19.289649,4.301036,9.71,18.84,39.28
perimeter_mean,Double,569,522,0,82.61,3,91.969033,24.298981,43.79,86.24,188.5
area_mean,Double,569,539,0,512.2,3,654.889104,351.914129,143.5,551.1,2501.0
smoothness_mean,Double,569,474,0,0.1007,5,0.09636,0.014064,0.05263,0.09587,0.1634
compactness_mean,Double,569,537,0,0.1206,3,0.104341,0.052813,0.01938,0.09263,0.3454
concavity_mean,Double,569,537,0,0.0,13,0.088799,0.07972,0.0,0.06154,0.4268
concave points_mean,Double,569,542,0,0.0,13,0.048919,0.038803,0.0,0.0335,0.2012


In [27]:
breastCancerDataset.head()

id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,fractal_dimension_mean,radius_se,texture_se,perimeter_se,area_se,smoothness_se,compactness_se,concavity_se,concave points_se,symmetry_se,fractal_dimension_se,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,untitled
842302,0.0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,1.095,0.9053,8.589,153.4,0.006399,0.04904,0.05373,0.01587,0.03003,0.006193,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
842517,0.0,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,0.5435,0.7339,3.398,74.08,0.005225,0.01308,0.0186,0.0134,0.01389,0.003532,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
84300903,0.0,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,0.7456,0.7869,4.585,94.03,0.00615,0.04006,0.03832,0.02058,0.0225,0.004571,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
84348301,0.0,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,0.4956,1.156,3.445,27.23,0.00911,0.07458,0.05661,0.01867,0.05963,0.009208,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
84358402,0.0,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,0.7572,0.7813,5.438,94.44,0.01149,0.02461,0.05688,0.01885,0.01756,0.005115,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,


Normalize the feature columns (excluding the 'diagnosis' column)

Normalization is done using the formula: (value - mean) / standard deviation

In [28]:
breastCancerDataset = breastCancerDataset.update { cols(2 until breastCancerDataset.columnsCount() - 1) }
    .perRowCol { row, col -> ((row[col.name()] as Double) - (col as DataColumn<Double>).mean()) / col.std() }

Print the first few rows of the normalized dataset

In [29]:
breastCancerDataset.head()

id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,fractal_dimension_mean,radius_se,texture_se,perimeter_se,area_se,smoothness_se,compactness_se,concavity_se,concave points_se,symmetry_se,fractal_dimension_se,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,untitled
842302,0.0,1.0961,-2.071512,1.268817,0.98351,1.567087,3.280628,2.650542,2.530249,2.215566,2.253764,2.487545,-0.564768,2.83054,2.485391,-0.213814,1.315704,0.72339,0.660239,1.147747,0.906286,1.885031,-1.358098,2.301575,1.999478,1.306537,2.614365,2.107672,2.294058,2.748204,1.935312,
842517,0.0,1.828212,-0.353322,1.684473,1.90703,-0.826235,-0.486643,-0.023825,0.547662,0.001391,-0.867889,0.498816,-0.875473,0.263095,0.741749,-0.604819,-0.692317,-0.440393,0.259933,-0.804742,-0.099356,1.80434,-0.368879,1.533776,1.888827,-0.375282,-0.430066,-0.14662,1.086129,-0.243675,0.280943,
84300903,0.0,1.578499,0.455786,1.565126,1.557513,0.941382,1.052,1.36228,2.03544,0.938859,-0.397658,1.227596,-0.779398,0.85018,1.180298,-0.296744,0.814257,0.212889,1.423575,0.236827,0.293301,1.510541,-0.023953,1.346291,1.455004,0.526944,1.08198,0.854222,1.953282,1.151242,0.201214,
84348301,0.0,-0.768233,0.253509,-0.592166,-0.763792,3.280667,3.399917,1.914213,1.450431,2.864862,4.906602,0.326087,-0.110312,0.286341,-0.288125,0.689095,2.741868,0.818798,1.114027,4.72852,2.045711,-0.281217,0.133866,-0.24972,-0.549538,3.391291,3.889975,1.987839,2.173873,6.040726,4.930672,
84358402,0.0,1.748758,-1.150804,1.775011,1.824624,0.280125,0.538866,1.369806,1.427237,-0.009552,-0.561956,1.269426,-0.789549,1.27207,1.18931,1.481763,-0.048477,0.827742,1.143199,-0.360775,0.498889,1.297434,-1.465481,1.337363,1.219651,0.220362,-0.313119,0.61264,0.728618,-0.86759,-0.396751,


Split the dataset into training (80%) and testing (20%) sets

In [30]:
val trainDataset = breastCancerDataset[0 until (breastCancerDataset.rowsCount() * 0.8).toInt()]
val testDataset = breastCancerDataset[(breastCancerDataset.rowsCount() * 0.8).toInt() until breastCancerDataset.rowsCount()]

Extract the target variable 'y' for the training set and convert to a list of Values

In [31]:
val ys = trainDataset.select { col(1) }.values().toList().map { Value(it as Double) }

Extract the feature variables 'x' for the training set, convert to a list of Values, and chunk by number of features

In [32]:
val xs = trainDataset.select { cols(2 until trainDataset.columnsCount() - 1) }.values(byRows = true).toList().map {
    Value(it as Double)
}.chunked(trainDataset.columnsCount() - 3)

Extract the target variable 'y' for the testing set and convert to a list of Values

In [33]:
val yTest = testDataset.select { col(1) }.values().toList().map { Value(it as Double) }

Extract the feature variables 'x' for the testing set, convert to a list of Values, and chunk by number of features

In [34]:
val xTest = testDataset.select { cols(2 until testDataset.columnsCount() - 1) }.values(byRows = true).toList().map {
    Value(it as Double)
}.chunked(testDataset.columnsCount() - 3)

Initialize a Multi-Layer Perceptron (MLP) with 30 input features, 15 hidden units, and 1 output unit

In [35]:
val n = MLP(30, listOf(15, 1))

Print the number of parameters in the network

In [36]:
println("Number of network params: ${n.parameters().size}")

Number of network params: 481


Define the number of training iterations

In [37]:
val learningIterations = 100

Training loop for the specified number of iterations

In [38]:
for (i in 0..learningIterations) {

    // Forward pass: compute the predicted values
    val yPred = xs.flatMap { n(it) }

    // Compute the loss using Mean Squared Error (MSE)
    val loss = ys.zip(yPred) { y, pred ->
        (y - pred).pow(2.0)
    }.reduce { a, b -> a + b }

    // Zero the gradients of the network before updating
    n.zeroGrad()

    // Backward pass: compute the gradients
    loss.backward()

    // Define the learning rate with a decay schedule
    val learningRate = max(
        0.002 * (((learningIterations) - i).toDouble() / (learningIterations).toDouble()),
        0.0001
    )

    // Update the network parameters using the computed gradients
    n.parameters().forEach {
        it.data += -learningRate * it.grad
    }

    // Print progress every 10 iterations
    if (i % 10 == 0) {
        // Evaluate the model on the test set
        val yPredTest = xTest.flatMap { n(it) }
        val correctPredictions = yTest.zip(yPredTest) { y, pred ->
            if (y.data.toInt() == pred.data.roundToInt()) {
                1
            } else {
                0
            }
        }.reduce { a, b -> a + b }

        // Print the current iteration, loss, learning rate, and accuracy on the test set
        println("$i\tloss: ${"%.2f".format(loss.data)}" +
                "\t\tlearningRate: ${"%.8f".format(learningRate)}\t\t" +
                "accuracy: ${"%.2f".format((correctPredictions.toDouble() / yPredTest.size.toDouble()) * 100)}%" +
                " ($correctPredictions/${yPredTest.size})")
    }
}

0	loss: 1011.24		learningRate: 0.00200000		accuracy: 34.21% (39/114)
10	loss: 58.15		learningRate: 0.00180000		accuracy: 85.96% (98/114)
20	loss: 64.69		learningRate: 0.00160000		accuracy: 85.09% (97/114)
30	loss: 78.91		learningRate: 0.00140000		accuracy: 93.86% (107/114)
40	loss: 51.69		learningRate: 0.00120000		accuracy: 91.23% (104/114)
50	loss: 36.63		learningRate: 0.00100000		accuracy: 96.49% (110/114)
60	loss: 20.52		learningRate: 0.00080000		accuracy: 96.49% (110/114)
70	loss: 16.26		learningRate: 0.00060000		accuracy: 97.37% (111/114)
80	loss: 15.62		learningRate: 0.00040000		accuracy: 97.37% (111/114)
90	loss: 15.28		learningRate: 0.00020000		accuracy: 97.37% (111/114)
100	loss: 15.15		learningRate: 0.00010000		accuracy: 97.37% (111/114)


### Training Loop detailed explanation:
The training loop runs for a specified number of iterations (100 in this case). Here’s what happens in each iteration:

1. **Forward Pass:**
   
     ```kotlin
     val yPred = xs.flatMap { n(it) }
     ```
          
The network takes the input features (`xs`) and computes the predicted values (`yPred`). The `flatMap` function applies the neural network (`n`) to each input in `xs` and flattens the resulting list.

2. **Loss Computation:**
   
     ```kotlin
     val loss = ys.zip(yPred) { y, pred ->
         (y - pred).pow(2.0)
     }.reduce { a, b -> a + b }
     ```
     
The loss is calculated using the Mean Squared Error (MSE) between the actual values (`ys`) and the predicted values (`yPred`). The `zip` function pairs each actual value with its corresponding predicted value, computes the squared difference (`pow(2.0)`), and then the `reduce` function sums these differences to get the total loss.

3. **Zero Gradients:**
   
     ```kotlin
     n.zeroGrad()
     ```
     
Before computing the gradients for the current iteration, the gradients from the previous iteration are reset to zero. This is necessary to prevent accumulation of gradients across iterations.

4. **Backward Pass:**
   
     ```kotlin
     loss.backward()
     ```
     
The backward pass computes the gradients of the loss with respect to each parameter in the network. This process uses backpropagation to propagate the error backward through the network.

5. **Parameter Update:**

     ```kotlin

     val learningRate = max(
         0.002 * (((learningIterations) - i).toDouble() / (learningIterations).toDouble()),
         0.0001
     )
     ```
     
The learning rate is defined with a decay schedule. It starts at 0.002 and decreases linearly to a minimum value of 0.0001 as the number of iterations increases. This helps in stabilizing the training process by reducing the step size as the training progresses.

   - **Update the Parameters:**
   
     ```kotlin
     n.parameters().forEach {
         it.data += -learningRate * it.grad
     }
     ```
     
Each parameter in the network is updated using gradient descent. The new value of each parameter is obtained by subtracting the product of the learning rate and the gradient from the current value of the parameter.

6. **Evaluation (every 10 iterations):**
   - **Evaluate the Model:**
   
     ```kotlin
     if (i % 10 == 0) {
         val yPredTest = xTest.flatMap { n(it) }
         val correctPredictions = yTest.zip(yPredTest) { y, pred ->
             if (y.data.toInt() == pred.data.roundToInt()) {
                 1
             } else {
                 0
             }
         }.reduce { a, b -> a + b }
     ```
     
     
Every 10 iterations, the model is evaluated on the test set. The network computes the predicted values (`yPredTest`) for the test inputs (`xTest`).

   - **Calculate Accuracy:**
   
     ```kotlin
         println("$i\tloss: ${"%.2f".format(loss.data)}" +
                 "\t\tlearningRate: ${"%.8f".format(learningRate)}\t\t" +
                 "accuracy: ${"%.2f".format((correctPredictions.toDouble() / yPredTest.size.toDouble()) * 100)}%" +
                 " ($correctPredictions/${yPredTest.size})")
         }
     ```
     
The accuracy of the model on the test set is calculated by comparing the predicted values (`yPredTest`) with the actual values (`yTest`). The `zip` function pairs each actual value with its corresponding predicted value, and the `reduce` function counts the number of correct predictions. The accuracy is then printed along with the current iteration number, loss, and learning rate.
