<a href="https://colab.research.google.com/github/dajuctech/Applied-AI-Course/blob/main/DL_Lab_2_Loss_Function.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Breast Cancer Diagnosis with NN

This notebook contains practical examples and exercises for Applied AI-DL and Optimisation.

*Created by Hansi Hettiarachchi*

*Updated by Muhammad Afzal, Feb 2025*



This tutorial will guide you through the process of building and optimising neural network models targeting a real-world problem.

**Importing Libraries and Setting Seeds**

In [5]:
# import libraries
import pandas as pd
import seaborn as sns

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

from sklearn import metrics

# set random seeds to get reproducible results
import os
seed = 100
os.environ['PYTHONHASHSEED']=str(seed)
keras.utils.set_random_seed(seed) # set all random seeds for the program (Python, NumPy, and TensorFlow)

Import necessary libraries and set a seed for reproducibility, ensuring the model produces consistent results.

## Understanding the problem and data set

I use [Breast Cancer Wisconsin (Diagnostic) Data Set
](https://www.kaggle.com/uciml/breast-cancer-wisconsin-data) for this tutorial.

Features available with this data set are computed from a digitised image of a fine needle aspirate (FNA) of a breast mass. They describe the characteristics of the cell nuclei present in the image.<br>
The diagnosed labels are 'M' and 'B', which correspond to malignant and benign.  

The targeted problem is to predict the tumour type given the features computed from digitised images. Let's train a simple neural network to make this prediction.

### Load and analyse the data set

In [2]:
# load the data set
# As the parameter, the file path should be provided. Additionally, GitHub URL can also be provided as follows.
df = pd.read_csv('https://raw.githubusercontent.com/HHansi/Applied-AI-Course/main/DL/data/cancer_data.csv')

# summarise the details
print(f'Number of entries: {len(df)}')
df.head()

Number of entries: 569


Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


To load the dataset and examine its contents.

## Extracting labels and features

In [6]:
# extract labels
y = df['diagnosis']

print(y.value_counts())

diagnosis
B    357
M    212
Name: count, dtype: int64


Separate the target variable (y) and understand the class distribution.

In [8]:
# remove unnecessary columns
X = df.drop(['id', 'diagnosis'], axis=1)

print(X.info())
X.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 30 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   radius_mean              569 non-null    float64
 1   texture_mean             569 non-null    float64
 2   perimeter_mean           569 non-null    float64
 3   area_mean                569 non-null    float64
 4   smoothness_mean          569 non-null    float64
 5   compactness_mean         569 non-null    float64
 6   concavity_mean           569 non-null    float64
 7   concave points_mean      569 non-null    float64
 8   symmetry_mean            569 non-null    float64
 9   fractal_dimension_mean   569 non-null    float64
 10  radius_se                569 non-null    float64
 11  texture_se               569 non-null    float64
 12  perimeter_se             569 non-null    float64
 13  area_se                  569 non-null    float64
 14  smoothness_se            5

Unnamed: 0,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,fractal_dimension_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


Prepare the dataset by removing unnecessary columns and keeping only relevant features.

Since we have characters ('M' and 'B') as labels, they need to be converted into numeric values. <br>
This can be easily done using a LabelEncoder

In [9]:
# create LabelEncoder for labels
le = LabelEncoder()
le.fit(y)

Convert categorical labels into numeric form, which is required for machine learning models.

In [10]:
# Convert labels into numeric values
y = le.transform(y)

y = pd.Series(y)
print(y.value_counts())

0    357
1    212
Name: count, dtype: int64


Ensure that the labels are in numeric format before using them in a machine learning model.

In [11]:
# remove unnecessary columns
X = df.drop(['id', 'diagnosis'], axis=1)

print(X.info())
X.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 30 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   radius_mean              569 non-null    float64
 1   texture_mean             569 non-null    float64
 2   perimeter_mean           569 non-null    float64
 3   area_mean                569 non-null    float64
 4   smoothness_mean          569 non-null    float64
 5   compactness_mean         569 non-null    float64
 6   concavity_mean           569 non-null    float64
 7   concave points_mean      569 non-null    float64
 8   symmetry_mean            569 non-null    float64
 9   fractal_dimension_mean   569 non-null    float64
 10  radius_se                569 non-null    float64
 11  texture_se               569 non-null    float64
 12  perimeter_se             569 non-null    float64
 13  area_se                  569 non-null    float64
 14  smoothness_se            5

Unnamed: 0,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,fractal_dimension_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


### Split Data

## M1

Let's build a model with 30 features using sequential model.

In [12]:
X.head()

Unnamed: 0,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,fractal_dimension_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


 Inspect the dataset and verify the features.

### Split Data

In [13]:
# split data to train and validation sets
X_train2, X_val2, y_train2, y_val2 = train_test_split(X, y, test_size=0.3, random_state=100)
print(f'training data set size: {len(X_train2)}')
print(f'validation data set size: {len(X_val2)}')

training data set size: 398
validation data set size: 171


Split the data for model training and evaluation.

### Build Model

In [14]:
# define the keras model
model2 = keras.Sequential()
model2.add(layers.Dense(64, input_dim=30, activation='relu'))
model2.add(layers.Dense(32, activation='relu'))
model2.add(layers.Dense(1, activation='sigmoid'))

model2.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Define a neural network for classifying breast cancer based on input features.

In [15]:
# compile the keras model
model2.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# train model
model2.fit(X_train2, y_train2, batch_size=50, epochs=60, validation_data=(X_val2, y_val2))

Epoch 1/60
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 204ms/step - accuracy: 0.3833 - loss: 53.5180 - val_accuracy: 0.5965 - val_loss: 13.2729
Epoch 2/60
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.6300 - loss: 15.1962 - val_accuracy: 0.4678 - val_loss: 2.3658
Epoch 3/60
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.3878 - loss: 5.8810 - val_accuracy: 0.8070 - val_loss: 0.5711
Epoch 4/60
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.7587 - loss: 1.9026 - val_accuracy: 0.7602 - val_loss: 0.9622
Epoch 5/60
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step - accuracy: 0.6924 - loss: 1.3860 - val_accuracy: 0.8713 - val_loss: 0.6711
Epoch 6/60
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.8532 - loss: 0.9489 - val_accuracy: 0.9240 - val_loss: 0.3202
Epoch 7/60
[1m8/8[0m [32m━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x7d64a7e4be90>

Compile and train the neural network for classifying cancer tumors.

**Exercise**

1. Find the optimal model M2 by experimenting with at least three loss functions on the same dataset.
2. Find the optimal model M3 by experimenting with at least three optimisation methods on the same dataset.
3. Find the best model M4 by combining the most effective loss function and optimizer strategy from models M2 and M3.
4. Do you think that increasing or decreasing the batch size has any effect on model performance? Provide your answer with evidence.

**Finding the Optimal Loss Function (M2)**
* Binary Crossentropy: Typically best for binary classification.
* Mean Squared Error (MSE): Not usually ideal for classification.
* Hinge Loss: Used for margin-based classifiers like SVM

In [17]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import pandas as pd

# Define function to train model with different loss functions
def train_model_with_loss(loss_function):
    model = keras.Sequential([
        layers.Dense(64, input_dim=30, activation='relu'),
        layers.Dense(32, activation='relu'),
        layers.Dense(1, activation='sigmoid')
    ])

    model.compile(loss=loss_function, optimizer='adam', metrics=['accuracy'])

    model.fit(X_train2, y_train2, batch_size=50, epochs=30, validation_data=(X_val2, y_val2), verbose=0)

    # Evaluate model
    loss, accuracy = model.evaluate(X_val2, y_val2, verbose=0)
    return loss_function, accuracy

# List of loss functions
loss_functions = ['binary_crossentropy', 'mean_squared_error', 'hinge']

# Train and evaluate models
results_M2 = [train_model_with_loss(loss) for loss in loss_functions]

# Display results
df_results_M2 = pd.DataFrame(results_M2, columns=['Loss Function', 'Validation Accuracy'])
print(df_results_M2)


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


         Loss Function  Validation Accuracy
0  binary_crossentropy             0.935673
1   mean_squared_error             0.596491
2                hinge             0.403509


**Finding the Optimal Optimizer (M3)**
* Adam: Adaptive learning rate, commonly used.
* SGD (Stochastic Gradient Descent): Simple but may require tuning.
* RMSprop: Maintains a per-parameter learning rate.

In [18]:
# Function to train model with different optimizers
def train_model_with_optimizer(optimizer):
    model = keras.Sequential([
        layers.Dense(64, input_dim=30, activation='relu'),
        layers.Dense(32, activation='relu'),
        layers.Dense(1, activation='sigmoid')
    ])

    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])

    model.fit(X_train2, y_train2, batch_size=50, epochs=30, validation_data=(X_val2, y_val2), verbose=0)

    loss, accuracy = model.evaluate(X_val2, y_val2, verbose=0)
    return optimizer, accuracy

# List of optimizers
optimizers = ['adam', 'sgd', 'rmsprop']

# Train and evaluate models
results_M3 = [train_model_with_optimizer(opt) for opt in optimizers]

# Display results
df_results_M3 = pd.DataFrame(results_M3, columns=['Optimizer', 'Validation Accuracy'])
print(df_results_M3)


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


  Optimizer  Validation Accuracy
0      adam             0.918129
1       sgd             0.596491
2   rmsprop             0.865497


**Finding the Best Model (M4)**
* Use the best loss function from M2 and the best optimizer from M3.
* Run the same train_model_with_loss() function with the best settings.

**Effect of Batch Size**
To determine whether batch size affects performance:

* Run models with batch_size=16, batch_size=50, and batch_size=128.
* Compare accuracy.

In [19]:
batch_sizes = [16, 50, 128]

def train_model_with_batch_size(batch_size):
    model = keras.Sequential([
        layers.Dense(64, input_dim=30, activation='relu'),
        layers.Dense(32, activation='relu'),
        layers.Dense(1, activation='sigmoid')
    ])

    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

    model.fit(X_train2, y_train2, batch_size=batch_size, epochs=30, validation_data=(X_val2, y_val2), verbose=0)

    loss, accuracy = model.evaluate(X_val2, y_val2, verbose=0)
    return batch_size, accuracy

# Train models with different batch sizes
results_batch = [train_model_with_batch_size(bs) for bs in batch_sizes]

# Display results
df_results_batch = pd.DataFrame(results_batch, columns=['Batch Size', 'Validation Accuracy'])
print(df_results_batch)


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


   Batch Size  Validation Accuracy
0          16             0.947368
1          50             0.935673
2         128             0.929825


**Expected Insights**

* Loss Function: Binary crossentropy should perform best.
* Optimizer: Adam is typically the best, but SGD with tuning can be competitive.
* Batch Size:
    * Smaller batch sizes improve generalization but may take longer.
    * Larger batch sizes lead to faster training but may generalize poorly.
