Author: Cristina Caro González

Contact: crismat02@gmail.com

**ENHANCING AUTOMATED DETECTION AND CLASSIFICATION OF SKIN CANCER USING ADVANCED IMAGE FILTERING, FEATURE EXTRACTION, AND SYNTHETIC DATA.**

In [None]:
from google.colab import drive

In [None]:
drive.mount('/content/drive')

In [None]:
from google.colab.patches import cv2_imshow

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import os
from glob import glob
import seaborn as sns
from PIL import Image
np.random.seed(123)
from sklearn.preprocessing import label_binarize
from sklearn.metrics import confusion_matrix
import itertools

import keras
from keras.utils.np_utils import to_categorical # used for converting labels to one-hot-encoding
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D
from keras import backend as K
import itertools
from tensorflow.keras.layers import BatchNormalization
from keras.utils.np_utils import to_categorical # convert to one-hot-encoding

from keras.optimizers import Adam
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ReduceLROnPlateau
from sklearn.model_selection import train_test_split
from sklearn.utils import resample


In [None]:
#1. Function to plot model's validation loss and validation accuracy
def plot_model_history(model_history):
    fig, axs = plt.subplots(1,2,figsize=(15,5))
    # summarize history for accuracy
    axs[0].plot(range(1,len(model_history.history['acc'])+1),model_history.history['acc'])
    axs[0].plot(range(1,len(model_history.history['val_acc'])+1),model_history.history['val_acc'])
    axs[0].set_title('Model Accuracy')
    axs[0].set_ylabel('Accuracy')
    axs[0].set_xlabel('Epoch')
    axs[0].set_xticks(np.arange(1,len(model_history.history['acc'])+1),len(model_history.history['acc'])/10)
    axs[0].legend(['train', 'val'], loc='best')
    # summarize history for loss
    axs[1].plot(range(1,len(model_history.history['loss'])+1),model_history.history['loss'])
    axs[1].plot(range(1,len(model_history.history['val_loss'])+1),model_history.history['val_loss'])
    axs[1].set_title('Model Loss')
    axs[1].set_ylabel('Loss')
    axs[1].set_xlabel('Epoch')
    axs[1].set_xticks(np.arange(1,len(model_history.history['loss'])+1),len(model_history.history['loss'])/10)
    axs[1].legend(['train', 'val'], loc='best')
    plt.show()

In [None]:
from os import chdir # to alter your working directory
chdir("/content/drive/MyDrive/TFM Cancer AI/TFM/")

In [None]:
base_skin_dir = os.path.join("/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel")

# Merging images from both folders HAM10000_images_part1.zip and HAM10000_images_part2.zip into one dictionary

imageid_path_dict = {os.path.splitext(os.path.basename(x))[0]: x
                     for x in glob(os.path.join(base_skin_dir, '*', '*.jpg'))}

# This dictionary is useful for displaying more human-friendly labels later on

lesion_type_dict = {
    'nv': 'Melanocytic nevi',
    'mel': 'Melanoma',
    'bkl': 'Benign keratosis-like lesions ',
    'bcc': 'Basal cell carcinoma',
    'akiec': 'Actinic keratoses',
    'vasc': 'Vascular lesions',
    'df': 'Dermatofibroma'
}

In [None]:
skin_df = pd.read_csv(os.path.join(base_skin_dir, 'HAM10000_metadata.csv'))

# Creating New Columns for better readability

skin_df['path'] = skin_df['image_id'].map(imageid_path_dict.get)
skin_df['cell_type'] = skin_df['dx'].map(lesion_type_dict.get)
skin_df['cell_type_idx'] = pd.Categorical(skin_df['cell_type']).codes

In [None]:
# Now lets see the sample of tile_df to look on newly made columns
skin_df.head()

In [None]:
skin_df.isnull().sum()

In [None]:
skin_df['age'].fillna((skin_df['age'].mean()), inplace=True)

In [None]:
skin_df.isnull().sum()

In [None]:
print(skin_df.dtypes)

In [None]:
fig, ax1 = plt.subplots(1, 1, figsize= (10, 5))
skin_df['cell_type'].value_counts().plot(kind='bar', ax=ax1)

Plotting of Technical Validation field (ground truth) which is dx_type to see the distribution of its 4 categories which are listed below :
1. Histopathology(Histo): Histopathologic diagnoses of excised lesions have been performed by specialized dermatopathologists.
2. Confocal: Reflectance confocal microscopy is an in-vivo imaging technique with a resolution at near-cellular level , and some facial benign with a grey-world assumption of all training-set images in Lab-color space before and after manual histogram changes.
3. Follow-up: If nevi monitored by digital dermatoscopy did not show any changes during 3 follow-up visits or 1.5 years biologists accepted this as evidence of biologic benignity. Only nevi, but no other benign diagnoses were labeled with this type of ground-truth because dermatologists usually do not monitor dermatofibromas, seborrheic keratoses, or vascular lesions.
4. Consensus: For typical benign cases without histopathology or followup biologists provide an expert-consensus rating of authors PT and HK. They applied the consensus label only if both authors independently gave the same unequivocal benign diagnosis. Lesions with this type of groundtruth were usually photographed for educational reasons and did not need further follow-up or biopsy for confirmation.

In [None]:
skin_df['dx_type'].value_counts().plot(kind='bar')

In [None]:
skin_df['localization'].value_counts().plot(kind='bar')

In [None]:
skin_df['age'].hist(bins=40)

In [None]:
skin_df['sex'].value_counts().plot(kind='bar')

In [None]:
from PIL import Image
import concurrent.futures

def load_resize_save(image_path, output_dir):
    img = Image.open(image_path)
    img = img.resize((100, 75), Image.ANTIALIAS) #Antialias is one of the most advanced methods for color normalization, as it calculates an average of surrounding pixels to determine the new value of each pixel.
    output_path = os.path.join(output_dir, os.path.basename(image_path))
    img.save(output_path)
    return output_path

output_dir = "/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/resized"
os.makedirs(output_dir, exist_ok=True)

with concurrent.futures.ProcessPoolExecutor() as executor:
    skin_df['path'] = list(executor.map(load_resize_save, skin_df['path'], [output_dir]*len(skin_df)))


In [None]:
skin_df

In [None]:
#skin_df['image'] = skin_df['path'].map(lambda x: np.asarray(Image.open(x).resize((100,75))))

skin_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/skin_df_resize.csv', index=False)

In [None]:
skin_df

In [None]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

n_samples = 5
fig, m_axs = plt.subplots(7, n_samples, figsize = (4*n_samples, 3*7))
for n_axs, (type_name, type_rows) in zip(m_axs, skin_df.sort_values(['cell_type']).groupby('cell_type')):
    n_axs[0].set_title(type_name)
    for c_ax, (_, c_row) in zip(n_axs, type_rows.sample(n_samples, random_state=1234).iterrows()):
        img = mpimg.imread(c_row['path'])  # Cargar la imagen del disco
        c_ax.imshow(img)
        c_ax.axis('off')
fig.savefig('category_samples.png', dpi=300)


In this code, I've replaced skin_df['image'] with skin_df['path'] and then used cv2.imread(x) to load each image from the disk before getting its shape. Remember, cv2.imread() returns a numpy array representing the image, so you can call .shape on the result to get the dimensions of the image.

In [None]:
# Checking the image size distribution
# skin_df['image'].map(lambda x: x.shape).value_counts()

import cv2
from skimage import io

# Checking the image size distribution
#skin_df['path'].map(lambda x: cv2.imread(x).shape).value_counts()



In [None]:
#features=skin_df.drop(columns=['cell_type_idx', 'dx', 'cell_type'],axis=1)
#target=skin_df['cell_type_idx']

In [None]:
skin_df
skin_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv', index=False)

In [None]:
skin_df_withoutExtraVariables= skin_df.copy()

In [None]:
# We introduce an additional variable to measure the average color of the image:

import cv2
import pandas as pd
from joblib import Parallel, delayed
from skimage import io

# Define a function to calculate the average color of an image.
def calculate_mean_color(row):
    # Load the image
    image_path = '/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/resized/' + row['image_id'] + '.jpg'
    image = io.imread(image_path)

    # We calculate the average color
    mean_colors = cv2.mean(image)[:3]

    # Return the color values.
    return mean_colors[2], mean_colors[1], mean_colors[0]

# Load the dataset
skin_df = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv')

# Calculate the average colors in parallel.
results = Parallel(n_jobs=-1)(delayed(calculate_mean_color)(row) for _, row in skin_df.iterrows())

# Separate the results into different lists.
mean_red, mean_green, mean_blue = zip(*results)

# Add the results to the dataset.
skin_df['mean_red'] = mean_red
skin_df['mean_green'] = mean_green
skin_df['mean_blue'] = mean_blue

# Save the updated dataset.
skin_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv', index=False)


In [None]:
skin_df.head()

In [None]:
import cv2
import numpy as np
import pandas as pd
from joblib import Parallel, delayed
from skimage import io, feature, color

def calculate_texture_features(row):
    # Guarda el dataset actualizado
    image_path = '/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/resized/' + row['image_id'] + '.jpg'
    image = io.imread(image_path)

    # Convert the image to grayscale.
    gray_image = color.rgb2gray(image)

    # Scale the values to a range of 0-255 and convert them to integers.
    gray_image = (gray_image * 255).astype(np.uint8)

    # Calculate the GLCM (Gray-Level Co-occurrence Matrix).
    glcm = feature.greycomatrix(gray_image, distances=[5], angles=[0], levels=256, symmetric=True, normed=True)

    # Calculate the texture features.
    contrast = feature.greycoprops(glcm, prop='contrast')
    homogeneity = feature.greycoprops(glcm, prop='homogeneity')
    energy = feature.greycoprops(glcm, prop='energy')

    # Return the features.
    return contrast[0, 0], homogeneity[0, 0], energy[0, 0]

# Load the dataset.
skin_df = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv')

# Calculate the texture features in parallel.
results = Parallel(n_jobs=-1)(delayed(calculate_texture_features)(row) for _, row in skin_df.iterrows())

# Separate the results into different lists.
contrast, homogeneity, energy = zip(*results)

# Add the results to the dataset.
skin_df['contrast'] = contrast
skin_df['homogeneity'] = homogeneity
skin_df['energy'] = energy

# Save the updated dataset.
skin_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv', index=False)


In [None]:
skin_df.head()

In [None]:
import cv2
import numpy as np
import pandas as pd
from joblib import Parallel, delayed
from skimage import io, color, filters

def calculate_edge_features(row):
    # Load the image.
    image_path = '/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/resized/' + row['image_id'] + '.jpg'
    image = io.imread(image_path)

    # Convert the image to grayscale.
    gray_image = color.rgb2gray(image)

    # Calculate the image gradient using the Sobel operator.
    edge_sobel = filters.sobel(gray_image)

    # Calculate the edge features.
    edge_mean = np.mean(edge_sobel)
    edge_std = np.std(edge_sobel)

    # Return the features.
    return edge_mean, edge_std

# Load the dataset.
skin_df = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv')

# Calculate the edge features in parallel.
results = Parallel(n_jobs=-1)(delayed(calculate_edge_features)(row) for _, row in skin_df.iterrows())

# Separate the results into different lists.
edge_mean, edge_std = zip(*results)

# Add the results to the dataset.
skin_df['edge_mean'] = edge_mean
skin_df['edge_std'] = edge_std

# Save the updated dataset.
skin_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv', index=False)


In [None]:
skin_df.head()

In [None]:
import cv2
import numpy as np
import pandas as pd
from joblib import Parallel, delayed
from skimage import io, color, measure, filters
from skimage.segmentation import clear_border
from scipy import ndimage as ndi

def calculate_shape_features(row):
    # Load the image.
    image_path = '/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/resized/' + row['image_id'] + '.jpg'
    image = io.imread(image_path)

    # Convert the image to grayscale.
    gray_image = color.rgb2gray(image)

    # Apply a threshold to segment the lesion.
    threshold = filters.threshold_otsu(gray_image)
    binary_image = gray_image > threshold

    # Clean up small objects in the binary image.
    binary_image = clear_border(binary_image)
    label_objects, _ = ndi.label(binary_image)
    sizes = np.bincount(label_objects.ravel())
    mask_sizes = sizes > 20
    mask_sizes[0] = 0
    clean_binary = mask_sizes[label_objects]

    # Find the contours of the lesion.n
    contours = measure.find_contours(clean_binary, 0.8)

    # Ensure at least one contour has been found.
    if len(contours) != 0:
        # Choose the longest contour
        contour = max(contours, key=len)

        # Calculate the area and perimeter of the lesion.
        area = cv2.contourArea(contour.astype(np.float32))
        perimeter = cv2.arcLength(contour.astype(np.float32), True)

        # Calculate the compactness and roundness of the lesion.
        if perimeter == 0:
            compactness = 0
        else:
            compactness = 4. * np.pi * (area / (perimeter * perimeter))

        roundness = 4 * area / (np.pi * np.max(contour[:,0])**2)

        # Return the features
        return area, perimeter, compactness, roundness

    # If no contour is found, return zeros.
    else:
        return 0, 0, 0, 0

# Load the dataset.
skin_df = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv')

# Calculate the shape features in parallel.
results = Parallel(n_jobs=-1)(delayed(calculate_shape_features)(row) for _, row in skin_df.iterrows())

# Separate the results into different lists.
area, perimeter, compactness, roundness = zip(*results)

# Add the results to the dataset.
skin_df['area'] = area
skin_df['perimeter'] = perimeter
skin_df['compactness'] = compactness
skin_df['roundness'] = roundness

# Save the updated dataset.
skin_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv', index=False)



In [None]:
skin_df.head()

In [None]:
import cv2
import numpy as np
import pandas as pd
from joblib import Parallel, delayed
from skimage import io, color

def calculate_fft_features(row):
    # Load the image.
    image_path = '/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/resized/' + row['image_id'] + '.jpg'
    image = io.imread(image_path)

    # Convert the image to grayscale.
    gray_image = color.rgb2gray(image)

    # Calculate the Fourier Transform of the image.
    # The fft2 function returns the two-dimensional Fourier Transform.
    # And fftshift shifts the zero-frequency component to the center of the spectrum.

    f = np.fft.fft2(gray_image)
    fshift = np.fft.fftshift(f)

    # Calculate the magnitude of the Fourier spectrum, which is typically visualized.
    # The log is used to reduce the dynamic range; otherwise, the low frequencies
    # might dominate the visualization.
    magnitude_spectrum = 20*np.log1p(np.abs(fshift))

    # Calculate the features of the Fourier spectrum.
    # In this case, we're calculating the mean and standard deviation.
    fft_mean = np.mean(magnitude_spectrum)
    fft_std = np.std(magnitude_spectrum)

    # Return the features.
    return fft_mean, fft_std

# Load the dataset.
skin_df = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv')

# Calculate the Fourier features in parallel.
results = Parallel(n_jobs=-1)(delayed(calculate_fft_features)(row) for _, row in skin_df.iterrows())

# Separate the results into different lists.
fft_mean, fft_std = zip(*results)

# Add the results to the dataset.
skin_df['fft_mean'] = fft_mean
skin_df['fft_std'] = fft_std

# Save the updated dataset.
skin_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv', index=False)
skin_df.head()

In [None]:
import cv2
import numpy as np
import pandas as pd
from joblib import Parallel, delayed
from skimage import io, color, filters

def calculate_density(row):
    # Load the image.
    image_path = '/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/resized/' + row['image_id'] + '.jpg'
    image = io.imread(image_path)

    # Convert the image to grayscale.
    gray_image = color.rgb2gray(image)

    # Determine a threshold for the image.
    threshold = filters.threshold_otsu(gray_image)

    # Calculate the "density" as the proportion of pixels above the threshold.
    density = np.sum(gray_image > threshold) / np.prod(gray_image.shape)

    # Return the density.
    return density

# Load the dataset.
skin_df = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv')

# Calculate the density in parallel.
density = Parallel(n_jobs=-1)(delayed(calculate_density)(row) for _, row in skin_df.iterrows())

# Add the results to the dataset.
skin_df['density'] = density

# Save the updated dataset.
skin_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv', index=False)
skin_df.head()

In [None]:
# Hu Moments

import cv2
import numpy as np
import pandas as pd
from joblib import Parallel, delayed
from skimage import io, color, filters

def calculate_hu_moments(row):
    # Load the image.
    image_path = '/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/resized/' + row['image_id'] + '.jpg'
    image = io.imread(image_path)

    # Convert the image to grayscale.
    gray_image = color.rgb2gray(image)

    # Determine a threshold for the image.
    threshold = filters.threshold_otsu(gray_image)

    # Binarize the image.
    binary_image = gray_image > threshold

    # Calculate the moments of the image.
    moments = cv2.moments(binary_image.astype(np.uint8))

    # Calculate the Hu moments from these moments.
    hu_moments = cv2.HuMoments(moments)

    # Return the features.
    return hu_moments[:, 0]

# Load the dataset.
skin_df = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv')

# Calculate the Hu moments in parallel.
hu_moments = Parallel(n_jobs=-1)(delayed(calculate_hu_moments)(row) for _, row in skin_df.iterrows())

# Separate the results into different lists.
hu_moments = np.array(hu_moments)

# Add the results to the dataset.
for i in range(hu_moments.shape[1]):
    skin_df['hu_moment_' + str(i)] = hu_moments[:, i]

# Save the updated dataset.
skin_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv', index=False)


In [None]:
skin_df.head()

In [None]:
# LAB Color Space Feature:

import cv2
import numpy as np
import pandas as pd
from joblib import Parallel, delayed
from skimage import io, color

def calculate_lab_features(row):
    # Load the image.
    image_path = '/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/resized/' + row['image_id'] + '.jpg'
    image = io.imread(image_path)

    # Convert the image to the LAB color space.
    lab_image = color.rgb2lab(image)

    # Calculate the LAB color features as the average of each component.
    lab_features = np.mean(lab_image, axis=(0, 1))

    # Return the features.
    return lab_features

# Load the dataset.
skin_df = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv')

# Calculate the LAB color features in parallel.
lab_features = Parallel(n_jobs=-1)(delayed(calculate_lab_features)(row) for _, row in skin_df.iterrows())

# Separate the results into different lists.
lab_features = np.array(lab_features)

# Add the results to the dataset.
for i in range(lab_features.shape[1]):
    skin_df['lab_feature_' + str(i)] = lab_features[:, i]

# Save the updated dataset.
skin_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv', index=False)
skin_df.head()

In [None]:
# Haralick Features
import cv2
import numpy as np
import pandas as pd
from joblib import Parallel, delayed
from skimage import io, color, feature, filters

def calculate_haralick_features(row):
    # Load the image.
    image_path = '/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/resized/' + row['image_id'] + '.jpg'
    image = io.imread(image_path)

    # Convert the image to grayscale.
    gray_image = color.rgb2gray(image)

    # Determine a threshold for the image.
    threshold = filters.threshold_otsu(gray_image)

    # Binarize the image.
    binary_image = gray_image > threshold

    # Calculate the Haralick features.
    glcm = feature.greycomatrix(binary_image.astype(np.uint8), distances=[5], angles=[0], levels=256, symmetric=True, normed=True)
    haralick_features = feature.greycoprops(glcm, 'contrast')

    # Return the features.
    return haralick_features[0]

# Load the dataset.
skin_df = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv')

# Calculate the Haralick features in parallel.
haralick_features = Parallel(n_jobs=-1)(delayed(calculate_haralick_features)(row) for _, row in skin_df.iterrows())

#Separate the results into different lists.
haralick_features = np.array(haralick_features)

# Add the results to the dataset.
for i in range(haralick_features.shape[1]):
    skin_df['haralick_feature_' + str(i)] = haralick_features[:, i]

# Save the updated dataset.
skin_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv', index=False)
skin_df.head()


In [None]:
# Eccentricity Features:

import cv2
import numpy as np
import pandas as pd
from joblib import Parallel, delayed
from skimage import io, color, measure, filters

def calculate_eccentricity(row):
    # Load the image.
    image_path = '/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/resized/' + row['image_id'] + '.jpg'
    image = io.imread(image_path)

    # Convert the image to grayscale.
    gray_image = color.rgb2gray(image)

    # Determine a threshold for the image.
    threshold = filters.threshold_otsu(gray_image)

    # Binarize the image.
    binary_image = gray_image > threshold

    # Label the regions of the image.
    labeled_image = measure.label(binary_image)

    # Calculate the properties of the regions.
    regions = measure.regionprops(labeled_image)

    # Select the largest region.
    largest_region = max(regions, key=lambda region: region.area)

    # Calculate the eccentricity of the largest region.
    eccentricity = largest_region.eccentricity

    # Return the eccentricity.
    return eccentricity

# Load the dataset.
skin_df = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv')

# Calculate the eccentricity in parallel.
eccentricities = Parallel(n_jobs=-1)(delayed(calculate_eccentricity)(row) for _, row in skin_df.iterrows())

# Add the results to the dataset.
skin_df['eccentricity'] = eccentricities

# Save the updated dataset.
skin_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv', index=False)
skin_df.head()


In [None]:
skin_df.head()

In [None]:
skin_df.shape

In [None]:
skin_df= pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/HAM10000_all.csv')

In [None]:
skin_df.columns

In [None]:
features=skin_df.drop(columns=['dx','cell_type', 'cell_type_idx'],axis=1)
target=skin_df['cell_type_idx']

In [None]:
features

In [None]:
features.columns

In [None]:
x_train_o, x_test_o, y_train_o, y_test_o = train_test_split(features, target, test_size=0.20,random_state=1234, stratify=target)

In [None]:
x_train_o.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/x_train_o.csv', index=False)
x_test_o.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/x_test_o.csv', index=False)
y_train_o.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/y_train_o.csv', index=False)
y_test_o.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/y_test_o.csv', index=False)


In [None]:
# Reconstruye los conjuntos de datos de entrenamiento y prueba
train_df = pd.concat([x_train_o, y_train_o], axis=1)
test_df = pd.concat([x_test_o, y_test_o], axis=1)

In [None]:
train_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/train_df_recons.csv', index=False)
test_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/test_df_recons.csv', index=False)

In [None]:
train_df = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/train_df_recons.csv')
test_df = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/test_df_recons.csv')
x_train_o = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/x_train_o.csv')
x_test_o = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/x_test_o.csv')
y_train_o = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/y_train_o.csv')
y_test_o = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/y_test_o.csv')

In [None]:
y_train_o

In [None]:
# Find the majority class in the training set.
class_counts = y_train_o['cell_type_idx'].value_counts()
majority_class = class_counts.idxmax()
minority_classes = y_train_o['cell_type_idx'].unique()
minority_classes = minority_classes[minority_classes != majority_class]


In [None]:
# Under-sample the majority class in the training set.
image_path = '/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/resized/'

# Create an image data generator.
datagen = ImageDataGenerator(rotation_range=10, width_shift_range=0.1, height_shift_range=0.1, shear_range=0.1, zoom_range=0.1, horizontal_flip=True, fill_mode='nearest')

# Create a directory to save the augmented images.
augmented_path = '/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/augmented/'
if not os.path.exists(augmented_path):
    os.mkdir(augmented_path)

majority_df = train_df[train_df['cell_type_idx'] == majority_class]
majority_df = resample(majority_df, replace=False, n_samples=3000)


In [None]:
majority_df

In [None]:
# Over-sample the minority classes in the training set.
minority_dfs = []
for minority_class in minority_classes:
    minority_df = train_df[train_df['cell_type_idx'] == minority_class]
    minority_df = resample(minority_df, replace=True, n_samples=1800)
    minority_dfs.append(minority_df)


In [None]:
minority_classes

In [None]:
# Combine the under-sampled and over-sampled samples.
balanced_train_df = pd.concat([majority_df] + minority_dfs)


In [None]:
balanced_train_df
balanced_train_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/balanced_train_df.csv', index=False)

In [None]:
balanced_train_df = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/balanced_train_df.csv')

In [None]:
fig, ax1 = plt.subplots(1, 1, figsize= (10, 5))
balanced_train_df['cell_type_idx'].value_counts().plot(kind='bar', ax=ax1)

In [None]:
from google.colab import files
balanced_train_df.to_csv('balanced_train_df.csv', index=False)
files.download('balanced_train_df.csv')

In [None]:
# Merge with the original balanced_train_df to get the corresponding labels.
augmented_train_df = balanced_train_df

In [None]:
augmented_train_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/augmented_train_df.csv', index=False)

In [None]:
augmented_train_df

In [None]:
from google.colab import files
augmented_train_df.to_csv('augmented_train_df.csv', index=False)
files.download('augmented_train_df.csv')

In [None]:
augmented_train_df

In [None]:
x_test_o = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/x_test_o.csv')
y_test_o = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/y_test_o.csv')

In [None]:
augmented_train_df

In [None]:
def load_images(image_paths):
    images = []
    for img_path in image_paths:
        img = cv2.imread(img_path)
        if img is not None:
            images.append(img)
        else:
            print(f'Failed to load image: {img_path}')
    return np.array(images)


In [None]:
import cv2
from skimage import io

x_train_balanced = load_images(augmented_train_df['path'].tolist())
y_train_balanced = augmented_train_df['cell_type_idx'].tolist()  # Using 'cell_type_idx' as the label column.



In [None]:
# Load images for testing from x_test_o
x_test = load_images(x_test_o['path'].tolist())
y_test = y_test_o['cell_type_idx'].tolist()  # Similarmente para los datos de prueba

# Calculate mean and standard deviation for training images
x_train_mean = np.mean(x_train_balanced, axis=(0, 1, 2))
x_train_std = np.std(x_train_balanced, axis=(0, 1, 2))

# Normalize both training and testing images with the same mean and standard deviation
x_train_balanced = (x_train_balanced - x_train_mean)/x_train_std
x_test = (x_test - x_train_mean)/x_train_std


In [None]:
augmented_train_df

In [None]:
augmented_train_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/augmented_train_df.csv', index=False)
balanced_train_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/balanced_train_df.csv', index=False)
train_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/train_df.csv', index=False)
test_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/test_df.csv', index=False)
skin_df.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/skin_df.csv', index=False)
y_test_o.to_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/y_test_o.csv', index=False)

In [None]:
augmented_train_df = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/augmented_train_df.csv')
balanced_train_df = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/balanced_train_df.csv')
train_df = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/train_df.csv')
test_df = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/test_df.csv')
skin_df = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/skin_df.csv')
y_test_o = pd.read_csv('/content/drive/MyDrive/TFM Cancer AI/TFM/Datos cáncer piel/Exports/y_test_o.csv')

In [None]:
balanced_train_df

In [None]:
# Perform one-hot encoding on the labels

# Extract the target variable from the balanced training set
y_train_balanced = augmented_train_df['cell_type_idx']

# Perform one-hot encoding on the labels
y_train = to_categorical(y_train_balanced, num_classes = 7)
y_test = to_categorical(y_test_o, num_classes = 7)

In [None]:
x_train, x_validate, y_train, y_validate = train_test_split(x_train_balanced, y_train, test_size=0.1, random_state=2, stratify=y_train_balanced)

In [None]:
y_train.shape

In [None]:
!pip install tensorflow-addons

In [None]:
# One-hot encoding for 'sex' and 'localization'.
augmented_train_df = pd.get_dummies(augmented_train_df, columns=['sex', 'localization'])

In [None]:
test_df = pd.get_dummies(test_df, columns=['sex', 'localization'])
augmented_test_df = test_df
augmented_test_df

In [None]:
from tensorflow.keras import backend as K

def f1_score(y_true, y_pred):
    # Calculation of precision and recall
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))

    precision = true_positives / (predicted_positives + K.epsilon())
    recall = true_positives / (possible_positives + K.epsilon())

    # We calculate F1 score
    f1_val = 2 * (precision * recall) / (precision + recall + K.epsilon())
    return f1_val

In [None]:
## Model 5 in the Thesis Document
import numpy as np
import tensorflow_addons as tfa
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split
from tensorflow.keras.applications import DenseNet169
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Flatten, Dropout, BatchNormalization, Input, concatenate
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import backend as K

# Metric F1 score
def f1_metric(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / (predicted_positives + K.epsilon())
    recall = true_positives / (possible_positives + K.epsilon())
    f1_val = 2 * (precision * recall) / (precision + recall + K.epsilon())
    return f1_val

# Load the pretrained DenseNet169 model.
densenet_model = DenseNet169(include_top=False, input_shape=(75, 100, 3), weights='imagenet')
intermediate_layer_model = Model(inputs=densenet_model.input, outputs=densenet_model.get_layer('conv5_block32_concat').output)

# Extract features using the intermediate layer.
x_train_features = intermediate_layer_model.predict(x_train)
x_validate_features = intermediate_layer_model.predict(x_validate)

# Hybrid model architecture.
input_images = Input(shape=x_train_features.shape[1:])
flatten = Flatten()(input_images)
dense1 = Dense(256, activation='relu')(flatten)
bn1 = BatchNormalization()(dense1)
dropout1 = Dropout(0.5)(bn1)
dense2 = Dense(128, activation='relu')(dropout1)
bn2 = BatchNormalization()(dense2)
dropout2 = Dropout(0.5)(bn2)

input_features = Input(shape=(augmented_train_df.drop(columns=['lesion_id', 'image_id', 'dataset', 'cell_type_idx', 'path', 'dx_type']).shape[1],))
concat = concatenate([dropout2, input_features])
dense3 = Dense(64, activation='relu')(concat)
dropout3 = Dropout(0.5)(dense3)
output = Dense(7, activation='softmax')(dropout3)

densenet_hybrid_model = Model(inputs=[input_images, input_features], outputs=output)
densenet_hybrid_model.compile(optimizer=Adam(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy', f1_metric])

# Data Augmentation
datagen = ImageDataGenerator(
    rotation_range=30,
    zoom_range=0.1,
    width_shift_range=0.2,
    height_shift_range=0.1,
    horizontal_flip=True,
    vertical_flip=True
)

features_all = augmented_train_df.drop(columns=['lesion_id', 'image_id', 'dataset', 'cell_type_idx', 'path', 'dx_type']).values
features_train, features_validate = train_test_split(features_all, test_size=0.1, random_state=2, stratify=y_train_balanced)

#Training.
densenet_hybrid_model.fit(datagen.flow([x_train_features, features_train], y_train, batch_size=30),
                          epochs=30,
                          validation_data=([x_validate_features, features_validate], y_validate))

# Evaluation and prediction.
x_test_features = intermediate_layer_model.predict(x_test)
y_pred = densenet_hybrid_model.predict([x_test_features, augmented_test_df.drop(columns=['lesion_id', 'image_id', 'dataset', 'cell_type_idx', 'path', 'dx_type']).values])
y_pred_classes = np.argmax(y_pred, axis=1)
y_true = np.argmax(y_test, axis=1)

# F1-Score
f1 = f1_score(y_true, y_pred_classes, average='weighted')
print(f"F1-Score: {f1:.4f}")

In [None]:
# We verify the data dimensions
print("x_train_balanced:", x_train_balanced.shape)
print("x_train_features:", x_train_features.shape)
print("features_train:", features_train.shape)
print("y_train_img:", y_train_img.shape)

print("\nx_validate_images:", x_validate_images.shape)
print("x_validate_features:", x_validate_features.shape)
print("features_validate:", features_validate.shape)
print("y_validate_img:", y_validate_img.shape)


In [None]:
print(augmented_train_df.columns)

In [None]:
augmented_train_df

In [None]:
# Model 4 of the Thesis Document. Without using any pre-trained model.

import tensorflow_addons as tfa

from keras.regularizers import l2
import tensorflow_addons as tfa

def create_model():
    input_shape = (75, 100, 3)
    num_classes = 7
    l2_reg_rate = 0.01

    model = Sequential()
    model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', padding='Same',
                    input_shape=input_shape, kernel_regularizer=l2(l2_reg_rate)))
    model.add(BatchNormalization())
    model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', padding='Same',
                    kernel_regularizer=l2(l2_reg_rate)))
    model.add(BatchNormalization())
    model.add(MaxPool2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))

    model.add(Conv2D(64, (3, 3), activation='relu', padding='Same',
                    kernel_regularizer=l2(l2_reg_rate)))
    model.add(BatchNormalization())
    model.add(Conv2D(64, (3, 3), activation='relu', padding='Same',
                    kernel_regularizer=l2(l2_reg_rate)))
    model.add(BatchNormalization())
    model.add(MaxPool2D(pool_size=(2, 2)))
    model.add(Dropout(0.40))

    model.add(Flatten())
    model.add(Dense(128, activation='relu', kernel_regularizer=l2(l2_reg_rate)))
    model.add(BatchNormalization())
    model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation='softmax'))

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=[tfa.metrics.F1Score(num_classes=num_classes)])

    return model


# Define the optimizer
optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)

# Set a learning rate annealer
learning_rate_reduction = ReduceLROnPlateau(monitor='val_loss',
                                            patience=3,
                                            verbose=1,
                                            factor=0.5,
                                            min_lr=0.00001)

# With data augmentation to prevent overfitting
datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=10,  # randomly rotate images in the range (degrees, 0 to 180)
        zoom_range = 0.1, # Randomly zoom image
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=False,  # randomly flip images
        vertical_flip=False)  # randomly flip images

datagen.fit(x_train)

from sklearn.model_selection import StratifiedKFold
epochs = 50
batch_size = 20
# Define 5-fold cross validation
kfold = StratifiedKFold(n_splits=3, shuffle=True)

# Convert labels from categorical to integer for stratifiedKFold
y_train_int = np.argmax(y_train, axis=1)

cvscores = []
for train, test in kfold.split(x_train, y_train_int):
  # create model
  model = create_model()  # Create model should return a new instance of your model

  # data augmentation for current fold
  datagen.fit(x_train[train])

  # Fit the model with current fold
  history = model.fit(datagen.flow(x_train[train], y_train[train], batch_size=batch_size),
                                epochs=epochs, validation_data=(x_train[test], y_train[test]),
                                verbose=1, steps_per_epoch=x_train[train].shape[0] // batch_size,
                                callbacks=[learning_rate_reduction])

  # evaluate the model with current fold
  scores = model.evaluate(x_train[test], y_train[test], verbose=0)
  print(f"{model.metrics_names[1]}: {[f'{i:.2f}%' for i in scores[1]*100]}")
  cvscores.append(scores[1] * 100)

print("%.2f%% (+/- %.2f%%)" % (np.mean(cvscores), np.std(cvscores)))

loss, accuracy = model.evaluate(x_test, y_test, verbose=1)
loss_v, accuracy_v = model.evaluate(x_validate, y_validate, verbose=1)
print("Validation: f1 = %f  ;  loss_v = %f" % (np.mean(accuracy_v), np.mean(loss_v)))
print("Test: f1 = %f  ;  loss = %f" % (np.mean(accuracy), np.mean(loss)))


# Save the model
model.save("model.h5")

loss, accuracy = model.evaluate(x_test, y_test, verbose=1)
loss_v, accuracy_v = model.evaluate(x_validate, y_validate, verbose=1)
print("Validation: f1 = %s ; loss_v = %f" % ([f'{i:.2f}%' for i in accuracy_v*100], loss_v))
print("Test: f1 = %s ; loss = %f" % ([f'{i:.2f}%' for i in accuracy*100], loss))

In this code, the pretrained DenseNet169 model is first loaded and used to extract features from the training and validation images. Then, the features are flattened and a MLP (Multi-Layer Perceptron) model is used for classification. Finally, the model is trained and evaluated on the test set.

In [None]:
import numpy as np ### Model 1 in the Thesis Document
import tensorflow_addons as tfa
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split
from tensorflow.keras.applications import DenseNet169
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Load pre-trained DenseNet169 model
densenet_model = DenseNet169(include_top=False, input_shape=(75, 100, 3), weights='imagenet')

# Extract features using the pretrained DenseNet169 model.
x_train_features = densenet_model.predict(x_train)
x_validate_features = densenet_model.predict(x_validate)

# Flatten the feature data.
x_train_flattened = x_train_features.reshape(x_train_features.shape[0], -1)
x_validate_flattened = x_validate_features.reshape(x_validate_features.shape[0], -1)

# Define the MLP (Multi-Layer Perceptron) model for classification.
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(x_train_flattened.shape[1],)))
model.add(Dense(64, activation='relu'))
model.add(Dense(7, activation='softmax'))

# Compile the model using the F1-score metric.
model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=[tfa.metrics.F1Score(num_classes=7)])

# Train the model
model.fit(x_train_flattened, y_train, validation_data=(x_validate_flattened, y_validate), epochs=12, batch_size=30)

# Predict the labels for the test set.
x_test_features = densenet_model.predict(x_test)
x_test_flattened = x_test_features.reshape(x_test_features.shape[0], -1)
y_pred = model.predict(x_test_flattened)
y_pred = np.argmax(y_pred, axis=1)
y_true = np.argmax(y_test, axis=1)

# Calculate F1 Score
f1 = f1_score(y_true, y_pred, average='weighted')
print(f"F1-Score: {f1}")

In [None]:
print(x_train_features.shape)

In [None]:
#IMPROVED: In this code, the pretrained DenseNet169 model is initially loaded and employed to extract features from the training and validation images.
#Subsequently, the features are flattened, and an MLP (Multi-Layer Perceptron) model is applied for classification. Ultimately, the model is trained and assessed on the test set.
#Model 2 in the Thesis Document

import numpy as np
import tensorflow_addons as tfa
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split
from tensorflow.keras.applications import DenseNet169
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Flatten, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Load the pretrained DenseNet169 model.
densenet_model = DenseNet169(include_top=False, input_shape=(75, 100, 3), weights='imagenet')

# Obtain the intermediate layer.
intermediate_layer_model = Model(inputs=densenet_model.input, outputs=densenet_model.get_layer('conv5_block32_concat').output)

# Extract features using the intermediate layer.
x_train_features = intermediate_layer_model.predict(x_train)
x_validate_features = intermediate_layer_model.predict(x_validate)

# Define the MLP (Multi-Layer Perceptron) model for classification.
model = Sequential()
model.add(Flatten(input_shape=x_train_features.shape[1:]))
model.add(Dense(256, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(128, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(7, activation='softmax'))

# Compile the model with the F1-score metric.
model.compile(loss='categorical_crossentropy', optimizer=Adam(learning_rate=0.001), metrics=[tfa.metrics.F1Score(num_classes=7)])

# Data Augmentation.
datagen = ImageDataGenerator(
    rotation_range=30,
    zoom_range=0.1,
    width_shift_range=0.2,
    height_shift_range=0.1,
    horizontal_flip=True,
    vertical_flip=True
)

# Fit the data generator to the training set.
datagen.fit(x_train_features)

# Train the model with data augmentation.
model.fit(datagen.flow(x_train_features, y_train, batch_size=30),
          validation_data=(x_validate_features, y_validate),
          epochs=30)

# Predict labels for the test set.
x_test_features = intermediate_layer_model.predict(x_test)
y_pred = model.predict(x_test_features)
y_pred = np.argmax(y_pred, axis=1)
y_true = np.argmax(y_test, axis=1)

# Calculate the F1-score.
f1 = f1_score(y_true, y_pred, average='weighted')
print(f"F1-Score: {f1}")



In [None]:
# Model 3 in the Thesis document. Using Densenet pre-trained model

import tensorflow as tf
from tensorflow.keras.applications import DenseNet169
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ReduceLROnPlateau
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import tensorflow_addons as tfa

# Load the pretrained DenseNet169 model.
densenet_model = DenseNet169(include_top=False, input_shape=(75, 100, 3), weights='imagenet')

def create_model():
    num_classes = 7
    l2_reg_rate = 0.01

    model = Sequential()
    model.add(densenet_model)
    model.add(Flatten())
    model.add(Dense(128, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(l2_reg_rate)))
    model.add(BatchNormalization())
    model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation='softmax'))

    model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=[tfa.metrics.F1Score(num_classes=num_classes)])

    return model

# Set a learning rate annealer.
learning_rate_reduction = ReduceLROnPlateau(monitor='val_loss',
                                            patience=3,
                                            verbose=1,
                                            factor=0.5,
                                            min_lr=0.00001)

# Use data augmentation to prevent overfitting.

datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=10,  # randomly rotate images in the range (degrees, 0 to 180)
        zoom_range = 0.1, # Randomly zoom image
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=False,  # randomly flip images
        vertical_flip=False)  # randomly flip images

datagen.fit(x_train)

from sklearn.model_selection import StratifiedKFold
epochs = 50
batch_size = 20
# Define 5-fold cross validation
kfold = StratifiedKFold(n_splits=3, shuffle=True)

# Convert labels from categorical to integer for stratifiedKFold
y_train_int = np.argmax(y_train, axis=1)

cvscores = []
for train, test in kfold.split(x_train, y_train_int):
  # create model
  model = create_model()  # Create model should return a new instance of your model

  # data augmentation for current fold
  datagen.fit(x_train[train])

  # Fit the model with current fold
  history = model.fit(datagen.flow(x_train[train], y_train[train], batch_size=batch_size),
                                epochs=epochs, validation_data=(x_train[test], y_train[test]),
                                verbose=1, steps_per_epoch=x_train[train].shape[0] // batch_size,
                                callbacks=[learning_rate_reduction])

  # evaluate the model with current fold
  scores = model.evaluate(x_train[test], y_train[test], verbose=0)

In [None]:
# We evaluate in the test subset
test_scores = model.evaluate(x_test, y_test, verbose=1)

print(f"Test loss: {test_scores[0]}")
print(f"Test F1-score: {test_scores[1]}")

In [None]:
# Model 6 in the Thesis Document = Model 3+ using extra features

import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.applications import DenseNet169
from tensorflow.keras.layers import Input, Flatten, Dense, Dropout, BatchNormalization, concatenate
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ReduceLROnPlateau
from tensorflow.keras.preprocessing.image import ImageDataGenerator

import tensorflow_addons as tfa
from sklearn.model_selection import train_test_split, KFold

epochs = 50
batch_size = 20

# Load the pretrained DenseNet169 model.
densenet_model = DenseNet169(include_top=False, input_shape=(75, 100, 3), weights='imagenet')

# Extract features using the pretrained DenseNet169 model.
intermediate_layer_model = Model(inputs=densenet_model.input, outputs=densenet_model.get_layer('conv5_block32_concat').output)

# Obtain the training and validation features.
x_train_features = intermediate_layer_model.predict(x_train)

# Hybrid model architecture.

# Image portion.
input_images = Input(shape=x_train_features.shape[1:])
flatten = Flatten()(input_images)
dense_img = Dense(128, activation='relu')(flatten)
bn_img = BatchNormalization()(dense_img)
dropout_img = Dropout(0.5)(bn_img)

# Additional features portion.
input_features = Input(shape=(augmented_train_df.drop(columns=['lesion_id', 'image_id', 'dataset', 'cell_type_idx', 'path', 'dx_type']).shape[1],))

# Concatenation.
concat = concatenate([dropout_img, input_features])
dense_concat = Dense(64, activation='relu')(concat)
dropout_concat = Dropout(0.5)(dense_concat)
output = Dense(7, activation='softmax')(dropout_concat)

# Create the model.
hybrid_model = Model(inputs=[input_images, input_features], outputs=output)
hybrid_model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=[tfa.metrics.F1Score(num_classes=7)])

# Data Augmentation (the same one you provided).
datagen = ImageDataGenerator(
    featurewise_center=False,
    samplewise_center=False,
    featurewise_std_normalization=False,
    samplewise_std_normalization=False,
    zca_whitening=False,
    rotation_range=10,
    zoom_range=0.1,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=False,
    vertical_flip=False)

datagen.fit(x_train)

features_all = augmented_train_df.drop(columns=['lesion_id', 'image_id', 'dataset', 'cell_type_idx', 'path', 'dx_type']).values
features_train, features_validate = train_test_split(features_all, test_size=0.1, random_state=2, stratify=y_train_balanced)

# Initialize KFold.
kfold = KFold(n_splits=3, shuffle=True, random_state=42)
y_train_int = np.argmax(y_train, axis=1)

# Training with KFold.
for train, test in kfold.split(x_train, y_train_int):

    # Data augmentation for the current fold.
    datagen.fit(x_train[train])

    history = hybrid_model.fit(datagen.flow([x_train_features[train], features_train[train]], y_train[train], batch_size=batch_size),
                               epochs=epochs, validation_data=([x_train_features[test], features_train[test]], y_train[test]),
                               verbose=1, steps_per_epoch=x_train[train].shape[0] // batch_size,
                               callbacks=[ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, verbose=1, min_lr=0.00001)])

    scores = hybrid_model.evaluate([x_train_features[test], features_train[test]], y_train[test], verbose=0)

# Evaluate on the test set.
test_scores = hybrid_model.evaluate(x_test, y_test, verbose=1)

print(f"Test loss: {test_scores[0]}")
print(f"Test F1-score: {test_scores[1]}")

In [None]:
test_scores = hybrid_model.evaluate(x_test, y_test, verbose=1)

print(f"Test loss: {test_scores[0]}")
print(f"Test F1-score: {test_scores[1]}")


In [None]:
import tensorflow as tf
from tensorflow.keras.applications import DenseNet169
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Flatten, Dense, BatchNormalization, Dropout, concatenate
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ReduceLROnPlateau
from sklearn.model_selection import StratifiedKFold
import tensorflow_addons as tfa

# Load the pre-trained DenseNet169 model
densenet_model = DenseNet169(include_top=False, input_shape=(75, 100, 3), weights='imagenet')
input_images = Input(shape=(75, 100, 3))
densenet_features = densenet_model(input_images)
flatten_features = Flatten()(densenet_features)

# Additional features
input_features = Input(shape=(extra_feature_shape,))
concatenated_features = concatenate([flatten_features, input_features])

# Build the hybrid model
num_classes = 7
l2_reg_rate = 0.01

dense1 = Dense(128, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(l2_reg_rate))(concatenated_features)
bn1 = BatchNormalization()(dense1)
dropout1 = Dropout(0.5)(bn1)
output = Dense(num_classes, activation='softmax')(dropout1)

hybrid_model = Model(inputs=[input_images, input_features], outputs=output)
hybrid_model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=[tfa.metrics.F1Score(num_classes=num_classes)])

# Set a learning rate annealer
learning_rate_reduction = ReduceLROnPlateau(monitor='val_loss',
                                            patience=3,
                                            verbose=1,
                                            factor=0.5,
                                            min_lr=0.00001)

# With data augmentation to prevent overfitting
datagen = ImageDataGenerator(
    rotation_range=10,
    zoom_range=0.1,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=False,
    vertical_flip=False
)
datagen.fit(x_train)

epochs = 50
batch_size = 20

# Define 3-fold cross-validation
kfold = StratifiedKFold(n_splits=3, shuffle=True)

cvscores = []
for train, test in kfold.split(x_train, y_train_int):
    # Create a new instance of the hybrid model
    model = create_model()

    # Data augmentation for the current fold
    datagen.fit(x_train[train])

    # Fit the model with the current fold
    history = model.fit(
        datagen.flow([x_train[train], features_train[train]], y_train[train], batch_size=batch_size),
        epochs=epochs, validation_data=([x_train[test], features_train[test]], y_train[test]),
        verbose=1, steps_per_epoch=x_train[train].shape[0] // batch_size,
        callbacks=[learning_rate_reduction]
    )

    # Evaluate the model with the current fold
    scores = model.evaluate([x_train[test], features_train[test]], y_train[test], verbose=0)
    cvscores.append(scores[1] * 100)

# Calculate the mean and standard deviation of F1 scores from cross-validation
mean_f1_score = np.mean(cvscores)
std_f1_score = np.std(cvscores)
print("Mean F1-Score: {:.2f}%, Std F1-Score: {:.2f}%".format(mean_f1_score, std_f1_score))


In [None]:
'''
# NOT USED DUE TO TAKING TOO LONG TO TRAIN. Using the pretrained Xception model and also using k-folds.

import numpy as np
from sklearn.model_selection import KFold
import tensorflow as tf
import tensorflow_addons as tfa
from tensorflow.keras.applications import Xception
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Flatten, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Definir el modelo base (Xception)
def create_model():
    base_model = Xception(include_top=False, input_shape=(75, 100, 3), weights='imagenet')

    # Obtener la capa intermedia
    intermediate_layer_model = Model(inputs=base_model.input, outputs=base_model.layers[-2].output)

    # Definir el modelo MLP para clasificación
    model = Sequential()
    model.add(Flatten(input_shape=(3, 3, 2048)))  # Ajusta la forma según la salida de Xception
    model.add(Dense(256, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dropout(0.5))
    model.add(Dense(128, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dropout(0.5))
    model.add(Dense(7, activation='softmax'))

    # Compilar el modelo con métrica de F1-score
    model.compile(loss='categorical_crossentropy', optimizer=Adam(learning_rate=0.001), metrics=[tfa.metrics.F1Score(num_classes=7)])

    return intermediate_layer_model, model

# Aumento de datos (Data Augmentation)
datagen = ImageDataGenerator(
    rotation_range=30,
    zoom_range=0.1,
    width_shift_range=0.2,
    height_shift_range=0.1,
    horizontal_flip=True,
    vertical_flip=True
)

# Implementación de k-folds
n_splits = 3
kf = KFold(n_splits=n_splits)
fold = 1

for train_idx, validate_idx in kf.split(x_train):
    print(f"Training on fold {fold}")

    x_train_fold = x_train[train_idx]
    y_train_fold = y_train[train_idx]
    x_validate_fold = x_train[validate_idx]
    y_validate_fold = y_train[validate_idx]

    # Crear modelo y extraer características
    intermediate_layer_model, model = create_model()
    x_train_features = intermediate_layer_model.predict(x_train_fold)
    x_validate_features = intermediate_layer_model.predict(x_validate_fold)

    # Ajustar el generador de datos al conjunto de entrenamiento
    datagen.fit(x_train_features)

    # Entrenar el modelo con aumento de datos
    model.fit(datagen.flow(x_train_features, y_train_fold, batch_size=500),
              validation_data=(x_validate_features, y_validate_fold),
              epochs=20)

    fold += 1
'''

In [None]:
# Model 5 in the Thesis Document. I removed k-folds because it takes too long to train.

import numpy as np
import tensorflow as tf
import tensorflow_addons as tfa
from tensorflow.keras.applications import Xception
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Flatten, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Define the base model (Xception).
def create_model():
    base_model = Xception(include_top=False, input_shape=(75, 100, 3), weights='imagenet')

    # Obtain the intermediate layer.
    intermediate_layer_model = Model(inputs=base_model.input, outputs=base_model.layers[-2].output)

    # Define the MLP model for classification.
    model = Sequential()
    model.add(Flatten(input_shape=(3, 3, 2048)))  # Adjust the shape according to Xception's output.
    model.add(Dense(256, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dropout(0.5))
    model.add(Dense(128, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dropout(0.5))
    model.add(Dense(7, activation='softmax'))

    # Compile the model using the F1-score metric.
    model.compile(loss='categorical_crossentropy', optimizer=Adam(learning_rate=0.5), metrics=[tfa.metrics.F1Score(num_classes=7)])

    return intermediate_layer_model, model

# Data Augmentation.
datagen = ImageDataGenerator(
    rotation_range=30,
    zoom_range=0.1,
    width_shift_range=0.2,
    height_shift_range=0.1,
    horizontal_flip=True,
    vertical_flip=True
)

# Create the model and extract features.
intermediate_layer_model, model = create_model()
x_train_features = intermediate_layer_model.predict(x_train)

# Fit the data generator to the training set.
datagen.fit(x_train_features)

# Train the model with data augmentation.
model.fit(datagen.flow(x_train_features, y_train, batch_size=50),
          epochs=20)


In [None]:
# Process the test images through the intermediate model.
x_test_features = intermediate_layer_model.predict(x_test)

# Obtain predictions from the MLP model.
predictions = model.predict(x_test_features)

# Convert the predictions into labels.
predicted_labels = np.argmax(predictions, axis=1)
true_labels = np.argmax(y_test, axis=1)

accuracy = np.sum(predicted_labels == true_labels) / len(true_labels)
print(f"Accuracy on test set: {accuracy:.4f}")


In [None]:
'''#  Model 8. NOT USED DUE to time to train the model. Google Colab collapsed.

efficientnet_model = EfficientNetB0(include_top=False, input_shape=(75, 100, 3), weights='imagenet')
densenet_model = DenseNet169(include_top=False, input_shape=(75, 100, 3), weights='imagenet')
resnet_model = ResNet152(include_top=False, input_shape=(75, 100, 3), weights='imagenet')
inception_model = InceptionV3(include_top=False, input_shape=(75, 100, 3), weights='imagenet')

def create_efficientnet_model(params):
    num_classes = 7
    l2_reg_rate = params['l2_reg_rate']
    dropout_rate = params['dropout_rate']

    model = Sequential()
    model.add(efficientnet_model)
    model.add(Flatten())
    model.add(Dense(128, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(l2_reg_rate)))
    model.add(BatchNormalization())
    model.add(Dropout(dropout_rate))
    model.add(Dense(num_classes, activation='softmax'))

    model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=[tfa.metrics.F1Score(num_classes=num_classes)])

    return model

def create_densenet_model(params):
    num_classes = 7
    l2_reg_rate = params['l2_reg_rate']
    dropout_rate = params['dropout_rate']

    model = Sequential()
    model.add(densenet_model)
    model.add(Flatten())
    model.add(Dense(128, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(l2_reg_rate)))
    model.add(BatchNormalization())
    model.add(Dropout(dropout_rate))
    model.add(Dense(num_classes, activation='softmax'))

    model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=[tfa.metrics.F1Score(num_classes=num_classes)])

    return model

def create_resnet_model(params):
    num_classes = 7
    l2_reg_rate = params['l2_reg_rate']
    dropout_rate = params['dropout_rate']

    model = Sequential()
    model.add(resnet_model)
    model.add(Flatten())
    model.add(Dense(128, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(l2_reg_rate)))
    model.add(BatchNormalization())
    model.add(Dropout(dropout_rate))
    model.add(Dense(num_classes, activation='softmax'))

    model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=[tfa.metrics.F1Score(num_classes=num_classes)])

    return model

def create_inception_model(params):
    num_classes = 7
    l2_reg_rate = params['l2_reg_rate']
    dropout_rate = params['dropout_rate']

    model = Sequential()
    model.add(inception_model)
    model.add(Flatten())
    model.add(Dense(128, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(l2_reg_rate)))
    model.add(BatchNormalization())
    model.add(Dropout(dropout_rate))
    model.add(Dense(num_classes, activation='softmax'))

    model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=[tfa.metrics.F1Score(num_classes=num_classes)])

    return model

# Set a learning rate annealer
learning_rate_reduction = ReduceLROnPlateau(monitor='val_loss',
                                            patience=3,
                                            verbose=1,
                                            factor=0.5,
                                            min_lr=0.00001)

# With data augmentation to prevent overfitting
datagen = ImageDataGenerator(
        featurewise_center=False,
        samplewise_center=False,
        featurewise_std_normalization=False,
        samplewise_std_normalization=False,
        zca_whitening=False,
        rotation_range=30,
        zoom_range=0.1,
        width_shift_range=0.2,
        height_shift_range=0.1,
        horizontal_flip=1,
        vertical_flip=1)

datagen.fit(x_train)

# Define hyperparameters for grid search
hyperparams_grid = {
    'l2_reg_rate': [0.01, 0.001],
    'dropout_rate': [0.3, 0.4, 0.5]
}

# Define number of epochs, batch size, and kfolds
epochs = 20
batch_size = 15
kfolds = 2

# Convert labels from categorical to integer for stratifiedKFold
y_train_int = np.argmax(y_train, axis=1)

# Grid search for each model
models = {
    "EfficientNet": (create_efficientnet_model, efficientnet_model),
    "DenseNet": (create_densenet_model, densenet_model),
    "ResNet": (create_resnet_model, resnet_model),
    "InceptionNet": (create_inception_model, inception_model)
}

results = {}

for model_name, (create_model_func, model) in models.items():
    best_score = 0
    best_params = {}
    best_model = None
    for params in ParameterGrid(hyperparams_grid):
        cvscores = []
        kfold = StratifiedKFold(n_splits=kfolds, shuffle=True)
        for train, test in kfold.split(x_train, y_train_int):
            # create model
            model = create_model_func(params)

            # data augmentation for current fold
            datagen.fit(x_train[train])

            # Fit the model with current fold
            history = model.fit(datagen.flow(x_train[train], y_train[train], batch_size=batch_size),
                                epochs=epochs, validation_data=(x_train[test], y_train[test]),
                                verbose=1, steps_per_epoch=x_train[train].shape[0] // batch_size,
                                callbacks=[learning_rate_reduction])

            # evaluate the model with current fold
            scores = model.evaluate(x_train[test], y_train[test], verbose=0)
            cvscores.append(scores[1] * 100)

        mean_score = np.mean(cvscores)
        if mean_score > best_score:
            best_score = mean_score
            best_params = params
            best_model = model

    results[model_name] = {
        'best_score': best_score,
        'best_params': best_params
    }

# Create a DataFrame with the results
results_df = pd.DataFrame(results).T
results_df.index.name = "Model"
results_df.columns = ['Best Score', 'Best Params']

# Print and save the results
print(results_df)
results_df.to_csv('grid_search_results.csv')

In [None]:
!pip install tensorflow-addons

In [None]:
import h5py
import numpy as np
from contextlib import redirect_stdout
from io import StringIO

# Redirect the printed summary to a string buffer
buffer = StringIO()
with redirect_stdout(buffer):
    model.summary()
architecture_summary = buffer.getvalue().encode()

# Convert the EagerTensor to a Python list
data = [float(x) for x in np.array([2.0896919, 2.1128857, 2.1081853])]

# Save the architecture and weights of the model
with h5py.File("best_model_EfficientNet.h5", "w") as file:
    file.create_dataset("architecture", data=np.array(architecture_summary))
    file.create_dataset("data", data=data)

    # Save the weights as separate datasets
    for i, weight in enumerate(model.get_weights()):
        file.create_dataset(f"weight_{i}", data=weight)

In [None]:
# Save the best model
model.save("best_model_EfficientNet.h5")

# Create a DataFrame with the results
results_df = pd.DataFrame(results).T
results_df.index.name = "Model"
results_df.columns = ['Best Score', 'Best Params']

# Print and save the results
print(results_df)
results_df.to_csv('grid_search_results.csv')

In [None]:
print("%.2f%% (+/- %.2f%%)" % (np.mean(cvscores), np.std(cvscores)))

loss, accuracy = model.evaluate(x_test, y_test, verbose=1)
loss_v, accuracy_v = model.evaluate(x_validate, y_validate, verbose=1)
print("Validation: f1 = %f  ;  loss_v = %f" % (np.mean(accuracy_v), np.mean(loss_v)))
print("Test: f1 = %f  ;  loss = %f" % (np.mean(accuracy), np.mean(loss)))
# Save the model
#model.save("model.h5")
loss, accuracy = model.evaluate(x_test, y_test, verbose=1)
loss_v, accuracy_v = model.evaluate(x_validate, y_validate, verbose=1)
print("Validation: f1 = %s ; loss_v = %f" % ([f'{i:.2f}%' for i in accuracy_v*100], loss_v))
print("Test: f1 = %s ; loss = %f" % ([f'{i:.2f}%' for i in accuracy*100], loss))


In [None]:
'''# Fit the model
epochs = 5
batch_size = 5
history = model.fit_generator(datagen.flow(x_train,y_train, batch_size=batch_size),
                              epochs = epochs, validation_data = (x_validate,y_validate),
                              verbose = 1, steps_per_epoch=x_train.shape[0] // batch_size
                              , callbacks=[learning_rate_reduction])'''

In [None]:
'''epochs = 50
batch_size = 15

# create model
model = create_model()

# data augmentation
datagen.fit(x_train)

# Fit the model
history = model.fit(datagen.flow(x_train, y_train, batch_size=batch_size),
                    epochs=epochs, validation_data=(x_validate, y_validate),
                    verbose=1, steps_per_epoch=x_train.shape[0] // batch_size,
                    callbacks=[learning_rate_reduction])'''

In [None]:
def plot_model_history(model_history):
    fig, axs = plt.subplots(1,2,figsize=(15,5))
    # summarize history for f1 score
    axs[0].plot(range(1,len(model_history.history['f1_score'])+1),model_history.history['f1_score'])
    axs[0].plot(range(1,len(model_history.history['val_f1_score'])+1),model_history.history['val_f1_score'])
    axs[0].set_title('Model F1 Score')
    axs[0].set_ylabel('F1 Score')
    axs[0].set_xlabel('Epoch')
    axs[0].set_xticks(np.arange(1,len(model_history.history['f1_score'])+1,len(model_history.history['f1_score']) // 10))
    axs[0].legend(['train', 'val'], loc='best')
    # summarize history for loss
    axs[1].plot(range(1,len(model_history.history['loss'])+1),model_history.history['loss'])
    axs[1].plot(range(1,len(model_history.history['val_loss'])+1),model_history.history['val_loss'])
    axs[1].set_title('Model Loss')
    axs[1].set_ylabel('Loss')
    axs[1].set_xlabel('Epoch')
    axs[1].set_xticks(np.arange(1,len(model_history.history['loss'])+1,len(model_history.history['loss']) // 10))
    axs[1].legend(['train', 'val'], loc='best')
    plt.show()

plot_model_history(history)


In [None]:
# Function to plot confusion matrix
def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]

    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, cm[i, j],
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

# Predict the values from the validation dataset
Y_pred = model.predict(x_validate)
# Convert predictions classes to one hot vectors
Y_pred_classes = np.argmax(Y_pred,axis = 1)
# Convert validation observations to one hot vectors
Y_true = np.argmax(y_validate,axis = 1)
# compute the confusion matrix
confusion_mtx = confusion_matrix(Y_true, Y_pred_classes)



# plot the confusion matrix
plot_confusion_matrix(confusion_mtx, classes = range(7))

In [None]:
label_frac_error = 1 - np.diag(confusion_mtx) / np.sum(confusion_mtx, axis=1)
plt.bar(np.arange(7),label_frac_error)
plt.xlabel('True Label')
plt.ylabel('Fraction classified incorrectly')