Data set: 
Link to the data set https://stanfordmlgroup.github.io/competitions/mura/
MURA consists of images 7 types: hand, humerus, forearm, finger, wrist, elbow, shoulder. The purpose is to build a model that classifies images as negative ('healthy' - 0) or positive('abnormal' - 1). There is a 'train' set and a 'valid' set (used actually for testing) inside the archive. Note that the total data is very large (approx 3 GB)

Data extraction:
Using a separate scipt I've split images based on their kind. In folders 'train_specific_paths' and 'valid_specific_paths' one can find .csv files which contain the necessary paths used to extract data.
Here, use the functions extract_data (to extract specific kind of images; ex: 'elbow') and extract_all_data. These functions return a 4-tuple: np arrays representation of train images, labels of train images, test images, labels of test images.
Note that when calling extract_data or extract_all_data, the first parameter is the path to the location of MURA (so if needed, please change 'D:\\python' in order to make the example work)

Model:

Given the task is image classification, convolutional neural network (CNN) is the most suitable type of model. Below one can find my reasoning for the design decisions made.

Preprocessing: MURA images are either RGB or greyscale. Because of the nature of classification, converting them to greyscale is preferable. Images have different shapes, I chose to reshape all of them to (512,512), because 512 is the maximum value their shaes can take and so no information will be lost.

Choosing the right parameters: using k_fold_cross_validation (k = 5) and classic_validation functions I computed the values to be used for the 2 pairs of convoltuion layers + maxpooling and the size of the fully connected dense layer. 3 different types of models have been trained, the one selected was the one with better accuracy on the test set.
Accuracy of the presented model: 61% on test set 

Metrics used: accuracy and confusion matrix (see conf_matrix function)
Dealing with overfitting: because of the large size of the train set, overfitting can be an issue. I've added dropout layers (of small values for the conv layers: 0.2 and 0.5 for the dense layer) and mainted a validation set (in classic_validation 20% of train data is saved for validation)

Limitations:The large size of the data set, makes training models take a very long time ( approx. 5 hours to train a model on elbow images alone) and so more experimenting is required in order to improve the model.

In [1]:
import numpy as np 
import os 
import tensorflow as tf
import cv2
import matplotlib.pyplot as plt
from csv import reader
from sklearn.preprocessing import scale
from tensorflow.keras.layers import Conv1D, Conv2D, MaxPooling2D, Flatten, Dense, Dropout 
from tensorflow.keras import models
from sklearn.metrics import confusion_matrix

In [2]:
def extract_data(source, kind, reshape, scale):
    train_imgs, train_vals = extract_helper(source, 'train', kind, reshape, scale)
    test_imgs, test_vals = extract_helper(source, 'valid', kind, reshape, scale)
    return train_imgs, train_vals, test_imgs, test_vals

def extract_all_data(source, reshape, scale):
    train_imgs, train_vals = extract_helper(source, 'train', reshape, scale)
    test_imgs, test_vals = extract_helper(source, 'valid', reshape, scale)
    return train_imgs, train_vals, test_imgs, test_vals


def extract_helper(source, torv, kind, reshape, scale):
    os.chdir(source+'\\MURA-v1.1')
    os.chdir(torv+'_specific_paths')
    file = open(torv+'_image_paths_'+kind+'.csv')
    return extract(source, file, reshape, scale)

def extract_all_helper(source, torv, reshape, scale):
    os.chdir(source+'\\MURA-v1.1')
    file = open(torv+'_image_paths_.csv')
    return extract(source, file, reshape, scale)
    
def extract(source, file, reshape, scale):
    readCSV = reader(file)
    imgs = []
    vals = []
    for row in readCSV:
        #im = Image.open(source+'\\'+row[0]).convert('L').resize(reshape)
        #train_imgs.append(scale(np.array(im)))
        im = cv2.imread(source+'\\'+row[0], cv2.IMREAD_GRAYSCALE)
        if scale == True:
            imgs.append(scale(np.array(cv2.resize(im,reshape))))
        else:
            imgs.append(np.array(cv2.resize(im,reshape)))
        if 'positive' in row[0]:
            vals.append(1)
        else:
            vals.append(0)
    file.close()
    imgs = np.array(imgs)
    vals = np.array(vals)
    imgs = np.expand_dims(imgs, axis=3)
    return imgs,vals

In [3]:
# *** Example ***
e_train_x, e_train_y, e_test_x, e_test_y = extract_data('D:\\python','elbow', (512,512), False)
print(e_train_x.shape)
print(e_test_x.shape)

(4931, 512, 512, 1)
(465, 512, 512, 1)


In [8]:
def classic_validation(model, data_x, data_y, batch_size, number_of_epochs):
    l = len(data_y)
    rate = int(l*0.8)
    p = np.random.permutation(l)
    data_x = data_x[p]
    data_y = data_y[p]
    train_x = data_x[:rate]
    train_y = data_y[:rate]
    valid_x = data_x[rate:]
    valid_y = data_y[rate:]    
    model.fit(train_x, train_y, batch_size = batch_size, epochs = number_of_epochs)
    score = model.evaluate(valid_x,valid_y)[1]
    model.fit(valid_x, valid_y, batch_size = batch_size, epochs = number_of_epochs)
    return score, model

def k_fold_cross_validation(k, model, data_x, data_y, batch_size, number_of_epochs):
    l = len(data_y)
    p = np.random.permutation(l)
    data_x = data_x[p]
    data_y = data_y[p]
    folds_x = []
    folds_y = []
    for i in range(k):
        folds_x[i] = data_x[(l//k)*i: (l//k)*(i+1)]
        folds_y[i] = data_y[(l//k)*i: (l//k)*(i+1)]
    score = 0
    for i in range(k):
        model_copy = models.clone_model(model)
        for j in range(k):
            if j!=i:
                model_copy.fit(folds_x[j],folds_y[j], batch_size = batch_size, epochs = number_of_epochs)
        score += model_copy.evaluate(folds_x[i],folds_y[i])[1]
    model = mode_copy
    model.fit(folds_x[k-1],folds_y[k-1], batch_size = batch_size, epochs = number_of_epochs)
    return score/k, model

def conf_matrix(model, data_x, data_y):
    y_pred = model.predict(data_x).flatten().tolist()
    y_true = data_y.tolist()
    for i in range(len(y_pred)):
        y_pred[i] = round(y_pred[i])
    return confusion_matrix(y_true, y_pred)
    

In [9]:
# *** Example ***

model = models.Sequential()
model.add(Conv2D(8, (8, 8), activation='relu', input_shape=(512,512,1), padding = 'same'))
model.add(Conv2D(8, (8, 8), activation='relu', padding = 'same'))
model.add(MaxPooling2D(pool_size=(4, 4)))
model.add(Dropout(0.2))

model.add(Conv2D(16, (8, 8), activation='relu', padding = 'same'))
model.add(Conv2D(16, (8, 8), activation='relu', padding = 'same'))
model.add(MaxPooling2D(pool_size=(4, 4)))
model.add(Dropout(0.2))

model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

s, model = classic_validation(model, e_train_x, e_train_y, 8, 5)
model.evaluate(e_test_x, e_test_y)
print(conf_matrix(model, e_test_x, e_test_y))


KeyboardInterrupt: 