# Tissue Classification using Neural Networks
In this lab we will explore the use of texture in images and traditional machine learning approaches such as clustering. The dataset we will be using is available here: http://dx.doi.org/10.5281/zenodo.53169. 

![alt text](https://www.researchgate.net/profile/Jakob_Kather/publication/303998214/figure/fig7/AS:391073710002224@1470250646407/Representative-images-from-our-dataset-Here-the-first-10-images-of-every-tissue-class.png)

The above figure shows the 8 different classes of tissue we will be trying to identify. 

In [0]:
# Imports
from __future__ import print_function
import os
import numpy as np
import matplotlib.pylab as plt
from sklearn.model_selection import train_test_split
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.utils import to_categorical

## Step 1
* Load the data (done for you)
 * The "data" variable stores 5000 images of shape 150x150. This means data has shape (5000, 150, 150). These images are loaded here as grayscale.
 * The "labels" variable stores 5000 labels (0-7). This means "labels" has shape (5000,)
* Split data into training and testing subsets (left up to you)
 * Check out the sklearn function train_test_split from sklearn.model_selection

In [2]:
! git clone https://github.com/BeaverWorksMedlytics/Week3_public.git

# Build the path to the data folder. No need to change directories
# There are a total of 6 files you will have to load
data_dir = os.path.join( os.getcwd(), 'Week3_public', 'data', 'crc')

Cloning into 'Week3_public'...
remote: Counting objects: 55, done.[K
remote: Total 55 (delta 0), reused 0 (delta 0), pack-reused 55[K
Unpacking objects: 100% (55/55), done.


In [110]:
#### ATTEMPT 1 #####
# Load data and split into training, testing sets
y = np.load(os.path.join(data_dir, 'rgb01.npz'))
labels = y['labels']
data = y['rgb_data']
data = data[:,:,:,0] #takes only the r value
label_str = y['label_str']
label_str = label_str.tolist() # this is to convert label_str back to a dictionary
y = []

print(data.shape)
for ii in range(2,6):
    filename = os.path.join(data_dir, 'rgb0' + str(ii) + '.npz')
    print('loading ', filename)
    y = np.load(filename)
    labels = np.append(labels, y['labels'], axis=0)
    data = np.append(data, y['rgb_data'][:,:,:,0], axis=0)
    print(data.shape)
    y = []


print( data.shape )
print( labels.shape )

(1000, 150, 150)
loading  /content/Week3_public/data/crc/rgb02.npz
(2000, 150, 150)
loading  /content/Week3_public/data/crc/rgb03.npz
(3000, 150, 150)
loading  /content/Week3_public/data/crc/rgb04.npz
(4000, 150, 150)
loading  /content/Week3_public/data/crc/rgb05.npz
(5000, 150, 150)
(5000, 150, 150)
(5000,)


In [123]:
#### ATTEMPT 2 #####
# Load data and split into training, testing sets
y = np.load(os.path.join(data_dir, 'rgb01.npz'))
labels = y['labels']
data = y['rgb_data']
data = data[:,:,:,0:3] #takes only the r value
label_str = y['label_str']
label_str = label_str.tolist() # this is to convert label_str back to a dictionary
y = []

print(data.shape)
for ii in range(2,6):
    filename = os.path.join(data_dir, 'rgb0' + str(ii) + '.npz')
    print('loading ', filename)
    y = np.load(filename)
    labels = np.append(labels, y['labels'], axis=0)
    data = np.append(data, y['rgb_data'][:,:,:,0:3], axis=0)
    print(data.shape)
    y = []


print( data.shape )
print( labels.shape )

(1000, 150, 150, 3)
loading  /content/Week3_public/data/crc/rgb02.npz
(2000, 150, 150, 3)
loading  /content/Week3_public/data/crc/rgb03.npz
(3000, 150, 150, 3)
loading  /content/Week3_public/data/crc/rgb04.npz
(4000, 150, 150, 3)
loading  /content/Week3_public/data/crc/rgb05.npz
(5000, 150, 150, 3)
(5000, 150, 150, 3)
(5000,)


In [111]:
##### ATTEMPT 1 ######
num_images, nrows, ncols = data.shape

# split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size = 0.25)
print("Current separation:", X_train.shape, y_train.shape)

# convert the labels from 1-D arrays to categorical type 
print("Previous Label:", y_train[0])
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
print("New Label Shape:", y_train.shape)

Current separation: (3750, 150, 150) (3750,)
Previous Label: 0
New Label Shape: (3750, 8)


In [124]:
##### ATTEMPT 2 ######
num_images, nrows, ncols, nrgb = data.shape

# split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size = 0.25)
print("Current separation:", X_train.shape, y_train.shape)

# convert the labels from 1-D arrays to categorical type 
print("Previous Label:", y_train[0])
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
print("New Label Shape:", y_train.shape)

Current separation: (3750, 150, 150, 3) (3750,)
Previous Label: 3
New Label Shape: (3750, 8)


## Normalize and Reshape Data
All images should be normalized to the range 0-1 by dividing by 255.

Additionally, because this is a ANN, not a CNN, we need to reshape the data to be one dimensional. In training and test data, colapse the row and column dimensions into one dimension using reshape().
#### Note
* Using the La\*b colorspace : If you convert your images to the La\*b colorspace, the scaling factor will change. Each channel in this colorspace will have a different range and normalization of each space will involve scaling each channel separately. Additionally, the a\* channel can have a negative range. This also needs to be taken into account. 
* Using the HSV/HSI colorspace : Similar considerations apply if you are using the HSV/HSI colorspace. The only difference is that the HSV/HSI colorspace will have all positive values.

In [0]:
# Assuming we are using the RGB colorspace
# Normalize all images so that they are 0-1


# Reshape the data 


In [112]:
##### ATTEMPT 1 ########
### just divide by 255 #####

n_images, n_rows, n_cols = X_train.shape
X_train = X_train/255
X_train = X_train.reshape((n_images, n_rows*n_cols))

print('After reshape:', X_train.shape)




After reshape: (3750, 22500)


In [113]:
#### ATTEMPT 1 ######
#normalize the test data
n_images, n_rows, n_cols = X_test.shape
X_test = X_test/255
X_test = X_test.reshape((n_images, n_rows * n_cols))

print('Current y_test', y_test)

Current y_test [[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 1. 0.]
 [0. 0. 0. ... 1. 0. 0.]
 ...
 [0. 0. 0. ... 1. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [1. 0. 0. ... 0. 0. 0.]]


In [125]:
#### ATTEMPT 2 #####
n_images, n_rows, n_cols, nrgb = X_train.shape


X_train = X_train/255
X_train = X_train.reshape((n_images, n_rows*n_cols*nrgb))

print('After reshape:', X_train.shape)

#normalize the test data
n_images, n_rows, n_cols, nrgb = X_test.shape
X_test = X_test/255
X_test = X_test.reshape((n_images, n_rows * n_cols * nrgb))

print('Current y_test', y_test)

After reshape: (3750, 67500)
Current y_test [[0. 1. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 1. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 0. 1. 0.]
 [0. 1. 0. ... 0. 0. 0.]]


In [0]:
### ATTEMPT 3 ########



In [114]:
### testing what it looks like #####
for i in range(X_train[2].shape[0]):
  if X_train[2][i] > 0:
    print("Not Zero")
print('Sample Image:', X_train[2])

Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
Not Zero
N

## Step 2
At this point, the data has been split into training and testing sets and normalized. We will now design a fully connected neural network for texture classification. 

<img src="http://cs231n.github.io/assets/nn1/neural_net2.jpeg" width="50%"></img>

( Image from http://cs231n.github.io/convolutional-networks/ )

When designing a fully connected network for classification, we have several decisions to make.

**Network Architecuture**
* How many layers will our network have ?
* How many neurons per layer ?
* What is an appropriate batch size, learning rate and number of training epochs ?

**Data input**
* Do we use the raw data ?
    * RGB or just gray channel ?
* Does the use of different colorspaces lead to better results for a given network architecture ?
* Can we use any of the texture features from the previous lab as inputs to this model ?
* How does data augmentation affect the results ? 

Other considerations, we will not be exploring :
* What is the trade-off between input data sizes and batch size ?
* Is the GPU always the appropriate platform for training ?
* How does hardware influence inputs and batch sizes for a given desired accuracy ?

In [0]:
# Define the data shapes based on your decision to use rgb or grayscale or other colorpsaces or texture features or 
# some combination of these inputs
num_classes = 8 
input_shape = nrows*ncols

In [0]:
#### ATTEMPT 2 ####
num_classes = 8 
input_shape = nrows*ncols*nrgb

## Step 3
Design your network here using Keras

In [0]:
import tensorflow as tf

In [117]:
#### ATTEMPT 1 #####

# Create your network
model = []
model = Sequential()

# Add input layer
model.add(Dense(64, activation=tf.nn.relu, input_shape = (nrows * ncols, )))
# Add fully connected layers 
model.add(Dense(64, activation=tf.nn.relu))
# See Dense : https://keras.io/layers/core/#dense

# Add final output layer - This should have as many neurons as the number

# of classes we are trying to identify
model.add(tf.keras.layers.Dense(num_classes, activation=tf.nn.softmax))


model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_21 (Dense)             (None, 64)                1440064   
_________________________________________________________________
dense_22 (Dense)             (None, 64)                4160      
_________________________________________________________________
dense_23 (Dense)             (None, 8)                 520       
Total params: 1,444,744
Trainable params: 1,444,744
Non-trainable params: 0
_________________________________________________________________


In [138]:
### ATTEMPT 2 ######

# Create your network
model = []
model = Sequential()

# Add input layer
model.add(Dense(32, activation=tf.nn.relu, input_shape = (nrows * ncols * nrgb, )))
# Add fully connected layers 
model.add(Dense(32, activation=tf.nn.relu))
# See Dense : https://keras.io/layers/core/#dense

# Add final output layer - This should have as many neurons as the number

# of classes we are trying to identify
model.add(tf.keras.layers.Dense(num_classes, activation=tf.nn.softmax))


model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_27 (Dense)             (None, 32)                2160032   
_________________________________________________________________
dense_28 (Dense)             (None, 32)                1056      
_________________________________________________________________
dense_29 (Dense)             (None, 8)                 264       
Total params: 2,161,352
Trainable params: 2,161,352
Non-trainable params: 0
_________________________________________________________________


## Step 4
Compile the model you designed. Compiltation of the Keras model results in the initialization of model weights and sets other model properties.

In [0]:
model.compile(loss='categorical_crossentropy', optimizer=SGD(), metrics=['accuracy'])

## Step 5
Train model

In [142]:
y = model.fit(X_train, y_train, batch_size = 20, epochs = 10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10

Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10

Epoch 10/10


## Step 6
See how your model performs by uisng it for inference.
* What is the accuracy of classification ?
* Change your model, re-compile and test. Can you improve the accuracy of the model ?


In [0]:
# predict labels - use the test set for prediction
pred_labels = model.predict(X_test)

In [144]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix

# We need to convert the categorical array test_labels and pred_labels into a vector
# in order to use it in the calculation of the confusion matrix (i.e. convert from one-hot to integers)
mat = confusion_matrix(np.argmax(y_test, axis=1), np.argmax(pred_labels, axis = 1))
acc = accuracy_score(np.argmax(y_test, axis=1), np.argmax(pred_labels, axis=1))
print(acc)
print(mat)

0.3152
[[ 30   1  33   4  18  34  18   6]
 [ 23   5  11  19  15  40  48   5]
 [ 27   2  45   6  13  39  15   2]
 [ 11   6   8  25  17  67  15   5]
 [ 32   9  10  27  24  23  22   1]
 [  1   0   0   3   0 171   3   0]
 [ 23   6  14  20  21  45  23  12]
 [ 11   7   0  12   0  14  32  71]]


In [0]:
plt.figure(figsize=(8,6))
plt.imshow(mat, cmap='hot', interpolation='nearest')
plt.grid(False)
plt.colorbar()
plt.xlabel('true label')
plt.ylabel('predicted label')
plt.show()

## Assignment
* In Step 3 design your own network
* Does the model perform better if you use all three RGB channels ?
* How does the performance change when using the La*b colorspace ?


In [0]:

# Load data as RGB
y = np.load(os.path.join(data_dir, 'rgb01.npz'))
labels = y['labels']
data_rgb = y['rgb_data']
label_str = y['label_str']
label_str = label_str.tolist() # this is to convert label_str back to a dictionary
y = []

print(data_rgb.shape)
for ii in range(2,6):
    filename = os.path.join(data_dir, 'rgb0' + str(ii) + '.npz')
    print('loading ', filename)
    y = np.load(filename)
    labels = np.append(labels, y['labels'], axis=0)
    data_rgb = np.append(data_rgb, y['rgb_data'])
    print(data_rgb.shape)
    y = []

data_rgb = data_rgb.astype('float')
data_rgb = data_rgb.reshape(5000, 150, 150, 3)

print( data_rgb.shape )
print( labels.shape )

num_images, nrows, ncols, dims = data_rgb.shape

(1000, 150, 150, 3)
loading  /content/Week3_public/data/crc/rgb02.npz
(135000000,)
loading  /content/Week3_public/data/crc/rgb03.npz
(202500000,)
loading  /content/Week3_public/data/crc/rgb04.npz
(270000000,)
loading  /content/Week3_public/data/crc/rgb05.npz
(337500000,)
(5000, 150, 150, 3)
(5000,)
