# Lung Classification Tutorial

#### In this course you will learn how to run a deep learning experiment - from loading data, running a model, and deploying your model on a test dataset.

This course will build upon the knowledge gained in the first lesson and will utilize a much larger dataset.

In this course you will build a deep learning model that identifies whether an x-ray of the lungs contains an opacity. The dataset is from a Kaggle challenge.

The dataset comes from the RSNA Pneumonia Detection Challenge (Kaggel API)
        ,
        "The [Radiological Society of North America](http://www.rsna.org/) Pneumonia Detection Challenge: https://www.kaggle.com/c/rsna-pneumonia-detection-challenge",

The project ID on MD.ai is `LxR6zdR2`.


<img src="images/lesson2_datasetImage.png">

In this notebook some of the cells need to be entered by you to work on completing the assignment. These cells have:

```python
#--------EDIT THIS CELL------------
```

at the top of the cell.

For instance, you will see a few cells down where you need to setup the variable 'p' to store the data for this project.

```python
#--------EDIT THIS CELL------------

# Load project data into a variable 'p'
```

The 'helper_utils.ipynb' file has details that can be used if you get stuck.


In [None]:
# Include the mdai module
!pip install --upgrade --quiet mdai
import mdai
mdai.__version__

In [None]:
# Add mdai client
mdai_client = mdai.Client(domain='public.md.ai', access_token="ENTER TOKEN")

In [None]:
#--------EDIT THIS CELL------------

# Load project data into a variable 'p'
# The project ID is "LxR6zdR2"

In [None]:
p.show_label_groups()

In [None]:
#--------EDIT THIS CELL------------

# map the label ids to class ids as a dictionary object.

In [None]:
# print label dictionary and set up

print(labels_dict)
p.set_labels_dict(labels_dict)

In [None]:
# show dataset ID and label mappings
p.show_datasets()

### Display label classes

In [None]:
dataset = p.get_dataset_by_id('D_ao3XWQ')
dataset.prepare()
dataset.show_classes()

In [None]:
anns = dataset.get_annotations()

In [None]:
# Separate dataset into train, val, and test

train_dataset, val_dataset = mdai.common_utils.train_test_split(dataset, validation_split = 0.98)
val_dataset, test_dataset = mdai.common_utils.train_test_split(val_dataset, validation_split = 0.995)
test_dataset, test_dataset2 = mdai.common_utils.train_test_split(test_dataset, validation_split = 0.90)

In [None]:
anns = dataset.get_annotations(labels_dict.keys(), verbose=True)

In [None]:
train_image_ids = train_dataset.get_image_ids()
val_image_ids = val_dataset.get_image_ids()

# visualize a few train images
mdai.visualize.display_images(train_image_ids[:2], cols=2)
mdai.visualize.display_images(val_image_ids[:2], cols=2)

In [None]:
#Example extracting pixel array from the dicom data
import numpy as np
# get image pixel data
pixel_array = mdai.visualize.load_dicom_image(train_image_ids[0], to_RGB=False, rescale=True)
print(np.shape(pixel_array))

In [None]:
# Import keras module
from keras import applications
from keras.models import Model, Sequential
from keras.layers import Input, Dropout, Flatten, Dense, GlobalAveragePooling2D, Conv2D
from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping, ModelCheckpoint

In [None]:
# Define model parameters
img_width = 128
img_height = 128
epochs = 20

params = {
    'dim': (img_width, img_height),
    'batch_size': 8,
    'n_classes': 2,
    'n_channels': 3,
    'shuffle': True,
}

# Begin Defining Model

Here we build up a very basic CNN architecture (similar in nature to the VGG class of architectures).

Here is where you can feel free to experiment with different architectures and tune the hyperparameters of the network. You should observe differences in training performance, as well as the amount of time required to fully train the network.

Try changing the number of kernels in the network from 32 down to 16.

For example:

```python
conv1 = Conv2D(16, (3,3), activation = 'relu', padding='same')(inputs)
```

Or changing the size of the filter kernels from 3x3 to 5x5

```python
conv1 = Conv2D(32, (5,5), activation = 'relu', padding='same')(inputs)
```

Or the activation function for the output:

```python
conv1 = Conv2D(32, (3,3), activation = 'tanh', padding='same')(inputs)
```

How do these parameters affect performance and training time?

In [None]:
#--------EDIT THIS CELL------------

# Create a CNN model to train
# This can be similar to the one used in the previous notebook
# (i.e. chest vs. abdomen X-ray)

# End Defining Model

In [None]:
from mdai.utils import keras_utils

train_generator = keras_utils.DataGenerator(train_dataset, **params)
val_generator = keras_utils.DataGenerator(val_dataset, **params)

In [None]:
import tensorflow as tf
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True

# Train Model

In [None]:
# Set callback functions to early stop training and save the best model so far
callbacks = [
    EarlyStopping(monitor='val_loss', patience=4, verbose=2),
    ModelCheckpoint(filepath='best_model_lesson2.h5', monitor='val_accuracy',
                    save_best_only=True, verbose=2)
]

history = model.fit_generator(
            generator=train_generator,
            epochs=epochs,
            callbacks=callbacks,
            verbose=1,
            validation_data=val_generator,
            use_multiprocessing=True,
            workers=8)

In [None]:
#--------EDIT THIS CELL------------

# Write code to plot learning curves
# for both training and validation (accuracy and loss)

# Create the Test dataset

In [None]:
model.load_weights('best_model_lesson2.h5')
test_dataset.prepare()
print(len(test_dataset.image_ids))

In [None]:
import numpy as np
#from skimage.transform import resize
from PIL import Image

for image_id in test_dataset.image_ids[80:100]:

    image = mdai.visualize.load_dicom_image(image_id, to_RGB=True)
    image = Image.fromarray(image)
    image = image.resize((img_width, img_height))

    x = np.expand_dims(image, axis=0)
    y_prob = model.predict(x)
    y_classes = y_prob.argmax(axis=-1)

    title = 'Pred: ' + test_dataset.class_id_to_class_text(y_classes[0]) + ', Prob:' + str(round(y_prob[0][y_classes[0]], 3))

    plt.figure()
    plt.title(title)
    plt.imshow(image)
    plt.axis('off')

plt.show()

### Success!!!

Feel free to continue working further on in this notebook to:

- Develop new models
- Work on ways to evaluate models
- etc. etc. etc.