<a href="https://colab.research.google.com/github/balling/ml-hackpack/blob/master/Chest_X_Ray.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A bit of setup

Make sure you run the two cells below!

In [0]:
#@title Import the necessary libraries
import os
import numpy as np
import pandas as pd
from skimage.io import imread 
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.utils import np_utils
from keras.datasets import mnist
from keras.utils import plot_model
from numpy.random import seed
from tensorflow import set_random_seed
from IPython.core.display import Image, display

In [0]:
#@title Download the images
!git clone https://github.com/balling/ml-hackpack.git

# Expore Chest X-Ray Dataset
Choose an X-ray file from the dropdown, or pick a row from [here](https://github.com/balling/ml-hackpack/blob/master/data/train-labels.csv)

In [0]:
xray_file = "00000013_033.png" #@param ["00000013_004.png", "00000006_000.png", "00000009_000.png", "00000013_011.png", "00000013_039.png", "00000013_043.png", "00000013_033.png"] {allow-input: true}
image = imread('ml-hackpack/data/train/%s'% xray_file, True)
display(Image(url= 'https://github.com/balling/ml-hackpack/blob/master/data/train/%s?raw=true' % xray_file, width=300))
np.set_printoptions(edgeitems=10, linewidth=200)
print("The computer sees a %d x %d array:" % image.shape)
print(image)

### ![question](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d9/Icon-round-Question_mark.svg/400px-Icon-round-Question_mark.svg.png =30x) The numbers at each cell range from 0 to 255.  Can you guess what does 0 and 255 represent respectively?

# Let's start building the model!

### The following code reads in the images for training and testing:

In [0]:
NUM_IMG = 100

def get_training_data(train_path, labels_path):
	train_images = []
	train_files = []
	for filename in os.listdir(train_path):
		if filename.endswith(".png"):
			train_files.append(train_path + filename)

	features = []
		
	for i, train_file in enumerate(train_files):
			if i >= NUM_IMG: break
			train_image = imread(train_file, True)
			feature_set = np.asarray(train_image)
			features.append(feature_set)

	labels_df = pd.read_csv(labels_path) #["Finding Labels"]
	labels_df = labels_df["Finding Labels"]
	labels = np.zeros(NUM_IMG) # 0 for no finding, 1 for finding.

	# loading all labels
	for i in range(NUM_IMG):
		if (labels_df[i] == 'No Finding'):
			labels[i] = 0
		else:
			labels[i] = 1
	images = np.expand_dims(np.array(features), axis=3).astype('float32') / 255 # adding single channel
	return images, labels
	
X_train, y_train = get_training_data("ml-hackpack/data/train/", "ml-hackpack/data/train-labels.csv")
X_test, y_test = get_training_data("ml-hackpack/data/test/", "ml-hackpack/data/test-labels.csv")

### Now construct our neural network


---



`model = Sequential()` means we are going to stack the layers (added below) one by one just like pancakes:

![stack](https://live.staticflickr.com/7265/7548486620_1f0c65a58e_q_d.jpg)

(image credit: [Crave Malay Mail](https://www.flickr.com/photos/cravemmail/7548486620/in/photostream/))

---





In [0]:
# Some tricks to make sure each run is predictable for demonstration purpose
seed(42)
set_random_seed(42)

# We are going to just stack layers on top one by one
model = Sequential()

# Add 2 dimensional convolution layer
model.add(Conv2D(4, (3, 3), strides=(2,2), activation='relu', input_shape=(1024, 1024, 1), data_format='channels_last'))

# Add 2 dimensional Max pool
model.add(MaxPooling2D(pool_size=(2,2)))

# Add Dropout with 5% probability
model.add(Dropout(rate=0.05))

# Add another 2 dimensional convolution layer
model.add(Conv2D(4, (3, 3), strides=(2,2), activation='relu'))

# Reshape the output and add two linear 
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 
# Start training the model with our training data set
model.fit(X_train, y_train, batch_size=8, epochs=8, verbose=1)

# Evaluate our model with new unseen data
score = model.evaluate(X_test, y_test, verbose=0)
print("The model diagnosed %d%% of the x-rays correctly!" % (score[1]*100))

# Taking a closer look into what our model is predicting

Choose an X-ray file from the dropdown, or pick a row from [here](https://github.com/balling/ml-hackpack/blob/master/data/test-labels.csv)

In [0]:
test_file = "00000023_001.png" #@param ["00000023_002.png", "00000023_001.png", "00000023_004.png", "00000031_000.png", "00000030_000.png", "00000032_008.png", "00000032_009.png"]
labels_df = pd.read_csv("ml-hackpack/data/test-labels.csv")
image = imread('ml-hackpack/data/test/%s'% test_file, True)
display(Image(url= 'https://github.com/balling/ml-hackpack/blob/master/data/test/%s?raw=true' % test_file, width=300))
predictions = model.predict(np.expand_dims(np.expand_dims(image, axis=0), axis=3))
expert=labels_df.loc[labels_df['Image Index'] == test_file, 'Finding Labels'].values[0]
print('Our model thinks there is %.f%% chance that this person has lung diseases, whereas human expert diagnosis is %s' % (predictions[0]*100, expert))

### ![question](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d9/Icon-round-Question_mark.svg/400px-Icon-round-Question_mark.svg.png =30x) Do you think this is a good model for lung disease diagnosis?  Why or why not?