# Clinical Heart Failure Detection Using Whole-Slide Images of H&E tissue

Version 0.01

#### References - Data Preparation
- Reading an image
  - mathplotlib: https://stackoverflow.com/questions/9298665/cannot-import-scipy-misc-imread
  - pathlib: https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f#:~:text=To%20use%20it%2C%20you%20just,for%20the%20current%20operating%20system.
  - OpenCV: https://www.geeksforgeeks.org/python-opencv-cv2-imread-method/
- Load multiple images into a numpy array
  - glob / os.listdir: https://stackoverflow.com/questions/39195113/how-to-load-multiple-images-in-a-numpy-array
  - glob / cv2: https://medium.com/@muskulpesent/create-numpy-array-of-images-fecb4e514c4b

## Data Preparation

### Understand folder structure and number of images available

**Training/Validation**
- \..\training\fold_1: has images for training = 770#
- \..\training\test_fold_1: has images for validation = 374#
- Total = 770 + 374 = 1144 images

**Test**
- \..\held-out_validation: has images for testing = 1155#

### Load libraries to aid in converting images to arrays

We will convert the images to arrays so that we can then use them to feed to our CNN model.

In [7]:
# install OpenCV package
# pip install opencv-python

In [8]:
import cv2

In [9]:
import glob

In [10]:
import numpy as np

### Convert Train Images to Array

In [11]:
# read all the filenames with extension as 'png' into the filelist
filelist_train = glob.glob('training/fold_1/*.png')

In [12]:
# confirm you have got the total number desired files in the list
len(filelist_train)

770

In [13]:
# read 1st file in the list
img = cv2.imread(filelist_train[0])

In [14]:
# read multiple files into an array
train = []
for file in filelist_train:
    img = cv2.imread(file)
    train.append(img)

In [15]:
# confirm you have got the total number desired images in the array
len(train)

770

In [16]:
# train is a list
type(train)

list

In [17]:
# convert list to a numpy array and the values to float
train = np.array(train, dtype = 'float32')

In [18]:
# check the shape to confirm it is ready for CNN
# number of instances, width, height, number of channels
# number of instances = number of image
# number of channels = 3 ... as these are color images
train.shape

(770, 250, 250, 3)

### Convert Validation Images to Array

In [19]:
# read all the filenames with extension as 'png' into the filelist
filelist_validation = glob.glob('training/test_fold_1/*.png')

In [20]:
# read multiple files into an array
validation = []
for file in filelist_validation:
    img = cv2.imread(file)
    validation.append(img)

In [21]:
# convert list to a numpy array and the values to float
validation = np.array(validation, dtype = 'float32')

In [22]:
# check the shape to confirm it is ready for CNN
validation.shape

(374, 250, 250, 3)

### Convert Test Images to Array

In [23]:
# read all the filenames with extension as 'png' into the filelist
filelist_test = glob.glob('held-out_validation/*.png')

In [24]:
# read multiple files into an array
test = []
for file in filelist_test:
    img = cv2.imread(file)
    test.append(img)

In [25]:
# convert list to a numpy array and the values to float
test = np.array(test, dtype = 'float32')

In [26]:
# check the shape to confirm it is ready for CNN
test.shape

(1155, 250, 250, 3)