# SIIM: Step-by-Step Image Detection for Beginners 
## Part 2. Basic Modeling - Simplest Image Classification Models using Keras

👉 Part 1. [EDA to Preprocessing](https://www.kaggle.com/songseungwon/siim-covid-19-detection-10-step-tutorial-1)

👉 Mini Part. [Preprocessing for Multi-Output Regression that Detect Opacities](https://www.kaggle.com/songseungwon/siim-covid-19-detection-mini-part-preprocess)

### Thanks for nice reference :

`load dataset(original image size info-)`
- [Resized to 256px JPG](https://www.kaggle.com/xhlulu/siim-covid19-resized-to-256px-jpg)

> Index
```
Step 1. Load Data and Trim for use
     1-a. load train-dataframe
     1-b. load meta-dataframe
     1-c. load image data array
     1-d. calculate image resize ratio information
Step 2. Image Pre-Classification with Data generator
     2-a. classify image id by opacity types
     2-b. sort image files into each type's folder
     2-c. data generation, split train/valid set
Step 3. Modeling I - Basic Multiclass classifier
     3-a. import libraries
     3-b. basic modeling with keras api
     3-c. model compile
     3-d. save model checkpoint
     3-e. model fit
     3-f. model evaluate & save
     3-g. reload model & model summary
Step 4. Modeling II - Multiclass classifier using EfficientNet(Transfer Learning)
     4-a. Load the EfficientNet and try it out
     4-b. Improving performance with an appropriate form
```

## Step 1. Load Data and Trim for use

### 1-a. load train-dataframe

In [None]:
import pandas as pd

In [None]:
# train_df = pd.read_csv('/kaggle/input/siimcovid19-train-data-that-opacitycount-added/train_df.csv')
# local
train_df = pd.read_csv('/kaggle/input/siimcovid19-train-data-that-opacitycount-added/train_df.csv')

In [None]:
train_df.head()

We don't use dcm file. drop 'path' column

In [None]:
train_df.drop(columns='Path', axis=1,inplace=True)

In [None]:
train_df.head()

And add 'Opacity' Column. The Value is 1 If Opacity detected, else 0

In [None]:
train_df['Opacity'] = train_df.apply(lambda row : 1 if row.label.split(' ')[0]=='opacity' else 0, axis=1)
train_df

In [None]:
train_df.drop(columns=['Unnamed: 0'], inplace=True)
train_df

### 1-b. load meta-dataframe

We need the size of the individual images. This is necessary later to calculate the ratio and find the coordinates of the box border to detect the opacity.

In [None]:
meta_df = pd.read_csv('/kaggle/input/siim-covid19-resized-to-256px-jpg/meta.csv')

In [None]:
meta_df.head()

- Y(height) : `dim0` 
- X(width) : `dim1`


In [None]:
meta_df.info()

In [None]:
meta_df.split.unique()

In [None]:
import warnings
warnings.filterwarnings(action='ignore')

In [None]:
train_meta_df = meta_df.loc[meta_df.split=='train']
train_meta_df.drop('split',axis=1,inplace=True)
train_meta_df.columns = ['id', 'origin_img_height','origin_img_width']
train_meta_df.info()

In [None]:
train_meta_df

In [None]:
train_df.head()

In [None]:
# test lambda
train_df['id'].apply(lambda x : x.split('_')[0])


In [None]:
train_df['id'] = train_df['id'].apply(lambda x : x.split('_')[0])

In [None]:
train_df.head()

In [None]:
train_df = pd.merge(train_df, train_meta_df, on='id')

In [None]:
train_df.head()

### 1-c. load image data array

In [None]:
path = '/kaggle/input/siim-covid19-resized-to-256px-jpg/train/'
train_imgs_path = list(train_df['id'].apply(lambda x : path + x + '.jpg').values)
train_imgs_path[:10]

Test sample image

In [None]:
import matplotlib.pyplot as plt

In [None]:
img = plt.imread(train_imgs_path[0])

In [None]:
img.shape

In [None]:
plt.imshow(img, cmap='gray');

In [None]:
import numpy as np

In [None]:
i = 0
train_imgs = []
for img_path in train_imgs_path:
    img = plt.imread(img_path)
    train_imgs.append(img)
    i += 1
    if i % 1000 == 0:
        print('{} / {}'.format(i, len(train_imgs_path)))
    elif i == 6334:
        print('6334 / 6334 (End)')

In [None]:
type(train_imgs)

In [None]:
train_imgs = np.array(train_imgs)

In [None]:
train_imgs.shape

add Channel (3dim to 4dim, gray)

In [None]:
train_imgs_path[0]

In [None]:
train_imgs[:,:,:,np.newaxis].shape

In [None]:
train_imgs_4dim = train_imgs[:,:,:,np.newaxis]
train_imgs_4dim.shape

And simply EDA

In [None]:
len(train_imgs)

In [None]:
min(train_imgs[0].reshape(-1)), max(train_imgs[0].reshape(-1))

In [None]:
min(train_imgs[13].reshape(-1)), max(train_imgs[13].reshape(-1))

### 1-d. calculate image resize ratio information

In [None]:
train_df['origin_img_height']

In [None]:
train_df['height_ratio'] = train_df['origin_img_height'].apply(lambda x : 255/x)
train_df['height_ratio']

In [None]:
train_df['origin_img_width']

In [None]:
train_df['width_ratio'] = train_df['origin_img_width'].apply(lambda x : 255/x)
train_df['width_ratio']

In [None]:
train_df

## Step 2. Image Pre-Classification with Data generator

### 2-a. classify image id by Opacity types

In [None]:
types = list(train_df.columns[5:9])
types

In [None]:
path

In [None]:
train_imgs.shape

### 2-b. sort image files into each type's folder

Create folders for each class **in advance**, and save images in each folder.

In [None]:
!mkdir ./genData
!mkdir ./genData/Negative
!mkdir ./genData/Typical
!mkdir ./genData/Indeterminate
!mkdir ./genData/Atypical

In [None]:
# Negative for Pneumonia
imgs_Negative = list(train_df[train_df[types[0]]==1].index)
for idx in imgs_Negative:
    plt.imsave('./genData/Negative/{}.jpg'.format(train_df.loc[idx,'id']), train_imgs[idx], cmap='gray')

In [None]:
# Typical Apperance
imgs_Typical = list(train_df[train_df[types[1]]==1].index)
for idx in imgs_Typical:
    plt.imsave('./genData/Typical/{}.jpg'.format(train_df.loc[idx,'id']), train_imgs[idx], cmap='gray')

In [None]:
# Indeterminate Apearance
imgs_Indeterminate = list(train_df[train_df[types[2]]==1].index)
for idx in imgs_Indeterminate:
    plt.imsave('./genData/Indeterminate/{}.jpg'.format(train_df.loc[idx,'id']), train_imgs[idx], cmap='gray')

In [None]:
# Atypical Apearance
imgs_Atypical = list(train_df[train_df[types[3]]==1].index)
for idx in imgs_Atypical:
    plt.imsave('./genData/Atypical/{}.jpg'.format(train_df.loc[idx,'id']), train_imgs[idx], cmap='gray')

### 2-c. data generation, split train/valid set

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [None]:
idg = ImageDataGenerator(
    rescale=1. / 255,
    rotation_range=3,
    width_shift_range=0.05,
    height_shift_range=0.05,
    zoom_range=0.05,
    horizontal_flip=False,
    fill_mode='reflect',
    validation_split=0.2
)

In [None]:
data_path = './genData'
batch_size = 64
target_size = (256, 256)
class_mode = 'categorical'
color_mode = 'grayscale'

In [None]:
train_gen = idg.flow_from_directory(
    data_path,
    batch_size=batch_size,
    target_size=target_size,
    class_mode=class_mode,
    color_mode=color_mode,
    subset = 'training'
)

valid_gen = idg.flow_from_directory(
    data_path,
    batch_size = batch_size,
    target_size = target_size,
    class_mode = class_mode,
    color_mode=color_mode,
    subset = 'validation'
)

## Step 3. Modeling I - Basic Multiclass classifier

### 3-a. import libraries

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dropout, Dense
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import ModelCheckpoint

### 3-b. basic modeling with keras api

In [None]:
model = Sequential([
    Conv2D(64, (3,3), activation='relu', input_shape=(256, 256,1)),
    MaxPooling2D(2,2),
    Conv2D(64, (3,3), activation='relu'),
    MaxPooling2D(2,2),
    Conv2D(128, (3,3), activation='relu'),
    MaxPooling2D(2,2),
    Conv2D(128, (3,3), activation='relu'),
    MaxPooling2D(2,2),
    Flatten(),
    Dropout(0.5),
    Dense(128, activation='relu'),
    Dense(4, activation='softmax')
])
model.summary() 

### 3-c. model compile

In [None]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])

### 3-d. save model checkpoint

In [None]:
filepath = 'my_checkpoint.ckpt'
cp = ModelCheckpoint(
    filepath = filepath,
    save_weights_only = True,
    save_best_only = True,
    monitor = 'val_loss',
    verbose=1
)

### 3-e. model fit

In [None]:
epochs = 1 # just for test
model.fit(
    train_gen,
    validation_data = (valid_gen),
    epochs = epochs,
    callbacks=[cp]
)

### 3-f. model evaluate & save

In [None]:
model.load_weights(filepath)

In [None]:
model.evaluate(valid_gen)

In [None]:
model.save('./model/basic_cnn.h5')

### 3-g. reload model & model summary

In [None]:
import tensorflow as tf

In [None]:
mymodel = tf.keras.models.load_model('./model/basic_cnn.h5')

In [None]:
mymodel.summary()

## Step 4. Modeling II - Multiclass classifier using EfficientNet(Transfer Learning)

### 4-a. Load the EfficientNet and try it out

In [None]:
from tensorflow.keras.applications import EfficientNetB0

In [None]:
efc = EfficientNetB0(weights='imagenet', include_top=False, input_shape=(256,256,3))
efc.trainable=False

In [None]:
model = Sequential([
    efc,
    Flatten(),
    Dropout(0.5),
    Dense(256, activation='relu'),
    Dense(4, activation='softmax')
])

In [None]:
model.summary()

In [None]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])

In [None]:
filepath = 'my_checkpoint_efc.ckpt'
cp = ModelCheckpoint(
    filepath = filepath,
    save_weights_only = True,
    save_best_only = True,
    monitor = 'val_loss',
    verbose=1
)

In [None]:
epochs=1
model.fit(
    train_gen,
    validation_data=(valid_gen),
    epochs=epochs,
    callbacks=[cp]
)

In [None]:
model.load_weights(filepath)

In [None]:
model.evaluate(valid_gen)

The performance is not very different from the basic cnn model.


In fact, efficientnet (which is precisely efficientnetB0) is designed according to the image size (224,224), and the input data range should be 0~255. That is, pure data that has not been normalized must pass through the model. normalize is included in the model itself

document : [Image classification via fine-tuning with EfficientNet](https://keras.io/examples/vision/image_classification_efficientnet_fine_tuning/)

Let's use the model as recommended in the official documentation.

### 4-b.  Improving performance with an appropriate form

In [None]:
idg = ImageDataGenerator(
    # rescale False
    rotation_range=3,
    width_shift_range=0.05,
    height_shift_range=0.05,
    zoom_range=0.05,
    horizontal_flip=False,
    fill_mode='reflect',
    validation_split=0.2
)

In [None]:
data_path = './genData'
batch_size = 64
target_size = (224, 224)
class_mode = 'categorical'
color_mode = 'grayscale'

In [None]:
train_gen = idg.flow_from_directory(
    data_path,
    batch_size=batch_size,
    target_size=target_size,
    class_mode=class_mode,
    color_mode=color_mode,
    subset = 'training'
)

valid_gen = idg.flow_from_directory(
    data_path,
    batch_size = batch_size,
    target_size = target_size,
    class_mode = class_mode,
    color_mode=color_mode,
    subset = 'validation'
)

In [None]:
efc = EfficientNetB0(weights='imagenet',
                     include_top=False, 
                     input_shape=(224,224,3),
                     drop_connect_rate=0.4)
efc.trainable=False

In [None]:
model = Sequential([
    efc,
    Flatten(),
    Dropout(0.5),
    Dense(256, activation='relu'),
    Dense(4, activation='softmax')
])

In [None]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])

In [None]:
filepath = 'my_checkpoint_efc_224.ckpt'
cp = ModelCheckpoint(
    filepath = filepath,
    save_weights_only = True,
    save_best_only = True,
    monitor = 'val_loss',
    verbose=1
)

In [None]:
epochs=1
model.fit(
    train_gen,
    validation_data=(valid_gen),
    epochs=epochs,
    callbacks=[cp]
)

In [None]:
model.load_weights(filepath)

In [None]:
model.evaluate(valid_gen)


In the first epoch, the accuracy increased noticeably (approximately 13%). If model learn iteratively, we can expect the difference in performance to become larger.

In this kernel, I made the simplest model with minimal coding. And now, Try to create model with better performance than this! with more complex models and more effective data!