# SIIM: Step-by-Step Image Detection for Beginners 
## Part 2. Basic Modeling - Simplest Image Classification Models using Keras

👉 Part 1. [EDA to Preprocessing](https://www.kaggle.com/songseungwon/siim-covid-19-detection-10-step-tutorial-1)

👉 Mini Part. [Preprocessing for Multi-Output Regression that Detect Opacities](https://www.kaggle.com/songseungwon/siim-covid-19-detection-mini-part-preprocess)

### Thanks for nice reference :

`load dataset(original image size info-)`
- [Resized to 256px JPG](https://www.kaggle.com/xhlulu/siim-covid19-resized-to-256px-jpg)

> Index
```
Step 1. Load Data and Trim for use
     1-a. load train-dataframe
     1-b. load meta-dataframe
     1-c. load image data array
     1-d. calculate image resize ratio information
Step 2. Image Pre-Classification with Data generator
     2-a. classify image id by opacity types
     2-b. sort image files into each type's folder
     2-c. data generation, split train/valid set
Step 3. Modeling I - Basic Multiclass classifier
     3-a. import libraries
     3-b. basic modeling with keras api
     3-c. model compile
     3-d. save model checkpoint
     3-e. model fit
     3-f. model evaluate & save
     3-g. reload model & model summary
Step 4. Modeling II - Multiclass classifier using EfficientNet(Transfer Learning)
     4-a. Load the EfficientNet and try it out
     4-b. Improving performance with an appropriate form
```

In [None]:
#!pip install -U tensorflow==2.5.0
#import tensorflow as tf
#print(tf.__version__)

## Step 1. Load Data and Trim for use

**trainデータは学習に使うデータ、重みの更新に使われる**

**Validationデータはハイパーパラメータのチューニング や Early Stopping(学習の早期打ち切り)に使うデータ**

**testデータは学習時には使わない、精度検証に用いるデータ**

https://serokell.io/blog/machine-learning-testing

### 1-a. load train-dataframe

In [None]:
import pandas as pd

In [None]:
# train_df = pd.read_csv('/kaggle/input/siimcovid19-train-data-that-opacitycount-added/train_df.csv')
# local
train_df = pd.read_csv('/kaggle/input/siimcovid19-train-data-that-opacitycount-added/train_df.csv')

In [None]:
train_df.head()

We don't use dcm file. drop 'path' column

In [None]:
train_df.drop(columns='Path', axis=1,inplace=True)
#Pathがついている列(列はaxis=1)を削除する

In [None]:
train_df.head()

And add 'Opacity' Column. The Value is 1 If Opacity detected, else 0

In [None]:
train_df['Opacity'] = train_df.apply(lambda row : 1 if row.label.split(' ')[0]=='opacity' else 0, axis=1)
train_df

In [None]:
train_df.drop(columns=['Unnamed: 0'], inplace=True)
train_df

### 1-b. load meta-dataframe

We need the size of the individual images. This is necessary later to calculate the ratio and find the coordinates of the box border to detect the opacity.

In [None]:
meta_df = pd.read_csv('/kaggle/input/siim-covid19-resized-to-256px-jpg/meta.csv')

In [None]:
meta_df.head()

- Y(height) : `dim0` 
- X(width) : `dim1`


In [None]:
meta_df.info()

In [None]:
meta_df.split

In [None]:
meta_df.split.unique()

In [None]:
import warnings
warnings.filterwarnings(action='ignore')

In [None]:
train_meta_df = meta_df.loc[meta_df.split=='train']
train_meta_df.drop('split',axis=1,inplace=True)
#spilit列を削除
train_meta_df.columns = ['id', 'origin_img_height','origin_img_width']
train_meta_df.info()

In [None]:
train_meta_df

In [None]:
train_df.head()

In [None]:
# test lambda、idを_で分割し、先頭を取り出す
train_df['id'].apply(lambda x : x.split('_')[0])

In [None]:
train_df['id'] = train_df['id'].apply(lambda x : x.split('_')[0])

In [None]:
train_df.head()

In [None]:
train_df = pd.merge(train_df, train_meta_df, on='id')
#pd.mergeは結合

In [None]:
train_df.head()

### 1-c. load image data array

In [None]:
path = '/kaggle/input/siim-covid19-resized-to-256px-jpg/train/'
train_imgs_path = list(train_df['id'].apply(lambda x : path + x + '.jpg').values)
#lambaは無名関数、lambda 引数: 返り値
train_imgs_path[:10]

Test sample image

In [None]:
import matplotlib.pyplot as plt

In [None]:
img = plt.imread(train_imgs_path[0])

In [None]:
img.shape

In [None]:
plt.imshow(img, cmap='gray')

In [None]:
import numpy as np

In [None]:
i = 0
train_imgs = []
for img_path in train_imgs_path:
    img = plt.imread(img_path)
    train_imgs.append(img)#appendは末尾に要素追加
    i += 1
    if i % 1000 == 0:
        print('{} / {}'.format(i, len(train_imgs_path)))
    elif i == 6334:
        print('6334 / 6334 (End)')

In [None]:
type(train_imgs)

In [None]:
train_imgs = np.array(train_imgs)

In [None]:
train_imgs.shape

**add Channel (3dim to 4dim, gray)**

In [None]:
train_imgs_path[0]

In [None]:
train_imgs[:,:,:,np.newaxis].shape
#newaxisは新しいサイズ1の次元を追加

In [None]:
train_imgs_4dim = train_imgs[:,:,:,np.newaxis]
train_imgs_4dim.shape

**And simply EDA**

https://toukei-lab.com/eda
EDA、探索的データ解析のこと。データ理解をする過程

In [None]:
len(train_imgs)

In [None]:
min(train_imgs[0].reshape(-1)), max(train_imgs[0].reshape(-1))
#.reshape(-1)は行ベクトルを返す

In [None]:
min(train_imgs[13].reshape(-1)), max(train_imgs[13].reshape(-1))

### 1-d. calculate image resize ratio information

In [None]:
train_df['origin_img_height']

In [None]:
train_df['height_ratio'] = train_df['origin_img_height'].apply(lambda x : 255/x)
train_df['height_ratio']

In [None]:
train_df['origin_img_width']

In [None]:
train_df['width_ratio'] = train_df['origin_img_width'].apply(lambda x : 255/x)
train_df['width_ratio']

In [None]:
train_df
#これで学習に使うデータが用意できた

## Step 2. Image Pre-Classification with Data generator

### 2-a. classify image id by Opacity types

In [None]:
types = list(train_df.columns[5:9])#6列目から10列目の名前リスト
types

In [None]:
path

In [None]:
train_imgs.shape

### 2-b. sort image files into each type's folder

Create folders for each class **in advance**, and save images in each folder.

In [None]:
!mkdir ./genData
!mkdir ./genData/Negative
!mkdir ./genData/Typical
!mkdir ./genData/Indeterminate
!mkdir ./genData/Atypical

In [None]:
# Negative for Pneumonia
imgs_Negative = list(train_df[train_df[types[0]]==1].index)
for idx in imgs_Negative:
    plt.imsave('./genData/Negative/{}.jpg'.format(train_df.loc[idx,'id']), train_imgs[idx], cmap='gray')

In [None]:
# Typical Apperance
imgs_Typical = list(train_df[train_df[types[1]]==1].index)
for idx in imgs_Typical:
    plt.imsave('./genData/Typical/{}.jpg'.format(train_df.loc[idx,'id']), train_imgs[idx], cmap='gray')

In [None]:
# Indeterminate Apearance
imgs_Indeterminate = list(train_df[train_df[types[2]]==1].index)
for idx in imgs_Indeterminate:
    plt.imsave('./genData/Indeterminate/{}.jpg'.format(train_df.loc[idx,'id']), train_imgs[idx], cmap='gray')

In [None]:
# Atypical Apearance
imgs_Atypical = list(train_df[train_df[types[3]]==1].index)
for idx in imgs_Atypical:
    plt.imsave('./genData/Atypical/{}.jpg'.format(train_df.loc[idx,'id']), train_imgs[idx], cmap='gray')

### 2-c. data generation, split train/valid set

https://keras.io/ja/preprocessing/image/

画像の前処理、なんでこんなことをするのかは…、、知らん！


＞http://wild-data-chase.com/index.php/2019/02/04/post-370/

学習に用いる画像を拡張（バリエーションを増やす）してる、validation data を作るための過程、と言い換えてもいい

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [None]:
idg = ImageDataGenerator(
    rescale=1. / 255,
    rotation_range=3,
    width_shift_range=0.05,
    height_shift_range=0.05,
    zoom_range=0.05,
    horizontal_flip=False,
    fill_mode='reflect',
    validation_split=0.2
)

In [None]:
data_path = './genData'
batch_size = 64
target_size = (256, 256)
class_mode = 'categorical'
color_mode = 'grayscale'

In [None]:
train_gen = idg.flow_from_directory(
    data_path,
    batch_size=batch_size,
    target_size=target_size,
    class_mode=class_mode,
    color_mode=color_mode,
    subset = 'training'
)

valid_gen = idg.flow_from_directory(
    data_path,
    batch_size = batch_size,
    target_size = target_size,
    class_mode = class_mode,
    color_mode=color_mode,
    subset = 'validation'
)

## Step 3. Modeling 

### 3-a. import libraries

参考
https://note.nkmk.me/python-tensorflow-keras-basics/

In [None]:
import os
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dropout, Dense
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.callbacks import EarlyStopping

***3-b. basic modeling with keras api***

In [None]:
def basic_cnn_model():
    # create model
    model = Sequential()
    model.add(Conv2D(64, (3, 3), input_shape=(256, 256, 1), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(128, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dense(128, activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Flatten())
    model.add(Dropout(0.5))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(4, activation='softmax'))
    # Compile model
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
    return model

In [None]:
model = basic_cnn_model()

**3-d.save model checkpoint**

In [None]:
checkpoint_path = 'my_checkpoint.ckpt'
checkpoint_dir = os.path.dirname(checkpoint_path)

cp_callback = ModelCheckpoint(
    filepath = checkpoint_path,
    save_weights_only = True,
    save_best_only = True,
    monitor = 'val_loss',
    verbose=1
)

**EaelyStoppingの設定**

In [None]:
# EaelyStoppingの設定
early_stopping =  EarlyStopping(
                            monitor='val_loss',
                            min_delta=0.0,
                            patience=2,)

**3-e.model fit**　

訓練の実行

In [None]:
epochs = 1 # just for test
model.fit(
    train_gen,
    validation_data = (valid_gen),
    epochs = epochs,
    callbacks=[cp_callback, early_stopping]
)

In [None]:
epochs = 20 #20までで、earlystopping設定済み
model.fit(
    train_gen,
    validation_data = (valid_gen),
    epochs = epochs,
    callbacks=[cp_callback, early_stopping]
)

**3-f. model evaluate & save**

In [None]:
model.load_weights(checkpoint_path)

In [None]:
model.evaluate(valid_gen)

In [None]:
model.save('./model/basic_cnn.h5')

**3-g. reload model & model summary**

In [None]:
import tensorflow as tf

mymodel = tf.keras.models.load_model('./model/basic_cnn.h5')

mymodel.summary()

tf.keras.utils.plot_model(mymodel, "mymodel.png", show_shapes=True)

In [None]:
results = mymodel.evaluate(valid_gen)

In [None]:
print('正解率=', results[1], 'loss=', results[0])

# Test

testデータ用意

In [None]:
test_meta_df = meta_df.loc[meta_df.split=='test']
test_meta_df.drop('split',axis=1,inplace=True) #spilit列を削除
test_meta_df.drop('dim1',axis=1,inplace=True) #dim1列を削除
test_meta_df.drop('dim0',axis=1,inplace=True) #dim0列を削除
test_meta_df.columns = ['id']
test_meta_df.info()

In [None]:
test_meta_df

In [None]:
test_path = '/kaggle/input/siim-covid19-resized-to-256px-jpg/test/' # absolute path
test_imgs_path = list(test_meta_df['id'].apply(lambda x : test_path + x + '.jpg').values)
test_imgs_path[:10]

In [None]:
img = plt.imread(test_imgs_path[0])
img.shape
plt.imshow(img, cmap='gray')

In [None]:
i = 0
test_imgs = []
for test_img_path in test_imgs_path:
    img = plt.imread(test_img_path)
    test_imgs.append(img)#appendは末尾に要素追加
    i += 1
    if i % 100 == 0:
        print('{} / {}'.format(i, len(test_imgs_path)))
    elif i == 1263:
        print('1263 /1263  (End)')

In [None]:
type(test_imgs)

In [None]:
test_imgs = np.array(test_imgs)
test_imgs.shape

In [None]:
test_imgs_path[0]

In [None]:
test_imgs[:,:,:,np.newaxis].shape
#newaxisは新しいサイズ1の次元を追加

In [None]:
test_imgs_4dim = test_imgs[:,:,:,np.newaxis]
test_imgs_4dim.shape

In [None]:
"""
!mkdir ./test_genData

test_data_gen = ImageDataGenerator(rescale=1./255)
test_generator = test_data_gen.flow_from_directory(
    directory='../input/siim-covid19-resized-to-256px-jpg/test',
    target_size=(256, 256), 
    color_mode='grayscale', 
    classes=None, 
    class_mode='categorical', 
    batch_size=64, 
    shuffle=False, 
    seed=None, 
    save_to_dir='./test_genData', 
    save_prefix='', 
    save_format='jpg', 
    follow_links=False, )
"""

In [None]:
my_predictions=mymodel.predict(test_imgs_4dim)
my_predictions

In [None]:
len(my_predictions)

In [None]:
df2 = pd.DataFrame(my_predictions, columns=['0','1','2','3'])
new_df2 = test_meta_df.join(df2)
new_df2

In [None]:
"""
opacity_list = ['opacity'] * 1263
results_list=[results[1]] * 1263
str(results_list)
df4 = pd.DataFrame(np.array(opacity_list))
df5 = pd.DataFrame(np.array(results_list))
"""

In [None]:
"""
new_df2 = df2[0]+ ""+df2[1]+""+df2[2]+""+df2[3]
"""

In [None]:
"""
df1 = pd.DataFrame(test_meta_df)
df3 = df1.join(new_df2)
df3.to_csv('my_predictions.csv')
pd.read_csv('my_predictions.csv')
"""

In [None]:
"""
results = predictions.argmax(axis=1)
print(results)
"""

In [None]:
# Read the submisison file
import pandas as pd
sub_df = pd.read_csv('/kaggle/input/siim-covid19-detection/sample_submission.csv')
sub_df

In [None]:
!pip install tqdm
from tqdm import tqdm 

In [None]:
# Prediction loop for submission
predictions = []

for i in range(len(sub_df)):
    row = sub_df.loc[i]
    id_name = row.id.split('_')[0]
    id_level = row.id.split('_')[-1]
    
    if id_level == 'study':
        # do study-level classification
        predictions.append("Negative 1 0 0 1 1") # dummy prediction
        
    elif id_level == 'image':
        if id_name in new_df2['id']:
            predictions.append("Opacity" + "" + results[1] + "" + new_df2[i,1:4]) 
        else:
            predictions.append("None 1 0 0 1 1")

In [None]:
import os
os.getcwd()
os.chdir('./working')
os.getcwd()

In [None]:
sub_df['PredictionString'] = predictions
sub_df.to_csv('submission.csv', index=False)
sub_df

### Info. Efficient Net V2


https://colab.research.google.com/github/google/automl/blob/master/efficientnetv2/tfhub.ipynb#scrollTo=E32RGKBEWq76

https://qiita.com/T-STAR/items/a04b559421ef20a970ec

https://github.com/lukemelas/EfficientNet-PyTorch

フライングゲット
https://qiita.com/kitfactory/items/4024dcdbd1034d15927b

Tutorial
https://colab.research.google.com/github/google/automl/blob/master/efficientnetv2/tutorial.ipynb#scrollTo=U2oz3r1LUDzr

**Install package and download source code/image.**