# Baseline Model

## Table of Contents
1. [Model Choice](#model-choice)
2. [Feature Selection](#feature-selection)
3. [Implementation](#implementation)
4. [Evaluation](#evaluation)


In [None]:
from tensorflow.keras.applications import ResNet152V2
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model
from tensorflow.keras import preprocessing
import numpy as np

BATCHSIZE = 32
EPOCHS = 1


## Model Choice

At the basis of my idea is a simple image detection problem, so I chose an already trained and established model for this task as baseline: ResNet152V2.

As baseline I just use the ResNet model without its top layer and just add an appropriate one for the task with 4 nodes. Just one epoch to train the weights for the second to the last layer.


## Feature Selection

The Dataset I am using offers only 4 categories of data:
- Eosinophil
- Lymphocyte
- Monocyte
- Neutrophil

I will use all 4 categories.


In [None]:
# use keras.preprocessing.image_dataset_from_directory to load images from ./TRAIN, split 80/20 for testing
train_data = preprocessing.image_dataset_from_directory(
    '../1_DatasetCharacteristics/train/',
    validation_split=0.2,
    subset='training',
    seed=123,
)

test_data = preprocessing.image_dataset_from_directory(
    '../1_DatasetCharacteristics/train/',
    validation_split=0.2,
    subset='validation',
    seed=123
)


## Implementation

Implementation of the base ResNet152 Model, with minimal changes for the specified task. 

In [None]:
# load the ResNet152V2 model
base_model = ResNet152V2(weights='imagenet', include_top=False)

# add new top layers
x = base_model.output
x = GlobalAveragePooling2D()(x)
predictions = Dense(4, activation='softmax')(x)

# create the new model
model = Model(inputs=base_model.input, outputs=predictions)

# freeze the base model layers
for layer in base_model.layers:
    layer.trainable = False

# compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics = ['accuracy'])

# print model summary
model.summary()

# train the model
model.fit(train_data, epochs=EPOCHS, batch_size = BATCHSIZE)

## Evaluation

The model will be judged by the accuracy of the predictions. 

After around 10 different trainings, the accuracy was around 28%.

In [None]:
predictions = model.predict(test_data)
predictions = np.argmax(predictions, axis=1)
actual = np.concatenate([y for x, y in test_data], axis=0)
print(predictions)
print(actual)

# calculate accuracy
accuracy = np.mean(predictions == actual)
print(f'Accuracy: {accuracy}')

#save accuracy to file
with open('baseline-accuracy.txt', 'w') as f:
    f.write(f'Accuracy: {accuracy}')
