# Kaggle. Plant Pathology 2020. Transfer Learning + Stacking
**Identify the category of foliar diseases in apple trees**

# Contents
* [<font size=4>Plan</font>](#1)
* [<font size=4>Results</font>](#2)
* [<font size=4>Preparing the ground</font>](#3)
    * [TPU Config](#3.1)
    * [Load Labels](#3.2)
    * [Visualize one leaf](#3.3)
    * [Image processing and augmentation](#3.4)
    * [Visualize results](#3.5)
    * [Save predictions](#3.6)
    * [Learning Rate](#3.7)
* [<font size=4>Modeling</font>](#4)
    * [EfficientNetB7](#4.1)
    * [ResNet50V2](#4.2)
    * [InceptionResNetV2](#4.3)
    * [InceptionV3](#4.4)
    * [Xception](#4.5)
    * [ResNet152V2](#4.6)
    * [NASNetLarge](#4.7)
    * [DenseNet201](#4.8)
* [<font size=4>Stacking</font>](#5)
    * [Avg](#5.1)
    * [Ridge meta model](#5.2)
    * [MLPClassifier meta model](#5.3)
    * [Entropy](#5.4)


# Plan <a id="1"></a>
1. Train/val 0.8/0.2. Choose the 3 best models with val_acc.
2. Train/val 0.85/0.15. Train the 3 best models.
3. Stack predictions for the 3 best models for validation dataset.
4. Stack predictions for the 3 best models for test dataset.
4. Train meta model with stacking predictions for validation dataset.
5. Get predictions with meta model for test dataset.
6. Compare with the average predictions of the 3 best models

# Results  <a id="2"></a>

1. EfficientNetB7: acc: 0.000, val_acc: 0.000, LB: 0.977 (0.972)
2. ResNet50V2:  60 epoch, 1024: acc: 0.9815, val_acc: 0.9644 
3. ResNet50: 100 epoch: acc: 0.92542, val_acc: 0.9233    
4. IncResNetV2: 100 epoch: acc: 0.9908, val_acc: 0.9534
5. InceptionV3: 40 epoch: acc: 0.9972, val_acc: 0.9562 
6. Xception: 40 epochs: acc: 0.9993, val_acc: 0.9726, LB: 0.954    
7. ResNet152V2: 40 epochs: acc: 0.9993, val_acc: 0.9562
8. Avg  IncV3, Xcept, ResNet152V2, 40 epoch LB: 0.948 

100 epoch, 16*, Val 0.15, img 800

1. Xcept: loss: 0.0032 - categorical_accuracy: 0.9993 - val_loss: 0.0465 - val_categorical_accuracy: 0.9818, LB: 0.971
2. ResNet50V2: loss: 0.0119 - categorical_accuracy: 0.9974 - val_loss: 0.1597 - val_categorical_accuracy: 0.9453
3. InceptionV3: loss: 0.0018 - categorical_accuracy: 0.9993 - val_loss: 0.0475 - val_categorical_accuracy: 0.9635
4. EffNetB7: loss: 0.0041 - categorical_accuracy: 0.9948 - val_loss: 0.1023 - val_categorical_accuracy: 0.9714
5. NasNet: loss: 0.0011 - categorical_accuracy: 0.9993 - val_loss: 0.4238 - val_categorical_accuracy: 0.9343

100 epoch, 16*, Val 0.15, img 533-800, Avg. LB 0.971
1. InceptionV3: loss: 0.0079 - categorical_accuracy: 0.9974 - val_loss: 0.1028 - val_categorical_accuracy: 0.9745
2. Xcept: loss: 0.0033 - categorical_accuracy: 0.9987 - val_loss: 0.0774 - val_categorical_accuracy: 0.9635 
3. EffNetB7 30 epoch: loss: 0.0066 - categorical_accuracy: 0.9974 - val_loss: 0.0755 - val_categorical_accuracy: 0.9745
4. ResNet152V2: val_categorical_accuracy: 0.9635
5. DenseNet201: loss: 0.0039 - categorical_accuracy: 0.9987 - val_loss: 0.0451 - val_categorical_accuracy: 0.9854, LB 0.963

100 epoch, 16*, no Val, img 533-800
1. Entropy. InceptionV3 + DenseNet201 + EffNetB7. LB entr: 0.969, 0.970
2. Avg. InceptionV3 + DenseNet201 + EffNetB7. LB avg: 0.974
3. 5 models entropy: 0.970
4. 5 models avg. LB: 0.972

# Preparing the ground  <a id="3"></a>

In [None]:
import numpy as np 
import pandas as pd 
import os
import tensorflow as tf
from tensorflow import keras
import tensorflow.keras.layers as L
from keras.models import Model

import warnings
warnings.filterwarnings("ignore")

### TPU Config  <a id="3.1"></a>

In [None]:
from kaggle_datasets import KaggleDatasets

AUTO = tf.data.experimental.AUTOTUNE
# Detect hardware, return appropriate distribution strategy
try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection. No parameters necessary if TPU_NAME environment variable is set. On Kaggle this is always the case.
    print('Running on TPU ', tpu.master())
except ValueError:
    tpu = None

if tpu:
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.experimental.TPUStrategy(tpu)
else:
    strategy = tf.distribute.get_strategy() # default distribution strategy in Tensorflow. Works on CPU and single GPU.

def seed_everything(seed=0):
    np.random.seed(seed)
    tf.random.set_seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    os.environ['TF_DETERMINISTIC_OPS'] = '1'

seed = 2048
seed_everything(seed)
print("REPLICAS: ", strategy.num_replicas_in_sync)

# Data access
GCS_DS_PATH = KaggleDatasets().get_gcs_path()

# Configuration
#BATCH_SIZE = 8 * strategy.num_replicas_in_sync
BATCH_SIZE = 16 * strategy.num_replicas_in_sync
EPOCHS = 100
image_size1 = 800
#image_size1 = 533
image_size2 = 800

### Load Labels  <a id="3.2"></a>

In [None]:
def format_path(st):
    return GCS_DS_PATH + '/images/' + st + '.jpg'

In [None]:
from sklearn.model_selection import train_test_split

train = pd.read_csv('/kaggle/input/plant-pathology-2020-fgvc7/train.csv')
test = pd.read_csv('/kaggle/input/plant-pathology-2020-fgvc7/test.csv')
sub = pd.read_csv('/kaggle/input/plant-pathology-2020-fgvc7/sample_submission.csv')

train_paths = train.image_id.apply(format_path).values
test_paths = test.image_id.apply(format_path).values
train_labels = train.loc[:, 'healthy':].values

valid_dataset = []
SPLIT_VALIDATION = True
if SPLIT_VALIDATION:
    train_paths, valid_paths, train_labels, valid_labels =train_test_split(train_paths, train_labels, test_size=0.15, random_state=24)
    valid_labels_df = pd.DataFrame({'healthy': valid_labels[:, 0], 
                                  'multiple_diseases': valid_labels[:, 1], 
                                  'rust': valid_labels[:, 2], 
                                  'scab': valid_labels[:, 3]})
    valid_labels_df.to_csv('valid_labels.csv', index=False)
    
train_labels_df = pd.DataFrame({'healthy': train_labels[:, 0], 
                              'multiple_diseases': train_labels[:, 1], 
                              'rust': train_labels[:, 2], 
                              'scab': train_labels[:, 3]})
train_labels_df.to_csv('train_labels.csv', index=False)
    
STEPS_PER_EPOCH = train_labels.shape[0] // BATCH_SIZE

### Visualize one leaf  <a id="3.3"></a>

In [None]:
from matplotlib import pyplot as plt

img = plt.imread('../input/plant-pathology-2020-fgvc7/images/Train_500.jpg')
print(img.shape)
plt.imshow(img)

### Image processing and augmentation  <a id="3.4"></a>

In [None]:
def decode_image(filename, label=None, image_size=(image_size1, image_size2)):
    bits = tf.io.read_file(filename)
    image = tf.image.decode_jpeg(bits, channels=3)
    image = tf.cast(image, tf.float32) / 255.0
    image = tf.image.resize(image, image_size)
    
    if label is None:
        return image
    else:
        return image, label

def data_augment(image, label=None):
    image = tf.image.random_flip_left_right(image)
    image = tf.image.random_flip_up_down(image)
    
    #image = tf.image.random_brightness(image, max_delta=32.0 / 255.0)
    #image = tf.image.random_saturation(image, lower=0.5, upper=1.5)

    #Make sure the image is still in [0, 1]
    image = tf.clip_by_value(image, 0.0, 1.0)
    
    if label is None:
        return image
    else:
        return image, label

In [None]:
train_dataset = (
tf.data.Dataset
    .from_tensor_slices((train_paths, train_labels))
    .map(decode_image, num_parallel_calls=AUTO)
    .cache()
    .map(data_augment, num_parallel_calls=AUTO)
    .repeat()
    .shuffle(512)
    .batch(BATCH_SIZE)
    .prefetch(AUTO)
)

if SPLIT_VALIDATION:
    valid_dataset = (
        tf.data.Dataset
        .from_tensor_slices((valid_paths, valid_labels))
        .map(decode_image, num_parallel_calls=AUTO)
        .batch(BATCH_SIZE)
        .cache()
        .prefetch(AUTO)
    )

test_dataset = (
    tf.data.Dataset
    .from_tensor_slices(test_paths)
    .map(decode_image, num_parallel_calls=AUTO)
    .map(data_augment, num_parallel_calls=AUTO)
    .batch(BATCH_SIZE)
)

train_dataset_for_pred = (
    tf.data.Dataset
    .from_tensor_slices((train_paths, train_labels))
    .map(decode_image, num_parallel_calls=AUTO)
    .cache()
    .map(data_augment, num_parallel_calls=AUTO)
    .batch(BATCH_SIZE)
    .prefetch(AUTO)
)

### Visualize results  <a id="3.5"></a>

In [None]:
import matplotlib.pyplot as plt

def plot_learning(history):
    acc = history.history['categorical_accuracy']
    loss = history.history['loss']
    if SPLIT_VALIDATION: 
        val_acc = history.history['val_categorical_accuracy']
        val_loss = history.history['val_loss']

    epochs = range(len(acc))

    plt.plot(epochs, acc, 'bo', label='Training accuracy')
    if SPLIT_VALIDATION: plt.plot(epochs, val_acc, 'b', label='Validation accuracy')
    plt.title('Accuracy')

    plt.figure()

    plt.plot(epochs, loss, 'bo', label='Training Loss')
    if SPLIT_VALIDATION: plt.plot(epochs, val_loss, 'b', label='Validation Loss')
    plt.title('Loss')
    plt.legend()

    plt.show()
    

### Save predictions  <a id="3.6"></a>

In [None]:
def save_preds(model_name, pred_train, pred_test, pred_val=None):
    
    sub.loc[:, 'healthy':] = pred_test
    filename_test = 'preds_' + model_name + '_test.csv'
    sub.to_csv(filename_test, index=False)

    train_labels_df.loc[:, 'healthy':] = pred_train
    filename_train = 'preds_' + model_name + '_train.csv'
    train_labels_df.to_csv(filename_train, index=False)

    if SPLIT_VALIDATION:    
        valid_labels_df.loc[:, 'healthy':] = pred_val
        filename_val = 'preds_' + model_name + '_val.csv'
        valid_labels_df.to_csv(filename_val, index=False)

### Learning Rate  <a id="3.7"></a>

In [None]:
LR_START = 0.00001
LR_MAX = 0.0001 * strategy.num_replicas_in_sync
LR_MIN = 0.00001
LR_RAMPUP_EPOCHS = 15
LR_SUSTAIN_EPOCHS = 3
LR_EXP_DECAY = .8

def lrfn(epoch):
    if epoch < LR_RAMPUP_EPOCHS:
        lr = (LR_MAX - LR_START) / LR_RAMPUP_EPOCHS * epoch + LR_START
    elif epoch < LR_RAMPUP_EPOCHS + LR_SUSTAIN_EPOCHS:
        lr = LR_MAX
    else:
        lr = (LR_MAX - LR_MIN) * LR_EXP_DECAY**(epoch - LR_RAMPUP_EPOCHS - LR_SUSTAIN_EPOCHS) + LR_MIN
    return lr
    
lr_callback = [
  tf.keras.callbacks.EarlyStopping(patience=10),
  tf.keras.callbacks.LearningRateScheduler(lrfn, verbose=True)
]

# Modeling  <a id="4"></a>

### EnfNetB7  <a id="4.1"></a>

In [None]:
!pip install efficientnet
import efficientnet.tfkeras as efn

In [None]:
with strategy.scope():
    model_EffNetB7 = tf.keras.Sequential([ efn.EfficientNetB7( input_shape=(image_size1, image_size2, 3), 
                                                              weights='imagenet', 
                                                              include_top=False, 
                                                              pooling='avg'), 
                                                    L.Dense(4, activation='softmax')
                                                    ])
    
    model_EffNetB7.compile(optimizer='adam', 
                           loss='categorical_crossentropy',
                           metrics=['accuracy'])

In [None]:
history = model_EffNetB7.fit(
        train_dataset, 
        steps_per_epoch=STEPS_PER_EPOCH,
        callbacks=lr_callback,
        epochs=EPOCHS,
        validation_data=valid_dataset if SPLIT_VALIDATION else None
    )

In [None]:
pred_train_EffNetB7 = model_EffNetB7.predict(train_dataset_for_pred)
pred_test_EffNetB7 = model_EffNetB7.predict(test_dataset)

if SPLIT_VALIDATION:
    pred_val_EffNetB7 = model_EffNetB7.predict(valid_dataset)
    save_preds('EffNetB7', pred_train_EffNetB7, pred_test_EffNetB7, pred_val_EffNetB7)
else:
    save_preds('EffNetB7', pred_train_EffNetB7, pred_test_EffNetB7)

In [None]:
plot_learning(history)

In [None]:
tf.tpu.experimental.initialize_tpu_system(tpu) # Clear TPU Memory

### ResNet50V2  <a id="4.2"></a>

In [None]:
from tensorflow.keras.applications.resnet_v2 import ResNet50V2

with strategy.scope():
    model_ResNet50V2 = tf.keras.Sequential([
                    ResNet50V2(
                        input_shape=(image_size1, image_size2, 3),
                        weights='imagenet',
                        include_top=False
                    ),
                    L.GlobalMaxPooling2D(),

                    L.Dense(1024, activation='relu'),
                    L.Dropout(0.5),
                    L.BatchNormalization(),

                    L.Dense(4, activation='softmax')
                ])
        
    model_ResNet50V2.compile(
        optimizer = 'adam',
        loss = 'categorical_crossentropy',
        metrics=['categorical_accuracy']
    )

In [None]:
history = model_ResNet50V2.fit(
    train_dataset, 
    epochs=EPOCHS, 
    callbacks=lr_callback,
    steps_per_epoch=STEPS_PER_EPOCH,
    validation_data=valid_dataset if SPLIT_VALIDATION else None
)

In [None]:
pred_train_ResNet50V2 = model_ResNet50V2.predict(train_dataset_for_pred)
pred_test_ResNet50V2 = model_ResNet50V2.predict(test_dataset)

if SPLIT_VALIDATION:
    pred_val_ResNet50V2 = model_ResNet50V2.predict(valid_dataset)
    save_preds('ResNet50V2', pred_train_ResNet50V2, pred_test_ResNet50V2, pred_val_ResNet50V2)
else:
    save_preds('ResNet50V2', pred_train_ResNet50V2, pred_test_ResNet50V2)

In [None]:
plot_learning(history)

In [None]:
tf.tpu.experimental.initialize_tpu_system(tpu) # Clear TPU Memory

### InceptionResNetV2  <a id="4.3"></a>

In [None]:
from tensorflow.keras.applications import InceptionResNetV2

with strategy.scope():
    model_IncResNetV2 = tf.keras.Sequential([
                InceptionResNetV2(
                    input_shape=(image_size1, image_size2, 3),
                    weights='imagenet',
                    include_top=False
                ),
                L.GlobalMaxPooling2D(),

                L.Dense(512, activation='relu'),
                L.Dropout(0.5),
                L.BatchNormalization(),

                L.Dense(4, activation='softmax')
            ])
        
    model_IncResNetV2.compile(
        optimizer = 'adam',
        loss = 'categorical_crossentropy',
        metrics=['categorical_accuracy']
    )

In [None]:
history = model_IncResNetV2.fit(
    train_dataset, 
    epochs=EPOCHS, 
    callbacks=lr_callback,
    steps_per_epoch=STEPS_PER_EPOCH,
    validation_data=valid_dataset if SPLIT_VALIDATION else None,
)

In [None]:
pred_train_IncResNetV2 = model_IncResNetV2.predict(train_dataset_for_pred)
pred_test_IncResNetV2 = model_IncResNetV2.predict(test_dataset)

if SPLIT_VALIDATION:
    pred_val_IncResNetV2 = model_IncResNetV2.predict(valid_dataset)
    save_preds('IncV3', pred_train_IncResNetV2, pred_test_IncResNetV2, pred_val_IncResNetV2)
else:
    save_preds('IncV3', pred_train_IncResNetV2, pred_test_IncResNetV2)

In [None]:
plot_learning(history)

In [None]:
tf.tpu.experimental.initialize_tpu_system(tpu) # Clear TPU Memory

### InceptionV3  <a id="4.4"></a>

In [None]:
from tensorflow.keras.applications.inception_v3 import InceptionV3

with strategy.scope(): 
    model_IncV3 = tf.keras.Sequential([ InceptionV3( input_shape=(image_size1, image_size2, 3), 
                                                                       weights='imagenet', 
                                                                       include_top=False ), 
                                                    L.GlobalMaxPooling2D(), 
                                                    L.Dense(4, activation='softmax')
                                                    ])
    model_IncV3.compile(optimizer='adam',
                  loss = 'categorical_crossentropy', 
                  metrics=['categorical_accuracy'])

In [None]:
history = model_IncV3.fit(
    train_dataset, 
    epochs=EPOCHS, 
    callbacks=lr_callback,
    steps_per_epoch=STEPS_PER_EPOCH,
    validation_data=valid_dataset if SPLIT_VALIDATION else None
)

In [None]:
pred_train_IncV3 = model_IncV3.predict(train_dataset_for_pred)
pred_test_IncV3 = model_IncV3.predict(test_dataset)

if SPLIT_VALIDATION:
    pred_val_IncV3 = model_IncV3.predict(valid_dataset)
    save_preds('IncV3', pred_train_IncV3, pred_test_IncV3, pred_val_IncV3)
else:
    save_preds('IncV3', pred_train_IncV3, pred_test_IncV3)

In [None]:
plot_learning(history)

In [None]:
tf.tpu.experimental.initialize_tpu_system(tpu) # Clear TPU Memory

### Xception  <a id="4.5"></a>

In [None]:
from tensorflow.keras.applications import Xception

with strategy.scope(): 
    
    model_Xcept = tf.keras.Sequential([Xception(input_shape=(image_size1, image_size2, 3),
                                                            weights='imagenet',
                                                            include_top=False),
                                             L.GlobalAveragePooling2D(),
                                             L.Dense(4, activation='softmax')
                                             ])
        
    model_Xcept.compile(loss="categorical_crossentropy", 
                        optimizer= 'adam', 
                        metrics=["categorical_accuracy"])

In [None]:
history = model_Xcept.fit(
    train_dataset, 
    epochs=EPOCHS, 
    callbacks=lr_callback,
    steps_per_epoch=STEPS_PER_EPOCH,
    validation_data=valid_dataset if SPLIT_VALIDATION else None
)

In [None]:
pred_train_Xcept = model_Xcept.predict(train_dataset_for_pred)
pred_test_Xcept = model_Xcept.predict(test_dataset)

if SPLIT_VALIDATION:
    pred_val_Xcept = model_Xcept.predict(valid_dataset)
    save_preds('Xcept', pred_train_Xcept, pred_test_Xcept, pred_val_Xcept)
else:
    save_preds('Xcept', pred_train_Xcept, pred_test_Xcept)

In [None]:
plot_learning(history)

In [None]:
tf.tpu.experimental.initialize_tpu_system(tpu) # Clear TPU Memory

### ResNet152V2  <a id="4.6"></a>

In [None]:
from tensorflow.keras.applications import ResNet152V2

with strategy.scope():
    model_ResNet152V2 = tf.keras.Sequential([ResNet152V2(input_shape=(image_size1, image_size2, 3),
                                                            weights='imagenet',
                                                            include_top=False),
                                             L.GlobalAveragePooling2D(),
                                             L.Dense(4, activation='softmax')
                                             ])
    
    model_ResNet152V2.compile(loss="categorical_crossentropy", 
                              optimizer= 'adam', 
                              metrics=["categorical_accuracy"])

In [None]:
history = model_ResNet152V2.fit(
    train_dataset, 
    epochs=EPOCHS, 
    callbacks=lr_callback,
    steps_per_epoch=STEPS_PER_EPOCH,
    validation_data=valid_dataset if SPLIT_VALIDATION else None
)

In [None]:
pred_train_ResNet152V2 = model_ResNet152V2.predict(train_dataset_for_pred)
pred_test_ResNet152V2 = model_ResNet152V2.predict(test_dataset)

if SPLIT_VALIDATION:
    pred_val_ResNet152V2 = model_ResNet152V2.predict(valid_dataset)
    save_preds('ResNet152V2', pred_train_ResNet152V2, pred_test_ResNet152V2, pred_val_ResNet152V2)
else:
    save_preds('ResNet152V2', pred_train_ResNet152V2, pred_test_ResNet152V2)

In [None]:
plot_learning(history)

In [None]:
tf.tpu.experimental.initialize_tpu_system(tpu) # Clear TPU Memory

### NASNet  <a id="4.7"></a>

In [None]:
from tensorflow.keras.applications.nasnet import NASNetLarge
with strategy.scope():    
    model_NASNet = tf.keras.Sequential([NASNetLarge( input_shape=(image_size1, image_size2, 3), 
                                                                       weights='imagenet', 
                                                                       include_top=False ), 
                                                    L.GlobalMaxPooling2D(), 
                                                    L.Dense(4, activation='softmax')
                                                    ])
    model_NASNet.compile(optimizer='adam',
                  loss = 'categorical_crossentropy', 
                  metrics=['categorical_accuracy'])

In [None]:
history = model_NASNet.fit(
    train_dataset, 
    epochs=EPOCHS, 
    callbacks=lr_callback,
    steps_per_epoch=STEPS_PER_EPOCH,
    validation_data=valid_dataset if SPLIT_VALIDATION else None
)

In [None]:
pred_train_NASNet = model_NASNet.predict(train_dataset_for_pred)
pred_test_NASNet = model_NASNet.predict(test_dataset)

if SPLIT_VALIDATION:
    pred_val_NASNet = model_NASNet.predict(valid_dataset)
    save_preds('NASNet', pred_train_NASNet, pred_test_NASNet, pred_val_NASNet)
else:
    save_preds('NASNet', pred_train_NASNet, pred_test_NASNet)

### DenseNet201  <a id="4.8"></a>

In [None]:
from tensorflow.keras.applications import DenseNet201

with strategy.scope():
    model_DenseNet201 = tf.keras.Sequential([DenseNet201(input_shape=(image_size1, image_size2, 3),
                                                            weights='imagenet',
                                                            include_top=False),
                                             L.GlobalAveragePooling2D(),
                                             L.Dense(4, activation='softmax')
                                             ])
    
    model_DenseNet201.compile(loss="categorical_crossentropy", 
                              optimizer= 'adam', 
                              metrics=["categorical_accuracy"])

In [None]:
history = model_DenseNet201.fit(
    train_dataset, 
    epochs=EPOCHS, 
    callbacks=lr_callback,
    steps_per_epoch=STEPS_PER_EPOCH,
    validation_data=valid_dataset if SPLIT_VALIDATION else None
)

In [None]:
pred_train_DenseNet201 = model_DenseNet201.predict(train_dataset_for_pred)
pred_test_DenseNet201 = model_DenseNet201.predict(test_dataset)

if SPLIT_VALIDATION:
    pred_val_DenseNet201 = model_DenseNet201.predict(valid_dataset)
    save_preds('DenseNet201', pred_train_DenseNet201, pred_test_DenseNet201, pred_val_DenseNet201)
else:
    save_preds('DenseNet201', pred_train_DenseNet201, pred_test_DenseNet201)

In [None]:
plot_learning(history)

In [None]:
tf.tpu.experimental.initialize_tpu_system(tpu) # Clear TPU Memory

# Stacking  <a id="5"></a>
1. Закачиваю тест и вал каждой модели
1. Составляю ДФ трэйн и ДФ тест
1. Тренирую модель
1. Получаю предсказания
1. Сохраняю предсказания

### Avg  <a id="5.1"></a>

In [None]:
preds_avg = (pred_test_EffNetB7 + pred_test_IncV3 + pred_test_Xcept) / 3
sub.loc[:, 'healthy':] = preds_avg
sub.to_csv('submission_avg_3model_NoSplit_800-533.csv', index=False)
sub.head()
# LB 0.98

### Ridge meta model  <a id="5.2"></a>

In [None]:
pred_train = np.concatenate((pred_val_EffNetB7, pred_val_IncV3, pred_val_Xcept), axis=1)
pred_train.shape

In [None]:
pred_test = np.concatenate((pred_test_EffNetB7, pred_test_IncV3, pred_test_Xcept), axis=1)
pred_test.shape

In [None]:
valid_labels.shape

In [None]:
from sklearn.linear_model import Ridge

Ridge = Ridge(alpha=1, random_state=241)
Ridge.fit(pred_train, valid_labels)
predictions = Ridge.predict(pred_test)

In [None]:
sub.loc[:, 'healthy':] = predictions
sub.to_csv('submission_predict_ridge.csv', index=False)
sub.head()

### MLPClassifier meta model  <a id="5.3"></a>

In [None]:
from sklearn.neural_network import MLPClassifier

MLP_clf = MLPClassifier(max_iter=400)
MLP_clf.fit(pred_train, valid_labels)

In [None]:
predictionMLP = MLP_clf.predict(pred_test)

In [None]:
sub.loc[:, 'healthy':] = predictionMLP
sub.to_csv('submission_3models_MLPReg.csv', index=False)
sub.head()
# LB 0.961 regression, 0.925 classifier

### Entropy  <a id="5.4"></a>

In [None]:
ent1 = entropy(pred_test_EffNetB7, base=2, axis = 1)
ent2 = entropy(pred_test_IncV3, base=2, axis = 1)
ent3 = entropy(pred_test_DenseNet201, base=2, axis = 1)
ent4 = entropy(pred_test_Xcept, base=2, axis = 1)
ent5 = entropy(pred_test_ResNet152V2, base=2, axis = 1)
entropies = np.array([ent1, ent2, ent3, ent4, ent5]).transpose()
entropies.shape

selected = np.argmin(entropies, axis = 1)

In [None]:
submission_size = len(selected)
for i in range(submission_size):
    if selected[i] ==0:
        sub.loc[i, 'healthy' : ] = sub1
    elif selected[i] ==1:
        sub.loc[i, 'healthy' : ] = sub2
    elif selected[i] == 2:
        sub.loc[i, 'healthy' : ] = sub3
    elif selected[i] == 3:
        sub.loc[i, 'healthy' : ] = sub4
    elif selected[i] == 4:
        sub.loc[i, 'healthy' : ] = sub5

In [None]:
sub.to_csv('submission.csv', index=False)