*1st place solution*  
*Team: vecxoz*  
*License: MIT*  

### Summary

Many thanks to the hosts for the very interesting challenge and congrats to all participants!

My solution is based on CNN-LSTM architecture with EfficientNet-B7 backbone trained on pairs of 512x512 images. I used original images without cleaning and any preprocessing. Dataset consists of 14613 pairs total: 11k for training and 3k for validation. I built a 5-fold cross-validation setup and tried to ensemble 5 models from each fold, but the ensemble was not better than a single 1st fold. As augmentation I used flips and rotations multiple of 90 degrees. Optimization performed with Adam optimizer and constant learning rate. Architecture has a substantial capacity due to a large backbone and large LSTM block and learns very quickly (1 epoch). 

A significant improvement in the score was obtained using linear calibration. From the visual inspection of the test set it’s clear that plants `LT_1088` and `LT_1089` have different appearance and evolution patterns compared to `LT` plants present in the training set. Here we have classic examples of so-called "data drift" i.e. distribution shift of the test set. There are different approaches which allow accommodating new distribution but I used simple calibration with linear coefficients applied to the predictions of my model. Depending on data split and seeds, the raw private score of my model is around 5.1 and calibrated private score is around 4.2. 

Acknowledgement. Thanks to [TRC program]( https://sites.research.google/trc/about/) I had an opportunity to run experiments on TPUv3-8.

### System requirements
```
12 CPU, 48 GB RAM, V100-32GB GPU  

Ubuntu 18.04  
Python: 3.6.9  
CUDA: 11.1  
cuDNN: 8.0.4  

numpy==1.19.5  
pandas==1.1.5  
Pillow==8.2.0  
scikit-learn==0.24.2  
tensorflow==2.4.1  
tensorflow-addons==0.12.0  
efficientnet==1.1.1  

Execution time: <1 hour  
```

### Environment setup

```
mkdir $HOME/solution
cd $HOME/solution
# Donload file "open.zip"
unzip open.zip
mv open/train_dataset open/test_dataset open/sample_submission.csv ./
rm -rf open.zip open
# Copy this notebook into $HOME/solution
# Now we have the following structure:

$HOME/
    solution/
        test_dataset/
        train_dataset/
        notebook.ipynb # this notebook
        sample_submission.csv
```

### Import

In [None]:
import os
import sys
import glob
import math
import random
import itertools
import collections
import numpy as np
import pandas as pd
from PIL import Image
from sklearn.model_selection import GroupKFold
import tensorflow as tf
print('tf:', tf.__version__)
import tensorflow_addons as tfa
print('tfa:', tfa.__version__)
import efficientnet.tfkeras as efn

### Settings

In [None]:
class args:
    data_dir = os.path.join(os.path.expanduser('~'), 'solution')             # Directory containig CSV files
    data_tfrec_dir = os.path.join(os.path.expanduser('~'), 'solution/tfrec') # Directory containig TFRecord files
    data_preds_dir = os.path.join(os.path.expanduser('~'), 'solution/preds') # Directory where predictions will be saved
    seed = 547895                     # Seed
    mixed_precision = 'mixed_float16' # Mixed precision. E.g.: mixed_float16, mixed_bfloat16, or None
    job = 'train_test'                # Job to run. Possible values: train, test, or train_test
    n_folds = 5                       # Number of folds
    initial_fold = 0                  # Initial fold (from 0)
    final_fold = 1                    # Final fold (from 1). To train all folds set equal to n_folds
    n_channels = 3                    # Number of image channels
    volume = 2                        # Number of 3-channel images in a sequence (volume)
    dim = 512                         # Image size in pixels
    n_examples_train = 11_691         # Number of training examples. This value is used to define an epoch
    n_epochs = 1                      # Number of epochs to train
    batch_size = 8                    # Batch size
    lr = 1e-4                         # Learning rate
    aug_number = 5                    # Number of train-time augmentations. 0 means no aug
    tta_number = 0                    # Number of test-time augmentations. 0 means no tta
    buffer_size = 1024                # Shuffle buffer size for tf.data
    create_tfrecords = True           # True to create TFRecords (first run), False to skip (next runs)

### Definitions

In [None]:
def nCr(n, r):
    """
    Number of combinations from n by r
    """
    return math.factorial(n) // math.factorial(r) // math.factorial(n-r)


def seeder(seed):
    """
    Set seed
    """
    os.environ['PYTHONHASHSEED'] = str(seed)
    random.seed(seed)
    np.random.seed(seed)
    tf.random.set_seed(seed)
    return seed


class TFRecordProcessor(object):
    """
    Write TFRecord files based on Pandas dataframe
    """
    def __init__(self):
        self.n_examples = 0
    def _bytes_feature(self, value):
        if isinstance(value, type(tf.constant(0))):
            value = value.numpy() # BytesList won't unpack a string from an EagerTensor.
        return tf.train.Feature(bytes_list=tf.train.BytesList(value=value))
    def _int_feature(self, value):
        return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
    def _float_feature(self, value):
        return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))
    def _process_example(self, ind, A, B, C, D):
        self.n_examples += 1
        feature = collections.OrderedDict()
        feature['image_id'] = self._bytes_feature([A[ind].encode('utf-8')])
        feature['image'] =    self._bytes_feature([tf.io.read_file(B[ind][0]).numpy(), 
                                                   tf.io.read_file(B[ind][1]).numpy()])
        feature['label_id'] = self._bytes_feature([C[ind].encode('utf-8')])
        feature['label'] =    self._int_feature(D[ind])
        example_proto = tf.train.Example(features=tf.train.Features(feature=feature))
        self._writer.write(example_proto.SerializeToString())
    def write_tfrecords(self, A, B, C, D, n_shards=1, file_out='train.tfrecord'):
        n_examples_per_shard = A.shape[0] // n_shards
        n_examples_remainder = A.shape[0] %  n_shards   
        self.n_examples = 0
        for shard in range(n_shards):
            self._writer = tf.io.TFRecordWriter('%s-%05d-of-%05d' % (file_out, shard, n_shards))
            start = shard * n_examples_per_shard
            if shard == (n_shards - 1):
                end = (shard + 1) * n_examples_per_shard + n_examples_remainder
            else:
                end = (shard + 1) * n_examples_per_shard
            print('Shard %d of %d: (%d examples)' % (shard, n_shards, (end - start)))
            for i in range(start, end):
                self._process_example(i, A, B, C, D)
                print(i, end='\r')
            self._writer.close()
        return self.n_examples


def init_tfdata(files_glob, deterministic=True, batch_size=32, auto=-1, 
                parse_example=None, mod=None, aug=None, tta=None, norm=None, 
                repeat=False, buffer_size=None, cache=False):
    """
    Init tf.data.TFRecordDataset
    """
    options = tf.data.Options()
    options.experimental_deterministic = deterministic
    files = tf.data.Dataset.list_files(files_glob, 
                                       shuffle=not deterministic).with_options(options)
    print('N tfrec files:', len(files))
    #
    ds = tf.data.TFRecordDataset(files, num_parallel_reads=auto)
    ds = ds.with_options(options)
    ds = ds.map(parse_example, num_parallel_calls=auto)
    #
    if mod:
        ds = ds.map(mod, num_parallel_calls=auto)
    if aug:
        ds = ds.map(aug, num_parallel_calls=auto)
    if tta:
        ds = ds.map(tta, num_parallel_calls=auto)
    if norm:
        ds = ds.map(norm, num_parallel_calls=auto)
    if repeat:
        ds = ds.repeat()
    if buffer_size:
        ds = ds.shuffle(buffer_size=buffer_size, 
                        reshuffle_each_iteration=True)
    ds = ds.batch(batch_size=batch_size)
    ds = ds.prefetch(auto)
    if cache:
        ds = ds.cache()
    #
    return ds


def fn(image):
    """
    Parse JPG image
    """
    image = tf.image.decode_jpeg(image, channels=args.n_channels, dct_method='INTEGER_ACCURATE')
    image = tf.image.resize(image, [args.dim, args.dim])
    image = tf.cast(image, dtype=tf.uint8)
    return image


def parse_example(example_proto):
    feature_description = {
        'image':    tf.io.FixedLenFeature([args.volume], tf.string),
        'label':    tf.io.FixedLenFeature([], tf.int64),
    }
    d = tf.io.parse_single_example(example_proto, feature_description)
    image = tf.map_fn(fn, d['image'], dtype=tf.uint8)
    image = tf.image.resize(image, [args.dim, args.dim])
    image = tf.reshape(image, [args.volume, args.dim, args.dim, args.n_channels])
    image = tf.cast(image, tf.uint8)
    label = tf.cast(d['label'], tf.int32)
    return image, label


def aug_0(image, label):
    return image, label
def aug_1(image, label):
    return tf.image.flip_left_right(image), label
def aug_2(image, label):
    return tf.image.flip_up_down(image), label
def aug_3(image, label):
    return tfa.image.rotate(image, math.pi/4*2), label #  90
def aug_4(image, label):
    return tfa.image.rotate(image, math.pi/4*4), label # 180
def aug_5(image, label):
    return tfa.image.rotate(image, math.pi/4*6), label # 270
def aug_6(image, label):
    return tf.image.central_crop(tfa.image.rotate(image, math.pi/4*1), 0.7), label #  45
def aug_7(image, label):
    return tf.image.central_crop(tfa.image.rotate(image, math.pi/4*3), 0.7), label # 135
def aug_8(image, label):
    return tf.image.central_crop(tfa.image.rotate(image, math.pi/4*5), 0.7), label # 225
def aug_9(image, label):
    return tf.image.central_crop(tfa.image.rotate(image, math.pi/4*7), 0.7), label # 315


tta_func_list = [aug_0, aug_1, aug_2, aug_3, aug_4, aug_5, aug_6, aug_7, aug_8, aug_9][:args.tta_number + 1]


def aug(image, label):
    """
    Runs single transformation per call with `aug_percentage` probability
    Specific transformation probability is `aug_percentage` / `aug_number`
    """
    aug_percentage = 0.5
    aug_maxval = round(args.aug_number / aug_percentage)
    #
    if args.aug_number != 0:
        aug_id = tf.random.uniform([], minval=0, maxval=aug_maxval, dtype=tf.int32)
    #
    if args.aug_number == 0:
        pass
    elif aug_id == 0:
        image, label = aug_1(image, label)
    elif aug_id == 1:
        image, label = aug_2(image, label)
    elif aug_id == 2:
        image, label = aug_3(image, label)
    elif aug_id == 3:
        image, label = aug_4(image, label)
    elif aug_id == 4:
        image, label = aug_5(image, label)
    elif aug_id == 5:
        image, label = aug_6(image, label)
    elif aug_id == 6:
        image, label = aug_7(image, label)
    elif aug_id == 7:
        image, label = aug_8(image, label)
    elif aug_id == 8:
        image, label = aug_9(image, label)
    return image, label


def norm(image, label):
    image = tf.image.resize(image, [args.dim, args.dim])
    image = tf.reshape(image, [args.volume, args.dim, args.dim, args.n_channels])
    image = tf.cast(image, tf.float32)
    label = tf.cast(label, tf.int32)
    image = image / 255.0
    return image, label


def init_model(print_summary=True):
    #
    inp = tf.keras.layers.Input(shape=(args.volume, 
                                       args.dim, 
                                       args.dim, 
                                       args.n_channels), dtype=tf.float32)
    back = efn.EfficientNetB7(input_shape=(args.dim, 
                                           args.dim, 
                                           args.n_channels),
                              weights='imagenet', include_top=False)
    pool_2d = tf.keras.layers.GlobalAveragePooling2D()
    #
    x = tf.keras.layers.TimeDistributed(back)(inp)
    x = tf.keras.layers.TimeDistributed(pool_2d)(x)
    x = tf.keras.layers.LSTM(1024, return_sequences=False)(x)
    x = tf.keras.layers.Dense(300, activation='relu')(x)
    out = tf.keras.layers.Dense(1, activation='linear')(x)
    #
    model = tf.keras.models.Model(inp, out, name='model')
    model.compile(optimizer=tf.keras.optimizers.Adam(args.lr), 
                  loss='mse',
                  metrics=[tf.keras.metrics.RootMeanSquaredError(name='rmse')])
    if print_summary:
        model.summary()
    return model

### Create data

Here we can skip image resizing and TFRecord creation if already done:
```
# args.create_tfrecords = False # skip
```

In [None]:
if args.create_tfrecords:
    print('Resise all images to 1024x1024 and compress as JPG to speedup data loading')
    for in_dir, out_dir in (('train_dataset', 'train_dataset_jpg_1024'), ('test_dataset', 'test_dataset_jpg_1024')):
        print(in_dir, '->', out_dir)
        files = sorted(glob.glob(os.path.join(args.data_dir, '%s/*/*/*.png' % in_dir)))
        os.makedirs(os.path.join(args.data_dir, out_dir), exist_ok=True)
        for counter, file in enumerate(files):
            out_file = file.replace(in_dir, out_dir)
            out_file = out_file.replace('.png', '.jpg')
            os.makedirs(os.path.dirname(out_file), exist_ok=True)
            Image.open(file).resize((1024, 1024)).save(out_file)
            print(counter, end='\r')

print('Create train pairs (compbinations)')

n_files = 0
n_combs = 0
combs = []
for dir_species in sorted(glob.glob(os.path.join(args.data_dir, 'train_dataset_jpg_1024/*'))):
    for dir_timeseries in sorted(glob.glob(os.path.join(dir_species, '*'))):
        files = sorted(glob.glob(os.path.join(dir_timeseries, '*.jpg')))
        combs.extend(list(itertools.combinations(files, 2)))
        n = len(files)
        n_files += n
        n_combs += nCr(n, 2)
print('N pairs:', n_combs)

print('Create train dataframe')

image_id = []
image = []
label = []
label_id = []
group = []
for counter, comb in enumerate(combs):
    image_id.append('comb_%06d' % counter)
    image.append(sorted(comb))
    delta = int(comb[1].split('.')[-2][-2:]) - int(comb[0].split('.')[-2][-2:])
    label.append(delta)
    label_id.append(str(delta))
    group.append(comb[0].split('/')[-2])

train_df = pd.DataFrame()
train_df['image_id'] = image_id
train_df['image'] = image
train_df['label_id'] = label_id
train_df['label'] = label
train_df['group'] = group
train_df['fold_id'] = 0

print('Create test dataframe')

files = sorted(glob.glob(os.path.join(args.data_dir, 'test_dataset_jpg_1024/*/*/*.jpg')))
test_df = pd.read_csv(os.path.join(args.data_dir, 'test_dataset/test_data.csv'))

image_id = []
image = []
label = []
label_id = []
group = []
for counter, row in test_df.iterrows():
    image_id.append('comb_%06d' % row['idx'])
    pair = []
    for file in files:
        if row['before_file_path'] in file:
            pair.append(file)
    for file in files:
        if row['after_file_path'] in file:
            pair.append(file)
    image.append(pair)
    label.append(0)
    label_id.append('0')
    group.append('Unk')

test_df = pd.DataFrame()
test_df['image_id'] = image_id
test_df['image'] = image
test_df['label_id'] = label_id
test_df['label'] = label
test_df['group'] = group
test_df['fold_id'] = 0

print('Create train split')

n_splits = 5
kf = GroupKFold(n_splits=n_splits)
for fold_id, (train_index, val_index) in enumerate(kf.split(train_df, train_df['label'].values, 
                                                            groups=train_df['group'].values)):
    train_df.loc[train_df.index.isin(val_index), 'fold_id'] = fold_id
train_df = train_df.sample(frac=1.0, random_state=34)

if args.create_tfrecords:
    print('Create TFRecords')
    os.makedirs(args.data_tfrec_dir, exist_ok=True)
    tfrp = TFRecordProcessor()
    
    for fold_id in range(len(train_df['fold_id'].unique())):
        print('Fold:', fold_id)
        n_written = tfrp.write_tfrecords(
            train_df[train_df['fold_id'] == fold_id]['image_id'].values,
            train_df[train_df['fold_id'] == fold_id]['image'].values,
            train_df[train_df['fold_id'] == fold_id]['label_id'].values,
            train_df[train_df['fold_id'] == fold_id]['label'].values,
            #
            n_shards=1, 
            file_out=os.path.join(args.data_tfrec_dir, 'fold.%d.tfrecord' % fold_id))
    
    n_written = tfrp.write_tfrecords(
        test_df['image_id'].values,
        test_df['image'].values,
        test_df['label_id'].values,
        test_df['label'].values,
        #
        n_shards=1,
        file_out=os.path.join(args.data_tfrec_dir, 'test.tfrecord'))

### Train and predict

In [None]:
for fold_id in range(args.initial_fold, args.final_fold):
    print('\n*****')
    print('Fold:', fold_id)
    print('*****\n')
    #--------------------------------------------------------------------------
    os.makedirs(args.data_preds_dir, exist_ok=True)
    #--------------------------------------------------------------------------
    print('Clear session...')
    tf.keras.backend.clear_session()
    #--------------------------------------------------------------------------
    print('Set fold-specific seed...')
    _ = seeder(args.seed + fold_id)
    #--------------------------------------------------------------------------
    print('Allow growth')
    tf.config.experimental.set_memory_growth(device=tf.config.list_physical_devices('GPU')[0], enable=True)
    # tf.config.set_logical_device_configuration(tf.config.list_physical_devices('GPU')[0], 
    #     [tf.config.LogicalDeviceConfiguration(memory_limit=30_000)])
    #--------------------------------------------------------------------------
    if args.mixed_precision is not None:
        print('Init mixed precision:', args.mixed_precision)
        policy = tf.keras.mixed_precision.experimental.Policy(args.mixed_precision)
        tf.keras.mixed_precision.experimental.set_policy(policy)
    else:
        print('Using default precision:', tf.keras.backend.floatx())
    #--------------------------------------------------------------------------
    print('FULL BATCH SHAPE: %d x %d x %d x %d x %d' % (args.batch_size,
                                                        args.volume,
                                                        args.dim,
                                                        args.dim,
                                                        args.n_channels))
    #--------------------------------------------------------------------------
    # Init TPU
    print('Init accelerator')
    strategy = tf.distribute.get_strategy()
    #--------------------------------------------------------------------------
    # Globs
    all_fold_ids = np.array(range(args.n_folds))
    train_fold_ids = all_fold_ids[all_fold_ids != fold_id]
    train_glob = os.path.join(args.data_tfrec_dir, 'fold.[%d%d%d%d].tfrecord*' % tuple(train_fold_ids))
    val_glob   = os.path.join(args.data_tfrec_dir, 'fold.[%d].tfrecord*' % fold_id)
    test_glob  = os.path.join(args.data_tfrec_dir, 'test.tfrecord*')
    print('TRAIN GLOB:', train_glob)
    print('VAL   GLOB:', val_glob)
    print('TEST  GLOB:', test_glob)
    #--------------------------------------------------------------------------
    print('Init datasets')
    train_ds = init_tfdata(train_glob, 
                           deterministic=True,  
                           batch_size=args.batch_size, 
                           auto=-1,
                           parse_example=parse_example, 
                           aug=aug, 
                           norm=norm,
                           repeat=True,
                           buffer_size=args.buffer_size, 
                           cache=False)
    val_ds = init_tfdata(val_glob, 
                         deterministic=True,  
                         batch_size=args.batch_size, 
                         auto=-1,
                         parse_example=parse_example,
                         norm=norm,
                         repeat=False,  
                         buffer_size=None,
                         cache=False)
    #--------------------------------------------------------------------------
    print('Init model')
    with strategy.scope():
        model = init_model(print_summary=True)
    #--------------------------------------------------------------------------
    print('Init callbacks')
    weight_file = 'model-f%d-e{epoch:03d}-{val_loss:.4f}-{val_%s:.4f}.h5' % (fold_id, 'rmse')
    call_ckpt = tf.keras.callbacks.ModelCheckpoint(weight_file,
                                                   monitor='val_rmse',
                                                   save_best_only=False,
                                                   save_weights_only=True,
                                                   mode='auto',
                                                   verbose=1)
    #-------------------------------------------------------------------------- 
    if 'train' in args.job:
        print('Fit (fold %d)' % fold_id)
        h = model.fit(
            train_ds,
            steps_per_epoch=args.n_examples_train // args.batch_size,
            epochs=args.n_epochs,
            validation_data=val_ds,
            callbacks=[call_ckpt])
    #--------------------------------------------------------------------------
    # Load best model for fold
    m = sorted(glob.glob('model-f%d*.h5' % fold_id))[-1]
    print('Load model (fold %d): %s' % (fold_id, m))
    model.load_weights(m)
    #--------------------------------------------------------------------------
    # TTA
    #--------------------------------------------------------------------------
    for tta_id in range(len(tta_func_list)):
        print('Init datasets for prediction (fold %d, tta %d)' % (fold_id, tta_id))
        test_ds = init_tfdata(test_glob, 
                              deterministic=True,  
                              batch_size=args.batch_size, 
                              auto=-1,
                              parse_example=parse_example,
                              tta=tta_func_list[tta_id],
                              norm=norm,
                              repeat=False,  
                              buffer_size=None,
                              cache=False)
        #--------------------------------------------------------------------------
        # Predict test
        if 'test' in args.job:
            print('Predict TEST (fold %d, tta %d)' % (fold_id, tta_id))
            y_pred_test = model.predict(test_ds, verbose=1)
            np.save(os.path.join(args.data_preds_dir, 
                                 'y_pred_test_fold_%d_tta_%d.npy' % (fold_id, tta_id)), y_pred_test)
        #--------------------------------------------------------------------------

### Create raw submission

In [None]:
# Load predictions and sample submission
y_pred = np.load(os.path.join(args.data_preds_dir, 'y_pred_test_fold_0_tta_0.npy'))
subm_df = pd.read_csv(os.path.join(args.data_dir, 'sample_submission.csv'))
subm_df['time_delta'] = y_pred

# Replace negative values (if present) with 1.0
subm_df.loc[subm_df['time_delta'] <= 0, 'time_delta'] = 1.0

# Save
subm_df[['idx', 'time_delta']].to_csv('submission_raw.csv', index=False)

### Create calibrated submission

In [None]:
# Create data groups for calibration
test_df['idx'] = test_df['image_id'].map(lambda x: int(x.split('_')[-1]))
test_df['species'] = test_df['image'].map(lambda x: x[0].split('/')[5])
test_df['ts'] = test_df['image'].map(lambda x: x[0].split('/')[5] + '_' + x[0].split('/')[6])
subm_df = pd.merge(subm_df, test_df[['idx', 'species', 'ts']], on='idx', how='left')

# Apply calibration coefficients
subm_df.loc[subm_df['species'] == 'LT', 'time_delta'] *= 1.10
subm_df.loc[subm_df['species'] == 'BC', 'time_delta'] *= 1.05
subm_df.loc[(subm_df['ts'] == 'LT_1088') | (subm_df['ts'] == 'LT_1089'), 'time_delta'] *= 0.65

# Save
subm_df[['idx', 'time_delta']].to_csv('submission_calibrated.csv', index=False)