# Overview of the competition (as presented on [kaggle](https://www.kaggle.com/c/siim-isic-melanoma-classification/overview))

Skin cancer is the most prevalent type of cancer. Melanoma, specifically, is responsible for 75% of skin cancer deaths, despite being the least common skin cancer. The American Cancer Society estimates over 100,000 new melanoma cases will be diagnosed in 2020. It's also expected that almost 7,000 people will die from the disease. As with other cancers, early and accurate detection—potentially aided by data science—can make treatment more effective.

In this competition, you’ll identify melanoma in images of skin lesions. In particular, you’ll use images within the same patient and determine which are likely to represent a melanoma. Using patient-level contextual information may help the development of image analysis tools, which could better support clinical dermatologists.

Melanoma is a deadly disease, but if caught early, most melanomas can be cured with minor surgery. Image analysis tools that automate the diagnosis of melanoma will improve dermatologists' diagnostic accuracy. Better detection of melanoma has the opportunity to positively impact millions of people.

# Data

The dataset contains 33,126 dermoscopic training images of unique benign and malignant skin lesions from over 2,000 patients. For each image, we are provided with the following information:

* dermoscopic image;
* patient_id - unique patient identifier;
* sex - the sex of the patient (when unknown, will be blank);
* age_approx - approximate patient age at time of imaging;
* anatom_site_general_challenge - location of imaged site;
* diagnosis - detailed diagnosis information (train only);
* benign_malignant - indicator of malignancy of imaged lesion;
* target - binarized version of the target variable;

We will use an external datasource that processed the given dataset in a very convenient format to use for Machine Learning training, as described [here](https://www.kaggle.com/cdeotte/melanoma-256x256). Our input images have size 256x256. Several other size options are available.


## What are we predicting?

We are predicting a binary target for each image. Your model should predict the probability (floating point) between 0.0 and 1.0 that the lesion in the image is malignant (the target). In the training data, train.csv, the value 0 denotes benign, and 1 indicates malignant.

## How are predictions evaluated?

Submissions are evaluated on area under the ROC curve (AUC) between the predicted probability and the observed target. This is particularly useful here, as the data provided is highly imbalanced, with only 1.8% malignant images, and if we were to use accuracy as a metric then a model that always predicts benign will have accuracy 98.2%. The Area Under the ROC curve (AUC) is the probability that a classifier will be more confident that a randomly chosen positive example is actually positive than that a randomly chosen negative example is positive (see [here](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve) for a proof of this interpretation).

# Machine Learning Algorithm for predicting melanoma

## Non-technical overview of the model

For a computer, images are simply an array of real numbers representing pixel intensities accross 3 channels (Red, Green and Blue). 

![](https://www.kdnuggets.com/wp-content/uploads/image-classification-cat-1.jpg)

[Source of the image](https://www.kdnuggets.com/2017/08/convolutional-neural-networks-image-recognition.html)

Convolutional Neural Networks (CNNs) are designed to process these arrays of numbers and extract *meaningful* interpretations of the images. These interpretations contain information such as locations of edges, or distinctive patterns and allow a CNN model to make a classification decision (in our case, is this benign or malign). For example, after going through a series of transformations as part of a model, the above cat image could have the one dimensional array representation

$$A = [2.17, -0.44, 3.01, 4.002, -11]$$

While the original input is also given as an array, slight variations of position, rotation, or contrast of an image can yield very different input arrays. It is thus essential for a CNN model to learn to account for these variations, so that the final array representation is robust and pictures of same entities end up having similar array representations.

The model learns simultaneously two things as part of the training: 

1. how to effectively represent the image as a one dimensional array;
2. how to use that representation for making the final call;

For the melanoma classification, we will choose a model that was already trained to achieve high accuracy on Task 1. Specifically, we will download a pre-trained model that achieved high-accuracy when asked to discriminate accross thousands of classes of images, and we will fine-tune it to learn melanoma image embeddings by training it on our dataset. We also complement the model by giving it the task to learn from scratch useful representations for the other metadata features provided (sex, age, etc). As part of the training the model will thus perform the following steps:

1. fine-tune a pre-trained image embedder;
2. learn useful embeddings for the additional metadata features;
3. combine the image and the metadata embeddings;
4. use the final embedding to decide if the result is benign or malignant.

## Technical specs of the model

While one can readily obtain great results by using only a powerful image classification model, we will create a model that also makes use of the additional metadata provided. 
For this, we will combine a pretrained image model with an embedding of the additional metadata that we define ourselves. The steps are as follows:
1. Download a pretrained image classifier that also performs very well on **transfer learning**;
2. Remove the top (output) layer of the image model, so that we actually have a pretrained **image embedder**;
3. Define embeddings for the rest of the metadata;
4. Combine the image embedder with the metadata embeddings and train the resulting model on the provided data.

The model's architecture is illustrated below

![model architecture](https://i.ibb.co/4mKk1Jb/model-with-metadata.png)

For Steps 1 and 2 above, we will use an EfficientNet Model. As described in their 2019 [paper](https://arxiv.org/pdf/1905.11946.pdf), this family of models achieved State-of-the-art performance on both image classification and transfer learning, while being significantly smaller (in number of parameters) and faster than best existing models. Moreover, the TensorFlow implementation of the model allows us to download the model with the pretrained weights on ImageNet and without the top layer, as desired.

To increase model's performance, in addition to creating a combined metadata + image model, the following techniques have been used:
1. augment images using techniques such as cropping, rotation, shear;
2. use several test time augmentation for images and average over the predictions to obtain a final prediction;
3. use [label smoothing](https://towardsdatascience.com/what-is-label-smoothing-108debd7ef06) and dropout for regularization;
4. use a learning rate scheduler, which determines the size and evolution of learning rate during training.

The above listed steps for increasing model's performance were already provided in the public kernel https://www.kaggle.com/agentauers/incredible-tpus-finetune-effnetb0-b6-at-once and what is presented below are merely small adaptations.

## Model's performance

The model below achieved 0.9224 AUC on the competition (for comparison, the first place achieved 0.9490). Since we didn't use a specified seed while training the model, your performance might differ slightly if reproducing the steps below, due to random initializations and Dropout choices that take place during training. Using the TPUs, the training time of the below algorithm is approximately one hour.

# Acknowledgements

I would like to thank my friend Raluca Turc for working together with me on this competition.
I would like to thank everyone that made their work public as part of the competition and allowed other participants like me to learn from their work. In particular, I would like to thank [Chris Deotte](https://www.kaggle.com/cdeotte) and the author of [this kernel](https://www.kaggle.com/agentauers/incredible-tpus-finetune-effnetb0-b6-at-once) (user [AgentAuers](https://www.kaggle.com/agentauers)) for preparing the data pipeline in such a nice and easily adaptable way.

# Reproducing the model

## Instructions for running the cells below on [kaggle](http://www.kaggle.com)


On the RHS toolbar:
1. Toggle on the "Internet" button.
2. Import the "melanoma-256x256" dataset.
3. Switch the accelerator to TPU.

## Installations & Imports

In [None]:
!pip install -q efficientnet

In [None]:
import math
import os
import random
import re

import numpy as np
import pandas as pd

import efficientnet.tfkeras as efn
import tensorflow as tf
import tensorflow.keras.backend as K
import tensorflow.keras.layers as layers

from kaggle_datasets import KaggleDatasets

from keras.utils.vis_utils import plot_model
from tensorflow.keras.preprocessing.image import random_rotation, random_shear, random_shift, random_zoom
from tensorflow.keras.models import load_model
from tensorflow import feature_column as fc
from tensorflow.compat.v1.tpu.experimental import embedding_column as tpu_embedding_col

## Data Pipeline

The pipeline below for loading tf records and training has been adapted from the kernel https://www.kaggle.com/agentauers/incredible-tpus-finetune-effnetb0-b6-at-once

## Load the external dataset

In [None]:
BASEPATH = "../input/siim-isic-melanoma-classification"
df_train = pd.read_csv(os.path.join(BASEPATH, 'train.csv'))
df_test  = pd.read_csv(os.path.join(BASEPATH, 'test.csv'))
df_sub   = pd.read_csv(os.path.join(BASEPATH, 'sample_submission.csv'))

GCS_PATH    = KaggleDatasets().get_gcs_path('melanoma-256x256')
files_train = np.sort(np.array(tf.io.gfile.glob(GCS_PATH + '/train*.tfrec')))
files_test  = np.sort(np.array(tf.io.gfile.glob(GCS_PATH + '/test*.tfrec')))

def count_data_items(filenames):
    n = [int(re.compile(r"-([0-9]*)\.").search(filename).group(1)) 
         for filename in filenames]
    return np.sum(n)

## Configure the TPU

In [None]:
DEVICE = "TPU"
if DEVICE == "TPU":
    print("connecting to TPU...")
    try:
        tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
        print('Running on TPU ', tpu.master())
    except ValueError:
        print("Could not connect to TPU")
        tpu = None

    if tpu:
        try:
            print("initializing  TPU ...")
            tf.config.experimental_connect_to_cluster(tpu)
            tf.tpu.experimental.initialize_tpu_system(tpu)
            strategy = tf.distribute.experimental.TPUStrategy(tpu)
            print("TPU initialized")
        except _:
            print("failed to initialize TPU")
    else:
        DEVICE = "GPU"

if DEVICE != "TPU":
    print("Using default strategy for CPU and single GPU")
    strategy = tf.distribute.get_strategy()

if DEVICE == "GPU":
    print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
    

AUTO     = tf.data.experimental.AUTOTUNE
REPLICAS = strategy.num_replicas_in_sync
print(f'REPLICAS: {REPLICAS}')

## Define the image loading pipeline

### Define pipeline configuration

In [None]:
CFG = {
    # image augmentation configs
    'read_size': 256, 
    'crop_size': 250,
    'net_size': 248,
    'rot': 180.0,
    'shr': 1.5,
    'hzoom': 6.0,
    'wzoom': 6.0,
    'hshift': 6.0,
    'wshift': 6.0,
    
    # learning rate scheduler configs
    'LR_START': 0.000003,
    'LR_MAX': 0.000020,
    'LR_MIN': 0.000001,
    'LR_RAMPUP_EPOCHS': 5,
    'LR_SUSTAIN_EPOCHS': 0,
    'LR_EXP_DECAY': 0.8,
    
    # model configs
    'model_name': 'EfficientNetB6',
    'batch_size': 16,
    'epochs': 50,
    'optimizer': 'adam',
    'label_smooth_fac': 0.01,
    'FC_size': 256,
    'tta_steps': 25} # test time augmentation

### Functions for reading the tfrecords

In [None]:
def read_labeled_tfrecord(example):
    tfrec_format = {
        'image': tf.io.FixedLenFeature([], tf.string),
        'sex': tf.io.FixedLenFeature([], tf.int64),
        'age_approx': tf.io.FixedLenFeature([], tf.int64),
        'anatom_site_general_challenge': tf.io.FixedLenFeature([], tf.int64),
        # These features are available for training only.
        #'diagnosis': tf.io.FixedLenFeature([], tf.int64),
        #'patient_id': tf.io.FixedLenFeature([], tf.int64),
        'target': tf.io.FixedLenFeature([], tf.int64)
    }           
    example = tf.io.parse_single_example(example, tfrec_format)
    target = example.pop('target')
    return example, target


def read_unlabeled_tfrecord(example, return_image_names):
    tfrec_format = {
        'image': tf.io.FixedLenFeature([], tf.string),
        'sex': tf.io.FixedLenFeature([], tf.int64),
        'age_approx': tf.io.FixedLenFeature([], tf.int64),
        'anatom_site_general_challenge': tf.io.FixedLenFeature([], tf.int64),
        'image_name': tf.io.FixedLenFeature([], tf.string),
    }
    example = tf.io.parse_single_example(example, tfrec_format)
    return example, example['image_name'] if return_image_names else 0

## Function for image augmentations

In [None]:
def prepare_image(img, augment=True):    
    img = tf.image.decode_jpeg(img, channels=3)
    img = tf.image.resize(img, [CFG['read_size'], CFG['read_size']])
    img = tf.cast(img, tf.float32) / 255.0
    
    if augment:
        img = transform(img)
        img = tf.image.random_crop(img, [CFG['crop_size'], CFG['crop_size'], 3])
        img = tf.image.random_flip_left_right(img)
        img = tf.image.random_hue(img, 0.01)
        img = tf.image.random_saturation(img, 0.7, 1.3)
        img = tf.image.random_contrast(img, 0.8, 1.2)
        img = tf.image.random_brightness(img, 0.1)

    else:
        img = tf.image.central_crop(img, CFG['crop_size'] / CFG['read_size'])
                                   
    img = tf.image.resize(img, [CFG['net_size'], CFG['net_size']])
    img = tf.reshape(img, [CFG['net_size'], CFG['net_size'], 3])
    return img


def get_mat(rotation, shear, height_zoom, width_zoom, height_shift, width_shift):
    # returns 3x3 transformmatrix which transforms indicies
        
    # CONVERT DEGREES TO RADIANS
    rotation = math.pi * rotation / 180.
    shear    = math.pi * shear    / 180.

    def get_3x3_mat(lst):
        return tf.reshape(tf.concat([lst],axis=0), [3,3])
    
    # ROTATION MATRIX
    c1   = tf.math.cos(rotation)
    s1   = tf.math.sin(rotation)
    one  = tf.constant([1],dtype='float32')
    zero = tf.constant([0],dtype='float32')
    
    rotation_matrix = get_3x3_mat([c1,   s1,   zero, 
                                   -s1,  c1,   zero, 
                                   zero, zero, one])    
    # SHEAR MATRIX
    c2 = tf.math.cos(shear)
    s2 = tf.math.sin(shear)    
    
    shear_matrix = get_3x3_mat([one,  s2,   zero, 
                                zero, c2,   zero, 
                                zero, zero, one])        
    # ZOOM MATRIX
    zoom_matrix = get_3x3_mat([one/height_zoom, zero,           zero, 
                               zero,            one/width_zoom, zero, 
                               zero,            zero,           one])    
    # SHIFT MATRIX
    shift_matrix = get_3x3_mat([one,  zero, height_shift, 
                                zero, one,  width_shift, 
                                zero, zero, one])
    
    return K.dot(K.dot(rotation_matrix, shear_matrix), 
                 K.dot(zoom_matrix,     shift_matrix))

def transform(image):    
    # input image - is one image of size [dim,dim,3] not a batch of [b,dim,dim,3]
    # output - image randomly rotated, sheared, zoomed, and shifted
    DIM = CFG["read_size"]
    XDIM = DIM%2 #fix for size 331
    
    rot = CFG['rot'] * tf.random.normal([1], dtype='float32')
    shr = CFG['shr'] * tf.random.normal([1], dtype='float32') 
    h_zoom = 1.0 + tf.random.normal([1], dtype='float32') / CFG['hzoom']
    w_zoom = 1.0 + tf.random.normal([1], dtype='float32') / CFG['wzoom']
    h_shift = CFG['hshift'] * tf.random.normal([1], dtype='float32') 
    w_shift = CFG['wshift'] * tf.random.normal([1], dtype='float32') 

    # GET TRANSFORMATION MATRIX
    m = get_mat(rot,shr,h_zoom,w_zoom,h_shift,w_shift) 

    # LIST DESTINATION PIXEL INDICES
    x   = tf.repeat(tf.range(DIM//2, -DIM//2,-1), DIM)
    y   = tf.tile(tf.range(-DIM//2, DIM//2), [DIM])
    z   = tf.ones([DIM*DIM], dtype='int32')
    idx = tf.stack( [x,y,z] )
    
    # ROTATE DESTINATION PIXELS ONTO ORIGIN PIXELS
    idx2 = K.dot(m, tf.cast(idx, dtype='float32'))
    idx2 = K.cast(idx2, dtype='int32')
    idx2 = K.clip(idx2, -DIM//2+XDIM+1, DIM//2)
    
    # FIND ORIGIN PIXEL VALUES           
    idx3 = tf.stack([DIM//2-idx2[0,], DIM//2-1+idx2[1,]])
    d    = tf.gather_nd(image, tf.transpose(idx3))
        
    return tf.reshape(d,[DIM, DIM,3])

In [None]:
def prepare_image_in_example(example, augment):
    example["image"] = prepare_image(example["image"], augment=augment)
    return example


def remove_image_name(example):
    example.pop('image_name', None)
    return example


def get_dataset(files, augment=False, shuffle=False, repeat=False, 
                labeled=True, return_image_names=True,is_test=False):
    
    ds = tf.data.TFRecordDataset(files, num_parallel_reads=AUTO)
    ds = ds.cache()
    
    if repeat:
        ds = ds.repeat()
    
    if shuffle: 
        ds = ds.shuffle(1024*8)
        opt = tf.data.Options()
        opt.experimental_deterministic = False
        ds = ds.with_options(opt)
        
    if labeled: 
        ds = ds.map(read_labeled_tfrecord, num_parallel_calls=AUTO)
    else:
        ds = ds.map(lambda example: read_unlabeled_tfrecord(example, return_image_names), 
                    num_parallel_calls=AUTO)     
    
    ds = ds.map(
                lambda example, imgname_or_label: (prepare_image_in_example(example, augment=augment), imgname_or_label), 
                num_parallel_calls=AUTO)
    
    if is_test:
        ds = ds.map(
            lambda example, imgname_or_label: (remove_image_name(example), imgname_or_label))
    
    ds = ds.batch(CFG['batch_size'] * REPLICAS)
    ds = ds.prefetch(AUTO)
    
    if is_test:
        return ds

    ds = ds.map(lambda img, label: (img, tuple([label])))
    return ds


## Create the dataset

In [None]:
ds_train = get_dataset(
    files_train, augment=True, shuffle=True, repeat=True)

## Create the image embedder

**Important note**: While we want to optimize for AUC, the AUC function is not differentiable, and practice has shown that a good indirect way for optimizing for AUC is to simply use Binary Cross Entropy. For more details, see [here](https://towardsdatascience.com/explicit-auc-maximization-70beef6db14e).

In [None]:
def get_efficientnet_model():
    """Builds an EfficientNet model."""
    model_input = tf.keras.Input(shape=(CFG['net_size'], CFG['net_size'], 3), name='image')
    dummy = tf.keras.layers.Lambda(lambda x:x)(model_input)    
    outputs = []    

    constructor = getattr(efn, CFG['model_name'])
    x = constructor(include_top=False, weights='imagenet', 
                    input_shape=(CFG['net_size'], CFG['net_size'], 3), 
                    pooling='avg')(dummy)

    outputs.append(x)
    
    model = tf.keras.Model(model_input, outputs, name=CFG['model_name'])
    model.summary()
    
    return model

In [None]:
def compile_model(model_fn):
    """Compiles a model by calling model_fn()."""
    with strategy.scope():
        model = model_fn()
        losses = tf.keras.losses.BinaryCrossentropy(label_smoothing = CFG['label_smooth_fac'])
        
        model.compile(
            optimizer=CFG['optimizer'],
            loss=losses,
            experimental_run_tf_function=False,
            metrics=[tf.keras.metrics.AUC(name='auc')])
    return model

## Define learning rate scheduler

In [None]:
def get_lr_callback():
    lr_start = CFG['LR_START']
    lr_max = CFG['LR_MAX'] * strategy.num_replicas_in_sync
    lr_min = CFG['LR_MIN']
    lr_ramp_ep = CFG['LR_RAMPUP_EPOCHS']
    lr_sus_ep = CFG['LR_SUSTAIN_EPOCHS']
    lr_decay = CFG['LR_EXP_DECAY']
   
    def lrfn(epoch):
        if epoch < lr_ramp_ep:
            lr = (lr_max - lr_start) / lr_ramp_ep * epoch + lr_start
            
        elif epoch < lr_ramp_ep + lr_sus_ep:
            lr = lr_max
            
        else:
            lr = (lr_max - lr_min) * lr_decay**(epoch - lr_ramp_ep - lr_sus_ep) + lr_min
        return lr

    lr_callback = tf.keras.callbacks.LearningRateScheduler(lrfn, verbose=False)
    return lr_callback

## Define the final model that also trains on metadata

In [None]:
def get_efficientnet_with_metadata_model():
    """Defines a model that uses both the image and metadata like sex & age."""

    image_embedder = get_efficientnet_model()
 
    metadata_inputs = {name: layers.Input(name=name, shape=(), dtype="int64")
                       for name in ["sex", "age_approx", "anatom_site_general_challenge"]}

    all_inputs = {"image": image_embedder.input}
    all_inputs.update(metadata_inputs)

    sex = fc.categorical_column_with_identity("sex", num_buckets=2)
    age_in_years = fc.bucketized_column(fc.numeric_column("age_approx"), boundaries=list(range(100)))
    
    ALL_SITES = ['head/neck', 'upper extremity', 'lower extremity', 'torso', 'palms/soles', 'oral/genital']
    site = fc.categorical_column_with_identity(key="anatom_site_general_challenge", num_buckets=len(ALL_SITES))
    
    feature_columns = {
        # Single continuous value (either 0.0 or 1.0).
        "sex": fc.numeric_column("sex", dtype=tf.float32),
        # One-hot vector of size = 100 (100 years)
        "year": fc.indicator_column(age_in_years),
        # Normalized age between 0 and 1; The categorical columns above fail to illustrate that e.g. 34yo is closer to
        # 35yo than to 100yo, since all buckets are independent.
        "age_continuous": fc.numeric_column("age_approx", normalizer_fn=lambda age: tf.cast(age, tf.float32) / 100.0),
        # One-hot vector with one entry per unique site.
        "anatom_site_general_challenge": fc.indicator_column(site),
    }
    
    if DEVICE != "TPU":
        # One-hot vector of size = 200 (2 sexes x 100 years). Doesn't work on TPU (there's no kernel implementation)
        feature_columns["sex_x_age"] =  fc.indicator_column(fc.crossed_column([sex, age_in_years], hash_bucket_size=200))
    
    metadata_embedding = layers.DenseFeatures(feature_columns.values(), name="metadata")(metadata_inputs)
    hidden_layer_meta_1 = tf.keras.layers.Dense(256, activation='relu', name="hidden_layer_meta_1")(metadata_embedding)
    hidden_layer_meta_2 = tf.keras.layers.Dense(32, activation='relu', name='hidden_layer_meta_2')(hidden_layer_meta_1)

    dropout = tf.keras.layers.Dropout(0.5)(image_embedder.output)
    final_embedding = layers.concatenate([hidden_layer_meta_2, dropout])
    hidden_layer = tf.keras.layers.Dense(CFG['FC_size'], name="hidden_layer")(final_embedding)
    output = tf.keras.layers.Dense(1, activation='sigmoid', name="probability")(hidden_layer)
    return tf.keras.Model(all_inputs, output)

plot_model(get_efficientnet_with_metadata_model(), to_file='model_with_metadata.png', show_shapes=True, show_layer_names=True)

## Train the model

In [None]:
steps_train = count_data_items(files_train) / (CFG['batch_size'] * REPLICAS)

model_with_metadata = compile_model(get_efficientnet_with_metadata_model)
model_with_metadata.summary()

model_with_metadata.fit(
    ds_train,
    verbose=1,
    steps_per_epoch=steps_train,
    epochs=CFG['epochs'],
    callbacks=[get_lr_callback()]
)

## Submission code

In [None]:
CFG['batch_size'] = 256

cnt_test = count_data_items(files_test)
steps = cnt_test / (CFG['batch_size'] * REPLICAS) * CFG['tta_steps']
ds_testAug = get_dataset(files_test, augment=True, repeat=True, 
                         labeled=False, return_image_names=False, is_test=True)

predictions = model_with_metadata.predict(ds_testAug, verbose=1, steps=steps)

# Use test time image augmentation and take the mean of all values
preds = np.stack(predictions)
preds = preds[:,:cnt_test* CFG['tta_steps']]
preds = preds[:df_test.shape[0]*CFG['tta_steps']]
preds = np.stack(np.split(preds, CFG['tta_steps']),axis=1)
preds = np.mean(preds, axis=1)
final_predictions = preds.reshape(-1)

## Sort image names and create submission file

The submission ist sorted by image_name, but the dataset yielded a different order. Traverse the test dataset once again and capture the image_names. Then join this list of image_names with the predictions and sort by image_name.

In [None]:
ds = get_dataset(files_test, augment=False, repeat=False, 
                 labeled=False, return_image_names=True)

image_names = np.array([img_name[0].numpy().decode("utf-8") 
                        for example, img_name in iter(ds.unbatch())])

submission = pd.DataFrame(dict(
    image_name=image_names,
    target=final_predictions))

submission = submission.sort_values('image_name')
filename=f'{CFG["model_name"]}-{CFG["FC_size"]}-{CFG["epochs"]}epc-{CFG["label_smooth_fac"]}_lbsf.csv'
submission.to_csv(filename, index=False)

submission.head()