# Final Submission

This notebook is an edited version of our final submission to the PANDAS challenge. This notebook takes the images in the test set and converts them from .tiff to .jpeg, then it breaks the image down into a series of tiles, after which it stitches together the top tiles 25 tiles based on the least amount of blank space in the tile. Once the images are preprocessed, they are fed to a series of models and the results averaged to get the final prediction (more on this in our model blend evaluation notebook). Our top submission, put together by Eliot, used four high performing models.

<a id="ToC"></a>
# Table of contents

[Create Models](#part-one) Add or remove models from the blend

[Image Preprocessing](#part-two)

[Inference Function](#part-three)

[Run Inference](#part-four) See here for testing combonations of models and veiwing performance


In [1]:
### Imports

import numpy as np
import pandas as pd
import os
pd.options.mode.chained_assignment = None


%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai import *
from fastai.vision import *

import os
from PIL import Image
import pandas as pd
import numpy as np
import cv2
import PIL
import random
import openslide
import skimage.io
import matplotlib
import seaborn as sns
import matplotlib.pyplot as plt
from IPython.display import Image, display

In [2]:
### DataFrames

path = Path('/kaggle/input/pandas-to-jpeg-with-tiles-sample')

pd.set_option('mode.chained_assignment', None)

train = pd.read_csv('../input/prostate-cancer-grade-assessment/train.csv')
train[['primary Gleason', 'secondary Gleason']] = train.gleason_score.str.split('+',expand=True)
display(train.head())

train['id'] = train['image_id'] + '.jpeg'
train_isup = train[['id', 'isup_grade']]
train_primary = train[['id', 'primary Gleason']]
train_secondary = train[['id', 'secondary Gleason']]

train_isup['isup_grade'] = train_isup['isup_grade'].astype(int)


train_isup['grade_1'] = pd.Series([1 if x >= 1 else 0 for x in train_isup['isup_grade']], index=train_isup.index)
train_isup['grade_2'] = pd.Series([1 if x >= 2 else 0 for x in train_isup['isup_grade']], index=train_isup.index)
train_isup['grade_3'] = pd.Series([1 if x >= 3 else 0 for x in train_isup['isup_grade']], index=train_isup.index)
train_isup['grade_4'] = pd.Series([1 if x >= 4 else 0 for x in train_isup['isup_grade']], index=train_isup.index)
train_isup['grade_5'] = pd.Series([1 if x >= 5 else 0 for x in train_isup['isup_grade']], index=train_isup.index)

train_bin = train_isup[['id', 'grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5']]

test_df = pd.read_csv('../input/prostate-cancer-grade-assessment/test.csv')
sample_submission = pd.read_csv('../input/prostate-cancer-grade-assessment/sample_submission.csv')

Unnamed: 0,image_id,data_provider,isup_grade,gleason_score,primary Gleason,secondary Gleason
0,0005f7aaab2800f6170c399693a96917,karolinska,0,0+0,0,0
1,000920ad0b612851f8e01bcc880d9b3d,karolinska,0,0+0,0,0
2,0018ae58b01bdadc8e347995b69f99aa,radboud,4,4+4,4,4
3,001c62abd11fa4b57bf7a6c603a11bb9,karolinska,4,4+4,4,4
4,001d865e65ef5d2579c190a0e0350d8f,karolinska,0,0+0,0,0


<a id="part-one"></a>
# Create Models

Object data is a databunch object which resizes all input images to size 448, data_512 resizes all input images to size 512, data_686 uses size 686, and so on and so fourth. The commented out codeblocks contain models that were not selected to be part of our final submission.

[Return to Table of Contents](#ToC)

In [3]:
# WINDOW_SIZE = 200 ### Values for merge large dataset
# STRIDE = 100
# K = 25


data = ImageDataBunch.from_df(path=Path('/kaggle/input/pandas-to-jpeg-with-tiles-sample'),
                              df=train_bin.loc[:1000], #train_isup (the actual target), train_primary, train_secondary
                              valid_pct=0.25,
                              label_col=['grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5'],
                              size=448,
                              bs=8,
                              ds_tfms=get_transforms()
        ).normalize(imagenet_stats)

data_512 = ImageDataBunch.from_df(path=Path('/kaggle/input/pandas-to-jpeg-with-tiles-sample'),
                              df=train_bin.loc[:1000], #train_isup (the actual target), train_primary, train_secondary
                              valid_pct=0.25,
                              label_col=['grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5'],
                              size=512,
                              bs=8,
                              ds_tfms=get_transforms()
        ).normalize(imagenet_stats)

data_686 = ImageDataBunch.from_df(path=Path('/kaggle/input/pandas-to-jpeg-with-tiles-sample'),
                              df=train_bin.loc[:1000], #train_isup (the actual target), train_primary, train_secondary
                              valid_pct=0.25,
                              label_col=['grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5'],
                              size=686,
                              bs=8,
                              ds_tfms=get_transforms()
        ).normalize(imagenet_stats)

data_784 = ImageDataBunch.from_df(path=Path('/kaggle/input/pandas-to-jpeg-with-tiles-sample'),
                              df=train_bin.loc[:1000], #train_isup (the actual target), train_primary, train_secondary
                              valid_pct=0.25,
                              label_col=['grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5'],
                              size=784,
                              bs=4,
                              ds_tfms=get_transforms()
        ).normalize(imagenet_stats)
data_616 = ImageDataBunch.from_df(path=Path('/kaggle/input/pandas-to-jpeg-with-tiles-sample'),
                              df=train_bin.loc[:1000], #train_isup (the actual target), train_primary, train_secondary
                              valid_pct=0.25,
                              label_col=['grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5'],
                              size=616,
                              bs=10,
                              ds_tfms=get_transforms()
        ).normalize(imagenet_stats)                                  
data_840 = ImageDataBunch.from_df(path=Path('/kaggle/input/pandas-to-jpeg-with-tiles-sample'),
                              df=train_bin.loc[:1000], #train_isup (the actual target), train_primary, train_secondary
                              valid_pct=0.25,
                              label_col=['grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5'],
                              size=840,
                              bs=8,
                              ds_tfms=get_transforms()                                  
                                  
                                  
        ).normalize(imagenet_stats)

### Model objects:
The cells below instantiate each model that is used for inference. To add new models simply copy one of the cells below and modify the object name, model path, Databunch, and load name.

In [4]:
learners = [] #this variable will hold a list of models that will be iterated through at inference time
learner_dfs = [] #this variable will hold a list of dfs that will be iterated through to store predictions from each model

In [5]:
# model 0
learn_1 = cnn_learner(data, models.densenet161, metrics=accuracy , pretrained=False) #resnet101, densenet161, vgg16_bn
Model_Path = Path('/kaggle/input/densenet161-with-tiles-progressive-resize-448/')
learn_1.model_dir = Model_Path
learn_1.load('checkpoint-5');
Model_Path = Path('/kaggle/working/')
learn_1.model_dir = Model_Path

learners += [learn_1]

learn_df_1 = pd.DataFrame(columns=['id', 'grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5'])
learner_dfs += [learn_df_1]

In [6]:
# learn_2 = cnn_learner(data, models.vgg16_bn, metrics=accuracy , pretrained=False) #resnet101, densenet161, vgg16_bn
# Model_Path = Path('/kaggle/input/with-tiles-progressive-resize-448/')
# learn_2.model_dir = Model_Path
# learn_2.load('checkpoint-5')
# Model_Path = Path('/kaggle/working/')
# learn_2.model_dir = Model_Path

# learners += [learn_2]

# learn_df_2 = pd.DataFrame(columns=['id', 'grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5'])
# learner_dfs += [learn_df_2]

In [7]:
## new tiles panda 718 model 1
learn_2 = cnn_learner(data, models.vgg16_bn, metrics=accuracy , pretrained=False) #resnet101, densenet161, vgg16_bn
Model_Path = Path('/kaggle/input/new-tiles-panda-challenge-with-fastai-718')
learn_2.model_dir = Model_Path
learn_2.load('checkpoint-3')
Model_Path = Path('/kaggle/working/')
learn_2.model_dir = Model_Path

learners += [learn_2]

learn_df_2 = pd.DataFrame(columns=['id', 'grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5'])
learner_dfs += [learn_df_2]

In [8]:
# learn_3 = cnn_learner(data_512, models.vgg16_bn, metrics=accuracy , pretrained=False) #resnet101, densenet161, vgg16_bn
# Model_Path = Path('/kaggle/input/panda-models/')
# learn_3.model_dir = Model_Path
# learn_3.load('checkpoint-3')
# Model_Path = Path('/kaggle/working/')
# learn_3.model_dir = Model_Path

# learners += [learn_3]

# learn_df_3 = pd.DataFrame(columns=['id', 'grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5'])
# learner_dfs += [learn_df_3]

In [9]:
## new tiles panda 719 model 2
# learn_3 = cnn_learner(data, models.vgg16_bn, metrics=accuracy , pretrained=False) #resnet101, densenet161, vgg16_bn
# Model_Path = Path('/kaggle/input/fork-of-new-tiles-panda-challenge-with-fastai-719')
# learn_3.model_dir = Model_Path
# learn_3.load('checkpoint-3')
# Model_Path = Path('/kaggle/working/')
# learn_3.model_dir = Model_Path

# learners += [learn_3]

# learn_df_3 = pd.DataFrame(columns=['id', 'grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5'])
# learner_dfs += [learn_df_3]

In [10]:
# learn_4 = cnn_learner(data, models.densenet161, metrics=accuracy , pretrained=False) #resnet101, densenet161, vgg16_bn
# Model_Path = Path('/kaggle/input/densenet161-with-tiles-progressive-resize-448/')
# learn_4.model_dir = Model_Path
# learn_4.load('checkpoint-6')
# Model_Path = Path('/kaggle/working/')
# learn_4.model_dir = Model_Path

# learners += [learn_4]

# learn_df_4 = pd.DataFrame(columns=['id', 'grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5'])
# learner_dfs += [learn_df_4]

In [11]:
# model 3
# learn_5 = cnn_learner(data, models.vgg16_bn, metrics=accuracy , pretrained=False) #resnet101, densenet161, vgg16_bn
# Model_Path = Path('/kaggle/input/with-tiles-progressive-resize-448/')
# learn_5.model_dir = Model_Path
# learn_5.load('checkpoint-4')
# Model_Path = Path('/kaggle/working/')
# learn_5.model_dir = Model_Path

# learners += [learn_5]

# learn_df_5 = pd.DataFrame(columns=['id', 'grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5'])
# learner_dfs += [learn_df_5]

In [12]:
# learn_6 = cnn_learner(data, models.resnet50, metrics=accuracy , pretrained=False) #resnet101, densenet161, vgg16_bn
# Model_Path = Path('/kaggle/input/resnet-with-tiles-progressive-resize')
# learn_6.model_dir = Model_Path
# learn_6.load('checkpoint-6')
# Model_Path = Path('/kaggle/working/')
# learn_6.model_dir = Model_Path

# learners += [learn_6]

# learn_df_6 = pd.DataFrame(columns=['id', 'grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5'])
# learner_dfs += [learn_df_6]

In [13]:
# model 4
learn_7 = cnn_learner(data_686, models.densenet169, metrics=accuracy , pretrained=False) #resnet101, densenet161, vgg16_bn
Model_Path = Path('/kaggle/input/densenet169size686/')
learn_7.model_dir = Model_Path
learn_7.load('dense169-checkpoint-4')
Model_Path = Path('/kaggle/working/')
learn_7.model_dir = Model_Path

learners += [learn_7]

learn_df_7 = pd.DataFrame(columns=['id', 'grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5'])
learner_dfs += [learn_df_7]

In [14]:
# model 5
learn_8 = cnn_learner(data_686, models.densenet169, metrics=accuracy , pretrained=False) #resnet101, densenet161, vgg16_bn
Model_Path = Path('/kaggle/input/densenet169size686/')
learn_8.model_dir = Model_Path
learn_8.load('dense169-checkpoint-5')
Model_Path = Path('/kaggle/working/')
learn_8.model_dir = Model_Path

learners += [learn_8]

learn_df_8 = pd.DataFrame(columns=['id', 'grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5'])
learner_dfs += [learn_df_8]

In [15]:
# learn_9 = cnn_learner(data_686, models.resnet50, metrics=accuracy , pretrained=False) #resnet101, densenet161, vgg16_bn
# Model_Path = Path('/kaggle/input/densenet169size686/')
# learn_9.model_dir = Model_Path
# learn_9.load('resnet50-checkpoint-5')
# Model_Path = Path('/kaggle/working/')
# learn_9.model_dir = Model_Path

# learners += [learn_9]

# learn_df_9 = pd.DataFrame(columns=['id', 'grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5'])
# learner_dfs += [learn_df_9]

In [16]:
# learn_10 = cnn_learner(data_784, models.densenet201, metrics=accuracy , pretrained=False) #resnet101, densenet161, vgg16_bn
# Model_Path = Path('/kaggle/input/pandas-size-784-models/')
# learn_10.model_dir = Model_Path
# learn_10.load('dense201-checkpoint-5')
# Model_Path = Path('/kaggle/working/')
# learn_10.model_dir = Model_Path

# learners += [learn_10]

# learn_df_10 = pd.DataFrame(columns=['id', 'grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5'])
# learner_dfs += [learn_df_10]

In [17]:
# learn_11 = cnn_learner(data_784, models.densenet201, metrics=accuracy , pretrained=False) #resnet101, densenet161, vgg16_bn
# Model_Path = Path('/kaggle/input/pandas-size-784-models/')
# learn_11.model_dir = Model_Path
# learn_11.load('dense201-checkpoint-4')
# Model_Path = Path('/kaggle/working/')
# learn_11.model_dir = Model_Path

# learners += [learn_11]

# learn_df_11 = pd.DataFrame(columns=['id', 'grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5'])
# learner_dfs += [learn_df_11]

In [18]:
# learn_12 = cnn_learner(data_784, models.resnet101, metrics=accuracy , pretrained=False) #resnet101, densenet161, vgg16_bn
# Model_Path = Path('/kaggle/input/pandas-size-784-models/')
# learn_12.model_dir = Model_Path
# learn_12.load('resnet101-checkpoint-2')
# Model_Path = Path('/kaggle/working/')
# learn_12.model_dir = Model_Path

# learners += [learn_12]

# learn_df_12 = pd.DataFrame(columns=['id', 'grade_1', 'grade_2', 'grade_3', 'grade_4', 'grade_5'])
# learner_dfs += [learn_df_12]

In [19]:
# This cell is just to confirm all models where added succesfully

for l in learners:
    print('i')

i
i
i
i


In [20]:
# Creates the output directory for new images

try:
    os.mkdir('resized-test')
except:
    pass

<a id="part-two"></a>
# Image Preprocessing:
The cell below contains the code which is used to preprocess the .tiff files into .jpeg images.

[return to Table of Contents](#ToC)

In [21]:
def compute_statistics(image):
    """
    Args:
        image                  numpy.array   multi-dimensional array of the form WxHxC
    
    Returns:
        ratio_white_pixels     float         ratio of white pixels over total pixels in the image 
    """
    width, height = image.shape[0], image.shape[1]
    num_pixels = width * height
    
    num_white_pixels = 0
    
    summed_matrix = np.sum(image, axis=-1)
    # Note: A 3-channel white pixel has RGB (255, 255, 255)
    num_white_pixels = np.count_nonzero(summed_matrix > 620)
    ratio_white_pixels = num_white_pixels / num_pixels
    
    green_concentration = np.mean(image[1])
    blue_concentration = np.mean(image[2])
    
    return ratio_white_pixels, green_concentration, blue_concentration

def select_k_best_regions(regions, k=20):
    """
    Args:
        regions               list           list of 2-component tuples first component the region, 
                                             second component the ratio of white pixels
                                             
        k                     int            number of regions to select
    """
    regions = [x for x in regions if x[3] > 180 and x[4] > 180]
    k_best_regions = sorted(regions, key=lambda tup: tup[2])[:k]
    return k_best_regions

def get_k_best_regions(coordinates, image, window_size=1024):#window size 512 is default
    regions = {}
    for i, tup in enumerate(coordinates):
        x, y = tup[0], tup[1]
        regions[i] = image[x : x+window_size, y : y+window_size, :]
    
    return regions

def generate_patches(slide_path, window_size=200, stride=256, k=20):#stride 128
    
    image = skimage.io.MultiImage(slide_path)[-2]
    image = np.array(image)
    
    max_width, max_height = image.shape[0], image.shape[1]
    regions_container = []
    i = 0
    
    while window_size + stride*i <= max_height:
        j = 0
        
        while window_size + stride*j <= max_width:            
            x_top_left_pixel = j * stride
            y_top_left_pixel = i * stride
            
            patch = image[
                x_top_left_pixel : x_top_left_pixel + window_size,
                y_top_left_pixel : y_top_left_pixel + window_size,
                :
            ]
            
            ratio_white_pixels, green_concentration, blue_concentration = compute_statistics(patch)
            
            region_tuple = (x_top_left_pixel, y_top_left_pixel, ratio_white_pixels, green_concentration, blue_concentration)
            regions_container.append(region_tuple)
            
            j += 1
        
        i += 1
    
    k_best_region_coordinates = select_k_best_regions(regions_container, k=k)
    k_best_regions = get_k_best_regions(k_best_region_coordinates, image, window_size)
    
    return image, k_best_region_coordinates, k_best_regions

def display_images(regions, title):
    fig, ax = plt.subplots(5, 4, figsize=(15, 15))
    
    for i, region in regions.items():
        ax[i//4, i%4].imshow(region)
    
    fig.suptitle(title)
    
def glue_to_one_picture(image_patches, window_size=200, k=32):
    side = int(np.sqrt(k))
    image = np.zeros((side*window_size, side*window_size, 3), dtype=np.int16)
        
    for i, patch in image_patches.items():
        x = i // side
        y = i % side
        image[
            x * window_size : (x+1) * window_size,
            y * window_size : (y+1) * window_size,
            :
        ] = patch
    
    return image

The cell below contains the values by which the images are preprocessed.

In [22]:
WINDOW_SIZE = 200
STRIDE = 100
K = 25

<a id="part-three"></a>
# Inference function:
The function below iterates through each of the files in the test dataframe, first preprocessing the images, then running inference on each image by each model, and taking the averaged result. Since we iterate through the images one at a time there isn't a significant advantage to running inference on a GPU over a CPU, this notebook we ran with a CPU to conserve Kaggle's GPU quota. 

[return to Table of Contents](#ToC)

In [23]:
# filler refers to what value is returned if the models fail to produce a valid inference for a particular case.
# 0 is the default for production cases, -1 if what we use for testing to see where errors are occurring
def inference_rt(data_dir, df, filler=0, evaluation_mode=False):
    df['isup_grade'] = pd.Series([np.NaN for x in df['image_id']], index=df.index)
    lst = []
    for i in df['image_id']:
        try:
            url = data_dir + i + '.tiff'
            image, best_coordinates, best_regions = generate_patches(url, window_size=WINDOW_SIZE, stride=STRIDE, k=K)
            glued_image = glue_to_one_picture(best_regions, window_size=WINDOW_SIZE, k=K)
            cv2.imwrite(i+".jpeg", glued_image)
            img = open_image('/kaggle/working/'+i+'.jpeg')
            s_1 = []
            s_2 = []
            s_3 = []
            s_4 = []
            s_5 = []
            s_all = [s_1, s_2, s_3, s_4, s_5]
            for l in learners:
                pred_class,pred_idx,outputs = l.predict(img)
                s_1 += [float(outputs[0])]
                s_2 += [float(outputs[1])]
                s_3 += [float(outputs[2])]
                s_4 += [float(outputs[3])]
                s_5 += [float(outputs[4])]
            count = 0
            for s in s_all:
                s_all[count] = round(np.mean(s))
                count+=1
            _ = sum(s_all)
            df['isup_grade'].loc[df['image_id'] == i] = int(_)
        except:
            df['isup_grade'].loc[df['image_id'] == i] = filler
    return df

In [24]:
#sample_submission['isup_grade'].loc[df['image_id'] == i] = filler

# Quadratic Weighted Kappa Function

In [25]:
from sklearn.metrics import confusion_matrix

def quadratic_kappa(actuals, preds, N=6):
    """This function calculates the Quadratic Kappa Metric used for Evaluation in the PetFinder competition
    at Kaggle. It returns the Quadratic Weighted Kappa metric score between the actual and the predicted values 
    of adoption rating."""
    w = np.zeros((N,N))
    O = confusion_matrix(actuals, preds)
    for i in range(len(w)): 
        for j in range(len(w)):
            w[i][j] = float(((i-j)**2)/(N-1)**2)
    
    act_hist=np.zeros([N])
    for item in actuals: 
        act_hist[item]+=1
    
    pred_hist=np.zeros([N])
    for item in preds: 
        pred_hist[item]+=1
                         
    E = np.outer(act_hist, pred_hist);
    E = E/E.sum();
    O = O/O.sum();
    
    num=0
    den=0
    for i in range(len(w)):
        for j in range(len(w)):
            num+=w[i][j]*O[i][j]
            den+=w[i][j]*E[i][j]
    return (1 - (num/den))

<a id="part-four"></a>
# Run Inference!

When running this notebook normally the `if os.path.exists('../input/prostate-cancer-grade-assessment/test_images'):` will return false, when submitting the notebook to the competition this statement will return true. This codeblock will run each of the test images through preprocessing, run inference, and record the outputs to a submission dataframe.

The else block will perform the same task as above, however it will run it on a sample of the training dataset (sample size based on `evaluation_cases`). The purpose of this is to evaluate the performance of the models, and check for errors. The output dataframe has the predicted values in the 'isup_grade' column and the 'actual' column has the actual values. This will also show the quadratic weighted kappa score on the training set, setting `evaluation_casses` to 1000 tends to get results close to the score on the test data.

[return to Table of Contents](#ToC)

In [26]:
%%time
evaluation_cases = 20 # number of cases to check when running validation

if os.path.exists('../input/prostate-cancer-grade-assessment/test_images'):
    data_dir = '../input/prostate-cancer-grade-assessment/test_images/'
    print('inference!')
    df_out = inference_rt(data_dir, test_df)
    df_out['isup_grade'] = df_out['isup_grade'].astype(int)
    sample_submission = df_out[['image_id','isup_grade']].copy()
else:
    data_dir = '../input/prostate-cancer-grade-assessment/train_images/'
    print('Evaluation inference!')
    train_sample = train_isup.copy()
    train_sample['image_id'] = pd.Series([x.rstrip('.jepg') for x in train_sample.id], index=train_sample.index) #train_9k['id'].rstrip('.jepg')
    train_sample = train_sample[['image_id', 'isup_grade']].head(evaluation_cases)
    train_eval = inference_rt(data_dir, train_sample, filler=-1)
    train_eval['actual'] = train_isup['isup_grade'].head(evaluation_cases)
    train_eval['isup_grade'] = train_eval['isup_grade'].astype(int)
    sample_submission = train_eval[['image_id','isup_grade']].copy()
    display(train_eval)
    print('Quadratic Weighted Kappa: ', quadratic_kappa(train_eval['isup_grade'].astype(int), train_eval['actual'].astype(int)))

Evaluation inference!


Unnamed: 0,image_id,isup_grade,actual
0,0005f7aaab2800f6170c399693a96917,0,0
1,000920ad0b612851f8e01bcc880d9b3d,0,0
2,0018ae58b01bdadc8e347995b69f99aa,4,4
3,001c62abd11fa4b57bf7a6c603a11bb9,4,4
4,001d865e65ef5d2579c190a0e0350d8f,0,0
5,002a4db09dad406c85505a00fb6f6144,1,0
6,003046e27c8ead3e3db155780dc5498,-1,1
7,0032bfa835ce0f43a92ae0bbab6871cb,1,1
8,003a91841da04a5a31f808fb5c21538a,1,1
9,003d4dd6bd61221ebc0bfb9350db333f,1,1


Quadratic Weighted Kappa:  0.85
CPU times: user 3min 52s, sys: 10.3 s, total: 4min 2s
Wall time: 2min 28s


In [27]:
sample_submission.to_csv('submission.csv', index=False)

In [28]:
sample_submission

Unnamed: 0,image_id,isup_grade
0,0005f7aaab2800f6170c399693a96917,0
1,000920ad0b612851f8e01bcc880d9b3d,0
2,0018ae58b01bdadc8e347995b69f99aa,4
3,001c62abd11fa4b57bf7a6c603a11bb9,4
4,001d865e65ef5d2579c190a0e0350d8f,0
5,002a4db09dad406c85505a00fb6f6144,1
6,003046e27c8ead3e3db155780dc5498,-1
7,0032bfa835ce0f43a92ae0bbab6871cb,1
8,003a91841da04a5a31f808fb5c21538a,1
9,003d4dd6bd61221ebc0bfb9350db333f,1


# Submit to competition:
To submit the notebook to the competition and evaluate, first commit the notebook, then from the committed draft go to the outputs section, select 'submission.csv' and click submit to competition.
