Requirement


1.Folder
    * './mart' 폴더에 (1)에서 저장한 Pickle 데이터가 들어 있어야 함
    * './saved_models' 폴더가 생성되어 있어야 함
2.Package
    * Python3.6
    * tensorflow == 1.5.0, keras == 2.2.2
    * pandas, numpy, os, time, pickle, tensorflow, keras, matplotlib

## Data 불러오기

### Data Import from Pickle
* (1)에서 저장한 데이터 중, X_AE를 불러온다

In [1]:
import pandas as pd 
import numpy as np
import os, time, pickle
import tensorflow as tf
import keras
from keras.layers.recurrent import GRU
from keras.models import Sequential, Model
from keras.layers import *
from keras.optimizers import *

def write_pickle(data, path, file_name):
    with open("".join([path, '/', file_name, '.pkl']), 'wb') as f:
        pickle.dump(data, f)


def read_pickle(path, file_name):
    with open("".join([path, '/', file_name, '.pkl']), 'rb') as f:
        return pickle.load(f)

mart_path = './mart'
time_length = 44
timelength = time_length

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Using TensorFlow backend.


In [2]:
X_AE = read_pickle(mart_path, '44_X_AE')
feature_cols = read_pickle(mart_path, '44_feature_cols')

In [3]:
seq_cols = feature_cols[:time_length]
print(seq_cols)

meta_cols = feature_cols[time_length:]
print(meta_cols)

['amount_43.0', 'amount_42.0', 'amount_41.0', 'amount_40.0', 'amount_39.0', 'amount_38.0', 'amount_37.0', 'amount_36.0', 'amount_35.0', 'amount_34.0', 'amount_33.0', 'amount_32.0', 'amount_31.0', 'amount_30.0', 'amount_29.0', 'amount_28.0', 'amount_27.0', 'amount_26.0', 'amount_25.0', 'amount_24.0', 'amount_23.0', 'amount_22.0', 'amount_21.0', 'amount_20.0', 'amount_19.0', 'amount_18.0', 'amount_17.0', 'amount_16.0', 'amount_15.0', 'amount_14.0', 'amount_13.0', 'amount_12.0', 'amount_11.0', 'amount_10.0', 'amount_9.0', 'amount_8.0', 'amount_7.0', 'amount_6.0', 'amount_5.0', 'amount_4.0', 'amount_3.0', 'amount_2.0', 'amount_1.0', 'amount_0']
['installments_2', 'installments_3', 'installments_4', 'installments_5', 'installments_6', 'installments_7', 'installments_8', 'installments_9', 'installments_10', 'installments_12', 'installments_15', 'installments_18', 'installments_20', 'installments_22', 'installments_24', 'installments_36', 'week_1', 'week_2', 'week_3', 'week_4', 'week_5', 'wee

### Data Preprocessing for Sequnece Learning

In [4]:
def preproc_for_seq(array):
    
    array_seq = []
    
    # seq_cols
    array_seq.append(array[:,:timelength].reshape((array.shape[0], timelength, 1)))
    
    # meta_cols
    array_seq.append(array[:,timelength:])
    
    return array_seq

X_AE_seq = preproc_for_seq(X_AE)

## Represenation Learning - Modeling

### Define Architecture
* ResNet Architecture를 사용
* 본래 ResNet은 Image Data에 Conv2D를 사용하지만, 본 데이터는 Sequnece Data이므로 Conv1D를 사용
* Sequence Data는 ResNet, Meta Data는 DNN
* Concatenate 후, Reconstruction함 (Autoencoder). Loss Function은 mean_squared_error
* Input -> Concatenate (Encoder)
* Concatenate -> output (Decoder)

In [5]:
def res_unit(inputs, channels):
    x = BatchNormalization()(inputs)
    x = Activation('relu')(x)
    x = Conv1D(channels, kernel_size=3, padding='same', use_bias=False)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Conv1D(channels, kernel_size=3, padding='same', use_bias=False)(x)
    added = Add()([inputs, x])
    return added

def res_unit_stride(inputs, channels):
    x = BatchNormalization()(inputs)
    x = Activation('relu')(x)
    x = Conv1D(channels, kernel_size=3, strides=2, padding='same', use_bias=False)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Conv1D(channels, kernel_size=3, padding='same', use_bias=False)(x)
    conv = Conv1D(channels, kernel_size=1, strides=2, padding='same', use_bias=False)(inputs)
    added = Add()([conv, x])
    return added

In [6]:
# for seq_cols, Conv1D
seq_in = Input(shape=(time_length,1))
x = Conv1D(16, kernel_size=3, activation='relu', padding='same')(seq_in)
x = MaxPooling1D(2, padding='same')(x)
x =  res_unit(x, 16)
x =  res_unit(x, 16)
x =  res_unit(x, 16)
x =  res_unit_stride(x, 32)
x =  res_unit(x, 32)
x =  res_unit(x, 32)
x =  res_unit(x, 32)
x =  res_unit_stride(x, 64)
x =  res_unit(x, 64)
x =  res_unit(x, 64)
x =  res_unit(x, 64)
x =  res_unit(x, 64)
x =  res_unit_stride(x, 128)
x =  res_unit(x, 128)
x =  res_unit(x, 128)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = GlobalAveragePooling1D()(x)
seq_out = x

# for meta_cols
meta_in = Input(shape=(len(meta_cols),))
meta_out = Dense(20, kernel_initializer='he_normal', activation='elu')(meta_in)

# Concat
merges = Concatenate()([seq_out, meta_out])

# decode
seq_decode = Dense(time_length, kernel_initializer='he_normal')(merges)
seq_decode = Reshape((time_length,1))(seq_decode)
meta_decode = Dense(len(meta_cols), kernel_initializer='he_normal')(merges)

model = Model(inputs=[seq_in, meta_in], outputs=[seq_decode, meta_decode])

In [7]:
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 44, 1)        0                                            
__________________________________________________________________________________________________
conv1d_1 (Conv1D)               (None, 44, 16)       64          input_1[0][0]                    
__________________________________________________________________________________________________
max_pooling1d_1 (MaxPooling1D)  (None, 22, 16)       0           conv1d_1[0][0]                   
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 22, 16)       64          max_pooling1d_1[0][0]            
__________________________________________________________________________________________________
activation

In [8]:
optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999)
model.compile(loss=['mean_squared_error', 'mean_squared_error'], optimizer=optimizer)

### Define Callback
* Learning Rate를 잘 조절해준다는 Cyclic lr을 사용 (LRFinder Class)
* epoch = 100, batch_size = 128, train_validation split 없음

In [9]:
%matplotlib inline
from keras.callbacks import Callback
import matplotlib.pyplot as plt
import keras.backend as K

class LRFinder(Callback):
    '''
    A simple callback for finding the optimal learning rate range for your model + dataset. 
    
    # Usage
        ```python
            lr_finder = LRFinder(min_lr=1e-5, 
                                 max_lr=1e-2, 
                                 steps_per_epoch=np.ceil(epoch_size/batch_size), 
                                 epochs=3)
            model.fit(X_train, Y_train, callbacks=[lr_finder])
            
            lr_finder.plot_loss()
        ```
    
    # Arguments
        min_lr: The lower bound of the learning rate range for the experiment.
        max_lr: The upper bound of the learning rate range for the experiment.
        steps_per_epoch: Number of mini-batches in the dataset. Calculated as `np.ceil(epoch_size/batch_size)`. 
        epochs: Number of epochs to run experiment. Usually between 2 and 4 epochs is sufficient. 
        
    # References
        Blog post: jeremyjordan.me/nn-learning-rate
        Original paper: https://arxiv.org/abs/1506.01186
    '''
    
    def __init__(self, min_lr=1e-5, max_lr=1e-2, steps_per_epoch=None, epochs=None):
        super().__init__()
        
        self.min_lr = min_lr
        self.max_lr = max_lr
        self.total_iterations = steps_per_epoch * epochs
        self.iteration = 0
        self.history = {}
        
    def clr(self):
        '''Calculate the learning rate.'''
        x = self.iteration / self.total_iterations 
        return self.min_lr + (self.max_lr-self.min_lr) * x
        
    def on_train_begin(self, logs=None):
        '''Initialize the learning rate to the minimum value at the start of training.'''
        logs = logs or {}
        K.set_value(self.model.optimizer.lr, self.min_lr)
        
    def on_batch_end(self, epoch, logs=None):
        '''Record previous batch statistics and update the learning rate.'''
        logs = logs or {}
        self.iteration += 1

        self.history.setdefault('lr', []).append(K.get_value(self.model.optimizer.lr))
        self.history.setdefault('iterations', []).append(self.iteration)

        for k, v in logs.items():
            self.history.setdefault(k, []).append(v)
            
        K.set_value(self.model.optimizer.lr, self.clr())
 
    def plot_lr(self):
        '''Helper function to quickly inspect the learning rate schedule.'''
        plt.plot(self.history['iterations'], self.history['lr'])
        plt.yscale('log')
        plt.xlabel('Iteration')
        plt.ylabel('Learning rate')
        
    def plot_loss(self):
        '''Helper function to quickly observe the learning rate experiment results.'''
        plt.plot(self.history['lr'], self.history['loss'])
        plt.xscale('log')
        plt.xlabel('Learning rate')
        plt.ylabel('Loss')

In [10]:
epochs = 100
batch_size = 128

### AE Training

In [None]:
history = model.fit(X_AE_seq, X_AE_seq,
                    batch_size=batch_size, epochs=epochs)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100

## Encoder Save for Finetuing in Future
* Modeling에서 Autoencoder의 중간층부터 학습에 사용하기 위해 저장한다.
* encoder도 저장하고 model(=AE)도 저장하는데, 학습에는 encoder만 사용할 예정

In [None]:
def save_keras_model(model, filename):
    model_json = model.to_json()
    with open('{}.json'.format(filename), 'w') as json_file:
        json_file.write(model_json)
    model.save_weights('{}.h5'.format(filename))

def load_keras_model(filename):
    from keras.models import model_from_json
    json_file = open('{}.json'.format(filename), 'r')
    loaded_model_json = json_file.read()
    json_file.close()
    model = model_from_json(loaded_model_json)
    model.load_weights('{}.h5'.format(filename))
    return model

In [None]:
model_dir = './saved_models'
filename = '{}/ResNetAE'.format(model_dir)

save_keras_model(model, filename)

In [None]:
encoder = Model(inputs=[seq_in, meta_in], outputs=merges)

In [None]:
filename = '{}/ResNetAE_encoder'.format(model_dir)

save_keras_model(encoder, filename)

Requirement

1.Folder
     * './mart' 폴더에 (1)에서 저장한 Pickle 데이터가 들어 있어야 함
     * './saved_models' 폴더에 (2)에서 저장한 Keras Model 데이터가 들어 있어야 함
     * './ckpt' 폴더가 생성되어 있어야 함
2.Package
     * Python3.6
     * pandas, numpy, os, time, pickle, tensorflow, keras, matplotlib, sklearn

# part3

## Data 불러오기
* (1)에서 저장한 데이터 중, (X, y)를 불러온다.
* Model의 Generalization 성능을 측정하기 위해 Train / Validation data split을 0.85 : 0.15 비율로 나눈다.
* preproc_for_seq 함수를 통해, Sequence data를 학습하기 위한 전처리를 진행한다.

In [None]:
import pandas as pd 
import numpy as np
import os, time, pickle
import tensorflow as tf
import keras
from keras.layers.recurrent import GRU
from keras.models import Sequential, Model
from keras.layers import *
from keras.optimizers import *

def write_pickle(data, path, file_name):
    with open("".join([path, '/', file_name, '.pkl']), 'wb') as f:
        pickle.dump(data, f)


def read_pickle(path, file_name):
    with open("".join([path, '/', file_name, '.pkl']), 'rb') as f:
        return pickle.load(f)

mart_path = './mart'
time_length = 44
timelength = time_length

In [None]:
X, y = read_pickle(mart_path, '44_develop_data')
feature_cols = read_pickle(mart_path, '44_feature_cols')

In [None]:
seq_cols = feature_cols[:time_length]
print(seq_cols)

meta_cols = feature_cols[time_length:]
print(meta_cols)

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.15, random_state=42)

In [None]:
def preproc_for_seq(array):
    
    array_seq = []
    
    # seq_cols
    array_seq.append(array[:,:timelength].reshape((array.shape[0], timelength, 1)))
    
    # meta_cols
    array_seq.append(array[:,timelength:])
    
    return array_seq

X_seq = preproc_for_seq(X)
y_seq = y

X_train_seq = preproc_for_seq(X_train)
X_valid_seq = preproc_for_seq(X_valid)

## Predictive Modeling - ResNet & DNN FineTuning

### Encoder 불러오기
* (2)에서 저장했던 Encoder를 불러온다

In [None]:
def save_keras_model(model, filename):
    model_json = model.to_json()
    with open('{}.json'.format(filename), 'w') as json_file:
        json_file.write(model_json)
    model.save_weights('{}.h5'.format(filename))

def load_keras_model(filename):
    from keras.models import model_from_json
    json_file = open('{}.json'.format(filename), 'r')
    loaded_model_json = json_file.read()
    json_file.close()
    model = model_from_json(loaded_model_json)
    model.load_weights('{}.h5'.format(filename))
    return model

In [None]:
model_dir = './saved_models'
filename = '{}/ResNetAE_encoder'.format(model_dir)
encoder = load_keras_model(filename)

### Define Architecture
* Encoder의 Output 위에 Dense Layer를 하나만 쌓는다.

In [None]:
seq_in, meta_in = encoder.inputs
merges = encoder.outputs[0]
y = Dense(1, kernel_initializer='he_normal', name='output')(merges)

model = Model(inputs=[seq_in, meta_in], outputs=y)

In [None]:
model.summary()

### Define Loss & Optimizer
* Loss는 Competition 주최측에서 공지한 Evaluation Function * (-1)을 사용한다.
* custom_loss가 그 역할을 수행한다.
* optimizer는 Adam을 사용하고 Learning Rate는 0.001이다.

In [None]:
import keras.backend as K

def custom_loss(y_true, y_pred):
    default_position = tf.less(y_true, y_pred)
    profit_position = tf.logical_not(default_position)
    pos = tf.reduce_sum(y_pred*tf.cast(profit_position, tf.float32)*13/365)
    neg = tf.reduce_sum((y_true-y_pred)*tf.cast(default_position, tf.float32))
    loss = -pos-neg
    return loss

In [None]:
optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999)
model.compile(loss=custom_loss, optimizer=optimizer)

### Define Callbacks
* Learning Rate를 잘 조절해준다는 Cyclic lr을 사용 (LRFinder Class)
* epoch = 100, batch_size = 128
* ModelCheckpoint Callback을 사용해서, Validation data에서 역대 최저 Loss를 갱신할 때마다, 모델을 저장

In [None]:
%matplotlib inline
from keras.callbacks import Callback
import matplotlib.pyplot as plt
import keras.backend as K

class LRFinder(Callback):
    '''
    A simple callback for finding the optimal learning rate range for your model + dataset. 
    
    # Usage
        ```python
            lr_finder = LRFinder(min_lr=1e-5, 
                                 max_lr=1e-2, 
                                 steps_per_epoch=np.ceil(epoch_size/batch_size), 
                                 epochs=3)
            model.fit(X_train, Y_train, callbacks=[lr_finder])
            
            lr_finder.plot_loss()
        ```
    
    # Arguments
        min_lr: The lower bound of the learning rate range for the experiment.
        max_lr: The upper bound of the learning rate range for the experiment.
        steps_per_epoch: Number of mini-batches in the dataset. Calculated as `np.ceil(epoch_size/batch_size)`. 
        epochs: Number of epochs to run experiment. Usually between 2 and 4 epochs is sufficient. 
        
    # References
        Blog post: jeremyjordan.me/nn-learning-rate
        Original paper: https://arxiv.org/abs/1506.01186
    '''
    
    def __init__(self, min_lr=1e-5, max_lr=1e-2, steps_per_epoch=None, epochs=None):
        super().__init__()
        
        self.min_lr = min_lr
        self.max_lr = max_lr
        self.total_iterations = steps_per_epoch * epochs
        self.iteration = 0
        self.history = {}
        
    def clr(self):
        '''Calculate the learning rate.'''
        x = self.iteration / self.total_iterations 
        return self.min_lr + (self.max_lr-self.min_lr) * x
        
    def on_train_begin(self, logs=None):
        '''Initialize the learning rate to the minimum value at the start of training.'''
        logs = logs or {}
        K.set_value(self.model.optimizer.lr, self.min_lr)
        
    def on_batch_end(self, epoch, logs=None):
        '''Record previous batch statistics and update the learning rate.'''
        logs = logs or {}
        self.iteration += 1

        self.history.setdefault('lr', []).append(K.get_value(self.model.optimizer.lr))
        self.history.setdefault('iterations', []).append(self.iteration)

        for k, v in logs.items():
            self.history.setdefault(k, []).append(v)
            
        K.set_value(self.model.optimizer.lr, self.clr())
 
    def plot_lr(self):
        '''Helper function to quickly inspect the learning rate schedule.'''
        plt.plot(self.history['iterations'], self.history['lr'])
        plt.yscale('log')
        plt.xlabel('Iteration')
        plt.ylabel('Learning rate')
        
    def plot_loss(self):
        '''Helper function to quickly observe the learning rate experiment results.'''
        plt.plot(self.history['lr'], self.history['loss'])
        plt.xscale('log')
        plt.xlabel('Learning rate')
        plt.ylabel('Loss')

In [None]:
epochs = 100
batch_size = 128
epoch_size = len(X_train)

lr_finder = LRFinder(min_lr=1e-5, 
                     max_lr=1e-3, 
                     steps_per_epoch=np.ceil(epoch_size/batch_size), 
                     epochs=epochs)

early_stop = keras.callbacks.EarlyStopping(patience=5, monitor='val_loss')

ckpt_dir = './ckpt'
ckpt_path = ckpt_dir + '/ResNetFinetuning_{epoch:02d}_valloss{val_loss:.2f}.hdf5'
ckpt = keras.callbacks.ModelCheckpoint(ckpt_path, monitor='val_loss', verbose=0, save_best_only=True, mode='min')

#callbacks = [lr_finder]

### Model Training

In [None]:
history = model.fit(X_train_seq, y_train,
                    batch_size=batch_size, epochs=epochs,
                    callbacks=[ckpt, lr_finder],
                    validation_data=(X_valid_seq, y_valid), shuffle=True)

### Best Model Loading
* 위 Training Log를 봤을 때, 국소적 역대 최고 Validation Loss를 기록한 것은, epoch 37, epoch 76, epoch 91, epoch 97 네 개이다
* 아주 역대 최고는 epoch 97이지만, 76 ~ 91, 36 ~76 사이에 Validation Loss 하락 공백이 두 번 있었기 때문에, Overfitting을 의심하지 않을 수 없다.
* 결론적으로, Competition에 Submit을 했을 때, 아래와 같은 Competition 점수를 얻었다
    * epoch 37 = 53.8752점
    * epoch 76 = 52.48691점 (3등 달성)
    * epoch 91 = 66.65879점
    * epoch 97 = 92점
* 따라서, epoch 76에서의 ResNet + DNN Finetuning 모델은 training Loss 1.15, validation Loss 1.07, Test Score 52.48691의 우수한 성적을 내는 Overfitting이 되지 않으면서 우수한 Generalization 성능을 내는 우수한 단일 모델이라고 볼 수 있다.

In [None]:
model.load_weights(ckpt_dir +'/ResNetFinetuning_76_valloss1.07.hdf5')

In [None]:
def save_keras_model(model, filename):
    model_json = model.to_json()
    with open('{}.json'.format(filename), 'w') as json_file:
        json_file.write(model_json)
    model.save_weights('{}.h5'.format(filename))

def load_keras_model(filename):
    from keras.models import model_from_json
    json_file = open('{}.json'.format(filename), 'r')
    loaded_model_json = json_file.read()
    json_file.close()
    model = model_from_json(loaded_model_json)
    model.load_weights('{}.h5'.format(filename))
    return model

In [None]:
model_dir = './saved_models'
filename = '{}/ResNetFinetuning'.format(model_dir)

save_keras_model(model, filename)

## Prediction

### Test Data, Validation Data에 대해 y를 Predict한다.
* Test Data: Submit을 위해
* Validation Data: Stacking 등을 위해 (하지 않았지만)
* y는 Scaling이 되어 있는 상태이므로, inverse_transform을 수행한다

In [None]:
X_test = read_pickle(mart_path, '{}_X_test'.format(timelength))
y_scaler = read_pickle(mart_path, '{}_y_scaler'.format(timelength))

X_test_seq = preproc_for_seq(X_test)

y_pred = model.predict(X_test_seq)
y_pred_valid = model.predict(X_valid_seq)

y_pred = y_scaler.inverse_transform(y_pred)
y_pred_valid = y_scaler.inverse_transform(y_pred_valid)
y_true_valid = y_scaler.inverse_transform(y_valid)

## Test Data에 대한 Prediction을 Submission 양식에 맞추어 만들어낸다.
* (1) Wranling 파트에서 store_id를 중복을 피하기 위해 조작을 했으므로, 본래의 store_id를 복구하기 위해 test_data를 다시 불러와서 원래의 store_id 정보를 얻어낸다.
* Test data에 대한 Prediction과 함께 조합하여, Submission 파일을 만든다.

In [None]:
# Data Meta 정보 지정
data_path = "./data/"
train_filename = "train.csv"
test_filename = "test.csv"

train_data = pd.read_csv(data_path + train_filename)
test_data = pd.read_csv(data_path + test_filename)


In [None]:
submission = pd.DataFrame({'store_id': test_data['store_id'].unique(), 'total_sales': y_pred.reshape(len(y_pred))})
submission.to_csv('submission_ResNetFinetuning76epoch.csv', index=False)

### Validation Data에 대한 Prediction을 추후 Stacking / Blending 등을 위해 저장한다

In [None]:
write_pickle(y_pred_valid, ".", 'y_pred_valid_ResNetFinetuning76epoch')
write_pickle(y_true_valid, ".", 'y_true_valid_ResNetFinetuning76epoch')

## Evaluation
* 대회 주최측에서 알려준 Evaluation metric을 Python Function으로 만들었다.
* Validation에 Evaluation Function을 적용했을 때, 101732063을 얻었고, 이는 만점 대비 0.4692700349674595%의 성과이다.
* 이는 내가 지금까지 Evaluation했을 때 결과 중 Top 5 안에 들면서, Submit했을 때도 높은 점수를 얻은 결과라 할 수 있다.

In [None]:
def get_score(y_true, y_pred, verbose=True):
    
    assert len(y_true) == len(y_pred)
    
    y_true = y_true.reshape(len(y_true),)
    y_pred = y_pred.reshape(len(y_true),)
    
    default_position = y_true - y_pred &lt; 0
    profit_position = ~default_position
    pos = np.sum(y_pred[profit_position]*13/365)
    neg = np.sum(y_true[default_position] - y_pred[default_position])
    score = pos + neg
    if verbose:
        print("The positive score: ")
        print(pos)
        print("The negative score: ")
        print(neg)
        print("The total score: ")
        print(score)
    return score

In [None]:
score = get_score(y_true_valid, y_pred_valid)

In [None]:
def get_percentage_to_upperbd(y_true, y_pred, verbose=False):
    upperbd_score = get_score(y_true, y_true, verbose=False)
    score = get_score(y_true, y_pred, verbose=False)
    percentage = score/upperbd_score
    if verbose:
        print("The upper bound score: ")
        print(upperbd_score)
        print("The obtained score: ")
        print(score)
        print("The percentage to upper bound: ")
        print(percentage)
    return percentage

In [None]:
percentage = get_percentage_to_upperbd(y_true_valid, y_pred_valid, verbose=True)