###### American Express - Default Prediction 
## Predict If A Customer Will Default in the Future ...
The objective of this competition is to predict the probability that a customer does not pay back their credit card balance amount in the future based on their monthly customer profile. The target binary variable is calculated by observing 18 months performance window after the latest credit card statement, and if the customer does not pay due amount in 120 days after their latest statement date it is considered a default event.

<img style="float: center;" src="https://img.freepik.com/free-vector/brain-with-digital-circuit-programmer-with-laptop-machine-learning-artificial-intelligence-digital-brain-artificial-thinking-process-concept-vector-isolated-illustration_335657-2246.jpg?w=2000" width = '550'>
<a href='https://www.freepik.com/vectors/machine-learning'>Machine learning vector created by vectorjuice - www.freepik.com</a>

#### Data Description
The dataset contains aggregated profile features for each customer at each statement date. Features are anonymized and normalized, and fall into the following general categories:

* D_* = Delinquency variables
* S_* = Spend variables
* P_* = Payment variables
* B_* = Balance variables
* R_* = Risk variables

With the following features being categorical:

**['B_30', 'B_38', 'D_114', 'D_116', 'D_117', 'D_120', 'D_126', 'D_63', 'D_64', 'D_66', 'D_68']**

Your task is to predict, for each customer_ID, the probability of a future payment default (target = 1).

Note that the negative class has been subsampled for this dataset at 5%, and thus receives a 20x weighting in the scoring metric.

**Files**
* train_data.csv - training data with multiple statement dates per customer_ID
* train_labels.csv - target label for each customer_ID
* test_data.csv - corresponding test data; your objective is to predict the target label for each customer_ID
* sample_submission.csv - a sample submission file in the correct format

---

## My Strategy, or How I Will Aproach this Competition...
We have data from many Customers and there is many points of information by for each of the customers, the target labels are only one per customer id so aggregation will be requiered, from here there is quie a lot of possibilities, this is what I will folow in this Notebook...

#### Loading the Datasets
The datasets is massive so I will rely on other Kaggles optimized datasets stored in a feather format to make my life easier in this competition.

#### Quick EDA
The typical analysis that I always like to complete to undertstand the dataset better...
* Information of the datasets, size and others.
* Simple visualization of the first few records.
* Data statistical analalysis using describe.
* Visualization of the number of NaNs.
* Understanding the amount of unique records.

#### Exploring the Target Variable
Nothing in particular dataset seems to be quite inbalanced so I will get back to this part later...

#### Structuring the Datasets
Here is where everything happens, because we have time-base data o multiple points per customer we are trying to aggregate the information in certain way that's practical:
* Statistical aggregation for numeric features
* Only keep the last know record for analysis
* Statictical aggregation for categorical features

#### Feature Engineering
At this point the only thing that I can consider some type of feature will be the aggregation of the datasets, as I mentioned in the previous point
* Statistical aggregation
* Only keep the last know record for analysis

#### Label Encoding
Because there is quite a lot of categorical variables and this is a NN model I will use the following encoding technique:
* OneHot encoder, only train in the train dataset and applyed on test

#### Fill NaNs**
At this point just to get started, I will fill everything with ceros, probably not a good idea.
* Fill NaNs with 0

#### Model Development and Training
I'm going to go first with an NN in the last few competitions the NN models have been working quite well also we have so much data.
* Simple NN tested, layer after later.
* I also tested a more complex NN, that I learned from Ambross with Skip conections.

#### Predictions and Submission
No much details here, just the simple average of all the predictions across multiple folds.
* Average predictions across 5 folds

---

## Updates
#### 05/28/2022
* Build the initial model using Neuronal Nets and simple agg strategy (Last data point).
* Evaluated the model and uploaded for Ranking.

#### 05/29/2022
* Improve model architecture.
* Really dive deep into Feature Engineering (Not much here, memory is a big challenge)

#### 05/30/2022
* ...

---

## Resources, Inspiration
I have taken Ideas or learned quite a lot from the Notebooks below, please check also if you like my work.

* https://www.kaggle.com/code/ambrosm/amex-keras-quickstart-1-training/notebook
* ...
* ...
* ...

---

# 1.0 Loading Model Libraries...

In [1]:
%%time
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/amex-default-prediction/train_labels.csv
/kaggle/input/parquet-files-amexdefault-prediction/train_data.ftr
/kaggle/input/parquet-files-amexdefault-prediction/test_data.ftr
CPU times: user 555 µs, sys: 81 µs, total: 636 µs
Wall time: 460 µs


In [2]:
%%time
import datetime # ...

CPU times: user 7 µs, sys: 1 µs, total: 8 µs
Wall time: 12.9 µs


---

# 2.0 Setting the Notebook Parameters and Default Configuration...

In [3]:
%%time
# I like to disable my Notebook Warnings.
import warnings
warnings.filterwarnings('ignore')

CPU times: user 33 µs, sys: 5 µs, total: 38 µs
Wall time: 43.2 µs


In [4]:
%%time
# Notebook Configuration...

# Amount of data we want to load into the Model...
DATA_ROWS = None
# Dataframe, the amount of rows and cols to visualize...
NROWS = 50
NCOLS = 15
# Main data location path...
BASE_PATH = '...'

CPU times: user 5 µs, sys: 1 µs, total: 6 µs
Wall time: 9.78 µs


In [5]:
%%time
# Configure notebook display settings to only use 2 decimal places, tables look nicer.
pd.options.display.float_format = '{:,.5f}'.format
pd.set_option('display.max_columns', NCOLS) 
pd.set_option('display.max_rows', NROWS)

CPU times: user 104 µs, sys: 15 µs, total: 119 µs
Wall time: 128 µs


---

# 3.0 Loading the Dataset Information (Using Feather)...

In [6]:
import gc
from sklearn.preprocessing import StandardScaler, QuantileTransformer, OneHotEncoder, OrdinalEncoder
def get_data(fill_values, get_test=False):
    # Load the CSV information into a Pandas DataFrame...
    trn_data = pd.read_feather('../input/parquet-files-amexdefault-prediction/train_data.ftr')
    trn_lbls = pd.read_csv('/kaggle/input/amex-default-prediction/train_labels.csv').set_index('customer_ID')
    if(get_test):
        tst_data = pd.read_feather('../input/parquet-files-amexdefault-prediction/test_data.ftr')

    #%%time
    #sub = pd.read_csv('/kaggle/input/amex-default-prediction/sample_submission.csv')

    ## 6.1 Training Dataset...

    # We have 458913 customers. and we have 458913 train labels...

    # Calculates the amount of information by costumer or records available...
    trn_num_statements = trn_data.groupby('customer_ID').size().sort_index()

    # Create a new dataset based on aggregated information
    trn_agg_data = (trn_data
                    .groupby('customer_ID')
                    .tail(1)
                    .set_index('customer_ID', drop=True)
                    .sort_index()
                    .drop(['S_2'], axis='columns'))
    del trn_data
    # Merge the labels from the labels dataframe
    trn_agg_data['target'] = trn_lbls.target
    del trn_lbls
    trn_agg_data['num_statements'] = trn_num_statements
    del trn_num_statements
    
    trn_agg_data.reset_index(inplace = True, drop = True) # forget the customer_IDs

    ## 6.2 Test Dataset...

    # Calculates the amount of information by costumer or records available...
    if(get_test):
        tst_num_statements = tst_data.groupby('customer_ID').size().sort_index()

        # Create a new dataset based on aggregated information
        tst_agg_data = (tst_data
                        .groupby('customer_ID')
                        .tail(1)
                        .set_index('customer_ID', drop=True)
                        .sort_index()
                        .drop(['S_2'], axis='columns'))
        del tst_data
        # Merge the labels from the labels dataframe
        tst_agg_data['num_statements'] = tst_num_statements
        del tst_num_statements
        tst_agg_data.reset_index(inplace = True, drop = True) # forget the customer_IDs

    # 7.0 Label / One-Hot Encoding the Categorical Variables...

    ## 7.1 One Hot Encoding Configuration...

    # One-hot Encoding Configuration
    cat_features = ['B_30', 'B_38', 'D_114', 'D_116', 'D_117', 'D_120', 'D_126', 'D_63', 'D_64', 'D_66', 'D_68']

    #trn_agg_data[cat_features] = trn_agg_data[cat_features].astype(object)
    trn_not_cat_features = [f for f in trn_agg_data.columns if f not in cat_features]
    if(get_test):
        tst_not_cat_features = [f for f in tst_agg_data.columns if f not in cat_features]

    #encoder = OneHotEncoder(drop = 'first', sparse = False, dtype = np.float32, handle_unknown = 'ignore')
    encoder = OrdinalEncoder()
    trn_encoded_features = encoder.fit_transform(trn_agg_data[cat_features])
    #feat_names = list(encoder.get_feature_names())

    ## 7.2 Train Dataset One Hot Encoding...

    # One-hot Encoding
    trn_encoded_features = pd.DataFrame(trn_encoded_features)
    #trn_encoded_features.columns = feat_names

    trn_agg_data = pd.concat([trn_agg_data[trn_not_cat_features], trn_encoded_features], axis = 1)

    ## 7.3 Test Dataset One-Hot Encoding...
    if(get_test):
        # One-hot Encoding
        tst_encoded_features = encoder.transform(tst_agg_data[cat_features])
        tst_encoded_features = pd.DataFrame(tst_encoded_features)
        #tst_encoded_features.columns = feat_names

        tst_agg_data = pd.concat([tst_agg_data[tst_not_cat_features], tst_encoded_features], axis = 1)
        tst_agg_data.head()

    features = [f for f in trn_agg_data.columns if f != 'target' and f != 'customer_ID']
    
    c = trn_agg_data[features].columns.str
    cs = [c.startswith('S_', False), c.startswith('P_', False), c.startswith('B_', False), c.startswith('R_', False), c.startswith('D_', False)]
    cs = [trn_agg_data[features].columns[c_i] for c_i in cs]
    #
    # Impute missing values
    # Old fill type values
    for i_fill in range(len(fill_values)):
        if(fill_values[i_fill]==0):
            trn_agg_data[cs[i_fill]].fillna(value = 0, inplace = True)
            if(get_test):
                tst_agg_data[cs[i_fill]].fillna(value = 0, inplace = True)
        elif(fill_values[i_fill]==1):
            trn_agg_data[cs[i_fill]].fillna(value = np.nanmean(trn_agg_data[cs[i_fill]]), inplace = True)
            if(get_test):
                tst_agg_data[cs[i_fill]].fillna(value = np.nanmean(trn_agg_data[cs[i_fill]]), inplace = True)         
        elif(fill_values[i_fill]==2):
            trn_agg_data[cs[i_fill]].fillna(value = np.nanquantile(trn_agg_data[cs[i_fill]], .25), inplace = True)
            if(get_test):
                tst_agg_data[cs[i_fill]].fillna(value = np.nanquantile(trn_agg_data[cs[i_fill]], .25), inplace = True)       
        elif(fill_values[i_fill]==3):
            trn_agg_data[cs[i_fill]].fillna(value = np.nanquantile(trn_agg_data[cs[i_fill]], .75), inplace = True)
            if(get_test):
                tst_agg_data[cs[i_fill]].fillna(value = np.nanquantile(trn_agg_data[cs[i_fill]], .75), inplace = True)            
    #Fill all others
    trn_agg_data.fillna(value = 0, inplace = True)
    if(get_test):
        tst_agg_data.fillna(value = 0, inplace = True)

    # 10.0 NN Development

    # Release some memory by deleting the original DataFrames...
    gc.collect()
    if(get_test==False):
        tst_agg_data = 0
    return trn_agg_data, tst_agg_data, features

## 10.1 Loading Specific Model Libraries...

In [7]:
%%time
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import ReduceLROnPlateau, LearningRateScheduler, EarlyStopping
from tensorflow.keras.layers import Dense, Input, InputLayer, Add, BatchNormalization, Dropout, Concatenate
from tensorflow.keras.utils import plot_model
from sklearn.metrics import log_loss

from sklearn.preprocessing import StandardScaler, RobustScaler, MinMaxScaler
import random

CPU times: user 1.49 s, sys: 251 ms, total: 1.74 s
Wall time: 1.74 s


---

## 10.2 Amex Metric, Function...

In [8]:
%%time
# From https://www.kaggle.com/code/inversion/amex-competition-metric-python

def amex_metric(y_true, y_pred, return_components=False) -> float:
    """Amex metric for ndarrays"""
    
    def top_four_percent_captured(df) -> float:
        """Corresponds to the recall for a threshold of 4 %"""
        
        df['weight'] = df['target'].apply(lambda x: 20 if x==0 else 1)
        four_pct_cutoff = int(0.04 * df['weight'].sum())
        df['weight_cumsum'] = df['weight'].cumsum()
        df_cutoff = df.loc[df['weight_cumsum'] <= four_pct_cutoff]
        return (df_cutoff['target'] == 1).sum() / (df['target'] == 1).sum()
    
    
    def weighted_gini(df) -> float:
        df['weight'] = df['target'].apply(lambda x: 20 if x==0 else 1)
        df['random'] = (df['weight'] / df['weight'].sum()).cumsum()
        total_pos = (df['target'] * df['weight']).sum()
        df['cum_pos_found'] = (df['target'] * df['weight']).cumsum()
        df['lorentz'] = df['cum_pos_found'] / total_pos
        df['gini'] = (df['lorentz'] - df['random']) * df['weight']
        return df['gini'].sum()

    
    def normalized_weighted_gini(df) -> float:
        """Corresponds to 2 * AUC - 1"""
        
        df2 = pd.DataFrame({'target': df.target, 'prediction': df.target})
        df2.sort_values('prediction', ascending=False, inplace=True)
        return weighted_gini(df) / weighted_gini(df2)

    
    df = pd.DataFrame({'target': y_true.ravel(), 'prediction': y_pred.ravel()})
    df.sort_values('prediction', ascending=False, inplace=True)
    g = normalized_weighted_gini(df)
    d = top_four_percent_captured(df)

    if return_components: return g, d, 0.5 * (g + d)
    return 0.5 * (g + d)

CPU times: user 6 µs, sys: 1e+03 ns, total: 7 µs
Wall time: 12.4 µs


---

## 10.3 Defining the NN Model Architecture...

## 10.3.1 Architecture 01, Simple NN

In [9]:
%%time
def nn_model():
    '''
    '''
    regularization = 4e-4
    activation_func = 'swish'
    inputs = Input(shape = (len(features)))
    
    x = Dense(256, 
              #use_bias  = True, 
              kernel_regularizer = tf.keras.regularizers.l2(regularization), 
              activation = activation_func)(inputs)
    
    x = BatchNormalization()(x)
    
    x = Dense(64, 
              #use_bias  = True, 
              kernel_regularizer = tf.keras.regularizers.l2(regularization), 
              activation = activation_func)(x)
    
    x = BatchNormalization()(x)
    
    x = Dense(64, 
          #use_bias  = True, 
          kernel_regularizer = tf.keras.regularizers.l2(regularization), 
          activation = activation_func)(x)
    
    x = BatchNormalization()(x)

    x = Dense(32, 
              #use_bias  = True, 
              kernel_regularizer = tf.keras.regularizers.l2(regularization), 
              activation = activation_func)(x)
    
    x = BatchNormalization()(x)

    x = Dense(1, 
              #use_bias  = True, 
              #kernel_regularizer = tf.keras.regularizers.l2(regularization),
              activation = 'sigmoid')(x)
    
    model = Model(inputs, x)
    
    return model

CPU times: user 5 µs, sys: 1e+03 ns, total: 6 µs
Wall time: 11 µs


---

## 10.3.2 Architecture 02, Concatenated NN

In [10]:
%%time
def nn_model(features):
    regularization = 4e-4
    activation_func = 'swish'
    inputs = Input(shape = (len(features)))

    x0 = Dense(256,
               kernel_regularizer = tf.keras.regularizers.l2(regularization), 
               activation = activation_func)(inputs)
    x1 = Dense(128,
               kernel_regularizer = tf.keras.regularizers.l2(regularization),
               activation = activation_func)(x0)
    x1 = Dense(64,
               kernel_regularizer = tf.keras.regularizers.l2(regularization),
               activation = activation_func)(x1)
    x1 = Dense(32,
           kernel_regularizer = tf.keras.regularizers.l2(regularization),
           activation = activation_func)(x1)
    
    x1 = Concatenate()([x1, x0])
    x1 = Dropout(0.1)(x1)
    
    x1 = Dense(16, kernel_regularizer=tf.keras.regularizers.l2(regularization),activation=activation_func,)(x1)
    
    x1 = Dense(1, 
              #kernel_regularizer=tf.keras.regularizers.l2(regularization),
              activation='sigmoid')(x1)
    
    model = Model(inputs, x1)
    
    return model
    

CPU times: user 8 µs, sys: 0 ns, total: 8 µs
Wall time: 12.9 µs


---

## 10.5 Defining Model Training Parameters...

In [11]:
%%time
# Defining model parameters...
BATCH_SIZE         = 256
EPOCHS             = 20 
EPOCHS_COSINEDECAY = 20
DIAGRAMS           = True
USE_PLATEAU        = False
INFERENCE          = False
VERBOSE            = 0 
TARGET             = 'target'

CPU times: user 5 µs, sys: 1e+03 ns, total: 6 µs
Wall time: 11.4 µs


---

## 10.6 Defining the Model Training Configuration...

In [12]:
 %%time
# Defining model training function...
def fit_model(X_train, y_train, X_val, y_val, run = 0):
    '''
    '''
    lr_start = 0.01
    start_time = datetime.datetime.now()
    
    scaler = StandardScaler()
    X_train = scaler.fit_transform(X_train)

    epochs = EPOCHS    
    lr = ReduceLROnPlateau(monitor = 'val_loss', factor = 0.7, patience = 4, verbose = VERBOSE)
    es = EarlyStopping(monitor = 'val_loss',patience = 12, verbose = 1, mode = 'min', restore_best_weights = True)
    tm = tf.keras.callbacks.TerminateOnNaN()
    callbacks = [lr, es, tm]
    
    # Cosine Learning Rate Decay
    if USE_PLATEAU == False:
        epochs = EPOCHS_COSINEDECAY
        lr_end = 0.0002

        def cosine_decay(epoch):
            if epochs > 1:
                w = (1 + math.cos(epoch / (epochs - 1) * math.pi)) / 2
            else:
                w = 1
            return w * lr_start + (1 - w) * lr_end
        
        lr = LearningRateScheduler(cosine_decay, verbose = 0)
        callbacks = [lr, tm]
    
    # Model Initialization...
    model = nn_model(features)
    optimizer_func = tf.keras.optimizers.Adam(learning_rate = lr_start)
    loss_func = tf.keras.losses.BinaryCrossentropy()
    model.compile(optimizer = optimizer_func, loss = loss_func)
    
    
    X_val = scaler.transform(X_val)
    validation_data = (X_val, y_val)
    
    history = model.fit(X_train, 
                        y_train, 
                        validation_data = validation_data, 
                        epochs          = epochs,
                        verbose         = VERBOSE,
                        batch_size      = BATCH_SIZE,
                        shuffle         = True,
                        callbacks       = callbacks
                       )
    print("Model fitted")
    history_list = [history.history]
    
    print(f'Training Loss: {history_list[-1]["loss"][-1]:.5f}, Validation Loss: {history_list[-1]["val_loss"][-1]:.5f}')
    callbacks, es, lr, tm, history = None, None, None, None, None
    
    
    y_val_pred = model.predict(X_val, batch_size = BATCH_SIZE, verbose = VERBOSE).ravel()
    amex_score = amex_metric(y_val.values, y_val_pred, return_components = False)
    
    print(f'Fold {run} | {str(datetime.datetime.now() - start_time)[-12:-7]}'
          f'| Amex Score: {amex_score:.5f}')
    
    print('')
    
    #score_list.append(amex_score)
    
    #tst_data_scaled = scaler.transform(tst_agg_data[features])
    #tst_pred = model.predict(tst_data_scaled)
    #predictions.append(tst_pred)
    print(amex_score)
    return amex_score

CPU times: user 6 µs, sys: 2 µs, total: 8 µs
Wall time: 14.1 µs


---

## 10.7 Creating a Model Training Loop and Cross Validating in 5 Folds... 

In [13]:
from sklearn.model_selection import KFold
from sklearn.metrics import roc_auc_score, roc_curve
import math
def train(trn_agg_data, features):
    score_list = []
    kf = KFold(n_splits = 5)
    for fold, (trn_idx, val_idx) in enumerate(kf.split(trn_agg_data)):
        X_train, X_val = trn_agg_data.iloc[trn_idx][features], trn_agg_data.iloc[val_idx][features]
        y_train, y_val = trn_agg_data.iloc[trn_idx][TARGET], trn_agg_data.iloc[val_idx][TARGET]
        print("Fold",fold)
        score_list.append(fit_model(X_train, y_train, X_val, y_val))
    current_score = np.mean(score_list)
    print(f'OOF AUC: {current_score:.5f}')
    return current_score

In [14]:
import sys
def sizeof_fmt(num, suffix='B'):
    ''' by Fred Cirera,  https://stackoverflow.com/a/1094933/1870254, modified'''
    for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']:
        if abs(num) < 1024.0:
            return "%3.1f %s%s" % (num, unit, suffix)
        num /= 1024.0
    return "%.1f %s%s" % (num, 'Yi', suffix)

for name, size in sorted(((name, sys.getsizeof(value)) for name, value in locals().items()),
                         key= lambda x: -x[1])[:10]:
    print("{:>30}: {:>8}".format(name, sizeof_fmt(size)))

                           _i6:  5.6 KiB
                           _ii:  2.6 KiB
                          _i12:  2.6 KiB
                           _i8:  1.7 KiB
                           _i9:  1.3 KiB
                          _i10:  1.1 KiB
                StandardScaler:  1.0 KiB
           QuantileTransformer:  1.0 KiB
                 OneHotEncoder:  1.0 KiB
                OrdinalEncoder:  1.0 KiB


In [15]:
%%time
gc.collect()
from sklearn.model_selection import KFold
from sklearn.metrics import roc_auc_score, roc_curve
import math
score_array = np.zeros((5,4))
# Create empty lists to store NN information...
best_fill_type = [0, 0, 0, 0, 0]
trn_agg_data, tst_agg_data, features = get_data(best_fill_type)
best_score = train(trn_agg_data, features)
length = len(features)
del trn_agg_data, tst_agg_data, features
# Define kfolds for training purposes...
for col in range(len(best_fill_type)):
    print("Feature type", col)
    score_array[col,0] = best_score

    best_fill_type_current = 0
    for fill_type in range(1, 4):
        new_fill = np.copy(best_fill_type)
        new_fill[col] = fill_type
        trn_agg_data, tst_agg_data, features = get_data(new_fill)
        print("got data")
        current_score = train(trn_agg_data, features)
        score_array[col, fill_type] = current_score
        if(current_score > best_score):
            best_fill_type = new_fill
            best_score = current_score  
            print("Best fill type list is now", best_fill_type)
        #free space
        del trn_agg_data, tst_agg_data, features
        gc.collect()
        for name, size in sorted(((name, sys.getsizeof(value)) for name, value in globals().items()),
                         key= lambda x: -x[1])[:10]:
            print("{:>30}: {:>8}".format(name, sizeof_fmt(size)))

Fold 0


2022-08-09 15:16:51.346500: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2022-08-09 15:16:52.056823: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


Model fitted
Training Loss: 0.23066, Validation Loss: 0.23230
Fold 0 | 03:05| Amex Score: 0.77791

0.7779084630048432
Fold 1
Model fitted
Training Loss: 0.23075, Validation Loss: 0.23274
Fold 0 | 02:57| Amex Score: 0.77708

0.7770769957549349
Fold 2
Model fitted
Training Loss: 0.23120, Validation Loss: 0.23025
Fold 0 | 02:58| Amex Score: 0.77950

0.7795020010305063
Fold 3
Model fitted
Training Loss: 0.23179, Validation Loss: 0.22933
Fold 0 | 02:57| Amex Score: 0.78017

0.7801666138912626
Fold 4
Model fitted
Training Loss: 0.23158, Validation Loss: 0.22857
Fold 0 | 02:57| Amex Score: 0.78102

0.7810156232080048
OOF AUC: 0.77913
Feature type 0
got data
Fold 0
Model fitted
Training Loss: 0.23082, Validation Loss: 0.23274
Fold 0 | 02:57| Amex Score: 0.77730

0.7773020120007867
Fold 1
Model fitted
Training Loss: 0.23065, Validation Loss: 0.23221
Fold 0 | 02:57| Amex Score: 0.77728

0.7772838370296578
Fold 2
Model fitted
Training Loss: 0.23102, Validation Loss: 0.22999
Fold 0 | 02:57| Amex S

In [16]:
print("Score array:\n",score_array)
print("Best score: ", best_score)
print("Best fill type:", best_fill_type)

Score array:
 [[0.77913394 0.77918542 0.779731   0.77978482]
 [0.77978482 0.77969781 0.78017576 0.77982197]
 [0.78017576 0.77967023 0.77968225 0.78005755]
 [0.78017576 0.7798931  0.78003162 0.78024421]
 [0.78024421 0.77970224 0.77997113 0.77983789]]
Best score:  0.7802442079037997
Best fill type: [3 2 0 3 0]


---

# 11.0 Model Prediction and Submissions

In [17]:
%%time
sub.head()

NameError: name 'sub' is not defined

In [18]:
%%time
sub['prediction'] = np.array(predictions).mean(axis = 0)

NameError: name 'predictions' is not defined

In [19]:
%%time
sub.to_csv('my_submission.csv', index = False)

NameError: name 'sub' is not defined

In [20]:
%%time
sub.head()

NameError: name 'sub' is not defined

---