### In this notebook, we train an Autoregressive RNN for predicting the future profiles of customers. The model will be trained to predict the future profiles of a customer over the course of the next 13 months, utilizing the profiles from the past as input. The training process will be self-supervised, as we will not be utilizing the target column "default or not". The predicted future profiles will serve as strong features for predicting a customer's likelihood of defaulting in the future. Specifically, we will:

- Create datasets and models for the Autoregressive RNN
- Utilize PyTorch Lightning to train the model
- Use the trained model to generate predictions of future customer profiles
- Compile and save the final model for triton inference server.

In [1]:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '7'

import numpy as np
import pytorch_lightning as pl
from torch.utils.data import DataLoader
import torch
import cudf
from tqdm import tqdm
import gc
from pathlib import Path
from rnn import TrainRnnDataset,ValidRnnDataset,TestRnnDataset,load_yaml,RNN,AutoRegressiveRNN

In [2]:
PATH = '/raid/data/ml/kaggle/amex'
os.listdir(PATH)

['test.parquet',
 'train_labels.csv.zip',
 'train_labels.csv',
 'amex-data-integer-dtypes-parquet-format.zip',
 'train.parquet']

### Datasets

In [3]:
train = cudf.read_parquet(f'{PATH}/train.parquet')
train['cid'], _ = train.customer_ID.factorize()
train['S_2'] = cudf.to_datetime(train['S_2'])
train = train.sort_values(['cid','S_2'])
train = train.reset_index(drop=True)
print(train.shape)
train.head()

(5531451, 191)


Unnamed: 0,customer_ID,S_2,P_2,D_39,B_1,B_2,R_1,S_3,D_41,B_3,...,D_137,D_138,D_139,D_140,D_141,D_142,D_143,D_144,D_145,cid
0,0000099d6bd597052cdcda90ffabf56573fe9d7c79be5f...,2017-03-09,0.938469,0,0.008724,1.006838,0.009228,0.124035,0.0,0.004709,...,-1,-1,0,0,0.0,,0,0.00061,0,0
1,0000099d6bd597052cdcda90ffabf56573fe9d7c79be5f...,2017-04-07,0.936665,0,0.004923,1.000653,0.006151,0.12675,0.0,0.002714,...,-1,-1,0,0,0.0,,0,0.005492,0,0
2,0000099d6bd597052cdcda90ffabf56573fe9d7c79be5f...,2017-05-28,0.95418,3,0.021655,1.009672,0.006815,0.123977,0.0,0.009423,...,-1,-1,0,0,0.0,,0,0.006986,0,0
3,0000099d6bd597052cdcda90ffabf56573fe9d7c79be5f...,2017-06-13,0.960384,0,0.013683,1.0027,0.001373,0.117169,0.0,0.005531,...,-1,-1,0,0,0.0,,0,0.006527,0,0
4,0000099d6bd597052cdcda90ffabf56573fe9d7c79be5f...,2017-07-16,0.947248,0,0.015193,1.000727,0.007605,0.117325,0.0,0.009312,...,-1,-1,0,0,0.0,,0,0.008126,0,0


In [4]:
'target' in train.columns

False

Note that we don't need `train_labels.csv` and this dataframe doesn't have `target` column so training our RNN is completely self-supervised.

In [5]:
config = load_yaml('rnn.yaml')

Config(model='rnn', epochs=5, batch_size=512, seq=5, H1=512, H2=128, layers=1, E=192, dropout=0, lr=0.001, wd=0.0, tcols='all')


In [6]:
%%time

train_ds = TrainRnnDataset(train,config)
valid_ds = ValidRnnDataset(train,config)
test_ds = TestRnnDataset(train,config)

RnnDataset not used columns:
['customer_ID', 'cid', 'S_2']
RnnDataset not used columns:
['customer_ID', 'cid', 'S_2']
RnnDataset not used columns:
['customer_ID', 'cid', 'S_2']
CPU times: user 5.16 s, sys: 7.81 s, total: 13 s
Wall time: 13 s


In [7]:
del train
gc.collect()

395

### Model Definition

### Training

In [8]:
x_dim, y_dim = train_ds.get_x_y_dims()
print('x_dim, y_dim', x_dim, y_dim)
model = RNN(x_dim,y_dim,config)

x_dim, y_dim 0 177


In [9]:
batch_size = config.batch_size
cpu_workers = 4
train_dl = DataLoader(train_ds, batch_size=batch_size,
                shuffle=True, num_workers=cpu_workers,
                drop_last=True)

valid_dl = DataLoader(valid_ds, batch_size=batch_size,
                shuffle=False, num_workers=cpu_workers,
                drop_last=False)

test_dl = DataLoader(test_ds, batch_size=batch_size,
                    shuffle=False, num_workers=cpu_workers,
                    drop_last=False)

In [10]:
EPOCHS = config.epochs
msgs = {}
checkpoint_callback = pl.callbacks.ModelCheckpoint(monitor='valid_last', mode='min')
trainer = pl.Trainer(gpus=1, max_epochs=EPOCHS,
                     callbacks=[checkpoint_callback],
                     )

  rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


In [11]:
trainer.fit(model, train_dataloaders=train_dl, 
                val_dataloaders=valid_dl)

Missing logger folder: /home/nfs/jiweil/rapids/triton_amex/lightning_logs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [7]

  | Name | Type   | Params
--------------------------------
0 | gru  | GRU    | 1.1 M 
1 | out  | Linear | 90.8 K
--------------------------------
1.2 M     Trainable params
0         Non-trainable params
1.2 M     Total params
4.609     Total estimated model params size (MB)


Sanity Checking: 0it [00:00, ?it/s]

Training: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

`Trainer.fit` stopped: `max_epochs=5` reached.


In [12]:
del trainer, model

### Inference and generate RNN features

In [13]:
model = AutoRegressiveRNN(x_dim,y_dim,config)

In [14]:
weights = torch.load('lightning_logs/version_0/checkpoints/epoch=4-step=4480.ckpt')

In [15]:
model.load_state_dict(weights['state_dict'],strict=True)

<All keys matched successfully>

In [16]:
model.forward

<bound method AutoRegressiveRNN.forward of AutoRegressiveRNN(
  (gru): GRU(177, 512, batch_first=True)
  (out): Linear(in_features=512, out_features=177, bias=True)
)>

In [17]:
model.eval()
model.cuda()

AutoRegressiveRNN(
  (gru): GRU(177, 512, batch_first=True)
  (out): Linear(in_features=512, out_features=177, bias=True)
)

In [18]:
len(test_dl)

897

In [19]:
%%time

res = []
for x in tqdm(test_dl, total=len(test_dl)):
    x = x.cuda()
    yp = model(x)
    res.append(yp.detach().cpu().numpy())

100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 897/897 [00:12<00:00, 71.22it/s]

CPU times: user 1min 17s, sys: 10.7 s, total: 1min 27s
Wall time: 12.6 s





In [20]:
rnn_feas = np.concatenate(res)
rnn_feas.shape

(458913, 13, 177)

In [21]:
np.save('rnn_feas.npy',rnn_feas)

In [22]:
x.shape

torch.Size([161, 5, 177])

### Write config file and compile the model

In [23]:
def generate_config(model_name, in_seq_len, num_feas, out_seq_len):

    config_text = f"""name: "{model_name}"
platform: "pytorch_libtorch"
max_batch_size : 42134
input [
  {{
    name: "input__0"
    data_type: TYPE_FP32
    dims: [{in_seq_len}, {num_feas}]
  }}
]
output [
  {{
    name: "output__0"
    data_type: TYPE_FP32
    dims: [{out_seq_len}, {num_feas}]
  }}
]

dynamic_batching {{
  max_queue_delay_microseconds: 100
}}"""
    config_path = os.path.join(model_name, 'config.pbtxt')
    with open(config_path, 'w') as file_:
        file_.write(config_text)

    return config_path
# reshape {{ shape: [ 1, {in_seq_len}, {num_feas}]}}

In [24]:
model_name='AutoRegressiveRNN'
Path(f'{model_name}/1').mkdir(parents=True, exist_ok=True)

generate_config(model_name=model_name, 
                in_seq_len=x.shape[1], 
                num_feas=x.shape[2], 
                out_seq_len=yp.shape[1])

'AutoRegressiveRNN/config.pbtxt'

In [25]:
%%time

traced_script_module = torch.jit.trace(model, x[:1,:,:])
traced_script_module.save(f"{model_name}/1/model.pt")

CPU times: user 372 ms, sys: 12.1 ms, total: 384 ms
Wall time: 389 ms
