PWC-Net-small model mixed-precision training (with cyclical learning rate schedule)
=======================================================

In this notebook we:
- Use a small model (no dense or residual connections), 6 level pyramid, uspample level 2 by 4 as the final flow prediction
- Train the PWC-Net-small model on a mix of the `FlyingChairs` and `FlyingThings3DHalfRes` dataset using a Cyclic<sub>short</sub> schedule of our own
- The Cyclic<sub>short</sub> schedule oscillates between `5e-04` and `1e-05` for 200,000 steps with a stepsize of `40,000`
- The training is done using mixed-precision with a loss scaler of `128.0` and a batch size of `32`

Below, look for `TODO` references and customize this notebook based on your own needs.

## Reference

[2018a]<a name="2018a"></a> Sun et al. 2018. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume. [[arXiv]](https://arxiv.org/abs/1709.02371) [[web]](http://research.nvidia.com/publication/2018-02_PWC-Net%3A-CNNs-for) [[PyTorch (Official)]](https://github.com/NVlabs/PWC-Net/tree/master/PyTorch) [[Caffe (Official)]](https://github.com/NVlabs/PWC-Net/tree/master/Caffe)

In [1]:
"""
pwcnet_train.ipynb

PWC-Net model training.

Written by Phil Ferriere

Licensed under the MIT License (see LICENSE for details)

Tensorboard:
    [win] tensorboard --logdir=E:\\repos\\tf-optflow\\tfoptflow\\pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16
    [ubu] tensorboard --logdir=/media/EDrive/repos/tf-optflow/tfoptflow/pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16
"""
from __future__ import absolute_import, division, print_function
import sys
from copy import deepcopy
import tensorflow as tf

from dataset_base import _DEFAULT_DS_TRAIN_OPTIONS
from dataset_flyingchairs import FlyingChairsDataset
from dataset_flyingthings3d import FlyingThings3DHalfResDataset
from dataset_mixer import MixedDataset
from model_pwcnet import ModelPWCNet, _DEFAULT_PWCNET_TRAIN_OPTIONS

## TODO: Set this first!

In [3]:
# TODO: You MUST set dataset_root to the correct path on your machine!
if sys.platform.startswith("win"):
    _DATASET_ROOT = 'E:/datasets/'
else:
    _DATASET_ROOT = '/Vol1/dbstore/datasets/'
_FLYINGCHAIRS_ROOT = _DATASET_ROOT + 'FlyingChairs_release'
_FLYINGTHINGS3DHALFRES_ROOT = _DATASET_ROOT + 'FlyingThings3D_HalfRes'
    
# TODO: You MUST adjust the settings below based on the number of GPU(s) used for training
# Set controller device and devices
# A one-gpu setup would be something like controller='/device:GPU:0' and gpu_devices=['/device:GPU:0']
# Here, we use a dual-GPU setup, as shown below
gpu_devices = ['/device:GPU:2']
controller = '/device:GPU:0'

# TODO: You MUST adjust this setting below based on the amount of memory on your GPU(s)
# Batch size
batch_size = 16

# Train on `FlyingChairs+FlyingThings3DHalfRes` mix

## Load the dataset

In [4]:
# TODO: You MUST set the batch size based on the capabilities of your GPU(s) 
#  Load train dataset
ds_opts = deepcopy(_DEFAULT_DS_TRAIN_OPTIONS)
ds_opts['in_memory'] = False                          # Too many samples to keep in memory at once, so don't preload them
ds_opts['aug_type'] = 'heavy'                         # Apply all supported augmentations
ds_opts['batch_size'] = batch_size * len(gpu_devices) # Use a multiple of 8; here, 16 for dual-GPU mode (Titan X & 1080 Ti)
ds_opts['crop_preproc'] = (256, 448)                  # Crop to a smaller input size
ds1 = FlyingChairsDataset(mode='train_with_val', ds_root=_FLYINGCHAIRS_ROOT, options=ds_opts)
ds_opts['type'] = 'into_future'
ds2 = FlyingThings3DHalfResDataset(mode='train_with_val', ds_root=_FLYINGTHINGS3DHALFRES_ROOT, options=ds_opts)
ds = MixedDataset(mode='train_with_val', datasets=[ds1, ds2], options=ds_opts)

In [5]:
# Display dataset configuration
ds.print_config()


Dataset Configuration:
  verbose              False
  in_memory            False
  crop_preproc         (256, 448)
  scale_preproc        None
  tb_test_imgs         False
  random_seed          1969
  val_split            0.03
  aug_type             heavy
  aug_labels           True
  fliplr               0.5
  flipud               0.5
  translate            (0.5, 0.05)
  scale                (0.5, 0.05)
  batch_size           16
  type                 into_future
  mode                 train_with_val
  train size           41731
  val size             1292


## Configure the training

In [6]:
# Start from the default options
nn_opts = deepcopy(_DEFAULT_PWCNET_TRAIN_OPTIONS)
nn_opts['verbose'] = True
nn_opts['ckpt_dir'] = './pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/'
nn_opts['batch_size'] = ds_opts['batch_size']
nn_opts['x_shape'] = [2, ds_opts['crop_preproc'][0], ds_opts['crop_preproc'][1], 3]
nn_opts['y_shape'] = [ds_opts['crop_preproc'][0], ds_opts['crop_preproc'][1], 2]
nn_opts['use_tf_data'] = True # Use tf.data reader
nn_opts['gpu_devices'] = gpu_devices
nn_opts['controller'] = controller

# Use the PWC-Net-small model in quarter-resolution mode
nn_opts['use_dense_cx'] = False
nn_opts['use_res_cx'] = False
nn_opts['pyr_lvls'] = 6
nn_opts['flow_pred_lvl'] = 3

# Use mixed precision training
nn_opts['use_mixed_precision'] = True 
nn_opts['loss_scaler'] = 128.
nn_opts['x_dtype'] = tf.float32
nn_opts['y_dtype'] = tf.float32

# More options
nn_opts['max_to_keep'] = 50

In [7]:
# Set the learning rate schedule. This schedule is for a single GPU using a batch size of 8.
# Below,we adjust the schedule to the size of the batch and the number of GPUs.
nn_opts['lr_policy'] = 'cyclic'
nn_opts['cyclic_lr_max'] = 5e-04
nn_opts['cyclic_lr_base'] = 1e-05
nn_opts['cyclic_lr_stepsize'] = 40000
nn_opts['max_steps'] = 200000

# Below,we adjust the schedule to the size of the batch and our number of GPUs (2).
nn_opts['max_steps'] = int(nn_opts['max_steps'] * 8 / ds_opts['batch_size'])
nn_opts['cyclic_lr_stepsize'] = int(nn_opts['cyclic_lr_stepsize'] * 8 / ds_opts['batch_size'])

In [8]:
# Instantiate the model and display the model configuration
nn = ModelPWCNet(mode='train_with_val', options=nn_opts, dataset=ds)
nn.print_config()

Building model...
... model built.
Configuring training ops...
... training ops configured.
Initializing model with random values for initial training...

... model initialized

Model Configuration:
  verbose                True
  ckpt_dir               ./pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/
  max_to_keep            50
  x_dtype                <dtype: 'float32'>
  x_shape                [2, 256, 448, 3]
  y_dtype                <dtype: 'float32'>
  y_shape                [256, 448, 2]
  train_mode             train
  adapt_info             None
  sparse_gt_flow         False
  display_step           100
  snapshot_step          1000
  val_step               1000
  val_batch_size         -1
  tb_val_imgs            pyramid
  tb_test_imgs           None
  gpu_devices            ['/device:GPU:2']
  controller             /device:GPU:0
  use_tf_data            True
  use_mixed_precision    True
  loss_scaler            128.0
  batch_size             16
  lr_policy              c

## Train the model

In [9]:
# Train the model
nn.train()

Start training from scratch...
Instructions for updating:
Use `tf.data.experimental.shuffle_and_repeat(...)`.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
2019-04-19 13:36:19 Iter 100 [Train]: loss=62.34, epe=18.92, lr=0.000012, samples/sec=18.2, sec/step=0.880, eta=1 day, 0:25:09
2019-04-19 13:38:18 Iter 200 [Train]: loss=60.31, epe=18.34, lr=0.000015, samples/sec=23.2, sec/step=0.688, eta=19:04:48
2019-04-19 13:40:14 Iter 300 [Train]: loss=76.40, epe=23.36, lr=0.000017, samples/sec=23.3, sec/step=0.686, eta=18:59:31
2019-04-19 13:42:13 Iter 400 [Train]: loss=60.12, epe=18.33, lr=0.000020, samples/sec=23.4, sec/step=0.683, eta=18:54:30
2019-04-19 13:44:14 Iter 500 [Train]: loss=61.29, epe=18.68, lr=0.000022, samples/sec=23.0, sec/step=0.695, eta=19:12:58
2019-04-19 13:46:13 Iter 600 [Train]: loss=62.48, epe=19.07, lr=0.000025, samples/sec=23.1, sec/step=0.691, eta=19:05:17
2019-04-19 13:48:07 Iter 700 [Train]: loss=62.70, epe=19.11, lr=0.000027, samples/se

2019-04-19 15:18:34 Iter 5100 [Train]: loss=61.27, epe=18.97, lr=0.000135, samples/sec=22.7, sec/step=0.704, eta=18:33:45
2019-04-19 15:20:33 Iter 5200 [Train]: loss=61.49, epe=19.06, lr=0.000137, samples/sec=23.0, sec/step=0.695, eta=18:17:53
2019-04-19 15:22:32 Iter 5300 [Train]: loss=59.16, epe=18.35, lr=0.000140, samples/sec=23.3, sec/step=0.688, eta=18:06:07
2019-04-19 15:24:32 Iter 5400 [Train]: loss=64.74, epe=20.06, lr=0.000142, samples/sec=22.3, sec/step=0.719, eta=18:53:10
2019-04-19 15:26:29 Iter 5500 [Train]: loss=59.29, epe=18.39, lr=0.000145, samples/sec=22.9, sec/step=0.699, eta=18:20:28
2019-04-19 15:28:28 Iter 5600 [Train]: loss=59.86, epe=18.61, lr=0.000147, samples/sec=22.9, sec/step=0.698, eta=18:18:28
2019-04-19 15:30:25 Iter 5700 [Train]: loss=57.72, epe=17.92, lr=0.000150, samples/sec=23.6, sec/step=0.679, eta=17:46:45
2019-04-19 15:32:23 Iter 5800 [Train]: loss=58.27, epe=18.09, lr=0.000152, samples/sec=23.0, sec/step=0.695, eta=18:10:38
2019-04-19 15:34:20 Iter

2019-04-19 17:06:59 Iter 10600 [Train]: loss=59.30, epe=18.58, lr=0.000270, samples/sec=23.1, sec/step=0.691, eta=17:10:12
2019-04-19 17:08:45 Iter 10700 [Train]: loss=59.98, epe=18.81, lr=0.000272, samples/sec=24.8, sec/step=0.645, eta=15:59:49
2019-04-19 17:10:36 Iter 10800 [Train]: loss=62.29, epe=19.48, lr=0.000275, samples/sec=23.4, sec/step=0.682, eta=16:54:24
2019-04-19 17:12:29 Iter 10900 [Train]: loss=60.59, epe=18.97, lr=0.000277, samples/sec=23.3, sec/step=0.685, eta=16:57:34
2019-04-19 17:14:19 Iter 11000 [Train]: loss=61.88, epe=19.32, lr=0.000279, samples/sec=23.4, sec/step=0.683, eta=16:52:52
2019-04-19 17:15:01 Iter 11000 [Val]: loss=52.99, epe=16.44
Saving model...
INFO:tensorflow:./pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/pwcnet.ckpt-11000 is not in all_model_checkpoint_paths. Manually adding it.
... model saved in ./pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/pwcnet.ckpt-11000
2019-04-19 17:17:18 Iter 11100 [Train]: loss=59.80, epe=18.71, lr=0.000282, sample

2019-04-19 18:54:17 Iter 16000 [Val]: loss=51.41, epe=15.94
Saving model...
INFO:tensorflow:./pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/pwcnet.ckpt-16000 is not in all_model_checkpoint_paths. Manually adding it.
... model saved in ./pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/pwcnet.ckpt-16000
2019-04-19 18:56:39 Iter 16100 [Train]: loss=60.82, epe=19.06, lr=0.000404, samples/sec=22.2, sec/step=0.720, eta=16:47:30
2019-04-19 18:58:33 Iter 16200 [Train]: loss=61.81, epe=19.32, lr=0.000407, samples/sec=23.3, sec/step=0.686, eta=15:58:42
2019-04-19 19:00:24 Iter 16300 [Train]: loss=59.08, epe=18.50, lr=0.000409, samples/sec=23.2, sec/step=0.689, eta=16:00:44
2019-04-19 19:02:17 Iter 16400 [Train]: loss=59.31, epe=18.58, lr=0.000412, samples/sec=22.2, sec/step=0.719, eta=16:42:14
2019-04-19 19:03:52 Iter 16500 [Train]: loss=59.44, epe=18.60, lr=0.000414, samples/sec=29.4, sec/step=0.545, eta=12:38:06
2019-04-19 19:05:39 Iter 16600 [Train]: loss=93.24, epe=28.73, lr=0.000417, sample

2019-04-19 20:37:36 Iter 21300 [Train]: loss=56.80, epe=17.77, lr=0.000468, samples/sec=23.4, sec/step=0.683, eta=14:55:32
2019-04-19 20:39:25 Iter 21400 [Train]: loss=61.72, epe=19.34, lr=0.000466, samples/sec=23.3, sec/step=0.688, eta=15:00:57
2019-04-19 20:41:15 Iter 21500 [Train]: loss=60.85, epe=19.74, lr=0.000463, samples/sec=24.1, sec/step=0.665, eta=14:30:09
2019-04-19 20:43:07 Iter 21600 [Train]: loss=62.02, epe=19.42, lr=0.000461, samples/sec=23.8, sec/step=0.673, eta=14:39:53
2019-04-19 20:44:58 Iter 21700 [Train]: loss=60.38, epe=18.93, lr=0.000458, samples/sec=23.5, sec/step=0.680, eta=14:47:36
2019-04-19 20:46:49 Iter 21800 [Train]: loss=59.96, epe=18.76, lr=0.000456, samples/sec=23.7, sec/step=0.676, eta=14:41:27
2019-04-19 20:48:41 Iter 21900 [Train]: loss=61.49, epe=19.22, lr=0.000453, samples/sec=23.1, sec/step=0.691, eta=14:59:44
2019-04-19 20:50:31 Iter 22000 [Train]: loss=58.90, epe=18.44, lr=0.000451, samples/sec=23.4, sec/step=0.684, eta=14:49:32
2019-04-19 20:51

2019-04-19 22:24:42 Iter 26800 [Train]: loss=86.56, epe=30.01, lr=0.000333, samples/sec=23.5, sec/step=0.682, eta=13:52:21
2019-04-19 22:26:35 Iter 26900 [Train]: loss=58.78, epe=18.40, lr=0.000331, samples/sec=23.4, sec/step=0.684, eta=13:52:45
2019-04-19 22:28:23 Iter 27000 [Train]: loss=58.90, epe=18.43, lr=0.000328, samples/sec=23.6, sec/step=0.679, eta=13:45:54
2019-04-19 22:28:58 Iter 27000 [Val]: loss=54.90, epe=17.02
Saving model...
INFO:tensorflow:./pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/pwcnet.ckpt-27000 is not in all_model_checkpoint_paths. Manually adding it.
... model saved in ./pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/pwcnet.ckpt-27000
2019-04-19 22:31:07 Iter 27100 [Train]: loss=63.79, epe=19.99, lr=0.000326, samples/sec=26.2, sec/step=0.610, eta=12:21:39
2019-04-19 22:32:59 Iter 27200 [Train]: loss=63.65, epe=20.47, lr=0.000324, samples/sec=23.2, sec/step=0.689, eta=13:55:53
2019-04-19 22:34:53 Iter 27300 [Train]: loss=72.95, epe=22.77, lr=0.000321, sample

... model saved in ./pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/pwcnet.ckpt-32000
2019-04-19 23:57:58 Iter 32100 [Train]: loss=60.66, epe=18.99, lr=0.000204, samples/sec=33.5, sec/step=0.477, eta=9:00:17
2019-04-19 23:59:23 Iter 32200 [Train]: loss=58.49, epe=18.31, lr=0.000201, samples/sec=34.1, sec/step=0.470, eta=8:50:37
2019-04-20 00:00:48 Iter 32300 [Train]: loss=58.51, epe=18.33, lr=0.000199, samples/sec=34.2, sec/step=0.468, eta=8:47:35
2019-04-20 00:02:15 Iter 32400 [Train]: loss=59.95, epe=18.76, lr=0.000196, samples/sec=34.1, sec/step=0.469, eta=8:48:00
2019-04-20 00:03:40 Iter 32500 [Train]: loss=58.79, epe=18.40, lr=0.000194, samples/sec=34.0, sec/step=0.471, eta=8:49:34
2019-04-20 00:05:06 Iter 32600 [Train]: loss=59.83, epe=18.70, lr=0.000191, samples/sec=33.4, sec/step=0.479, eta=8:57:38
2019-04-20 00:06:33 Iter 32700 [Train]: loss=60.59, epe=18.94, lr=0.000189, samples/sec=33.8, sec/step=0.473, eta=8:50:19
2019-04-20 00:08:00 Iter 32800 [Train]: loss=59.05, epe=18.4

2019-04-20 01:21:35 Iter 37600 [Train]: loss=64.73, epe=20.27, lr=0.000069, samples/sec=34.0, sec/step=0.471, eta=8:10:05
2019-04-20 01:23:02 Iter 37700 [Train]: loss=70.95, epe=22.47, lr=0.000066, samples/sec=33.7, sec/step=0.475, eta=8:13:35
2019-04-20 01:24:29 Iter 37800 [Train]: loss=63.08, epe=19.78, lr=0.000064, samples/sec=34.5, sec/step=0.464, eta=8:01:16
2019-04-20 01:25:56 Iter 37900 [Train]: loss=59.65, epe=18.68, lr=0.000061, samples/sec=34.6, sec/step=0.463, eta=7:58:56
2019-04-20 01:27:21 Iter 38000 [Train]: loss=58.17, epe=18.22, lr=0.000059, samples/sec=33.7, sec/step=0.475, eta=8:10:51
2019-04-20 01:27:54 Iter 38000 [Val]: loss=50.81, epe=15.74
Saving model...
INFO:tensorflow:./pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/pwcnet.ckpt-38000 is not in all_model_checkpoint_paths. Manually adding it.
... model saved in ./pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/pwcnet.ckpt-38000
2019-04-20 01:29:44 Iter 38100 [Train]: loss=58.32, epe=18.26, lr=0.000057, samples/sec

2019-04-20 02:44:44 Iter 43000 [Val]: loss=53.44, epe=16.56
Saving model...
INFO:tensorflow:./pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/pwcnet.ckpt-43000 is not in all_model_checkpoint_paths. Manually adding it.
... model saved in ./pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/pwcnet.ckpt-43000
2019-04-20 02:46:35 Iter 43100 [Train]: loss=58.78, epe=18.39, lr=0.000048, samples/sec=32.3, sec/step=0.495, eta=7:49:09
2019-04-20 02:48:01 Iter 43200 [Train]: loss=58.59, epe=18.33, lr=0.000049, samples/sec=32.5, sec/step=0.492, eta=7:45:38
2019-04-20 02:49:29 Iter 43300 [Train]: loss=58.82, epe=18.40, lr=0.000050, samples/sec=32.5, sec/step=0.493, eta=7:45:44
2019-04-20 02:50:56 Iter 43400 [Train]: loss=61.11, epe=19.08, lr=0.000052, samples/sec=33.0, sec/step=0.485, eta=7:37:22
2019-04-20 02:52:22 Iter 43500 [Train]: loss=61.31, epe=19.19, lr=0.000053, samples/sec=32.8, sec/step=0.488, eta=7:39:11
2019-04-20 02:53:48 Iter 43600 [Train]: loss=58.78, epe=18.39, lr=0.000054, samples/sec

2019-04-20 04:08:49 Iter 48400 [Train]: loss=60.37, epe=18.92, lr=0.000113, samples/sec=33.0, sec/step=0.484, eta=6:56:29
2019-04-20 04:10:15 Iter 48500 [Train]: loss=62.95, epe=19.76, lr=0.000114, samples/sec=32.8, sec/step=0.488, eta=6:58:47
2019-04-20 04:11:44 Iter 48600 [Train]: loss=56.98, epe=17.85, lr=0.000115, samples/sec=32.1, sec/step=0.498, eta=7:06:50
2019-04-20 04:13:13 Iter 48700 [Train]: loss=55.34, epe=17.31, lr=0.000117, samples/sec=33.2, sec/step=0.482, eta=6:51:52
2019-04-20 04:14:40 Iter 48800 [Train]: loss=58.66, epe=18.40, lr=0.000118, samples/sec=33.1, sec/step=0.483, eta=6:52:07
2019-04-20 04:16:08 Iter 48900 [Train]: loss=60.75, epe=18.98, lr=0.000119, samples/sec=32.9, sec/step=0.486, eta=6:53:52
2019-04-20 04:17:36 Iter 49000 [Train]: loss=74.50, epe=23.24, lr=0.000120, samples/sec=32.5, sec/step=0.492, eta=6:58:20
2019-04-20 04:18:09 Iter 49000 [Val]: loss=52.67, epe=16.34
Saving model...
INFO:tensorflow:./pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/pwcne

2019-04-20 05:33:57 Iter 53900 [Train]: loss=60.02, epe=18.79, lr=0.000180, samples/sec=32.6, sec/step=0.491, eta=6:17:04
2019-04-20 05:35:26 Iter 54000 [Train]: loss=60.76, epe=19.03, lr=0.000182, samples/sec=32.5, sec/step=0.492, eta=6:17:04
2019-04-20 05:35:59 Iter 54000 [Val]: loss=49.79, epe=15.43
Saving model...
INFO:tensorflow:./pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/pwcnet.ckpt-54000 is not in all_model_checkpoint_paths. Manually adding it.
... model saved in ./pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/pwcnet.ckpt-54000
2019-04-20 05:37:51 Iter 54100 [Train]: loss=59.93, epe=18.77, lr=0.000183, samples/sec=32.3, sec/step=0.496, eta=6:19:31
2019-04-20 05:39:19 Iter 54200 [Train]: loss=60.42, epe=19.37, lr=0.000184, samples/sec=33.3, sec/step=0.481, eta=6:07:07
2019-04-20 05:40:48 Iter 54300 [Train]: loss=62.64, epe=19.60, lr=0.000185, samples/sec=32.7, sec/step=0.489, eta=6:12:48
2019-04-20 05:42:16 Iter 54400 [Train]: loss=57.85, epe=18.12, lr=0.000186, samples/sec

2019-04-20 06:58:30 Iter 59300 [Train]: loss=63.69, epe=19.92, lr=0.000246, samples/sec=33.1, sec/step=0.484, eta=5:28:15
2019-04-20 06:59:55 Iter 59400 [Train]: loss=59.65, epe=18.71, lr=0.000248, samples/sec=33.0, sec/step=0.484, eta=5:27:44
2019-04-20 07:01:22 Iter 59500 [Train]: loss=59.27, epe=18.52, lr=0.000249, samples/sec=33.1, sec/step=0.484, eta=5:26:32
2019-04-20 07:02:49 Iter 59600 [Train]: loss=58.44, epe=18.31, lr=0.000250, samples/sec=33.3, sec/step=0.480, eta=5:23:17
2019-04-20 07:04:17 Iter 59700 [Train]: loss=67.51, epe=21.28, lr=0.000251, samples/sec=33.1, sec/step=0.483, eta=5:24:29
2019-04-20 07:05:45 Iter 59800 [Train]: loss=60.18, epe=18.86, lr=0.000253, samples/sec=33.4, sec/step=0.479, eta=5:21:10
2019-04-20 07:07:14 Iter 59900 [Train]: loss=57.22, epe=17.93, lr=0.000254, samples/sec=33.0, sec/step=0.485, eta=5:24:26
2019-04-20 07:08:40 Iter 60000 [Train]: loss=58.08, epe=18.18, lr=0.000255, samples/sec=32.5, sec/step=0.492, eta=5:28:09
2019-04-20 07:09:13 Iter

2019-04-20 08:23:04 Iter 64800 [Train]: loss=61.11, epe=19.15, lr=0.000196, samples/sec=33.2, sec/step=0.482, eta=4:43:01
2019-04-20 08:24:32 Iter 64900 [Train]: loss=57.49, epe=18.00, lr=0.000195, samples/sec=33.7, sec/step=0.475, eta=4:37:49
2019-04-20 08:25:58 Iter 65000 [Train]: loss=61.23, epe=19.19, lr=0.000194, samples/sec=34.4, sec/step=0.465, eta=4:31:31
2019-04-20 08:26:30 Iter 65000 [Val]: loss=50.69, epe=15.70
Saving model...
INFO:tensorflow:./pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/pwcnet.ckpt-65000 is not in all_model_checkpoint_paths. Manually adding it.
... model saved in ./pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/pwcnet.ckpt-65000
2019-04-20 08:28:21 Iter 65100 [Train]: loss=63.26, epe=19.95, lr=0.000193, samples/sec=31.6, sec/step=0.507, eta=4:54:38
2019-04-20 08:29:49 Iter 65200 [Train]: loss=61.59, epe=19.30, lr=0.000191, samples/sec=32.6, sec/step=0.490, eta=4:44:24
2019-04-20 08:31:15 Iter 65300 [Train]: loss=71.97, epe=22.46, lr=0.000190, samples/sec

2019-04-20 09:49:13 Iter 70300 [Train]: loss=61.23, epe=19.17, lr=0.000129, samples/sec=32.8, sec/step=0.487, eta=4:01:12
2019-04-20 09:50:41 Iter 70400 [Train]: loss=86.10, epe=29.83, lr=0.000128, samples/sec=32.4, sec/step=0.494, eta=4:03:42
2019-04-20 09:52:07 Iter 70500 [Train]: loss=58.24, epe=18.26, lr=0.000126, samples/sec=32.7, sec/step=0.489, eta=4:00:21
2019-04-20 09:53:35 Iter 70600 [Train]: loss=63.06, epe=19.91, lr=0.000125, samples/sec=32.3, sec/step=0.495, eta=4:02:33
2019-04-20 09:55:03 Iter 70700 [Train]: loss=59.01, epe=18.47, lr=0.000124, samples/sec=32.9, sec/step=0.486, eta=3:57:20
2019-04-20 09:56:29 Iter 70800 [Train]: loss=61.93, epe=19.40, lr=0.000123, samples/sec=32.3, sec/step=0.495, eta=4:01:08
2019-04-20 09:57:57 Iter 70900 [Train]: loss=61.91, epe=19.40, lr=0.000121, samples/sec=32.2, sec/step=0.496, eta=4:00:37
2019-04-20 09:59:26 Iter 71000 [Train]: loss=61.68, epe=19.33, lr=0.000120, samples/sec=32.1, sec/step=0.498, eta=4:00:54
2019-04-20 09:59:59 Iter

2019-04-20 11:18:04 Iter 76000 [Val]: loss=52.92, epe=16.39
Saving model...
... model wasn't saved -- its score (16.39) doesn't outperform other checkpoints
2019-04-20 11:19:46 Iter 76100 [Train]: loss=58.19, epe=nan, lr=0.000058, samples/sec=34.8, sec/step=0.460, eta=3:03:16
2019-04-20 11:21:08 Iter 76200 [Train]: loss=60.77, epe=19.01, lr=0.000057, samples/sec=34.2, sec/step=0.468, eta=3:05:39
2019-04-20 11:22:32 Iter 76300 [Train]: loss=60.93, epe=19.06, lr=0.000055, samples/sec=33.7, sec/step=0.474, eta=3:07:19
2019-04-20 11:23:58 Iter 76400 [Train]: loss=58.92, epe=18.47, lr=0.000054, samples/sec=34.2, sec/step=0.467, eta=3:03:51
2019-04-20 11:25:21 Iter 76500 [Train]: loss=63.91, epe=20.42, lr=0.000053, samples/sec=34.3, sec/step=0.467, eta=3:02:51
2019-04-20 11:26:46 Iter 76600 [Train]: loss=57.96, epe=18.17, lr=0.000052, samples/sec=34.6, sec/step=0.462, eta=3:00:22
2019-04-20 11:28:10 Iter 76700 [Train]: loss=59.99, epe=18.75, lr=0.000050, samples/sec=34.2, sec/step=0.468, eta

2019-04-20 12:43:08 Iter 81700 [Train]: loss=59.54, epe=18.63, lr=0.000020, samples/sec=32.8, sec/step=0.488, eta=2:28:50
2019-04-20 12:44:33 Iter 81800 [Train]: loss=58.25, epe=18.23, lr=0.000021, samples/sec=32.9, sec/step=0.487, eta=2:27:43
2019-04-20 12:45:56 Iter 81900 [Train]: loss=62.23, epe=19.49, lr=0.000022, samples/sec=33.9, sec/step=0.472, eta=2:22:30
2019-04-20 12:47:22 Iter 82000 [Train]: loss=60.83, epe=19.05, lr=0.000022, samples/sec=32.4, sec/step=0.493, eta=2:27:59
2019-04-20 12:47:49 Iter 82000 [Val]: loss=52.32, epe=16.21
Saving model...
... model wasn't saved -- its score (16.21) doesn't outperform other checkpoints
2019-04-20 12:49:25 Iter 82100 [Train]: loss=61.18, epe=19.13, lr=0.000023, samples/sec=32.8, sec/step=0.487, eta=2:25:22
2019-04-20 12:50:51 Iter 82200 [Train]: loss=57.90, epe=18.11, lr=0.000023, samples/sec=32.4, sec/step=0.494, eta=2:26:40
2019-04-20 12:52:14 Iter 82300 [Train]: loss=59.58, epe=18.65, lr=0.000024, samples/sec=33.1, sec/step=0.484, e

2019-04-20 14:06:17 Iter 87500 [Train]: loss=59.09, epe=18.49, lr=0.000056, samples/sec=31.9, sec/step=0.502, eta=1:44:38
2019-04-20 14:07:37 Iter 87600 [Train]: loss=59.43, epe=18.59, lr=0.000057, samples/sec=32.8, sec/step=0.487, eta=1:40:40
2019-04-20 14:08:57 Iter 87700 [Train]: loss=57.92, epe=18.12, lr=0.000057, samples/sec=31.9, sec/step=0.502, eta=1:42:55
2019-04-20 14:10:21 Iter 87800 [Train]: loss=57.99, epe=18.15, lr=0.000058, samples/sec=32.3, sec/step=0.495, eta=1:40:45
2019-04-20 14:11:42 Iter 87900 [Train]: loss=59.38, epe=18.59, lr=0.000058, samples/sec=32.1, sec/step=0.498, eta=1:40:27
2019-04-20 14:13:00 Iter 88000 [Train]: loss=60.18, epe=18.86, lr=0.000059, samples/sec=33.4, sec/step=0.479, eta=1:35:47
2019-04-20 14:13:28 Iter 88000 [Val]: loss=49.56, epe=15.36
Saving model...
INFO:tensorflow:./pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/pwcnet.ckpt-88000 is not in all_model_checkpoint_paths. Manually adding it.
... model saved in ./pwcnet-sm-6-2-cyclic-chairsthi

2019-04-20 15:15:01 Iter 93100 [Train]: loss=59.23, epe=18.53, lr=0.000090, samples/sec=58.1, sec/step=0.275, eta=0:31:40
2019-04-20 15:16:04 Iter 93200 [Train]: loss=58.56, epe=18.32, lr=0.000091, samples/sec=58.6, sec/step=0.273, eta=0:30:58
2019-04-20 15:17:06 Iter 93300 [Train]: loss=61.63, epe=19.24, lr=0.000091, samples/sec=58.2, sec/step=0.275, eta=0:30:41
2019-04-20 15:18:07 Iter 93400 [Train]: loss=58.52, epe=18.30, lr=0.000092, samples/sec=58.2, sec/step=0.275, eta=0:30:14
2019-04-20 15:19:11 Iter 93500 [Train]: loss=60.32, epe=18.88, lr=0.000093, samples/sec=58.4, sec/step=0.274, eta=0:29:40
2019-04-20 15:20:14 Iter 93600 [Train]: loss=59.18, epe=18.53, lr=0.000093, samples/sec=58.5, sec/step=0.273, eta=0:29:10
2019-04-20 15:21:17 Iter 93700 [Train]: loss=60.27, epe=18.88, lr=0.000094, samples/sec=58.4, sec/step=0.274, eta=0:28:47
2019-04-20 15:22:19 Iter 93800 [Train]: loss=61.38, epe=19.20, lr=0.000095, samples/sec=58.2, sec/step=0.275, eta=0:28:24
2019-04-20 15:23:23 Iter

2019-04-20 16:16:58 Iter 98700 [Train]: loss=59.15, epe=18.52, lr=0.000125, samples/sec=58.2, sec/step=0.275, eta=0:05:57
2019-04-20 16:18:01 Iter 98800 [Train]: loss=58.88, epe=18.43, lr=0.000125, samples/sec=58.2, sec/step=0.275, eta=0:05:30
2019-04-20 16:19:04 Iter 98900 [Train]: loss=59.93, epe=18.77, lr=0.000126, samples/sec=58.2, sec/step=0.275, eta=0:05:03
2019-04-20 16:20:08 Iter 99000 [Train]: loss=59.48, epe=18.62, lr=0.000126, samples/sec=58.2, sec/step=0.275, eta=0:04:35
2019-04-20 16:20:29 Iter 99000 [Val]: loss=51.54, epe=15.98
Saving model...
INFO:tensorflow:./pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/pwcnet.ckpt-99000 is not in all_model_checkpoint_paths. Manually adding it.
... model saved in ./pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16_lvl3/pwcnet.ckpt-99000
2019-04-20 16:21:47 Iter 99100 [Train]: loss=58.68, epe=18.39, lr=0.000127, samples/sec=57.3, sec/step=0.279, eta=0:04:12
2019-04-20 16:22:50 Iter 99200 [Train]: loss=60.19, epe=18.85, lr=0.000128, samples/sec

In [10]:
import numpy as np
t = np.array([[0,0,0],[0,0,-3],[0,3,0]])

In [13]:
e = np.array([[1,0,0],[0,1,0],[0,0,1]])

In [14]:
e@t

array([[ 0,  0,  0],
       [ 0,  0, -3],
       [ 0,  3,  0]])

In [15]:
u, s, vh = np.linalg.svd(t, full_matrices=True)

In [22]:
s

array([3., 3., 0.])

In [26]:
s_full = np.array([[3,0,0],[0,3,0],[0,0,0]])

In [20]:
z= np.array([[0,1,0],[-1,0,0],[0,0,0]])
z

array([[ 0,  1,  0],
       [-1,  0,  0],
       [ 0,  0,  0]])

In [24]:
w = np.array([[0,-1,0],[1,0,0],[0,0,1]])
w

array([[ 0, -1,  0],
       [ 1,  0,  0],
       [ 0,  0,  1]])

In [21]:
u @ z @ np.transpose(u)

array([[ 0.,  0.,  0.],
       [ 0.,  0., -1.],
       [ 0.,  1.,  0.]])

In [27]:
u @ w @ s_full @ u.T

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  3.],
       [ 0., -3.,  0.]])

In [29]:
u @ w.T @ vh

array([[ 1.,  0.,  0.],
       [ 0., -1.,  0.],
       [ 0.,  0., -1.]])

In [18]:
u, '\n', s, '\n', vh

(array([[ 0.,  0.,  1.],
        [ 0., -1.,  0.],
        [-1.,  0.,  0.]]),
 '\n',
 array([3., 3., 0.]),
 '\n',
 array([[-0., -1., -0.],
        [ 0.,  0.,  1.],
        [ 1.,  0.,  0.]]))

## Training log

Here are the training curves for the run above:

![](img/pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16/loss.png)
![](img/pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16/epe.png)
![](img/pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16/lr.png)