This notebook implements autoencoder trial run cycles.

* Define
    * trial name
    * environment to work with (MyBallsEnv, MyArmEnv, GymArmEnv)
    * dataset to use (set of 3 npz files for train, test and grid visualization, each file contains many angle/image pairs)
    * model builder
    * list of parameters to try
    * number of epochs to run 
    * output directory
* Run
    * fails if output directory exists, otherwise creates it
    * chooses a set of model parameters
Thursday 4/03/2021

* Visualize YYs (third output sheet or animated GIF) to assess decoder performance
* Rerun with [2, 10, 50] epochs
* Implement beta-VAE (see beta-vae notebook)

Next:

* Explore impact of changing filters chain down to 2x (2,2)
* Scatterplot 3D
* How does it fluctuate depending on network architecture, nlats, training protocol, etc
* Explore fold area in L0/L1 space. For each frame show:
 * image of gym robot,
 * colored L0/L1 scatterplot, with a red cross showing current latent state,
 * image reconstructed by the autoencoder

### imports

In [1]:
%load_ext autoreload
%autoreload 2

%matplotlib inline
import matplotlib.pyplot as plt

%load_ext tensorboard

import numpy as np
import os, datetime

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

from src.config import output_data_dir

import logging
logger = logging.getLogger(__name__)


### helpers

In [2]:
def cycle_autoencoder(ae, dataset, prefix="", N=10, vae=False):
    Y = dataset['images']
    
    L = ae['enc'].predict(Y)
    L = np.array(L)

    if vae:
        L = np.array(L)[2] # [z_mean, z_log_var, z] - take Z

    YY = ae['dec'].predict(L)
    YY = np.array(YY)
    
    out_data = dict()
    out_data['latvars'] = L
    out_data['rec_images'] = YY

    return out_data    

### environment

In [3]:
# define ae builder method and environment name
#import src.models.vae
#ae_builder = src.models.vae.build_autoencoder

import src.models.sigma_vae
ae_builder = src.models.sigma_vae.build_autoencoder

ENV_NAME = "twoballs"

### subdirectories

In [4]:
TRIAL_DIR = os.path.join(output_data_dir, ENV_NAME) # FUXNE

# fail if TRIAL_DIR exists
assert(not os.path.exists(TRIAL_DIR))
    
# create trial subdirs
TENSORBOARD_LOGS_DIR =  "%s/tensorboard-logs" % TRIAL_DIR
TRAINED_MODELS_DIR = "%s/trained-models" % TRIAL_DIR
#DATA_DIR = "%s/data" % TRIAL_DIR
IMGS_DIR = "%s/imgs" % TRIAL_DIR

#for dir in [TENSORBOARD_LOGS_DIR, TRAINED_MODELS_DIR, DATA_DIR, IMGS_DIR]:
for dir in [TENSORBOARD_LOGS_DIR, TRAINED_MODELS_DIR, IMGS_DIR]:
    os.makedirs(dir, exist_ok=True)

### load datasets

In [5]:
from src.data.dataset import load_datasets

datasets = load_datasets(ENV_NAME)

2021-03-07 12:37:26,656 - src.data.dataset - INFO - Loading from C:\Users\alexa\Documents\dvp\autoenc2\data\processed\twoballs_rand_15000.npz ...
2021-03-07 12:37:27,643 - src.data.dataset - INFO - Loaded 15000 datapoints from C:\Users\alexa\Documents\dvp\autoenc2\data\processed\twoballs_rand_15000.npz
2021-03-07 12:37:27,643 - src.data.dataset - INFO - Loading from C:\Users\alexa\Documents\dvp\autoenc2\data\processed\twoballs_rand_1000.npz ...
2021-03-07 12:37:27,707 - src.data.dataset - INFO - Loaded 1000 datapoints from C:\Users\alexa\Documents\dvp\autoenc2\data\processed\twoballs_rand_1000.npz
2021-03-07 12:37:27,707 - src.data.dataset - INFO - Loading from C:\Users\alexa\Documents\dvp\autoenc2\data\processed\twoballs_grid_20_500.npz ...
2021-03-07 12:37:29,023 - src.data.dataset - INFO - Loaded 20000 datapoints from C:\Users\alexa\Documents\dvp\autoenc2\data\processed\twoballs_grid_20_500.npz


### train

In [6]:
from src.models import save_models, save_models_weights
from src.visualization.lat import visualize_lat_space, animate_Y_YYs

def save_visualization(dataset, dataset_grid_out, round, epoch):
    vis_fname = "%s/round-%d_epoch-%d.png" % (IMGS_DIR, round, epoch)
    ani_fname = "%s/round-%d_epoch-%d.mp4" % (IMGS_DIR, round, epoch)

    # save 4x4 visualization of pairwise plots of latvars vs angles
    fig, axs = visualize_lat_space(dataset['grid'], dataset_grid_out, sheet=1)
    fig.savefig(vis_fname)
    
    #vis_fname2 = "%s/%sb.png" % (IMGS_DIR, suffix)
    #fig, axs = visualize_lat_space(dataset['grid'], dataset_grid_out, sheet=2)
    #fig.savefig(vis_fname2)

    # save MP4
    NANIFRAMES = 20
    logger.debug("saving ani visualization (for %d datapoints from grid dataset) into %s" % (NANIFRAMES, ani_fname))
    rng = np.random.default_rng()
    ani_is = rng.choice(range(dataset['grid']['images'].shape[0]), size=NANIFRAMES) # choose 20 random items from the dataset_grid
    animate_Y_YYs(dataset['grid']['images'][ani_is], dataset_grid_out['rec_images'][ani_is], outfile=ani_fname)

    plt.close('all')

def run_round(datasets, ae_builder, params, round):
    #if os.path.isfile(latvars_fname):
    #    logger.error("%s exists, skipping ..." % latvars_fname)
    #    return
    
    print("*** run_round(%d, %s)" % (round, str(params)))
       
    models = ae_builder((64, 64, 1), 2, params)

    tensorboard_logdir = os.path.join(("%s/round-%d" % (TENSORBOARD_LOGS_DIR, round)),
                                      datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
    tensorboard_callback = keras.callbacks.TensorBoard(tensorboard_logdir) #, histogram_freq=1)

    for epoch in range(params['epochs']):
        logger.info("round-%d_epoch-%d: fit" % (round, epoch))
        models['ae'].fit(datasets['train']['images'], callbacks=[tensorboard_callback],
               initial_epoch=epoch, epochs=epoch+1, batch_size=128, verbose=0)
        save_models_weights(models, "%s/round-%d_epoch-%d" % (TRAINED_MODELS_DIR, round, epoch))

        #logger.info("round-%d_epoch-%d: cycle" % (round, epoch))
        #dataset_grid_out = cycle_autoencoder(models, datasets['grid'], vae=True)
        #save_visualization(datasets, dataset_grid_out, round, epoch)
    
        # save angles & latvars for grid dataset
        dataset_test_out = cycle_autoencoder(models, datasets['test'], vae=True)
        latvars_fname = "%s/round-%d_epoch-%d.csv" % (IMGS_DIR, round, epoch)
        angles_latvars = np.hstack((datasets['test']['angles'], dataset_test_out['latvars']))
        np.savetxt(latvars_fname, angles_latvars, delimiter=',')

In [7]:
keras.backend.clear_session()
for round in range(10):
    run_round(datasets, ae_builder, {'epochs': 100}, round)

*** run_round(0, {'epochs': 100})


2021-03-07 12:37:30,627 - __main__ - INFO - round-0_epoch-0: fit
2021-03-07 12:37:37,999 - __main__ - INFO - round-0_epoch-1: fit
2021-03-07 12:37:42,148 - __main__ - INFO - round-0_epoch-2: fit
2021-03-07 12:37:46,413 - __main__ - INFO - round-0_epoch-3: fit
2021-03-07 12:37:50,596 - __main__ - INFO - round-0_epoch-4: fit
2021-03-07 12:37:54,785 - __main__ - INFO - round-0_epoch-5: fit
2021-03-07 12:37:59,122 - __main__ - INFO - round-0_epoch-6: fit
2021-03-07 12:38:03,386 - __main__ - INFO - round-0_epoch-7: fit
2021-03-07 12:38:07,832 - __main__ - INFO - round-0_epoch-8: fit
2021-03-07 12:38:12,111 - __main__ - INFO - round-0_epoch-9: fit
2021-03-07 12:38:16,314 - __main__ - INFO - round-0_epoch-10: fit
2021-03-07 12:38:20,471 - __main__ - INFO - round-0_epoch-11: fit
2021-03-07 12:38:24,637 - __main__ - INFO - round-0_epoch-12: fit
2021-03-07 12:38:28,831 - __main__ - INFO - round-0_epoch-13: fit
2021-03-07 12:38:32,963 - __main__ - INFO - round-0_epoch-14: fit
2021-03-07 12:38:37,

*** run_round(1, {'epochs': 100})


2021-03-07 12:44:38,399 - __main__ - INFO - round-1_epoch-1: fit
2021-03-07 12:44:42,585 - __main__ - INFO - round-1_epoch-2: fit
2021-03-07 12:44:46,790 - __main__ - INFO - round-1_epoch-3: fit
2021-03-07 12:44:50,993 - __main__ - INFO - round-1_epoch-4: fit
2021-03-07 12:44:55,226 - __main__ - INFO - round-1_epoch-5: fit
2021-03-07 12:44:59,482 - __main__ - INFO - round-1_epoch-6: fit
2021-03-07 12:45:03,663 - __main__ - INFO - round-1_epoch-7: fit
2021-03-07 12:45:07,860 - __main__ - INFO - round-1_epoch-8: fit
2021-03-07 12:45:12,029 - __main__ - INFO - round-1_epoch-9: fit
2021-03-07 12:45:16,505 - __main__ - INFO - round-1_epoch-10: fit
2021-03-07 12:45:20,695 - __main__ - INFO - round-1_epoch-11: fit
2021-03-07 12:45:24,923 - __main__ - INFO - round-1_epoch-12: fit
2021-03-07 12:45:29,148 - __main__ - INFO - round-1_epoch-13: fit
2021-03-07 12:45:33,364 - __main__ - INFO - round-1_epoch-14: fit
2021-03-07 12:45:37,675 - __main__ - INFO - round-1_epoch-15: fit
2021-03-07 12:45:41

*** run_round(2, {'epochs': 100})


2021-03-07 12:51:39,771 - __main__ - INFO - round-2_epoch-1: fit
2021-03-07 12:51:44,041 - __main__ - INFO - round-2_epoch-2: fit
2021-03-07 12:51:48,317 - __main__ - INFO - round-2_epoch-3: fit
2021-03-07 12:51:52,578 - __main__ - INFO - round-2_epoch-4: fit
2021-03-07 12:51:56,815 - __main__ - INFO - round-2_epoch-5: fit
2021-03-07 12:52:01,442 - __main__ - INFO - round-2_epoch-6: fit
2021-03-07 12:52:05,650 - __main__ - INFO - round-2_epoch-7: fit
2021-03-07 12:52:09,895 - __main__ - INFO - round-2_epoch-8: fit
2021-03-07 12:52:14,103 - __main__ - INFO - round-2_epoch-9: fit
2021-03-07 12:52:18,310 - __main__ - INFO - round-2_epoch-10: fit
2021-03-07 12:52:22,548 - __main__ - INFO - round-2_epoch-11: fit
2021-03-07 12:52:26,755 - __main__ - INFO - round-2_epoch-12: fit
2021-03-07 12:52:31,014 - __main__ - INFO - round-2_epoch-13: fit
2021-03-07 12:52:35,217 - __main__ - INFO - round-2_epoch-14: fit
2021-03-07 12:52:39,462 - __main__ - INFO - round-2_epoch-15: fit
2021-03-07 12:52:43

*** run_round(3, {'epochs': 100})


2021-03-07 12:58:44,108 - __main__ - INFO - round-3_epoch-1: fit
2021-03-07 12:58:48,368 - __main__ - INFO - round-3_epoch-2: fit
2021-03-07 12:58:52,707 - __main__ - INFO - round-3_epoch-3: fit
2021-03-07 12:58:57,366 - __main__ - INFO - round-3_epoch-4: fit
2021-03-07 12:59:01,587 - __main__ - INFO - round-3_epoch-5: fit
2021-03-07 12:59:05,844 - __main__ - INFO - round-3_epoch-6: fit
2021-03-07 12:59:10,052 - __main__ - INFO - round-3_epoch-7: fit
2021-03-07 12:59:14,313 - __main__ - INFO - round-3_epoch-8: fit
2021-03-07 12:59:18,520 - __main__ - INFO - round-3_epoch-9: fit
2021-03-07 12:59:22,727 - __main__ - INFO - round-3_epoch-10: fit
2021-03-07 12:59:26,995 - __main__ - INFO - round-3_epoch-11: fit
2021-03-07 12:59:31,196 - __main__ - INFO - round-3_epoch-12: fit
2021-03-07 12:59:35,444 - __main__ - INFO - round-3_epoch-13: fit
2021-03-07 12:59:39,652 - __main__ - INFO - round-3_epoch-14: fit
2021-03-07 12:59:43,905 - __main__ - INFO - round-3_epoch-15: fit
2021-03-07 12:59:48

*** run_round(4, {'epochs': 100})


2021-03-07 13:05:49,196 - __main__ - INFO - round-4_epoch-1: fit
2021-03-07 13:05:53,470 - __main__ - INFO - round-4_epoch-2: fit
2021-03-07 13:05:57,730 - __main__ - INFO - round-4_epoch-3: fit
2021-03-07 13:06:02,023 - __main__ - INFO - round-4_epoch-4: fit
2021-03-07 13:06:06,261 - __main__ - INFO - round-4_epoch-5: fit
2021-03-07 13:06:10,607 - __main__ - INFO - round-4_epoch-6: fit
2021-03-07 13:06:14,846 - __main__ - INFO - round-4_epoch-7: fit
2021-03-07 13:06:19,131 - __main__ - INFO - round-4_epoch-8: fit
2021-03-07 13:06:23,408 - __main__ - INFO - round-4_epoch-9: fit
2021-03-07 13:06:27,646 - __main__ - INFO - round-4_epoch-10: fit
2021-03-07 13:06:31,907 - __main__ - INFO - round-4_epoch-11: fit
2021-03-07 13:06:36,146 - __main__ - INFO - round-4_epoch-12: fit
2021-03-07 13:06:40,423 - __main__ - INFO - round-4_epoch-13: fit
2021-03-07 13:06:44,661 - __main__ - INFO - round-4_epoch-14: fit
2021-03-07 13:06:48,881 - __main__ - INFO - round-4_epoch-15: fit
2021-03-07 13:06:53

ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted:  OOM when allocating tensor with shape[128,64,64,1] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
	 [[{{node GatherV2}}]]
	 [[IteratorGetNext]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted:  OOM when allocating tensor with shape[128,64,64,1] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
	 [[{{node GatherV2}}]]
	 [[IteratorGetNext]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[gradient_tape/sub_1/Shape/_8]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_448223]

Function call stack:
train_function -> train_function


### tensorboard

In [None]:
#%tensorboard --logdir ..\data\output\twoballs