# Training Weak Lensing Maps / Cosmic Web Slices:

This is the script for training the cosmoGAN algorithm. If you want to train the algorithm on your own dataset be sure to do the following:

* Set the output size accordingly. The "output_size" variable should be equal to the input dataset image dimensions (e.g. 256x256 px).
* Currently the algorithm only works for images of the same height and width dimensions (e.g. 512x512 px etc).
* The training dataset should be a python list/array of the following shape (No. of images x height x width (NHWC). The get_data() function will reshape the input the array by adding an extra dimension corresponding to the color dimension of the training images.
* cosmoGAN can also be trained on RGB images (multidimensional arrays). For more information see the notebook on preparing Illustris data. 

** Importing the libraries: **

In [1]:
import os
import tensorflow as tf
import sys
import time
import numpy as np
import pprint
import functions
from functions import train_dcgan
import warnings
warnings.filterwarnings('ignore')




In [2]:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 1305401620340962448
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 2944228838210510154
physical_device_desc: "device: XLA_CPU device"
]


**GPU settings:**

The training script was tested with both GPU and CPU-only access. If you encounter any problems when training with a GPU, most likely, it's a memory issue, with the dataset being too big. If you need to disable the GPU, the following command can be used:

In [3]:
## If you encounter any problems running the training script with the GPU enable, you can disable it using:
os.environ["CUDA_VISIBLE_DEVICES"]="-1"

** A function to load the training data:**

Here we choose the function to load the training data. If training on greyscale images (i.e. cosmic-web slices and weak lensing convergence maps), the data array is reshaped according to the user choice into No. x height x width x color format (NHWC) or no. x color dimesion x height x width (NCHW) data format. In case of Illustris data (i.e. 3-D RGB arrays) this is not required. 

In [4]:
#dataset_type = "Illustris"
dataset_type = "CW_WL"

def get_data():
    data = np.load(config.datafile, mmap_mode='r')

    if dataset_type != "Illustris":
        if config.data_format == 'NHWC':  ## This is the data format: Number of images x height x width x color dimension
            data = np.expand_dims(data, axis=-1)
        else: # 'NCHW'
            data = np.expand_dims(data, axis=1)

    return data


** Training settings: **

In [5]:
## The important parameters to get right here are the batch_size, c_dim & data_format. The batch_size here is 
## the number of dataset samples used per iteration of training. Also, it controls the number of samples
## produced by the generator neural network after training. The c_dim parameter refers to the dimensionality 
## of the training dataset, i.e. for greyscale images arrays it's c_dim = 1 and for full-color RGB
## arrays, it's c_dim = 3 (e.g. the Illustris dataset). The data_format also has to be set correctly, given the
## dataset you are using. It is highly recommended to reshape your data into  NHWC format, if you are
## training the GAN on your own data. 

flags = tf.app.flags
flags.DEFINE_string("dataset", "cosmic_web", "The name of the dataset")
#flags.DEFINE_string("datafile", "./data/cosmogan_maps_256_8k_1.npy", "Training dataset file location")
flags.DEFINE_string("datafile", "./data/stack_z0p0_fR_f1_f7_a_250_test.npy", "Training dataset file location")
flags.DEFINE_integer("epoch", 5000, "Epochs to train before stopping")
flags.DEFINE_float("learning_rate", 0.00009, "The learning rate parameter")
flags.DEFINE_float("beta1", 0.5, "Momentum term of adam (default value: 0.5)")
flags.DEFINE_float("flip_labels", 0.01, "Probability of flipping labels (default value: 0.01)")
flags.DEFINE_integer("z_dim", 256, "Dimension of noise vector z")
flags.DEFINE_integer("nd_layers", 4, "Number of discriminator convolutional layers (default value: 4)")
flags.DEFINE_integer("ng_layers", 4, "Number of generator conv_T layers (default value: 4)")
flags.DEFINE_integer("gf_dim", 64, "Dimension of generator filters in last conv layer (default value: 64)")
flags.DEFINE_integer("df_dim", 64, "Dimension of discriminator filters in first conv layer (default value: 64)")
flags.DEFINE_integer("batch_size", 64, "The size of batch images (default value: 64)")
flags.DEFINE_integer("output_size", 256, "The size of the output images to produce (default value: 64 for weak lensing maps and 256 for CW slices)")
flags.DEFINE_integer("c_dim", 1, "Dimension of image color. 1 = greyscale image, 3 = RGB image")
flags.DEFINE_string("data_format", "NHWC", "data format (NHWC = No. x height x width x color dimension while NCHW = no. x color dimesion x height x width)")
flags.DEFINE_boolean("transpose_matmul_b", False, "Transpose matmul B matrix for performance [False]")
flags.DEFINE_string("checkpoint_dir", "./checkpoints/checkpoint_name", "Directory name to save the checkpoints (default value: checkpoint)")
flags.DEFINE_string("experiment", "run_0", "Tensorboard run directory name (run_0)")
flags.DEFINE_boolean("save_every_step", False, "Save a checkpoint after every step (default value: False)")
flags.DEFINE_boolean("verbose", True, "print loss on every step (default value: False)")
config = flags.FLAGS

tf.app.flags.DEFINE_string('f', '', 'kernel')


In [None]:
pprint.PrettyPrinter().pprint(config.__flags)
train_dcgan(get_data(), config)