## Exercise 2.3 - Training the network

Now that we have learnt how the dataset needs to be prepared and how to initialize the models - we get to the most exciting step: the training of the model. In this last part of the exercise, you will train the network and implement some methods to monitor and evaluate the training.

First, we need some packages and methods from our scripts:

In [1]:
import tensorflow as tf
import sys
import time
import os
import csv
from keras.optimizers import Adam
from keras.backend import clear_session

# add folders data_processing and model to path so that we can read in our own modules from this folder
sys.path.append("data_processing")
sys.path.append("model")

# import our own methods 
from load_datasets import load_train_dataset, load_test_dataset
from model_setup import generator, discriminator
from training_methods import train_step  # defined under model/training_methods.py 
from training_methods import eval_example_images  # defined under model/training_methods.py 
from visualize_data import plot_images_at_epoch  # defined under data_processinig/visualize_data.py 

2024-05-28 20:51:56.483919: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Then we need to check which devices are available in our computer. If there are GPUs (better, faster and larger) use this device, else look for CPUs.

In [2]:
# ask for GPU
devices = tf.config.experimental.list_physical_devices("GPU")
if len(devices) > 0:
    tf.config.experimental.set_memory_growth(devices[0] ,enable=True)

# if not there, ask for CPU
else:
    devices = tf.config.experimental.list_physical_devices("CPU")

print(devices)

[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]


Now we load in the datasets in the same way we have done in exercise 2.1.

In [6]:
# path to project directory
PROJECT_DIR = "/net/merisi/pbigalke/teaching/METFUT2024/CGAN_Pix2Pix_MSG"

## path to train and val image sets
TRAIN_PATH = f"{PROJECT_DIR}/VIS_IR_images/train"
TEST_PATH = f"{PROJECT_DIR}/VIS_IR_images/val"

# define image and batch size
IMAGE_SIZE = 128  # do NOT change
BATCH_SIZE = 10 

# load training and test datasets
train_dataset = load_train_dataset(TRAIN_PATH, BATCH_SIZE)
test_dataset = load_test_dataset(TEST_PATH)

### (a) Monitoring training procedure

In the following we have already prepared a method that trains the network. 

In [3]:
# define training procedure
def fit(generator, discriminator, gen_optimizer, discr_optimizer, train_dataset, test_dataset, epochs, 
        outpath_img, loss_file_out):

    start_training = time.time()

    # loop over number of epochs
    for epoch in range(epochs):
        print(f"Epoch {epoch}")
        start_epoch = time.time()

        # loop over batches in training dataset
        for n, (ir_img, real_vis_img) in train_dataset.enumerate():
            # perform training step
             discr_loss, gen_total_loss, gen_gan_loss, gen_l1_loss = train_step(ir_img, real_vis_img, generator, discriminator, 
                                                                                gen_optimizer, discr_optimizer)

        
        # save losses to csv file
        with open(loss_file_out, 'a') as csv_file:
            writer = csv.writer(csv_file)
            writer.writerow([epoch, n, discr_loss, gen_total_loss, gen_l1_loss])

        ##### evaluate progress by plotting some example images

        # select random images from test dataset
        test_ir_img, test_fake_vis_img, test_real_vis_img = eval_example_images(test_dataset, generator, BATCH_SIZE)

        
        # select random image of batch
        rand_idx = random.randint(0, batchsize-1)
        ir_img = ir_batch.numpy()[rand_idx]
        vis_img = vis_batch.numpy()[rand_idx]

        
        # save plot to
        example_imgs = f"{outpath_img}/example_images_epoch{epoch}.png"
        
        ##### TODO 
        ##### plot all three test images in one plot: 
        ##### the infra-red image, the generated fake visible image and the real visible image
        ##### Hint: look for inspiration in data_processing/visualize_data.py
        plot_images_at_epoch(test_ir_img, test_fake_vis_img, test_real_vis_img, 
                            output_file=example_imgs, 
                            normalized=True)
        #####
                
        # calculate the test and print it:
        print("TODO: implement test loss, accuracy etc.")
        
        # print some information on the progress of training
        print("TODO: save losses in an array and return for later plotting")
        print(f"Generator loss: {gen_loss:.2f}, Discriminator loss: {disc_loss:.2f}")
        print(f"Time for epoch {epoch+1}: {(time.time()-start_epoch)/60.:.2f} min.")
        print(f"Total runtime: {(time.time()-start_training)/60.:.2f} min.")


IndentationError: unindent does not match any outer indentation level (<tokenize>, line 19)

In [4]:
# define number of epochs (start with a low number e.g. 3 and once the code is done you can run the whole training)
EPOCHS = 3  # 150

# path where to store example images
OUT_IMG = f"{PROJECT_DIR}/output/training_IR_VIS/example_images"
if not os.path.exists(OUT_IMG):
    os.makedirs(OUT_IMG)

# file where to store losses
LOSS_FILE = f"{PROJECT_DIR}/output/training_IR_VIS/losses.csv"
with open(LOSS_FILE, 'w') as csv_file:
    writer = csv.writer(csv_file)
    field = ["epoch", "batch", "discr_loss", "gen_loss", "vis_similarity"]
    writer.writerow(field)

# clear previous training session
clear_session()

# set up generator
gen_model = generator()
gen_model.summary()

# set up discriminator
discr_model = discriminator()
discr_model.summary()

# create optimizers for generator and discriminator
gen_optimizer = Adam(lr=2e-4, beta_1=0.5)
discr_optimizer = Adam(lr=2e-4, beta_1=0.5)



# train the model
fit(gen_model, discr_model, gen_optimizer, discr_optimizer, train_dataset, test_dataset, EPOCHS, 
    outpath_img=OUT_IMG, loss_file_out=LOSS_FILE)

NameError: name 'PROJECT_DIR' is not defined