### Step 1. Prepare the training dataset:
To prepare the dataset of face images follow the structure found in `DATASETS/example_dataset`. The dataset should contain a `VIS` directory with visible spectrum images, and a `NIR` directory with corresponding near-infrared images. 
Images should use the naming convention `{identity}_{sample_name}.jpg`. Corresponding images in the `VIS` and `NIR` directories should share the same name. 


### Step 2. Add identity features to the training dataset:
To create identity features of the training images use the script `create_training_identity_features.py`. The identity features are saved in the `identity_features.json` file in the dataset directory.


In [None]:
data_folder = "DATASETS/example_dataset"
recognition_model = "Arcface_files/ArcFace_r100_ms1mv3_backbone.pth"

!bash ./docker_run.sh python create_training_identity_features.py --data_folder=$data_folder --model=$recognition_model --gpu=0 --all_or_one="all"

The script relies on the following arguments: 
* `--rec_model` should point to the `.pth` file of a pretrained recognition model
* `--gpu_device_number` determines which GPU to use (e.g. `--gpu_device_number=0`)
*  `--all_or_one`  determines whether to use identity features of each image in the dataset (`all`) or one most representative identity feature per identity (`one`)


### Step 3. Train the identity-conditioned StyleGAN2 model:
To train the identity-conditioned StyleGAN2 of ArcBiFaceGAN use the `training.py` script as follows:   


In [None]:

path_to_training_dataset = "DATASETS/example_dataset"
output_folder = "EXPERIMENTS/training_output"

NIR_loss_weight = 0.1

!bash ./docker_run.sh python training.py --NIR_loss_weight=$NIR_loss_weight --cfg="auto" --snap=20 --gpus=1 --mirror=1 --gpu_device_number=0 --batch=12  --data=$path_to_training_dataset  --outdir=$output_folder #--cond=1


The script relies on the following arguments: 
* `--data` should point to the training dataset with `VIS` and `NIR` subdirectories
* `--outdir` determines the output directory
* `--NIR_loss_weight` defines the weight of the NIR Discriminator in the final loss calculation
* `--cfg` determines the model configuration (e.g. number of blocks, image resolution)
* `--snap` defines the frequency of snapshots during training
* `--batch` determines the batch size
* `--mirror=1` enables horizontal flipping of training images
* `--gpu_device_number` determines which GPU to use, if you want to use one
* `--gpus` determines the amount of available GPUs, if you want to use multiple (only works in certain environments)
* `--cond=0` can be used to disable training based on the identity condition

For details on other possible arguments and available configurations check the [StyleGAN2-ADA](https://github.com/NVlabs/stylegan2-ada-pytorch) documentation.


### Step 3.5. Continue training with updates NIR loss weight:
To continue training from a saved checkpoint use the `--resume` argument, i.e. `--resume={path_to_pretrained_model}`. 


In [None]:
path_to_training_dataset = "DATASETS/example_dataset"
path_to_pretrained_pkl_model = ""
output_folder = "EXPERIMENTS/training_output_continued"

NIR_loss_weight = 0.5

!bash ./docker_run.sh python training.py --NIR_loss_weight=$NIR_loss_weight --cfg="auto" --snap=20 --gpus=1 --mirror=1 --GPU_DEVICE_NUMBER=0 --batch=12  --data=$path_to_training_dataset --resume=$path_to_pretrained_pkl_model --outdir=$output_folder --cond=1


### Step 4. Generate synthetic recognition datasets:
To generate data using ArcBiFaceGAN use the `generate_recognition_data.py` script as follows:

In [None]:
path_to_pretrained_pkl_model = ""
recognition_model = "Arcface_files/ArcFace_r100_ms1mv3_backbone.pth"
output_folder = "EXPERIMENTS/synthetic_output_example"
path_to_training_identity_features="DATASETS/example_dataset/identity_features.json" 
ids = 100
samples_per_id = 32
seed = 0
gpu_device_number = 0

!bash ./docker_run.sh python generate_recognition_data.py --gen_model=$path_to_pretrained_pkl_model --rec_model=$recognition_model --outdir=$output_folder --training_ids=$path_to_training_identity_features --ids=$ids --samples_per_id=$samples_per_id --seed=$seed --gpu_device_number=$gpu_device_number  

The script relies on the following arguments: 
* `--gen_model` should point to the `.pkl` file of the identity-conditioned StyleGAN2 model that was trained in the previous step
* `--rec_model` should point to the `.pth` file of the pretrained recognition model to be used for filtering
* `--training_ids` should point to the  `.json` file of training identity features (i.e. identities of real-world subjects)
*  `--outdir` determines the output directory
* `--ids` defines the amount of synthetic identities to be generated
* `--samples_per_id` controls the amount of samples to be generated per synthetic identity
* `--seed` determines which starting seed to use 
* `--truncation` controls the truncation factor of the latent space (see the [StyleGAN2-ADA](https://github.com/NVlabs/stylegan2-ada-pytorch) documentation)
*  `--gpu_device_number` determines which GPU device to use (e.g. `0` or `1`)