Skip to content

CFG Parameters in the [net] section

Alexey edited this page Jun 15, 2020 · 5 revisions

CFG-Parameters in the [net] section:

  1. [net] section
    • batch=1 - number of samples (images, letters, ...) which will be precossed in one batch

    • subdivisions=1 - number of mini_batches in one batch, size mini_batch = batch/subdivisions, so GPU processes mini_batch samples at once, and the weights will be updated for batch samples (1 iteration processes batch images)

    • width=416 - network size (width), so every image will be resized to the network size during Training and Detection

    • height=416 - network size (height), so every image will be resized to the network size during Training and Detection

    • channels=3 - network size (channels), so every image will be converted to this number of channels during Training and Detection

    • inputs=256 - network size (inputs) is used for non-image data: letters, prices, any custom data

    • max_chart_loss=20 - max value of Loss in the image chart.png

For training only

  • Contrastive loss:

    • contrastive=1 - use Supervised contrastive loss for training Classifier (should be used with [contrastive] layer)

    • unsupervised=1 - use Unsupervised contrastive loss for training Classifier on images without labels (should be used with contrastive=1 parameter and with [contrastive] layer)

  • Data augmentation:

    • angle=0 - randomly rotates images during training (classification only)

    • saturation = 1.5 - randomly changes saturation of images during training

    • exposure = 1.5 - randomly changes exposure (brightness) during training

    • hue=.1 - randomly changes hue (color) during training https://en.wikipedia.org/wiki/HSL_and_HSV

    • blur=1 - blur will be applied randomly in 50% of the time: if 1 - will be blured background except objects with blur_kernel=31, if >1 - will be blured whole image with blur_kernel=blur (only for detection and if OpenCV is used)

    • min_crop=224 - minimum size of randomly cropped image (classification only)

    • max_crop=448 - maximum size of randomly cropped image (classification only)

    • aspect=.75 - aspect ration can be changed during croping from 0.75 - to 1/0.75 (classification only)

    • letter_box=1 - keeps aspect ratio of loaded images during training (detection training only, but to use it during detection-inference - use flag -letter_box at the end of detection command)

    • cutmix=1 - use CutMix data augmentation (for Classifier only, not for Detector)

    • mosaic=1 - use Mosaic data augmentation (4 images in one)

    • mosaic_bound=1 - limits the size of objects when mosaic=1 is used (does not allow bounding boxes to leave the borders of their images when Mosaic-data-augmentation is used)

    • data augmentation in the last [yolo]-layer

      • jitter=0.3 - randomly changes size of image and its aspect ratio from x(1 - 2*jitter) to x(1 + 2*jitter)
      • random=1 - randomly resizes network size after each 10 batches (iterations) from /1.4 to x1.4 with keeping initial aspect ratio of network size
    • adversarial_lr=1.0 - Changes all detected objects to make it unlike themselves from neural network point of view. The neural network do an adversarial attack on itself

    • attention=1 - shows points of attention during training

    • gaussian_noise=1 - add gaussian noise

  • Optimizator:

    • momentum=0.9 - accumulation of movement, how much the history affects the further change of weights (optimizer)

    • decay=0.0005 - a weaker updating of the weights for typical features, it eliminates dysbalance in dataset (optimizer) http://cs231n.github.io/neural-networks-3/

    • learning_rate=0.001 - initial learning rate for training

    • burn_in=1000 - initial burn_in will be processed for the first 1000 iterations, current_learning rate = learning_rate * pow(iterations / burn_in, power) = 0.001 * pow(iterations/1000, 4) where is power=4 by default

    • max_batches = 500200 - the training will be processed for this number of iterations (batches)

    • policy=steps - policy for changing learning rate: constant (by default), sgdr, steps, step, sig, exp, poly, random (f.e., if policy=random - then current learning rate will be changed in this way = learning_rate * pow(rand_uniform(0,1), power))

    • power=4 - if policy=poly - the learning rate will be = learning_rate * pow(1 - current_iteration / max_batches, power)

    • sgdr_cycle=1000 - if policy=sgdr - the initial number of iterations in cosine-cycle

    • sgdr_mult=2 - if policy=sgdr - multiplier for cosine-cycle https://towardsdatascience.com/https-medium-com-reina-wang-tw-stochastic-gradient-descent-with-restarts-5f511975163

    • steps=8000,9000,12000 - if policy=steps - at these numbers of iterations the learning rate will be multiplied by scales factor

    • scales=.1,.1,.1 - if policy=steps - f.e. if steps=8000,9000,12000, scales=.1,.1,.1 and the current iteration number is 10000 then current_learning_rate = learning_rate * scales[0] * scales[1] = 0.001 * 0.1 * 0.1 = 0.00001

    • label_smooth_eps=0.1 - use label smoothing for training Classifier

For training Recurrent networks:

  • Object Detection/Tracking on Video - if [conv-lstm] or [crnn] layers are used in additional to [connected] and [convolutional] layers

  • Text generation - if [lstm] or [rnn] layers are used in additional to [connected] layers

    • track=1 - if is set 1 then the training will be performed in Recurrents-tyle for image sequences

    • time_steps=16 - training will be performed for a random image sequence that contains 16 images from train.txt file

      • for [convolutional]-layers: mini_batch = time_steps*batch/subdivisions
      • for [conv_lstm]-recurrent-layers: mini_batch = batch/subdivisions and sequence=16
    • augment_speed=3 - if set 3 then can be used each 1st, 2nd or 3rd image randomly, i.e. can be used 16 images with indexes 0, 1, 2, ... 15 or 110, 113, 116, ... 155 from train.txt file

    • sequential_subdivisions=8 - lower value increases the sequence of images, so if time_steps=16 batch=16 sequential_subdivisions=8, then will be loaded time_steps*batch/sequential_subdivisions = 16*16/8 = 32 sequential images with the same data-augmentation, so the model will be trained for sequence of 32 video-frames

    • seq_scales=0.5, 0.5 - increasing sequence of images at some steps, i.e. the coefficients to which the original sequential_subdivisions value will be multiplied (and batch will be dividied, so the weights will be updated rarely) at correspond steps if is used policy=steps or policy=sgdr