Model Training ‐ Comparison ‐ [Epochs x Repeats]

_{Models | Logs | Graphs | Configs}

One of the first questions that every guide provides its own answer to is "What's better: 100 epochs with 1 repeat, or 50 epochs with 2 repeats, or 25 epochs with 4 repeats...?" In general, does the ratio of epochs to repeats affect the result if the total number of steps remains the same?

Compared values:

100x1 - B,
50x2,
20x5,
10x10.

_DLR(step)

The peak DLR varies slightly, but a clear logic of growth doesn't seem to be apparent.

_Loss(step)

As the number of epochs varies in each case while the total number of steps remains constant, we will use the loss(step) graph.

The graphs are almost identical, and the lowest graphs are only slightly different due to the time difference between training the models. The only notable observation is that with more epochs and fewer repeats, the graph tends to be more irregular. At 10x10 the graph is very smooth, while at 100x1 it is much more jagged.

While many results may look a little different, the quality appears to be nearly identical across all of them.

CONCLUSION

There is one nuance that is influenced by this ratio, which we will discuss a bit later. However, for now, there seems to be no difference, and the quality of the results is more influenced by the random. The only advantage of having more epochs is that, if desired, you can choose from a larger number of models.

Next - Model Training ‐ Comparison - [Resolution]

Short Way

Model Training ‐ Comparison - Brief

Long Way

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Training ‐ Comparison ‐ [Epochs x Repeats]

CONCLUSION

Clone this wiki locally