A dataset (or data set) is a collection of data that are used for machine-learning training job.
Machine learning typically works with three datasets:
-
Training dataset
The actual dataset that we use to train the model. The model learns weights and parameters from this data.
-
Validation dataset
The validation set is used to evaluate a given model during the training process. It helps machine learning engineers to fine-tune the HyperParameter at model development stage. The model doesn't learn from validation dataset; and validation dataset is optional.
-
Test dataset
The Test dataset provides the gold standard used to evaluate the model. It is only used once a model is completely trained.
See Jason Brownlee’s article for more detail.
DJL provides a number of built-in basic and standard datasets. These datasets are used to train deep learning models.