Machine Learning CONFIGuration, mlconfig file

mlconfig file

In this section it will be briefly described the structure of the mlconfig file,and how to create it from scratch.The mlconfig file represent central part of the ANNdotNET tool, since it is a basic file in order to run deep learning with ANNdotNET.

In order to write proper mlconfig file, the user should define 8 keywords:

configid: - unique identifier of the mlconfig,
metadata:- meta information about data set.
features: - defines features for the model
labels: - defines labels for the model
network: - defines neural network model to be trained
learning: - defines learning parameters,
training: - defines training parameters,
path:– defines paths to files needed during training and evaluation.

Each keyword consists of several parameters (pn) and values (pv) which must be specified in one line. In case mlkeyword is defined in more than one line the mlcofing will not be parsed correctly. mlconfig keyword defined set of parameters through defining parameter names, followed by parameter values.

So every keyword is written in the following template: [mlkeyword]:[pv] or [mlkeyword]:[pn]:[pv]

In case more than one parameters it is used '|' as parameter separator.For example:

[mlkeyword]:|[pn1]:[pv1]|[pn2]: [pv2]

Parameter name (pn) may have one or more parameter values (> pv). Each parameter value is separated using ' ' space or semicolon ';'. For example:

[mlkeyword]:|[pn1]:[pv1] [pv2] or [mlkeyword]:|[pn1]:[pv1];[pv2]

Semicolon and space can be mixed in case some parameter value can be of different type. For example, input feature can be a vector or matrix or tensor. For example this is correct ml syntax:

[mlkeyword]:|[pn1]:[pv11;pv12] [pv2] [pv3]

The last syntax indicates the first parameter value, consist of two numbers, then second and third parameter values are just a values.

Making comment in the `mlcofig` file

mlconfig file allows you to create as many empty lines as you like. In case you want to add comment in the file, the sentence must begin with exclamation ‘!’. Order of the keywords is irrelevant. Every keyword must end in one line.

For example, the following content represent typical mlconfig file:

!ANNdotNET v1.5
!Iris mlconfig file iris.mlconfig
!configid represent the unique identified of the configuration
modelid:33fe0968-d640-4b53-97dc-982dcf2b1cad

!metada contains information about data set.
metadata:|Column01:sepal_length;Numeric;Feature;Ignore;|Column02:sepal_width;Numeric;Feature;Ignore;|Column03:petal_length;Numeric;Feature;Ignore;|Column04:petal_width;Numeric;Feature;Ignore;|Column05:species;Category;Label;Ignore;setosa;versicolor;virginica

!Information about features. The line contains
! two groups of features: NumericFeatures and Product fetaure

features:|NumFeatures 4 0 |Product 10 0

!Information about label
labels:|species 3 0

!Network configuration
network:|Layer:Normalization 0 0 0 None 0 0 |Layer:Dense 5 0 0 ReLU 0 0 |Layer:Dense 3 0 0 Softmax 0 0

!Learning parameter information
learning:|Type:SGDLearner |LRate:0.01 |Momentum:0.9 |Loss:CrossEntropyWithSoftmax |Eval:ClassificationAccuracy|L1:0|L2:0

!Training parameters information
training:|Type:default |BatchSize:65 |Epochs:1000 |Normalization:0 |RandomizeBatch:False |SaveWhileTraining:1 |ProgressFrequency:50 |ContinueTraining:0 |TrainedModel:models model_at_952of1000_epochs_TimeSpan_636720117054117391

!Components of the mlconfig paths
paths:|Training:data\\mldataset_train |Validation:data\\mldataset_valid \|Test:data\\mldataset_valid |TempModels:temp_models |Models:models |Result:FFModel_result.csv |Logs:log

configid

Each mlconifg has unique identifier GUID. The GUID value is generated automatically and supposed to be identifier for several components used by ANNdotNET MLEngine. When mlconfig file is created manually, it can be any string value. Syntax for the keyword is: modelid:[guid_string]

metadata:

The metadata keyword contains meta information about data set. The information is arranged as list of columns, where each column contains: name, type, kind of variable, missing value type, and class values in case of categorical variable.

metadata:|Column01:[name];[type];[kind];[missingtype] |Column02:[name];[type];[kind];[missingtype];[cv1][cv2]|Column03:.... Values cv1, cv2, cv3 are class values of categorical variable in second column.

Note: The number of columns must be the same as number of dimensions of features and labels.

features: and labels:

When describing features and labels in the mlconfig file, the following signatures should be applied:

features:|[featurename1] [dim] [isSparsedata] |[featurename2] [dim1;dim2] [isSparsedata] ... labels:|[labelname1] [dim] [isSparsedata] |[labelname2] [dim1;dim2;dim3] [isSparsedata] ...

Each feature and label are defined with 3 parameters:

name - name of the variable. The name must be the same as the name from the dataset file.
dimension - dimension of the variable
issparsedata - indicates is data written in sparse format.

So, in case of iris data set features are define:

features:|iris_measures 1;4 0  
labels:|species 3 0

It means the iris data set has 4 features which is grouped in iris_measures name, with 1x4 dimensions, which identifies 4 features and it is not written in sparse data. The keyword labels is defined as species with one-hot encoding vector of 3 classes

Assume we have an example for features and labels definition with the following case:

features:|year 3 1 |month 12 1|shop 52 1|item 5100 1|cnt_past3m 3 0
labels:|item_cnt_month 1 1 0

As can be seen, in this example there are 5 different group of features. The features are described in the following text:

feature name: year, with dimension of 3, and it is sparse data (1).
feature name: item, with 5100 dimensions, and this feature is sparse data (1).
feature name: cnt_past3m, with 3 dimensions, and this data is not (0) sparse.
label name: item_cnt_month, with 1 dimension, and it is not (0) sparse data.

Definition of features and labels are closely related on how mlready dataset is generated. In case of the above definition one row for corresponded mlready data set is given as:

|year 2:1 |item 3906:1 |cnt_past3m 0.02696543 0.02696543 0.02696543|item_cnt_month 0.02696543

We can see that the row defines with 3 groups of features and one group of labels. Two features are category and one is numeric feature. Also label is of numeric type, since it has only one dimension. It is recommended to review other examples to see how the features and labels defined.

Assume image classification is described in the mlconfig file. The features and labels are defined by map file containing image paths and label index.
In mlconfig file the features are defined as:

features:|features 3;98;98;1 0   
labels:|labels 2 0

Each image feature is described by 4 numbers (channel;height;width;augmentation). The image is defined with 3x98x98 (98 width, 98 height and 3 channels). The last value from semicolon separation indicates the Image Data Augmentation.

1- means augmentation is enabled, while
0- means augmentation is disabled.

The last parameter value 0 means the data is not sparse.

network:

network: keyword defines the network model. It can be of type:

default – all network parameters are provided in the ml config file,
custom - network model are provided as C# method in extension project.

In order to define custom network model, the following line must be defined:

network:|Layer:Custom

So, when first layer is of Custom type, the API tries to call the delegate method specified by the last argument of the MachineLearning.Run. In case the custom model implementation is not provided, the exception is thrown by specifying the exception message.

For default network type we can define various network models. The network keyword has the following signature:

network:|Layer1:[Type][Param1] [Param2] [Param3] [FParam] [BParam1] [BParam2] |Layer2:[Type][Param1] [Param2] [Param3] [FParam] [BParam1] [BParam2]|Layer3:...

As can be seen, network consist of layers. Each Layer has 7 parameters which may be define. You can have as many layers as you like. This means you can make network of arbitrary size.

Type of network parameters can have the following value:

Normalization – implements normalization layer,
Scale - implements layer of scalar value, each feature is multiply by this value. This layer is handy when modelling time series, in order to normalize data.
Dense – implements dense layer
Embedding- implements Embedding layer.
DropOut -implements dropout layer.
LSTM – implements Long Short-Term memory layer,
NALU - implements Neural Arithmetic Logic Units layer,
Conv1D - convolution 1D layer,
Conv2D - convolution 2D layer,
Polling1D- pooling 1D layer,
Polling2D - pooling 2D layer,
CudaStackedLSTM- implementation of CNTK stacked LSTMrecurrent layer,
CudaStackedGRU - implementation of CNTK stacked GRU recurrent layer.

Other layer parameters can be summarizing to:

Type [layertype] - type layer name
Param1[number] – defines the first numeric parameter of the layer.Depending of the layer it has different meaning.
Param2[number] – defines the second numeric parameter of the layer.
FParam [name] – defines parameter indicated function (activation, arithmetic, or other function ).
Param3 [number]- defines the third numeric parameter of the layer.
BParam1 [0,1] - defines Boolean value parameter.
BParam2 [0,1] - defines Boolean value parameter.

Schematic picture of sample network model which is presented as list of layer is shown below:

The last layer in the sequence must be the output layer.

Example of bike sharing network model

The Bike Sharing example can be found and opened from the Start Page of ANNdotNET GUI Tool. The graphical representation of the model is shown on the image above. The following text describes the network model in the mlconfig file:

network:|Layer:Normalization 0 0 0 None 0 0 |Layer:Embedding 10 0 0 None 0 0 |Layer:LSTM 240 240 0 TanH 1 1 |Layer:Drop 0 0 20 None 0 0 |Layer:Dense 20 0 0 TanH 0 0 |Layer:Dense 1 0 0 None 0 0

The model is defined as: EmbeddedLSTM network which has several different layers in the network. Since we are dealing with numerical data, the first layer is Normalization which normalizes the numerical features value. Only numerical values are normalized with this layer, which means other categorical features are remain the same. Then the Embedding layer is added in order to reduce the features number, since we have 40 features. With embedding layer, we reduce 40 features to 10, and then we add LSTM layer with 240 output dimensions. After LSTM we add some dropout and dense layers. Since the label layer has dimension of 1, the last layer in the network must match the output layer dimension. The last layer doesn’t have activation function, since we except the any value.

learning:

The learning: keyword defines the learning parameters. The signature of the keyword is the following:

The learning keyword has 5 parameters:

Type – defines the learning type. Currently supported learners: (SGDLearner, MomentumSGDLearner, FSAdaGradLearner, AdaGradLearner, AdamLearner)
LRate: - indicates the learning rate which represent the real value. Example (|LRate:0.01)
Momentum – defines momentum for the learner. Example (|Momentum:0.9)
Loss: - defines the loss function for the learner.Supported loss functions:SquaredError, BinaryCrossEntropy, CrossEntropyWithSoftmax. Example (|Loss:SquaredError).
Eval: - defines the function during testing and evaluation for the model. Supported functions:SquaredError, ClassificationError, ClassificationAccuracy,RMSError,MSError.

Supported loss and evaluation functions defined directly in CNTK library:

SquaredError -used for regression models,
ClassificationError - for classification problems.
BinaryCrossEntropy - Computes the binary cross entropy (aka logistic loss) between the output and target,
CrossEntropyWithSoftmax - computes the cross entropy between the target_vector and the softmax of the output_vector.

Several additional custom functions are implemented, and they can be used in training models:

RMSError - root mean square error,
MSError - mean squared error.

Example of learning parameters

Example of learning parameters in ANNdotNET GUI tool for Bike Sharing looks like:

The learner type is AdaGradLearner, with 0.01 of learning rate, and 0.9 of momentum. The loss function is Squared error, and the evaluation function is Squared error.

training:

The first parameter is the type which indicates should we used default, or custom implemented minibatch source. Possible values (custom, default).
The BatchSize defines the size for the batch for the trainer. Possible values are 1 to size of training data set. Example (|BatchSize:125)
The Epoch: defines the number of cycles the trainer processes all samples from the training dataset. Example (|Epochs:12)
The Normalization: defines if the network model contains the Normalization layer which will normalize the data. Normalization layer is described here.
SaveWhileTraining: - indicates if the MLFactory will save models during training process. This option should be used when we expect the model will be overtrained during training.
BandmizeBatch: - indicates if the batch will be generated randomly during training.
ProgressFrequency: - defines how progress will be sent to caller, in order to report about training progress.
TrainedModel: - when the model is trained, this parameter holds the path of the rained model.

Simple example of training keyword which is defined in Solar production example:

Training process is defined with default implemented minibatch source, with 500 size of batch which will not be randomized, with no normalization, with saving models during training, and report progress every 2 epochs. The following image shows the training parameters in GUI tool:

`Paths:`

The last keyword is paths, which contains file paths for correctly working mlconfig.

Training: - path of the training data set file
Validation: - path of the validation data set file¸
Test: - path of the test data set file used for evaluation
TemModels: - path where models are stored during training process
Models: - path where stored the best model once the training process finished
Result: - full path name where the result of the evaluation process stored.
Logs – path of log folder. Location of log files.

The last 6 keywords are self-explained and example of using them are presented in the case of solar production example:

As can be seen, paths keyword defines the path for training, validation and testing data sets, storing models during training and final stage, and the path where the result will be stored when the model is evaluated.

Other information about mlconfig file

The mlconfig file can define comments, which are important during explanation about parameters and options. Beside comment, important information in the mlconfig file is separators.

There are three four kind of separators:

: -double point
| -vertical line,
- space,
; semicolon.

: - separates the keyword and parameters, as well as parameters names of their values. | - separates parameters within the keyword. - separates parameter values. ; - separates list of values for the parameter.

So, let’s see the following example:

network:|Layer:LSTM 240 240 0 TanH 1 1 |Layer:Dense 20 0 0 TanH 0 0

Network keyword is separated by : from its parameters (|Layer…..). Then | separates different network layers (:|Layer:LSTM 240 240 0 TanH 1 1, |Layer:Dense 20 0 0 TanH 0 0). Each network layer is separated by vertical line |.Each layer’s parameters are separated by space. In case the parameter has list of values element list are separated by semicolon.

Note: After double point space is not allow.

ANNdotNET v1.0 - deep learning tool