-
Notifications
You must be signed in to change notification settings - Fork 61
XML model
All model is stored in XML format.
For example, a model generated by nn-init
will look like this:
<transform type="Affine" input-dim="20" output-dim="64" momentum="0.100000" learning-rate="0.100000" >
<weight></weight>
<bias></bias>
</transform>
<transform type="Sigmoid" input-dim="64" output-dim="64" />
<transform type="Affine" input-dim="64" output-dim="64" momentum="0.100000" learning-rate="0.100000" >
<weight></weight>
<bias></bias>
</transform>
<transform type="Sigmoid" input-dim="64" output-dim="64" />
<transform type="Affine" input-dim="64" output-dim="2" momentum="0.100000" learning-rate="0.100000" >
<weight></weight>
<bias></bias>
</transform>
<transform type="Softmax" input-dim="2" output-dim="2" />
Don't worry about indentation like TAB or space. I use 3rd party library rapidxml, which is very robust.
BUT be careful, rapidxml is a lightweight XML parse. It can only parse
<transform type="Sigmoid" input-dim="2" output-dim="2" />
NOT this (which is common in HTML)
<transform type=Sigmoid input-dim=2 output-dim=2 />
Remember to quote attribute like "value"
!!
Also, empty node like <weight></weight>
and <bias></bias>
is okay.
My program nn-train
will try to fill them with random numbers (see normalized uniform distribution).
You can change activation functions by replacing the type
attribute in <transfrom .. />
like this:
<transform type="tanh" input-dim="2" output-dim="2" />
Case is insensitive. Either type="tanh"
or type="Tanh"
will be fine. It'll be converted it to lower case when parsed.
Here's the list of activation functions available now:
- Sigmoid
- Tanh
- ReLU
- Softplus
- Softmax (last layer only)
- Convolution
- SubSample
Take dropout for example, you can add it after activation functions (ex: Sigmoid) to the above model by inserting:
<transform type="Dropout" input-dim="64" output-dim="64" dropout-ratio="0.3"/>
The attribute dropout-ratio
means in what percentage do you want to dropout.
In this case dropout-ratio="0.3"
means 30% of hidden nodes will be randomly turned off.
The results will be:
<transform type="Affine" input-dim="20" output-dim="64" momentum="0.100000" learning-rate="0.100000" >
<weight></weight>
<bias></bias>
</transform>
<transform type="Sigmoid" input-dim="64" output-dim="64" />
<transform type="Dropout" input-dim="64" output-dim="64" dropout-ratio="0.3"/>
<transform type="Affine" input-dim="64" output-dim="64" momentum="0.100000" learning-rate="0.100000" >
<weight></weight>
<bias></bias>
</transform>
<transform type="Sigmoid" input-dim="64" output-dim="64" />
<transform type="Dropout" input-dim="64" output-dim="64" dropout-ratio="0.3"/>
<transform type="Affine" input-dim="64" output-dim="2" momentum="0.100000" learning-rate="0.100000" >
<weight></weight>
<bias></bias>
</transform>
<transform type="Softmax" input-dim="2" output-dim="2" />