Hard bound on learning rate/computed derivative results in low-precision failures

The kludge of forcing the derivative and computed learning rate to be the smallest possible representation in the current fixed point precision may result in instability for low fixed point precision values.

I'm more aware of this problem as it relates to the computed learning rate, i.e.,

computed learning rate = learning rate / # items in a batch
As the number of items in a batch increases, this results in an ever-smaller learning rate. However, for low fixed point precisions (e.g., 7 bits), we can only use a batch size of 64 to allow for a reasonable learning rate of 0.5. Allowing the number of batch items to increase substantially beyond this causes problems. For example, a 7-bit fractional representation with 2048 batch items, the minimum learning rate that we can represent is 16. This is nearly guaranteed to cause instability.

There are a couple of ways to get around this:

Use a larger internal precision to deal with learning rate computations
Limit the batch size to prevent artificially increasing the learning rate

`fann-xfiles` will currently fail if this behavior is detected, but that is not a legitimate solution.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hard bound on learning rate/computed derivative results in low-precision failures #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hard bound on learning rate/computed derivative results in low-precision failures #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions