Skip to content

Hard bound on learning rate/computed derivative results in low-precision failures #6

@seldridge

Description

@seldridge

The kludge of forcing the derivative and computed learning rate to be the smallest possible representation in the current fixed point precision may result in instability for low fixed point precision values.

I'm more aware of this problem as it relates to the computed learning rate, i.e.,

computed learning rate = learning rate / # items in a batch
As the number of items in a batch increases, this results in an ever-smaller learning rate. However, for low fixed point precisions (e.g., 7 bits), we can only use a batch size of 64 to allow for a reasonable learning rate of 0.5. Allowing the number of batch items to increase substantially beyond this causes problems. For example, a 7-bit fractional representation with 2048 batch items, the minimum learning rate that we can represent is 16. This is nearly guaranteed to cause instability.

There are a couple of ways to get around this:

Use a larger internal precision to deal with learning rate computations
Limit the batch size to prevent artificially increasing the learning rate

fann-xfiles will currently fail if this behavior is detected, but that is not a legitimate solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions