An autoencoder for deep independence learning and distribution tying.
My goal is to solve catastrophic interference when tying two distributions.
Deep-mind technique to prevent it uses sensitivity measures as the means to keep track which neurons can be remolded and which should be left alone. In many extends, it has an edge, but also comes with hard parameter tuning and ever increasing stiffness (at a certain point, no new information could be learned). Mathematically, keeping full Fisher information tensors is impractical. Deepmind only approximate it using un-rotated covariance. Research results in machine learning have over and over again suggested that the best way is to keep old data for retraining (through the use of batch or mini-batch). But I do not think that we should go that far to solve this problem. If we only manage to extract certain statistics from the data, that should be enough.
Interestingly, gradients of a function are nothing more than the correlation between the input and the output. Though the size of data may reach the infinity, the dimensionality of correlation is usually much smaller (if it's larger, we should have used rote-learning). When a new data comes, we can compute its correlation and add it up into our statistics.
Momentum method is the first resemblance, but it is not the same thing. Momentum smooth the gradients over optimization steps, while my suggestion tries to smooth over examples. But it is not for all models that we can efficiently keep the statistics; some require infinite amount of storage (like conventional non-linear neural networks). Some are bounded, like models with linear piecewise activation functions, but the alternative of keeping the example data themselves is more attainable.
The only model that we can keep statistics efficiently is the auto-encoder, and that is enough for me...
tl,dr; non-linear autoencoder with ReLU mirroring bases. Please refer to my blog for more detail.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
- TensorFlow r1.0
- OpenCV
- numpy
- matplotlib
usage: main.py [-h] [--layers LAYERS] [--load] [--coeff COEFF] [--eval EVAL]
[--rate RATE] [--skip SKIP] [--limit LIMIT]
[--boot_skip BOOT_SKIP] [--boot_limit BOOT_LIMIT]
[--infer INFER]
optional arguments:
-h, --help show this help message and exit
--layers LAYERS ex: '100, 100, 100'
--load load weight
--coeff COEFF update rate
--eval EVAL evaluation coefficient
--rate RATE learning rate
--skip SKIP run skip
--limit LIMIT run limit
--boot_skip BOOT_SKIP
bootstrap skip
--boot_limit BOOT_LIMIT
bootstrap limit
--infer INFER total inference steps
This project is licensed under the MIT License - see the LICENSE.md file for details