SparseLearning-nim

My playground for training sparse neural networks in Nim.

Summary

This is a simplified explanation of how the algorithms work.

The idea is that a neural networks with dense weights has sparse activations for individual inputs. But this is not commonly utilized, instead dense algorithms are used. But how do we find the largest activations without calculating all activations first? The answer is: hashing!

Let's start off with how a linear layer works. You have an input row-vector x of shape (1, n_features) and weights W with shape (n_features, n_outputs). The output of the linear layer (assume 0 bias/offset) is x*W which will produce an output with shape (1, n_outputs). The key here is to realize how each of the elements in the output is calculated: The n:th output is the dot product of x and the n:th column of W. It is a simple dot product between two vectors. This will be important in the next step, the hashing.

The idea of the hashing is to construct a vector space such that two vectors which have the same angle are close to each other. This is important because the dot product between two vectors x*y = |x|*|y|*cos(φ) is proportional to the angle φ between the two vectors. Assuming that the weight vectors are roughly equal in amplitude, the angle will play a big part in how large the dot product becomes. So by using a vector space where two vectors are close to each other if the angle between them is small, we can reformulate this into a search problem: The columns of W which are the closest to x in this new vector space are the ones that are most likely to produce high activations. So by constructing a hash table which sorts points close to each other into the same bins, the weight vectors can be stored in the hash table. And when we want to calculate the output for an input vector x, we hash x and calculates the weight vectors that are in the same bin. This way we have will have to do fewer dot products as we can skip many of the weight vectors. Add to that the fact that the output of a layer will be sparse. Hence will the dot product between it and the (dense) weight vector of the next layer require fewer multiplications because most of the elements are zero.

If we assume the fraction of activations that we choose is s and the number of numerical operations of a dense layer is N^2. Then we would only need to computed s*N of the output nodes, and each dot product would only need s*N operations. This hand-wavely gives a final complexity of s^2*N^2. This means that if we have a sparsity of s = 10%, then only s^2 = 1% of the weights will be used in the forward pass. And when doing the backward pass, only these 1% of the weights will have to be updated. The hashing of the input has complexity N so it can be cheap if an efficient enough hashing algorithm is used.

The beauty of this is that it could work with existing pretrained models because it used dense weights. So in theory a library that implements these algorithms could run existing networks faster and on CPUs.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dense.nim		dense.nim
lsh.nim		lsh.nim
lsh_new.nim		lsh_new.nim
mnist.nim		mnist.nim
models.nim		models.nim
models_sparse.nim		models_sparse.nim
nim.cfg		nim.cfg
sparse.nim		sparse.nim
sparse_new.nim		sparse_new.nim
sparse_parallel.nim		sparse_parallel.nim
sparsevector.nim		sparsevector.nim

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

dense.nim

dense.nim

lsh.nim

lsh.nim

lsh_new.nim

lsh_new.nim

mnist.nim

mnist.nim

models.nim

models.nim

models_sparse.nim

models_sparse.nim

nim.cfg

nim.cfg

sparse.nim

sparse.nim

sparse_new.nim

sparse_new.nim

sparse_parallel.nim

sparse_parallel.nim

sparsevector.nim

sparsevector.nim

Repository files navigation

SparseLearning-nim

Summary

References

About

Releases

Packages

Languages

License

HugoGranstrom/SparseLearning-nim

Folders and files

Latest commit

History

Repository files navigation

SparseLearning-nim

Summary

References

About

Resources

License

Stars

Watchers

Forks

Languages