Coren Bialik, 2018.
This is a GPU implementation of Generalized Orthogonal Least-Squares (GOLS) following Hashemi and Vikalo. It solves the following problem: $$ \mathop{\mathrm{min}} \lVert A_{ij} x_j - b_i\rVert_2, :\text{subject to} :\lVert x\rVert_0 \leq k $$
Thus it can be used for a set of sparse approximation problems, like supervised feature selection, or classification, or just generally as an EDC for your sparse approximation tasks.
The implementation is based on CUDA/cuBLAS, uses ModernGPU for buffer allocation and copies, the caching allocator from CUB to create a memory pool.
Note: some of the many things that keep this short of a production-level implementation
- error handling and reporting would have to be improved
cta
launches are not optimal for all problem sizes -- especially small problems (like I could run it with "python" small) could benefit from better parallelization
Basic:
git clone --recursive URL
mkdir build
cd build
cmake ..
make
Or simply:
python setup.py install
There's a python interface based on pybind11 (btw: <3), a C, and a C++ interface (gols.hxx
).
import gols
help(gols.solve)
See the jupyter notebook in ./play
for example usage.