Legendre nnls #287

craigwarner-ufastro · 2024-03-12T02:33:44Z

Non-negative least squares (NNLS) is significantly faster than the general case bounded value least squares (BVLS). For archetypes, BVLS had been used where the archetype term had a bounded solution from 0 to inf and all of the legendre term solutions were allowed to float. By adding additional terms to our basis vector that are the inverse of each legendre term, we are able to use NNLS instead, e.g., we have negative copies, v3 = -v1 and v4 = -v2 so that in our solution we could have c1 > 0, c3 = 0 for a positive result and the opposite for a negative.

This also required a modification to the prior array which was added to the M vector. The results using NNLS now agree to within np.allclose to the original BVLS solution, with a significant speed gain. The Finding best redshift section takes 45s on 4 GPU + 4 CPU versus 99s using BVLS. However this only applies to when using GPUs.

When using 64 CPUs, additional negative legendre terms + NNLS are 113s for this section versus 74s for BVLS. This is because the computation time for various array operations such as dot products on the larger arrays with the added terms overwhelms the speed gains from NNLS vs BVLS and thus when using CPU mode, we do NOT use the NNLS method and maintain the original BVLS.

vector (in tdata) so that NNLS can be used instead of BVLS. Fixed bug when ValueError excepted in rebinning

negative legendre terms. Refactored a little to clean up. For archetypes with legendre terms, NNLS is now actually used with additional terms of -1*each legendre term so that we can use the faster NNLS to get the same result as BVLS.

because on CPU, the additional computation time for the larger array sizes dominates any time savings of NNLS vs BVLS

coveralls · 2024-03-12T02:36:19Z

coverage: 37.979% (-0.6%) from 38.533%
when pulling fd76617 on legendre_nnls
into 028848f on main.

craigwarner-ufastro · 2024-03-12T02:36:27Z

Note: This also includes a bug fix related to #278

sbailey · 2024-04-12T16:45:20Z

@craigwarner-ufastro thanks for this update and apologies for the late review.

First of all, a bug:

srun -n 4 --gpu-bind=map_gpu:3,2,1,0 --cpu-bind=cores rrdesi_mpi --gpu \
  -i $CFS/desi/spectro/redux/iron/tiles/cumulative/80605/20210205/coadd-6-80605-thru20210205.fits \
  --archetypes /global/cfs/cdirs/desi/users/abhijeet/new-archetypes/rrarchetype-galaxy.fits \
  -o $SCRATCH/redrock/redrock-legnnls.fits
...
--- Process 0 raised an exception ---
Proc 0: Traceback (most recent call last):
Proc 0:   File "/global/common/software/desi/users/sjbailey/redrock/py/redrock/external/desi.py", line 916, in rrdesi
    scandata, zfit = zfind(targets, dtemplates, mpprocs,
Proc 0:   File "/global/common/software/desi/users/sjbailey/redrock/py/redrock/zfind.py", line 602, in zfind
    if archetype._solver_method == 'bvls':
Proc 0: AttributeError: 'NoneType' object has no attribute '_solver_method'

MPICH Notice [Rank 0] [job id 24313752.5] [Fri Apr 12 09:37:16 2024] [nid002260] - Abort(0) (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0

Second, this update required a lot of delicate bookkeeping about the number of legendre coefficients, setting up the prior, etc. impacting multiple functions in multiple files. I'm concerned about that from a maintenance perspective. It feels like this could/should be an internal detail of the solver, not something that requires the "user" to setup that negative/positive legendre trick themselves and then unpack the results. i.e. could you move all of this logic into per_camera_coeff_with_least_square_batch itself? If it receives method='bvls' and use_gpu=True, it would automatically switch to the +/- legendre nnls method, including padding the arrays and expanding the prior as needed? Can that be implemented with similar speed gains, while isolating all of the messy bookkeeping to a single location?

negative legendre terms on the GPU in lieu of BVLS into per_camera_coeff_with_least_square_batch. Moved prior_on_coeffs into zscan.py, called from within per_camera_coeff_with_least_square_batch. Pass prior_sigma, a scalar, insted of the array from fitz to archetypes. Bug fix on case where only some object types, not all, have an archetype.

sbailey · 2024-04-29T20:06:22Z

Thanks for the updates, @craigwarner-ufastro . Please resolve the merge conflicts that have arisen here due to other merge-conflicts. If they aren't trivial to resolve in-place, please don't merge main into this branch, which would make the changes in this PR harder to evaluate mixed in with all the other changes. If needed, ok to make a new branch from this branch and then rebase that off of main and start a new PR. Ping me on Slack if that operation description is confusing.

Back to the updates: I'd like to take this one step further to avoid legendre-specific manipulations to tdata prior to calling the solver, e.g. in archetypes.nearest_neighbor_model:

tdata[hs] = np.append(tdata[hs], legendre[hs].transpose()[None,:,:], axis=2)
                if per_camera and self._solver_method == 'bvls' and use_gpu:
                    #Using BVLS and GPU - append negative copies of legendre terms as well to use NNLS with additional bases to approximate BVLS
                    tdata[hs] = np.append(tdata[hs], -1*legendre[hs].transpose()[None,:,:], axis=2)

and in zfind.zfind:

        tot_deg_legendre = deg_legendre
        for key in archetypes:
            if archetypes[key]._solver_method == 'bvls':
                tot_deg_legendre = deg_legendre*2
...
        maxcoeff = max(np.max([t.template.nbasis for t in templates]), ncamera*(tot_deg_legendre)+n_nearest)

in the second case, I'm not even sure that is correct because we are reserving deg_legendre*2 coefficients but those extra 2x coefficients are for the BVLS -> NNLS trick and shouldn't be needed for the output arrays.

Suggestion: rather than manipulating tdata (template data) to add legendre terms and then calling per_camera_coeff_with_least_square_batch(... tdata, nleg=...), could you pass tdata and the normal legendre array, and let per_camera_coeff_with_least_square_batch derive nleg from the shape of legendre and whether it needs to do the BLVS -> NNLS trick and tdata manipulation internally based upon method and use_gpu?

i.e. from a user-perspective, turn this into "if you want to include legendre terms, pass in the legendre template arrays to use" and all other details of how those are combined with tdata etc. become internal to per_camera_coeff_with_least_square_batch. Thoughts?

terms with 0 coefficients and make the coeffs negative for negative terms. Refactored to remove the last bit of the BVLS-NNLS trick from archetypes.py and zfind.py by passing binned instead of tdata to per_camera_coeff_with_least_square_batch.

involve using a list of bands instead of ncam to be more general.

craigwarner-ufastro added 3 commits February 27, 2024 13:48

Modified archetypes using BVLS to add -1*all legendre terms to the basis

c4132a5

vector (in tdata) so that NNLS can be used instead of BVLS. Fixed bug when ValueError excepted in rebinning

Only use additional negative legendre terms and NNLS when in GPU mode

7911c5b

because on CPU, the additional computation time for the larger array sizes dominates any time savings of NNLS vs BVLS

Merge branch 'main' into legendre_nnls

bbf187b

sbailey assigned craigwarner-ufastro Apr 15, 2024

craigwarner-ufastro added 2 commits May 9, 2024 15:36

Refactor to bring more in-line with main where the conflicts mostly

fd76617

involve using a list of bands instead of ncam to be more general.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Legendre nnls #287

Legendre nnls #287

craigwarner-ufastro commented Mar 12, 2024

coveralls commented Mar 12, 2024 •

edited

craigwarner-ufastro commented Mar 12, 2024

sbailey commented Apr 12, 2024

sbailey commented Apr 29, 2024

Legendre nnls #287

Are you sure you want to change the base?

Legendre nnls #287

Conversation

craigwarner-ufastro commented Mar 12, 2024

coveralls commented Mar 12, 2024 • edited

craigwarner-ufastro commented Mar 12, 2024

sbailey commented Apr 12, 2024

sbailey commented Apr 29, 2024

coveralls commented Mar 12, 2024 •

edited