Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abhijeet per camera gpu #253

Closed

Conversation

craigwarner-ufastro
Copy link
Contributor

Use batch rebin, transmission_Lyman, and calc_zchi2 operations.

Speed gains: without archetypes, redrock on 4 GPU / 4 CPU takes 14.8s
reported total run time, 7.3s of which is in the fine redshift scan.
Comparatively with 64 CPU and 0 GPU the base redrock runs in 40.0s
reported total run time with 6.7s spent in fine redshift scan.

Adding the base archetypes option (without per-camera or nearest
neighbor) raises CPU times to 63.8s overall and 28.1s in fine z scan so
about 60% increase overall and about 4x slower in fine z scan.

With the new code the "batch" CPU mode slightly improves this to 60.0s
and 24.2s.

With the new GPU code, it runs on 4 GPU / 4 CPU in 22.8s total and 14.3s
for fine z scan, a 50% overall increase but only 2x speed increase in
the fine z scan, a big improvement from CPU times.

Also updated transmission_Lyman to return None when given scalar z to
match behavior when given an array of redshifts, in the case where the
wavelength range is not affected by Lyman transmission. There is no
need in this case to calculate an array of all ones and then
additionally multiply the rebinned data by it.

Have yet to update --per-camera and -n_nearest options so right now
there are placeholders so that it does not crash that simply loop over
the existing CPU-mode logic. These will be updated shortly.

@craigwarner-ufastro craigwarner-ufastro changed the base branch from main to abhijeet_per_camera September 19, 2023 04:33
@craigwarner-ufastro
Copy link
Contributor Author

Timing update - rows after the blank line are new results

Mode            CPU     GPU     Findz   Total   Clock
Orig             4      4        7.3    14.8    18.6
Orig            64      0        6.7    40.0    49.1
Arch only       64      0       28.1    63.8    71.7
Arch only batch 64      0       24.2    60.0    68.0
Arch only GPU    4      4       14.3    22.8    27.4
Arch+PC+NN      64      0       40.1    76.4    93.5
Arch+PerCam      4      4       376.7  389.0   400.6

Arch only batch 64      0       24.3    60.1    67.9
Arch only GPU    4      4       12.7    21.8    26.0
Arch+NN batch   64      0       24.6    60.3    69.7
Arch+NN GPU      4      4       15.5    23.9    28.1

craigwarner-ufastro and others added 7 commits October 4, 2023 15:05
Modified solve_matrices algorithm to accept PCA or BVLS as a method.
Re-orgianized logic for GPU to take advantage of existing
calc_zchi2_batch algorithm.
For some strange reason refactoring tdata to add rows for color data in
batch in 3d array on CPU takes way longer than a python loop over 2d
arrays so I left two methods for per_camera - one for GPU and one for
CPU - for now while this gets resolved.

This is not the cleanest but it is the fastest and we can revisit if we
change our mind on that.
cleanly without a speed loss in runtime.

Removed old methods Tbs_for_archetypes,
per_camera_coeff_with_least_square, and
per_camera_coeff_with_least_square_cpu and renamed the remaining method
to per_camera_coeff_with_least_square_batch.

Updated 1->n_nbh where Abhijeet pointed out errors had been made.
@coveralls
Copy link

coveralls commented Nov 2, 2023

Coverage Status

coverage: 35.613% (-0.5%) from 36.147%
when pulling bd5983a on abhijeet_per_camera_gpu
into 201399a on abhijeet_per_camera.

abhi0395 and others added 3 commits November 3, 2023 09:37
Rolled back changes by Abhijeet done while attempting to merge to
restore per camera batch method.
@craigwarner-ufastro
Copy link
Contributor Author

Added NNLS to solve_matrices and restored per_camera_coeff_with_least_square_batch in zscan.py from my previous version.

@sbailey
Copy link
Collaborator

sbailey commented Nov 28, 2023

See #259 . I think we want to merge these changes into the abhijeet_per_camera_priors_gpu branch, rather than the abhijeet_per_camera branch which appears to be superseded.

…et_per_camera_prior_gpu. Added prior as optional arg to calc_zchi2_batch to do this.
@sbailey
Copy link
Collaborator

sbailey commented Jan 4, 2024

changes in this PR were incorporated into PR #255, so closing this one without merging.

@sbailey sbailey closed this Jan 4, 2024
@sbailey sbailey deleted the abhijeet_per_camera_gpu branch January 4, 2024 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants