Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute continua for a batch of spectra #1

Open
teaghan opened this issue Jul 28, 2022 · 1 comment
Open

Compute continua for a batch of spectra #1

teaghan opened this issue Jul 28, 2022 · 1 comment

Comments

@teaghan
Copy link

teaghan commented Jul 28, 2022

Hi there,

Currently I am estimating the continuum for each spectrum separately using

cont, cont_err, seg, seg_err = nn.normalize(wave_grid, spectrum)

However, to make use of the efficiency of the neural network, it would be nice if I could feed a batch of spectra at a time. Would this capability be possible?

Thanks!

@RozanskiT
Copy link
Owner

Hi!

Of course we can think of such a new feature, but let me clarify a bit and think out loud. :-)

What nn.normalize(wave_grid, spectrum) does underneath is splitting the spectrum into partially overlapping fragments (8192 samples by default). Then these fragments are collected in a batch and thrown into the network, so hardware acceleration is used here. Due to this approach, each sample in the spectrum has many corresponding pseudo-conitinuum predictions, from which the weighted mean (which is used as a final pseudo-continuum prediction) and the standard deviation (considered some proxy of uncertainty) is computed.

You can take a closer look on how these details are handled in the class ProcessSpectrum, but this is nothing more but just "sliding window":

class ProcessSpectrum:

Now to the point. What you are suggesting could be really beneficial if we want to apply the method to more than, let's say, 1000 spectra. For this purpose it should be relatively simple to modify this class that it would accept a list of spectra (wavelengths+fluxes, or maybe just filenames, etc.) as an input, apply this preprocessing procedure ("sliding window") to form batches of size fitted into your GPU and probably get decent speedup.

So this can be done and might be useful for such a case. It would take two steps:

  1. Adding a proper GPU support (Here I mean preparing docker image for SUPPNet. I have it in mind for a long time but I didn't have time to work on that).
  2. Adopting a ProcessSpectrum class to make proper batching and processing of arbitrary number of spectra.

Maybe it is right time for that developments. What are your thoughts?

Thanks for interest!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants