Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add distance to the BEAST #38

Closed
karllark opened this issue Feb 21, 2017 · 10 comments
Closed

Add distance to the BEAST #38

karllark opened this issue Feb 21, 2017 · 10 comments
Assignees

Comments

@karllark
Copy link
Member

Adding distance is needed for work in the Magellanic Clouds.

The fastest way to do this from human standpoint is to do multiple BEAST runs with different distances on a uniform grid. The results can be used to regenerate all the standard outputs of the BEAST with the addition of the distance information included and distance as the 7th fit parameter.

Basically (given a uniform distance grid):

  • For all the 1D pPDFs (except distance), the 1D pPDFs for the the different distances can be added together to generate the 1D pPDFs including marginalization over distance.
    need to make sure we are not normalizaing the 1D pPDFs for output
  • For the the 1D pPDF for distance, this can be created by simply summing one of the 1D pPDFs for any other parameter for each distance as this provides the marginalization over all other parameters
  • For the stats file, all the parameters p50, p13, p87, exp, etc. need to be regenerated from the 1D pPDFS (just like they are done during the fitting)
  • For the max values, use the set of Pmax values for each distance and find the maximum one to set a the new Pmax value (and min chisqr, etc.)
  • For the sparse nD likelihoods, I think the solution is to merge all of them into one file, created a sparse likelihood that is n times larger than for a single distance run (more thinking about if this is correct from the sampling standpoint)

Code is needed to do all this semi-seamlessly to avoid human error (e.g., include distance grid in datamodel.py and provide scripts to setup the n distance BEAST grids and merge the results).

@karllark
Copy link
Member Author

karllark commented Mar 4, 2017

One thought: we should put in distance just like the other 6 parameters. Thus, we can make a grid that has a range of distances if we want, or have a grid with one distance. This would provide flexibility in putting in distance. Maybe the grid will get small enough to allow enough distance bin points - or computers will get enough RAM - or the BEAST will get setup for true parallel computations in the sense of different parts of the grid to difference nodes.

@karllark karllark added this to the BEAST v1.5 milestone Nov 30, 2017
@karllark karllark changed the title Add distance to the BEAST (fast method) Add distance to the BEAST Nov 30, 2017
@drvdputt
Copy link

I think that adding distances to the grid should not increase the amount of RAM used actually. Since the distance just rescales everything, it wouldn't make sense to store separate SEDs for the models at different distances.
When the distances are put into the model grid as an extra axis, we would need some trick to refer to the same SED for multiple grid points.
I think the better option is to keep the distances on a separate grid, and implement a loop somewhere near the end, when the probabilities are calculated. I will investigate these sections of the code and see how this fits in.

@mfouesneau
Copy link
Member

Distance can be first a deterministically optimized parameter for each model on the grid. Then you get a posterior predictive distribution of distances.

optimal distance:
log(D) = 1/5 * (1/log10(det(Cov))) * (obs - model)^T @ Cov^-1 @ (obs - model)

@karllark
Copy link
Member Author

Adding distance this way is possible for the physics model. But it is not clear to me that this carries over to the observation model. The noise model (unc and bias) are mapped to the physics models via their fluxes. Changing the distance changes the fluxes, so changed the noise model. This is the case for the toothpick noise model. The more complicated trunchen model includes the covariance matrix and is directly mapped to the physics models. Remember, we have found the observation model is critical for good BEAST fits.

At this point, I don't see a solution that does not just make distance another BEAST variable and multiplies the models size by the number of distance bins. I'm very open to other solutions, but it has to be one that addresses the physics and observation models.

@mfouesneau
Copy link
Member

True, I forgot about the AST based noise model. We're back to "how to make efficient sets of ASTs?"
Learning the covariant noise properties smells badly like a Gaussian Process to me. @davidwhogg ?

@karllark
Copy link
Member Author

Making efficient sets of ASTs is an important question and one we need more work on - especially for the trunchen model.

But I think the issue here is different. The BEAST works on the combo of the physics+observation model. As the observation model (unc and bias) is attached to the models (and not the data), then it is inherit in our method that distance will not just scale the combined physics+observation model. This is because distance scales the physics model, but requires a different non-linear mapping for the observation model because the observation model is highly non-linear with flux.

@galaxyumi
Copy link
Contributor

I think we also need to think about what science cases we would really need the distance determination. I doubt we can determine distances of individual stars beyond the LMC and SMC, probably except for bright stars in nearby galaxies.

@karllark
Copy link
Member Author

I have been thinking/hoping that distance could be added in a way that allows for a single value for the cases where multiple values are needed (as @galaxyumi notes above). So, this would be like metallicity now where a single value is allowed. When distance was multi-valued, then a prior can be included. Including distance is not necessarily so that it can be derived by the BEAST. But it can at least be included as a nuisance parameter and marginalized over at a minimum.

@karllark
Copy link
Member Author

Right now, I thought it might be good to just add distance as the 7th parameter and just allow the RAM to get larger. Then we could figure out a solution to the large RAM separately. One solution can be to split the extra large grid into pieces, run each piece and reassemble the results into a merged solution. Splitting by distance bins is one "easy" way to think about this. Merging the subrun results (sparse likelihoods, 1D pPDFs, etc) should be straightforward.

@karllark
Copy link
Member Author

Done. New work on splitting/merging grid tracked by a different issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants