Add distance to the BEAST #38

karllark · 2017-02-21T23:59:04Z

Adding distance is needed for work in the Magellanic Clouds.

The fastest way to do this from human standpoint is to do multiple BEAST runs with different distances on a uniform grid. The results can be used to regenerate all the standard outputs of the BEAST with the addition of the distance information included and distance as the 7th fit parameter.

Basically (given a uniform distance grid):

For all the 1D pPDFs (except distance), the 1D pPDFs for the the different distances can be added together to generate the 1D pPDFs including marginalization over distance.
need to make sure we are not normalizaing the 1D pPDFs for output
For the the 1D pPDF for distance, this can be created by simply summing one of the 1D pPDFs for any other parameter for each distance as this provides the marginalization over all other parameters
For the stats file, all the parameters p50, p13, p87, exp, etc. need to be regenerated from the 1D pPDFS (just like they are done during the fitting)
For the max values, use the set of Pmax values for each distance and find the maximum one to set a the new Pmax value (and min chisqr, etc.)
For the sparse nD likelihoods, I think the solution is to merge all of them into one file, created a sparse likelihood that is n times larger than for a single distance run (more thinking about if this is correct from the sampling standpoint)

Code is needed to do all this semi-seamlessly to avoid human error (e.g., include distance grid in datamodel.py and provide scripts to setup the n distance BEAST grids and merge the results).

karllark · 2017-03-04T13:46:11Z

One thought: we should put in distance just like the other 6 parameters. Thus, we can make a grid that has a range of distances if we want, or have a grid with one distance. This would provide flexibility in putting in distance. Maybe the grid will get small enough to allow enough distance bin points - or computers will get enough RAM - or the BEAST will get setup for true parallel computations in the sense of different parts of the grid to difference nodes.

drvdputt · 2018-02-21T22:22:07Z

I think that adding distances to the grid should not increase the amount of RAM used actually. Since the distance just rescales everything, it wouldn't make sense to store separate SEDs for the models at different distances.
When the distances are put into the model grid as an extra axis, we would need some trick to refer to the same SED for multiple grid points.
I think the better option is to keep the distances on a separate grid, and implement a loop somewhere near the end, when the probabilities are calculated. I will investigate these sections of the code and see how this fits in.

mfouesneau · 2018-02-22T09:05:34Z

Distance can be first a deterministically optimized parameter for each model on the grid. Then you get a posterior predictive distribution of distances.

optimal distance:
log(D) = 1/5 * (1/log10(det(Cov))) * (obs - model)^T @ Cov^-1 @ (obs - model)

karllark · 2018-02-22T15:14:46Z

Adding distance this way is possible for the physics model. But it is not clear to me that this carries over to the observation model. The noise model (unc and bias) are mapped to the physics models via their fluxes. Changing the distance changes the fluxes, so changed the noise model. This is the case for the toothpick noise model. The more complicated trunchen model includes the covariance matrix and is directly mapped to the physics models. Remember, we have found the observation model is critical for good BEAST fits.

At this point, I don't see a solution that does not just make distance another BEAST variable and multiplies the models size by the number of distance bins. I'm very open to other solutions, but it has to be one that addresses the physics and observation models.

mfouesneau · 2018-02-22T15:32:29Z

True, I forgot about the AST based noise model. We're back to "how to make efficient sets of ASTs?"
Learning the covariant noise properties smells badly like a Gaussian Process to me. @davidwhogg ?

karllark · 2018-02-22T15:39:26Z

Making efficient sets of ASTs is an important question and one we need more work on - especially for the trunchen model.

But I think the issue here is different. The BEAST works on the combo of the physics+observation model. As the observation model (unc and bias) is attached to the models (and not the data), then it is inherit in our method that distance will not just scale the combined physics+observation model. This is because distance scales the physics model, but requires a different non-linear mapping for the observation model because the observation model is highly non-linear with flux.

galaxyumi · 2018-02-22T19:07:12Z

I think we also need to think about what science cases we would really need the distance determination. I doubt we can determine distances of individual stars beyond the LMC and SMC, probably except for bright stars in nearby galaxies.

karllark · 2018-02-22T21:17:44Z

I have been thinking/hoping that distance could be added in a way that allows for a single value for the cases where multiple values are needed (as @galaxyumi notes above). So, this would be like metallicity now where a single value is allowed. When distance was multi-valued, then a prior can be included. Including distance is not necessarily so that it can be derived by the BEAST. But it can at least be included as a nuisance parameter and marginalized over at a minimum.

karllark · 2018-02-22T21:20:28Z

Right now, I thought it might be good to just add distance as the 7th parameter and just allow the RAM to get larger. Then we could figure out a solution to the large RAM separately. One solution can be to split the extra large grid into pieces, run each piece and reassemble the results into a merged solution. Splitting by distance bins is one "easy" way to think about this. Merging the subrun results (sparse likelihoods, 1D pPDFs, etc) should be straightforward.

karllark · 2018-04-27T20:23:35Z

Done. New work on splitting/merging grid tracked by a different issue.

karllark added the enhancement label Feb 21, 2017

karllark assigned galaxyumi Feb 21, 2017

karllark mentioned this issue Feb 22, 2017

Adjustable distance prior #45

Closed

karllark unassigned galaxyumi Mar 11, 2017

karllark assigned galaxyumi Mar 21, 2017

karllark mentioned this issue Nov 27, 2017

Add functionality for distance fitting. #146

Closed

karllark added this to the BEAST v1.5 milestone Nov 30, 2017

karllark changed the title ~~Add distance to the BEAST (fast method)~~ Add distance to the BEAST Nov 30, 2017

drvdputt mentioned this issue Feb 24, 2018

Add Distance as 7th fitting variable #178

Merged

drvdputt mentioned this issue Apr 4, 2018

Beast runs over subgrids + merge tool #189

Closed

karllark modified the milestones: BEAST v1.5, 2018 Production Runs Apr 27, 2018

karllark closed this as completed Apr 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add distance to the BEAST #38

Add distance to the BEAST #38

karllark commented Feb 21, 2017

karllark commented Mar 4, 2017

drvdputt commented Feb 21, 2018

mfouesneau commented Feb 22, 2018

karllark commented Feb 22, 2018

mfouesneau commented Feb 22, 2018

karllark commented Feb 22, 2018

galaxyumi commented Feb 22, 2018

karllark commented Feb 22, 2018

karllark commented Feb 22, 2018

karllark commented Apr 27, 2018

Add distance to the BEAST #38

Add distance to the BEAST #38

Comments

karllark commented Feb 21, 2017

karllark commented Mar 4, 2017

drvdputt commented Feb 21, 2018

mfouesneau commented Feb 22, 2018

karllark commented Feb 22, 2018

mfouesneau commented Feb 22, 2018

karllark commented Feb 22, 2018

galaxyumi commented Feb 22, 2018

karllark commented Feb 22, 2018

karllark commented Feb 22, 2018

karllark commented Apr 27, 2018