multivariable distributions - T and P separate or together #17

abigailsnyder · 2020-07-08T18:01:04Z

@claudiatebaldi @kdorheim In trying to work through the code in more depth for doing this enhancement #16,

I've done more careful, line by line combing through the nested functions in data_raw/L3_fit_dirichlet_params.R and data_raw/jobrun.zsh. I think that the code is estimating the parameters of a multivariable beta distribution for the temperature data, and a separate set of parameters for the precipitation data. At least I think.

I didn't catch it in my initial trying to learn the an2month package, I think because of how the functions are nested. And because I think that approach of treating T and P separately is different from the very early notes I had contributing to figuring out what the sampling should look like (around Dec 2018) and then I wasn't involved in the actual work. And then so many issues came up with how fldgen was being called in the pipeline, I didn't return to this until last week/this week.

So do we want to keep T and P separate the way they're implemented, or do we want to estimate 24 parameters together (like I initially thought was happening)? Also thoughts on continuing to use a multivariate beta distribution?

The text was updated successfully, but these errors were encountered:

abigailsnyder · 2020-07-08T18:14:21Z

per @claudiatebaldi would expect jointly estimated and jointly sampled.

@abigailsnyder will

update the code in data_raw to jointly estimate - 24 parameters to get a joint multivariable beta distribution of T and P fractions. And add more documentation to those functions.
refit the models.
open a PR

Then go back into the monthly_downscaling code and update sampling to be joint, as well as adding options outlined in #16

abigailsnyder · 2020-07-08T18:33:20Z

In terms of updating the sampling to be joint, it looks like the separate sampling for each variable is happening in the cassandra components code:
https://github.com/JGCRI/cassandra/blob/master/cassandra/components.py Lines 964-977

Which explains why it's harder to tell from the R monthly_downscaling sampling code that the variables are being treated separately than in the data_raw/... training code.

So the R code will have to be updated for the sampling but then the cassandra code will also have to be updated, FYI @crvernon

abigailsnyder assigned abigailsnyder, claudiatebaldi and kdorheim Jul 8, 2020

abigailsnyder mentioned this issue Jul 9, 2020

general workflow documentation for the package #21

Open

abigailsnyder closed this as completed Jul 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multivariable distributions - T and P separate or together #17

multivariable distributions - T and P separate or together #17

abigailsnyder commented Jul 8, 2020

abigailsnyder commented Jul 8, 2020

abigailsnyder commented Jul 8, 2020

multivariable distributions - T and P separate or together #17

multivariable distributions - T and P separate or together #17

Comments

abigailsnyder commented Jul 8, 2020

abigailsnyder commented Jul 8, 2020

abigailsnyder commented Jul 8, 2020