Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multivariable distributions - T and P separate or together #17

Closed
abigailsnyder opened this issue Jul 8, 2020 · 2 comments
Closed

multivariable distributions - T and P separate or together #17

abigailsnyder opened this issue Jul 8, 2020 · 2 comments
Assignees

Comments

@abigailsnyder
Copy link

@claudiatebaldi @kdorheim In trying to work through the code in more depth for doing this enhancement #16,

I've done more careful, line by line combing through the nested functions in data_raw/L3_fit_dirichlet_params.R and data_raw/jobrun.zsh. I think that the code is estimating the parameters of a multivariable beta distribution for the temperature data, and a separate set of parameters for the precipitation data. At least I think.

I didn't catch it in my initial trying to learn the an2month package, I think because of how the functions are nested. And because I think that approach of treating T and P separately is different from the very early notes I had contributing to figuring out what the sampling should look like (around Dec 2018) and then I wasn't involved in the actual work. And then so many issues came up with how fldgen was being called in the pipeline, I didn't return to this until last week/this week.

So do we want to keep T and P separate the way they're implemented, or do we want to estimate 24 parameters together (like I initially thought was happening)? Also thoughts on continuing to use a multivariate beta distribution?

@abigailsnyder
Copy link
Author

per @claudiatebaldi would expect jointly estimated and jointly sampled.

@abigailsnyder will

  • update the code in data_raw to jointly estimate - 24 parameters to get a joint multivariable beta distribution of T and P fractions. And add more documentation to those functions.
  • refit the models.
  • open a PR

Then go back into the monthly_downscaling code and update sampling to be joint, as well as adding options outlined in #16

@abigailsnyder
Copy link
Author

In terms of updating the sampling to be joint, it looks like the separate sampling for each variable is happening in the cassandra components code:
https://github.com/JGCRI/cassandra/blob/master/cassandra/components.py Lines 964-977

Which explains why it's harder to tell from the R monthly_downscaling sampling code that the variables are being treated separately than in the data_raw/... training code.

So the R code will have to be updated for the sampling but then the cassandra code will also have to be updated, FYI @crvernon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants