Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate cloudy and improve CES #32

Merged
merged 2 commits into from
Feb 16, 2020
Merged

Conversation

bielim
Copy link
Contributor

@bielim bielim commented Feb 13, 2020

Revise CES code.
Changes include the following:

  • Created a separate module GModel that contains all the functionality to run the forward model G
    --> anything Cloudy-related is contained in that module. The idea is that this module (and specifically the function run_G) is all the user has to modify for a specific CES application (at least for applications where G is a simple model that can be run without using a cluster etc.)
  • The module "Observations.jl" replaces "Truth.jl" - instead of a Truth object, there is now an "Obs" object that contains the observations. In the general case, CES will deal with observations coming from some external observing system rather than with artificial, model-generated "observations" - in the latter case, the observation can be thought of as the "truth" (since they result from running the model with the known "true" parameters), but more generally the name "observations" is probably more appropriate.
  • All "helper functions" (including some new ones) have been moved to Utilities.jl
  • Documented the main structs (EKIObj, GPObj, MCMCObj, Obs)
  • Made the covariance (kernel) an (optional) input argument to the GP emulator.

@bielim bielim self-assigned this Feb 13, 2020
@tapios
Copy link

tapios commented Feb 13, 2020

This is looking really good! Thanks, @bielim, @charleskawczynski and all who have worked on it.

Copy link
Member

@charleskawczynski charleskawczynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We probably shouldn't ignore any src/ code .gitignore: src/run_cloudy_uq.jl, if this code is needed, we can include it in an example / test
  • It might have been easier to track some of the changes with git mv OldFileName NewFileName, then make changes to Observations.jl (it looks like Truth.jl has changed to Observations.jl). But this is fine since things are still in flux

src/MCMC.jl Outdated Show resolved Hide resolved
src/MCMC.jl Outdated Show resolved Hide resolved
@charleskawczynski
Copy link
Member

Also, as a general note, I think it's best we change the convention of using I for integer to IT so that it doesn't clash with LinearAlgebra's identity matrix I. But we can change this in another PR.

Copy link
Collaborator

@odunbar odunbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still use the Zscore? I can't see it - therefore is it worth keeping?

In this vain, i suggest perhaps we always transform inputs to the GP training and prediction. We find the mean and covariance of the input parameters, then do a "Z score":

new_inputs= input_cov^{-1/2}*(inputs - input_mean)

Train the GP on this, and apply this to the inputs to the prediction function.

It makes the GP hyperparameters which users may wish to set into more independent of the problem (e.g they become scaled (approximately)correctly with respect to the geometry of the training points (which in CES is quite important as it can change a lot)).

It will require storing the inputs_mean and sqrt(inv(cov)) in the GP struct, but these are small.

@bielim
Copy link
Contributor Author

bielim commented Feb 14, 2020

Added updated test/Cloudy/runtests.jl such that the tests run with the latest CES code. @charleskawczynski: GModel.jl currently imports Cloudy modules --> I'm hoping that your code beautification skills can make this go away :-)

@codecov
Copy link

codecov bot commented Feb 14, 2020

Codecov Report

Merging #32 into master will decrease coverage by 5.54%.
The diff coverage is 71.18%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #32      +/-   ##
==========================================
- Coverage   66.66%   61.12%   -5.55%     
==========================================
  Files          11       12       +1     
  Lines         474      481       +7     
==========================================
- Hits          316      294      -22     
- Misses        158      187      +29
Impacted Files Coverage Δ
src/CalibrateEmulateSample.jl 100% <ø> (ø) ⬆️
src/GModel.jl 100% <100%> (ø)
src/EKS.jl 100% <100%> (ø) ⬆️
src/EKI.jl 100% <100%> (ø) ⬆️
src/Histograms.jl 59.21% <100%> (+0.54%) ⬆️
src/MCMC.jl 79% <100%> (-10.8%) ⬇️
src/Observations.jl 17.85% <17.85%> (ø)
src/Utilities.jl 66.66% <66.66%> (-33.34%) ⬇️
src/GPEmulator.jl 56% <83.33%> (-1.65%) ⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7b52c0...647c62e. Read the comment docs.

Copy link
Collaborator

@odunbar odunbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @bielim , maybe I missed this, but the changes you have made to the modelling etc. are for an experiment in runtests.jl - i.e a specific run when Test is called? This is fine with me and the changes look good! But does this mean that your Jupyter notebook will not run any more?

@charleskawczynski
Copy link
Member

But does this mean that your Jupyter notebook will not run any more?

I'd be fine with removing the Jupyter notebooks altogether and replacing them with Literate.jl examples in the docs :)

@bielim
Copy link
Contributor Author

bielim commented Feb 14, 2020

@odunbar: The changes I made (GModel.jl, Observations.jl, etc.) are for the general CES code, but in an additional step I also updated test/Cloudy/runtests.jl, which tests the whole CES pipeline on a Cloudy application. That test is basically the Cloudy jupyter notebook I uploaded, just in the form of an automated test instead of a notebook, so in a way the notebook is now redundant. Because of that, I haven't updated the notebook yet - I had a feeling that @charleskawczynski wouldn't be opposed to removing it altogether :-) The argument one can make for keeping it is that some users like the interactive interface of notebooks as a friendly way of being introduced to a topic/code, but the disadvantage of course is that it would be an additional thing to keep in sync with the evolving CES and Cloudy code base. I have no strong opinion on this and am fine with deleting the notebook, but would also be willing to make an effort at keeping it up to date.

@bielim
Copy link
Contributor Author

bielim commented Feb 14, 2020

But does this mean that your Jupyter notebook will not run any more?

And just to give a clear answer to that question: At the moment the notebook is not running, but if we decide that we want to keep it in the CES repo, I will go ahead and update it to make it run again.

@charleskawczynski
Copy link
Member

bors r+

bors bot added a commit that referenced this pull request Feb 15, 2020
32: Separate cloudy and improve CES r=charleskawczynski a=bielim

Revise CES code.
Changes include the following:

- Created a separate module GModel that contains all the functionality  to run the forward model G
--> anything Cloudy-related is contained in   that module. The idea is that this module (and specifically the  function run_G) is all the user has to modify for a specific CES  application (at least for applications where G is a simple model that  can be run without using a cluster etc.)
- The module "Observations.jl" replaces "Truth.jl" - instead of a Truth  object, there is now an "Obs" object that contains the observations. In the general case, CES will deal with observations coming from some external observing system rather than with artificial, model-generated "observations" - in the latter case, the observation can be thought of as the "truth" (since they result from running the model with the  known "true" parameters), but more generally the name "observations" is probably more appropriate.
- All "helper functions" (including some new ones) have been moved to Utilities.jl
- Documented the main structs (EKIObj, GPObj, MCMCObj, Obs)
- Made the covariance (kernel) an (optional) input argument to the GP emulator.

Co-authored-by: Charles Kawczynski <kawczynski.charles@gmail.com>
@bors
Copy link
Contributor

bors bot commented Feb 15, 2020

Build failed

@charleskawczynski
Copy link
Member

bors r+

bors bot added a commit that referenced this pull request Feb 15, 2020
32: Separate cloudy and improve CES r=charleskawczynski a=bielim

Revise CES code.
Changes include the following:

- Created a separate module GModel that contains all the functionality  to run the forward model G
--> anything Cloudy-related is contained in   that module. The idea is that this module (and specifically the  function run_G) is all the user has to modify for a specific CES  application (at least for applications where G is a simple model that  can be run without using a cluster etc.)
- The module "Observations.jl" replaces "Truth.jl" - instead of a Truth  object, there is now an "Obs" object that contains the observations. In the general case, CES will deal with observations coming from some external observing system rather than with artificial, model-generated "observations" - in the latter case, the observation can be thought of as the "truth" (since they result from running the model with the  known "true" parameters), but more generally the name "observations" is probably more appropriate.
- All "helper functions" (including some new ones) have been moved to Utilities.jl
- Documented the main structs (EKIObj, GPObj, MCMCObj, Obs)
- Made the covariance (kernel) an (optional) input argument to the GP emulator.

Co-authored-by: Charles Kawczynski <kawczynski.charles@gmail.com>
Co-authored-by: Charles_Kawczynski <kawczynski.charles@gmail.com>
@bors
Copy link
Contributor

bors bot commented Feb 16, 2020

@bors bors bot merged commit 647c62e into master Feb 16, 2020
@bors bors bot deleted the mb/Separate_cloudy_and_improve_CES branch February 16, 2020 02:11
@bielim
Copy link
Contributor Author

bielim commented Feb 17, 2020

Do you still use the Zscore? I can't see it - therefore is it worth keeping?

In this vain, i suggest perhaps we always transform inputs to the GP training and prediction. We find the mean and covariance of the input parameters, then do a "Z score":

new_inputs= input_cov^{-1/2}*(inputs - input_mean)

Train the GP on this, and apply this to the inputs to the prediction function.

It makes the GP hyperparameters which users may wish to set into more independent of the problem (e.g they become scaled (approximately)correctly with respect to the geometry of the training points (which in CES is quite important as it can change a lot)).

It will require storing the inputs_mean and sqrt(inv(cov)) in the GP struct, but these are small.

@odunbar: I will implement this on a separate feature branch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants