Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KQCEF #706

Merged
merged 16 commits into from
Aug 26, 2024
Merged

KQCEF #706

merged 16 commits into from
Aug 26, 2024

Conversation

iangrooms
Copy link
Contributor

@iangrooms iangrooms commented Jul 23, 2024

Description:

  • A new distribution module kde_distribution_mod that uses kernel density estimation (kde) and quadrature to approximate the distribution from which an ensemble is drawn
  • A new module rootfinding_mod that provides a different implementation of inv_cdf. The iteration starts with the secant method until two successive approximations bracket the solution, at which point it switches to an improved bisection method from Oliveira & Takahashi, ACM Trans Math Soft, 2020. I have not extensively compared the performance of this implementation with the latest implementation of inv_cdf in DART.
  • Adds KDE_DISTRIBUTION (KDE = kernel density estimation) to distribution_params_mod
  • Adds KERNEL_QCEF to algorithm_info_mod
  • Adds code to assim_tools_mod to use the new obs-space filter
  • Adds the ability to use the cdf and quantile function associated with the new distribution to probit_transform_mod
  • A new program test_kde_dist has been added that runs unit tests on many of the methods in the new modules

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Documentation changes needed?

  • My change requires a change to the documentation.
    • I have updated the documentation accordingly.

Tests

As noted above, many of the new methods have unit tests. I have also used this code in this paper. Finally, I have run both the Lorenz-63 and Lorenz-96 models with the new filter (and with the associated probit transforms) on this branch and verified that they perform as well as or better than the EAKF (though slower).

Checklist for merging

  • Updated changelog entry
  • Documentation updated
  • Update conf.py

Checklist for release

  • Merge into main
  • Create release from the main branch with appropriate tag
  • Delete feature-branch

Testing Datasets

  • Dataset needed for testing available upon request
  • Dataset download instructions included
  • No dataset needed

First pass at adding the kernel density estimation based QCEF into
latest version of DART.
Fixes how the word 'quantile' is used in the code, along with
variable names. The output of the inverse cdf (aka the quantile
function) is a quantile. The input to the quantile function is
a probability.
This commit moves the obs_increment_kde subroutine out of assim_tools
and into the kde distribution module. It also changes whitespace,
adds the copyright header, and fixes minor bugs in the probit mod.
Copy link
Member

@hkershaw-brown hkershaw-brown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iangrooms are you planning to add documentation?

@iangrooms
Copy link
Contributor Author

I'm not sure how to do it. Is that on a separate repository? Sorry for the dumb question - first time PR to DART.

@hkershaw-brown
Copy link
Member

I'm not sure how to do it. Is that on a separate repository? Sorry for the dumb question - first time PR to DART.

no problem, thanks for this pull request btw!

The documentation is in this repo, its the .rst files, e.g.:
DART/guide/qceff_probit.rst

They get turned into html via readthedocs & sphinx, so for this pull request the docs are available to view here:
https://dart-documentation--706.org.readthedocs.build/en/706/. They'll rebuild when you commit to your branch.

If you want to build docs locally you can (see building locally)

Or if you have the documentation in text, but don't want to get involved writing .rst we can put them in .rst format.

Cheers,
Helen

@iangrooms
Copy link
Contributor Author

Excellent - thanks for the explanation. I've use .rst before so I'll try to add the documentation.

Copy link
Member

@hkershaw-brown hkershaw-brown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ian,

First read though, bunch of comments, mostly small.

I think the main takeaway is which errors need to be fatal vs just messages.

Cheers,
Helen

assimilation_code/modules/assimilation/assim_tools_mod.f90 Outdated Show resolved Hide resolved
call test_kde()
call finalize_utilities()

end program test_kde_mod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HKershaw missing an eof on this file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

developer_tests/test_kde_dist/test_kde_dist.rst Outdated Show resolved Hide resolved
developer_tests/test_kde_dist/work/input.nml Outdated Show resolved Hide resolved

use gamma_distribution_mod, only : gamma_cdf, inv_gamma_cdf, gamma_mn_var_to_shape_scale, &
gamma_gamma_prod

use bnrh_distribution_mod, only : inv_bnrh_cdf, bnrh_cdf, inv_bnrh_cdf_like

use kde_distribution_mod, only : kde_cdf_params, inv_kde_cdf_params, obs_dist_types, &
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a style naming convention, we use _type for derived types. @HKershaw This would be quite a few lines of code to change, so if we switch this we'll do a global search & replace single commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will hold off on this for now.

Based on suggestions by @hkershaw-brown, I changed some errors from
messages to fatal errors. Also replaced the case where the observation
distribution is truncated-normal and the observation error variance
is non-positive with a fatal error.
Many minor changes following suggestions from @hkershaw-brown. I
also removed an unused parameter in rootfinding.mod

public :: obs_error_info, probit_dist_info, obs_inc_info, &
init_algorithm_info_mod, end_algorithm_info_mod, &
EAKF, ENKF, BOUNDED_NORMAL_RHF, UNBOUNDED_RHF, &
GAMMA_FILTER, KERNEL, OBS_PARTICLE
GAMMA_FILTER, KERNEL, OBS_PARTICLE, KERNEL_QCEF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlaucar note from the standup about KERNEL_QCEF choice of name

KERNEL is in the existing code but not selectable as a filter kind with QCEFF (since v11+).

subroutine obs_increment_kernel(ens, ens_size, obs, obs_var, obs_inc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this to KDE_FILTER

@mjs2369
Copy link
Contributor

mjs2369 commented Jul 31, 2024

Thanks so much for this @iangrooms !

I have just a couple small comments. The first is that I saw that you have a commit where you move obs_increment_kde out of assim_tools_mod and into kde_distribution_mod.f90. What is the reasoning behind this decision since all of the other obs_increment_x subroutines for the various filter kinds are still located in the assim_tools_mod?

The second is similar to Helen's question on the naming of KERNEL_QCEF - will we want to replace the old filter_kind KERNEL with this new code (i.e. remove obs_increment_kernel and other relevant code) or keep both filter_kinds? @hkershaw-brown what are your thoughts on this?

@iangrooms
Copy link
Contributor Author

iangrooms commented Aug 2, 2024

I saw that you have a commit where you move obs_increment_kde out of assim_tools_mod and into kde_distribution_mod.f90. What is the reasoning behind this decision since all of the other obs_increment_x subroutines for the various filter kinds are still located in the assim_tools_mod?

@jlaucar suggested that particular change. I think the idea is that subroutines that are closely tied to a specific distribution should be located within that distribution's module.

I'll go with whatever naming convention you prefer. The KERNEL filter kind is fundamentally different from the new KERNEL_QCEF - the new one is not a better version of the old one, so it makes more sense to me to keep both kinds. KERNEL is more like a Gaussian Mixture following Anderson & Anderson 1999 whereas the new KERNEL_QCEF is conceptually like the BNRHF but with different details.

(edit:spelling)

hkershaw-brown and others added 4 commits August 7, 2024 07:57
test_kde_dist -> kde_test. This is for convention and for    gitignore (program and directory not the same name)
Updated input.nml for preprocess. Use default term level for errors
Added test_kde_dist executable to .gitignore
tests print pass/fail NCAR#706 (comment)
@hkershaw-brown
Copy link
Member

hkershaw-brown commented Aug 14, 2024

note probably 2weeks ~+ before jla/Ian get to this to finish up.

todo notes for then

  • doc for when to expect Bandwidth = 0 not needed (advice from @jlaucar this is fixed)
  • root finiding (advice from @jlaucar will have both root finding methods)

Adds logic to handle the case where the observation is bounded,
and all the observations _in the interior_ (i.e. not on the
boundaries) are identical. This commit only fixes this case
for the obs increment subroutine; the probit transform will
be handled separately.
@iangrooms
Copy link
Contributor Author

@hkershaw-brown is there anything outstanding before this can be merged?

I spoke with @jlaucar yesterday. I decided to rename from KERNEL_QCEF to KDE_FILTER. It matches the name of the distribution module as well as the distribution used in the probit transform, and it will hopefully clear up confusion stemming from having two filters with KERNEL in the name.

Regarding the rootfinding module and the new inv_cdf method that it provides: The main reason for using it with the new kernel density estimate (kde) methods is that it uses fewer evaluations of the cdf per step as compared to the existing inv_cdf. Evaluating the kde cdf can be quite a bit more expensive than evaluating the cdf for other distributions.

Copy link
Member

@hkershaw-brown hkershaw-brown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good Ian.
Super cool to get this into dart.
We'll get this released next week (avoiding the curse of releasing on Fridays)

For the release notes, do you want a link to the paper mentioned in the comments:
"A Quantile-Conserving Ensemble Filter Based on Kernel-Density Estimation"
by Grooms & Riedel (Remote Sensing, 2024; DOI:10.3390/rs16132377)

And let us know if you want any other attributions in the release notes, typically we'd do:
contributed by Ian Grooms

Cheers,
Helen

@hkershaw-brown hkershaw-brown added the release! bundle with next release label Aug 23, 2024
@iangrooms
Copy link
Contributor Author

I do think it would help to link to the paper in the release notes, and 'Contributed by Ian Grooms' is plenty of attribution for me. Thanks very much for your help with this PR!

@hkershaw-brown hkershaw-brown merged commit 7cee8df into NCAR:main Aug 26, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release! bundle with next release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants