Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: add QC flag instead of filter crash, when the ensemble member beyond the bounds setting in qceff_table.csv file. #681

Open
hgrhgy opened this issue May 17, 2024 · 12 comments
Assignees
Labels
Bug Something isn't working QCEFF quantile conserving filters

Comments

@hgrhgy
Copy link

hgrhgy commented May 17, 2024

Use case
Add new qc flag to the obs_seq.final, when the ensemble member beyond the bounds setting in qceff_table.csv file.

Is your feature request related to a problem?
when there are some member values out of the bounds configured in qceff_table.csv, the filter program crash with the error Smallest ensemble member less than lower bound -3.100498191521694E-004 0.000000000000000E+000.

Describe your preferred solution
Maybe the observations related to update this member are not assimilated instead of filter crash. And add a new QC flag to the obs_seq.final to tell the user why the observations are not assimilated.

Describe any alternatives you have considered
Or add a control options in the namelist file to tell the filter keep running when error occurs.

@hkershaw-brown hkershaw-brown added the Bug Something isn't working label May 17, 2024
@hkershaw-brown
Copy link
Member

Hi @hgrhgy I think this may be a bug, do you have a test case you can share that reproduces this error?

Also can you let us know:

  • which version on DART you are using git describe --tags
  • which model(s) you are working with and
  • which compiler (e.g. gfortran --version) you are using to build dart.
  • Are you running filter with single precision? (r4)

Do you have input state that has 'out of bounds values'?

@hkershaw-brown
Copy link
Member

hkershaw-brown commented May 17, 2024

There is a fix_bound_violations namelist option in probit_transform_nml

&probit_transform_nml
fix_bound_violations = .true.
/

Try this and see if it affects your run of filter.

fix_bound_violations will correct bounds violations in the transform_to_probit but only for small round off errors. However at first glance -3.100498191521694E-004 appears to be a fairly large bound violation.

@hgrhgy
Copy link
Author

hgrhgy commented May 18, 2024

Thanks for replying. @hkershaw-brown
The DART version is v11.0.1, the compiler is intel fortran, the precision is r8, the model is GEOSChem carbon simulation.
Maybe you can set the lower bound with a large value to see if the problem is reproduced..

I had tried setting fix_bound_violations = .true., but the problem did not sloved.
The table csv file is attached.
qceff_table.csv

@hgrhgy
Copy link
Author

hgrhgy commented May 21, 2024

if fix_bound_violations = .false. the error occurs in function bnrh_cdf in bnrh_distribution_mod.f90.
And if fix_bound_violations = .true. the error occurs in function fix_bounds in probit_transform_mod.f90.

In qceff table file, I set the lower bound to zero for obs_error_info, probit_inflation, probit_state and obs_inc_info.
I have checked the negative number is not from the observed states, the negative values are discard in forward operator.
So the negative value may be come from the extended states, but in the qceff table file lower bound of extended states is not set.

@hkershaw-brown
Copy link
Member

thanks for the update @hgrhgy

I think if the negative values are discarded, this would be a fail in the forward operator, and so that particular forward operator would not be part of the extended state (it would be skipped).
we don't have the GeosChem model, but I think I can create an out-of-bounds forward operators with any model to take a closer look at what is going on.

I think either:

  • out-of-bounds forward operators should be qc'd to a fail and are not being (we may need enforce failing for out-of-bounds after calling the fwd operator rather than expecting the obs_def_mod to do this)
  • possible the extended state bounds are not being dealt with properly

@hkershaw-brown hkershaw-brown self-assigned this May 31, 2024
hkershaw-brown added a commit to hkershaw-brown/DART that referenced this issue Jun 5, 2024
out-of-bounds forward operator
@hkershaw-brown
Copy link
Member

reproducer:
https://github.com/hkershaw-brown/DART/tree/out-of-bounds-fwd
two observation lorenz_96_tracer_advection
fwd operator out of bounds.

hkershaw-brown added a commit to hkershaw-brown/DART that referenced this issue Jun 14, 2024
…perator values to filter_assim

forward operators that produced out-of-bounds values are given the QC value DARTQC_OUT_OF_BOUNDS
tested with forcing lorenz_96_tracer_advection to give out-of-bounds values in model_interpolate

see NCAR#681 for user reported problem.
hkershaw-brown added a commit that referenced this issue Jun 14, 2024
…perator values to filter_assim

forward operators that produced out-of-bounds values are given the QC value DARTQC_OUT_OF_BOUNDS
tested with forcing lorenz_96_tracer_advection to give out-of-bounds values in model_interpolate

see #681 for user reported problem.
@hkershaw-brown
Copy link
Member

hkershaw-brown commented Jun 14, 2024

hi @hgrhgy the branch
https://github.com/NCAR/DART/tree/qc-for-out-of-bounds-fwd-ops has a fix to catch any fwd-operators with out-of-bounds errors. It sets the qc to DARTQC_OUT_OF_BOUNDS (41).

Can you give this a try and let me know if this solves your problem.

edit @hkershaw-brown double check bitwise on this

@hgrhgy
Copy link
Author

hgrhgy commented Jun 17, 2024

Hi @hkershaw-brown , I have merged the commit bcf41d1 to my own branch, but the problem did not solved. I debug in detail by gdb, the stack is shown below. The program crashed in the same function bnrh_cdf for different reasons.

CASE 1: The inflation probit out of bounds.
image

CASE 2: Then I disabled the inflation probit lower bound condition, and debug with the same break point , the error changed to :
image

I supposed the tag bcf41d1 could solved the case 2, but it didn't. The new code is added at line 537 in assim_tools_module, and the crash occur at line 500 before the line the code added.
I don't know if the Failed to converge for quantile warnning has any impact on the errors.
Also, I am confused about the difference of inflation bound setting in input.nml and qceff_table.csv.

Let me know if any other information is needed.

The gdb logs:
gdb_log_for_state_out_of_bound.txt
gdb_log_for_inflation_out_of_bound.txt

@hkershaw-brown
Copy link
Member

the Failed to converge for quantile is not a good sign.
Also, is this the same input (options and files) that gave the first reported problem of the out-of-bounds error? If so then their maybe other problems with your code. It is hard to tell without the code or the input files.

Is your code available on GitHub? If so, please provide the repository.
It looks from the gdb_log output that your using code from https://github.com/apmizzi/DART_Chem rather than DART v11.0.1

before going further into this, I'd like to make sure this is something that we can reproduce with DART.

@hgrhgy
Copy link
Author

hgrhgy commented Jun 17, 2024

It's the same input gave the first reported problem, the qceff_table.csv maybe changed by turn on or off the lower bound for each option (obs_error_info, probit_inflation, probit_state and obs_inc_info) to test which option caused the error.

The code is clone from the DART tag v11.0.1, then some forward operator from https://github.com/apmizzi/DART_Chem and GEOS-Chem model code are merged, so some log is in Arthur's log style.

The code is not currently on github.

@hkershaw-brown
Copy link
Member

Hi @hgrhgy
We're limited in the support we can provide for private code.
I'd recommend you check that your input data respects the bounds set for the QCEFF options.

For the scientific options of the QCEFF, dart@ucar.edu is the best place to ask about this.

@hgrhgy
Copy link
Author

hgrhgy commented Jun 24, 2024

I understand that the limited for private code. I have tried the branch qc-for-out-of-bounds-fwd-ops, the problem cant be reproduce in lorenz_96_tracer_advection model. I'll fully re-check my input data to ensure it respects the QCEFF bounds, and try to compare the difference in filtering process between the two model. Thank you very much for your help @hkershaw-brown .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working QCEFF quantile conserving filters
Projects
None yet
Development

No branches or pull requests

3 participants