Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new use case of python embedding for point observations with NRL innovation files. #635

Open
6 of 21 tasks
JohnHalleyGotway opened this issue Sep 25, 2020 · 3 comments
Open
6 of 21 tasks
Assignees
Labels
Milestone

Comments

@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Sep 25, 2020

Describe the New Feature

This task is to create a python script to read NRL innovation files and use python embedding to pass them into the ascii2nc tool. Once that script works well, collaborate with NRL staff to develop a new METplus use case which calls ascii2nc to prepare the point observations and then either Point-Stat or Ensemble-Stat to verify model output against them. Need to get direction from Liz Satterfield for direction on these details.

See details about the formatting of the NRL innovation files as a comment on this issue.

Consider breaking down the multiple steps for this task into sub-issues:
(1) Write and test a python embedding script to process the input file with ascii2nc.
(2) See comments for a description of how Liz would like (1) to be configurable.
Should filtering logic be added to the python-embedding script or to the ascii2nc tool?
(3) Gather sample model data which should be verified using these point observations and develop of METplus use case for these steps.
(4) This data may also be suitable for python embedding in Stat-Analysis to compute statistics directly using the obs and innovation values. Explore this option and potentially include it in the use case as well.

Acceptance Testing

Input NRL innovations files can be found in kiowa:/d1/projects/METplus/METplus_Development/feature_635/innovation_data
Contact Liz Satterfield for the model data to be used in use case development.

Time Estimate

Estimate the amount of work required here.
Issues should represent approximately 1 to 3 days of work.

Sub-Issues

Consider breaking the new feature down into sub-issues.

  • Add a checkbox for each sub-issue here.

Relevant Deadlines

METplus-4.0 release

Funding Source

2700021

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required
  • Select scientist(s) or no scientist required

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Review projects and select relevant Repository and Organization ones or add "alert:NEED PROJECT ASSIGNMENT" label
  • Select milestone to next major version milestone or "Future Versions"

Define Related Issue(s)

Consider the impact to the other METplus components.

New Feature Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s), Project(s), Milestone, and Linked issues
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.
@JohnHalleyGotway JohnHalleyGotway added component: use case wrapper component: use case configuration priority: high High Priority type: new feature Make it do something new requestor: Navy/NRL Naval Research Laboratory alert: NEED ACCOUNT KEY Need to assign an account key to this issue alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle labels Sep 25, 2020
@JohnHalleyGotway JohnHalleyGotway added this to the METplus-4.0 milestone Sep 25, 2020
@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Sep 25, 2020

Email from Liz on 9/23/2020:
The innovation files are ascii files with a 75 line header. The column
description is below. If you all could provide a reader to get these files
into MET that would be great! I'm happy to provide a sample data set and
any additional information that you may need.

  1. n: Observation number for indexing obs.
  2. ob: Observation value.
  3. bk_ob: Background (short-term forecast) value, interpolated in space and time to the observation location/time.
  4. t_bk_ob: Supplemental information required for assimilating the observation.
  5. xiv_ob: Innovation, or difference between the observation and the background value (e.g., ob - bk_ob)
  6. err_ob: Observation error standard deviation
  7. etc_ob: Supplemental information required for assimilating the observation.
  8. lat_ob: latitude of the observation.
  9. lon_ob: longitude of the observation.
  10. p_ob: pressure level of observation; except for radiances.
  11. jvarty: Jvarty_ob in the code, this is the variable type code.
  12. insty: Insty_ob in the code; instrument type code.
  13. nvp: Number of observations in the "profile"
  14. ichk: Quality indicator.
  15. idt: Time difference in seconds from the center of the time window. For a 6-hr time window, the values range from -10800 to +10800.
  16. c_pf_ob: Observation platform identifier (16 characters). For radiances,
    this includes sensor name (AMSU-A), channel number, ascending/descending
    flag and assimilate flag (A, M, R - for assimilate, monitor or reject). For
    radiosonde, it includes the block/station number.
  17. c_db_ob: More platform identifiers (10 characters). For radiances, this
    is includes the satellite name (NOAA15), and land/sea/ice flag. For
    radiosondes, it includes the retransmission status.
  18. idp: NAVDAS computes the pressure change over the assimilation window and stores here.
  19. lsi: Land/sea/ice flag
  20. rej: Reject flag
  21. bkerr: background error standard deviation, in observation space.
  22. cob: analysis solution, in observation space prior to being projected back into model space
  23. resid: residual in observation space, or H(analysis - background).
    A read statement would look like:
C=textscan(fileID,
'%*7d%8f%*1c%8f%*1c%*8f%*1c%8f%9f%*1c%*9f%*1c%9f%*1c%9f%*1c%11f%*1c%2d%*1c%3
d%*1c%5d%*1c%*4d%*1c%7d%*2c%16c%*2c%10c%*1c%*4d%*1c%*2d%*1c%3d%*1c%7f%*1c%*7
f%*1c%9f%*1c%9f%*[^\n]','headerlines',75,'whitespace','','delimiter','/n');
fortran   = '7s 9s 1x 8s 1x 8s 1x 8s 9s 1x 9s 1x 9s 1x 9s 1x 11s ' +\
                 '1x 2s 1x 3s 1x 5s 1x 4s 1x 7s 2x 16s 2x 10s 1x 4s 1x 2s '
+\
                '1x 3s 1x 7s 1x 7s 1x 9s 1x 9s'

We now output h5 files instead of ascii. I put both formats and the python converter on the ftp directory.
It would be great to have something like pb2nc, where we could choose what data types to output to the nc file via a config. Also having the innovation and residual information will be helpful for DA diagnostics.

Putting the obs values into ascii2nc will get us to the verification task and is where we likely need to start. Having matched pairs would also be useful.

There is additional information in the file that would be helpful to retain (you could treat it as an additional observation type) , in particular:

  1. xiv: the innovation value
  2. resid: residual in observation space
  3. the prescribed errors oberr and bkerr
  4. bk_ob the background in observation space
  5. the QC flags

If that information is available along with the ob value, this enables us to implement some DA diagnostics fairly easily.

@TaraJensen TaraJensen removed the alert: NEED ACCOUNT KEY Need to assign an account key to this issue label Sep 28, 2020
@TaraJensen TaraJensen added this to To do in METplus-4.0.0-beta1 (10/26/20) via automation Sep 28, 2020
@TaraJensen TaraJensen removed the alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle label Sep 28, 2020
@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Oct 1, 2020

On 10/1/2020, Tara noted that the funding to pay for this must be spent by the end of 2020.

@DanielAdriaansen
Copy link
Contributor

The prototype for this work exists in branch feature_635_nrlinnov. I decided to add the filtering capability in the Python embedding for now, via pandas. The pandas filters are configurable via [user_env_vars] in METplus, as is the mapping between NRL column names and MET 11-column names. Processing is somewhat slow, but these are very large files. The ASCII format takes ~ 3 minutes per file while the HDF5 files are about half of that or less (1-1.5 minutes per file) thus I would recommend using HDF5 input files whenever possible.

@georgemccabe georgemccabe added this to To do in METplus-4.1.0-beta2 (8/31/21) via automation Jul 12, 2021
@georgemccabe georgemccabe moved this from To do to In progress in METplus-4.1.0-beta2 (8/31/21) Jul 27, 2021
@JohnHalleyGotway JohnHalleyGotway added this to To do in METplus-4.1.0-beta3 (10/06/21) via automation Aug 17, 2021
@georgemccabe georgemccabe added this to To do in METplus-4.1.0-beta4 (11/16/21) via automation Oct 4, 2021
@georgemccabe georgemccabe added this to To Do in METplus-4.1.0-beta6 03/01/22) via automation Jan 11, 2022
@georgemccabe georgemccabe added the alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle label Jan 18, 2022
@georgemccabe georgemccabe modified the milestone: METplus-5.0.0 Feb 8, 2022
@georgemccabe georgemccabe removed the alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle label Apr 26, 2022
@georgemccabe georgemccabe added the alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle label Dec 10, 2022
@DanielAdriaansen DanielAdriaansen added type: new use case Add a new use case and removed alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle type: new feature Make it do something new labels May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 🏗 In progress
Development

No branches or pull requests

6 participants