Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diagnostic bug in CICE when outputting 4d fields #62

Open
scrallen opened this issue Jul 29, 2021 · 22 comments
Open

diagnostic bug in CICE when outputting 4d fields #62

scrallen opened this issue Jul 29, 2021 · 22 comments

Comments

@scrallen
Copy link

Background:
For each grid point and ice category, cice computes its thermodynamics over nkice layers (this is 4 in ACCESS-OM2).

Under the BL99 thermodynamics option, the conductive salinity profile (aka 'ice internal salinity') is fixed and prescribed from a function - therefore we know what values to expect. For a given point and thickness category the expected values for these four layers are [0.64920187, 2.354581  , 3.0310922 , 3.1892977 ].

The Issue:
However, if Sinz is output (f_sinz is the namelist field flag), the values do not appear as I expect.

Here's an example taken from the 1° model, given an Xarray DataArray:

In [4]: type(sinz)
Out[4]: xarray.core.dataarray.DataArray

In [5]: sinz.shape
Out[5]: (1, 5, 4, 300, 360)

selecting a point for a given time (which has length 1 in this case), lat and lon (that has ice, aka aice>0)

In [6]: sinz.isel(time=0,ni=30,nj=40,nc=0).values
Out[6]: array([0.64920187, 0.64920187, 0.64920187, 0.64920187], dtype=float32)

We do not see the 4 values we expect to see.

Consider the same time,lat,lon for all layers and thickness categories:

n [8]: sinz.isel(time=0,ni=30,nj=40).shape
Out[8]: (5, 4)
In [9]: sinz.isel(time=0,ni=30,nj=40).values
Out[9]: 
array([[0.64920187, 0.64920187, 0.64920187, 0.64920187],
       [0.64920187, 2.354581  , 2.354581  , 2.354581  ],
       [2.354581  , 2.354581  , 3.0310922 , 3.0310922 ],
       [3.0310922 , 3.0310922 , 3.0310922 , 3.1892977 ],
       [3.1892977 , 3.1892977 , 3.1892977 , 3.1892977 ]], dtype=float32)

The values appear to be ordered along the wrong dimensions. What I think should be the correct answer can be retrieved (for this time,lat,lon) by:

In [10]: temp = sinz.isel(time=0,ni=30,nj=40).values

In [11]: temp.reshape((4,5)).transpose()
Out[11]: 
array([[0.64920187, 2.354581  , 3.0310922 , 3.1892977 ],
       [0.64920187, 2.354581  , 3.0310922 , 3.1892977 ],
       [0.64920187, 2.354581  , 3.0310922 , 3.1892977 ],
       [0.64920187, 2.354581  , 3.0310922 , 3.1892977 ],
       [0.64920187, 2.354581  , 3.0310922 , 3.1892977 ]], dtype=float32)

These code snippets have been produced in ipython using pyXarray, but the same answer is obtained using a variety of tools, including inspecting the netCDF file with ncview.

If there is an issue here, this may affect other 4d fields like Tinz, but these may be harder to diagnose from their values, as they aren't prescribed and fixed like Sinz.

An example output file with the Sinz variable can be found on gadi at:

/home/548/sxa548/access-om2-sample_output/iceh.2018-08-15.nc
@scrallen
Copy link
Author

It would be fantastic if someone familiar with CICE can confirm what I'm seeing!

If I'm right, I'm wondering if this is a subscript error? That is, the thickness category and layer indices are misordered somewhere in the code. I suspect this is linked to writing output, so I have started looking in files such as ice_history.F90 and ice_history_shared.F90. However, I can't spot anything obviously amiss.

@aekiss
Copy link
Contributor

aekiss commented Nov 29, 2021

Hi @scrallen, sorry I didn't reply earlier. I can confirm that I see the same thing with Sinz in your
/home/548/sxa548/access-om2-sample_output/iceh.2018-08-15.nc.
Did you have any luck spotting a bug?

We have saved Sinz and Tinz in part of cycle 2 and all of cycle 3 of the 0.1deg IAF but as this is using mushy ice it is not so obvious whether there's a problem there.

@scrallen
Copy link
Author

Hi, no I looked as carefully as I could, but couldn't see where things might be wrong. Someone more skilled than me may spot something.

I confess, I've moved on to focusing on configurations with mushy ice and other things for the time begin, so this slipped off my radar.
However, if the same error also occurs when using mushy, it may be harder to diagnose, unless the internal profiles are very obviously wrong/aphysical/unexpected.

@aekiss
Copy link
Contributor

aekiss commented Nov 29, 2021

Yes, this requires further investigation.
My notebook to look at this is here: https://github.com/aekiss/notebooks/blob/master/check-sinz-tinz.ipynb

@aekiss
Copy link
Contributor

aekiss commented Nov 30, 2021

The time axis is also missing for some (all?) Sinz and Tinz daily and monthly outputs from IAF cycle 2, e.g.

ncdump -h /g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle2/output471/ice/OUTPUT/iceh.2014-10-daily.nc

yields

float Tinz_m(nc, nkice, nj, ni) ;
...
float Sinz_m(nc, nkice, nj, ni) ;

Not sure how this occurred.

There is a time axis in cycles 3 and 4, e.g.

ncdump -h /g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle3/output488/ice/OUTPUT/iceh.1958-01-daily.nc

yields

	float Tinz(time, nc, nkice, nj, ni) ;
	float Sinz(time, nc, nkice, nj, ni) ;

@aekiss
Copy link
Contributor

aekiss commented Jan 25, 2022

The time axis was missing from Sinz and Tinz in cycle2 prior to concatenation of the daily files (see updated https://github.com/aekiss/notebooks/blob/master/check-sinz-tinz.ipynb), so data was lost from Sinz and Tinz in the concatenation.

I have retained the unconcatenated files from 2014-05-01 onward in /scratch/x77/aek156/access-om2/archive/01deg_jra55v140_iaf_cycle2/output4??/ice/OUTPUT/iceh.????-??-??.nc* in case we want to try to fix them by adding a time axis prior to concatenation. This covers nearly all the Sinz, Tinz output from cycle 2 (I deleted Jan-April 2014 before I thought to check this).

Apart from the lack of a time axis, the salt profile looks sensible in cycle 2 (at least in some places)
download
in contrast to /home/548/sxa548/access-om2-sample_output/iceh.2018-08-15.nc, which looks like
download

It would be worth looking at the CICE versions used for these runs - perhaps there was a commit that fixed the time axis but messed up the index order.

@aekiss
Copy link
Contributor

aekiss commented Jan 31, 2022

cycle 2 used cice_auscom_3600x2700_722p_26e6159_libaccessom2_a6e5d87.exe
cycle 3 used cice_auscom_18x15.3600x2700_1682p_9864af2_libaccessom2_44e8821.exe

The commits between 26e6159 and 9864af2 are:

9864af2 add 01deg_18x15 build option to Makefile
532a2df Merge branch 'master' of github.com:COSIMA/cice5
2633afb Add build config for res 3600x2700 and blocksize of 18x15.
ee387d9 Merge pull request #50 from COSIMA/221-model-startup-time
49de88b Update PIO.
b7f876e Grid array read fix. (see 751a252) https://github.com/COSIMA/access-om2/issues/221
3838ccf Send to ocean before receiving. https://github.com/COSIMA/access-om2/issues/221
751a252 Don't broadcast grid arrays. https://github.com/COSIMA/access-om2/issues/221
d45eb2b Check def var in restart. #34
8895750 Merge pull request #2 from COSIMA/master
015877c Merge pull request #49 from COSIMA/tweaks
a134394 set DNetCDF_Fortran_LIBRARY to a library file rather than a directory to keep cmake happy
67ed5dc use netcdf/4.7.4p for 0.25deg config so that PIO will work
5765c34 use https rather than git protocol for ParallelIO submodule
7c74942 Introduce namelist parameter to control block padding. https://github.com/COSIMA/cice5/issues/34
3f0022c Don't do block padding unless it is needed for netCDF parallel IO. https://github.com/COSIMA/cice5/issues/34
5ecd733 Remove debugging code. #34
5b4f6af Merge pull request #48 from COSIMA/34-pio
c1ae3f4 Allow configurable chunk size. #34
e9575cd Merge parallel netcdf.
4c84f46 Simplify build of ParallelIO
07f3ceb Simplify PIO init - using fewer iotasks doesn't appear to improve performance.
43ad528 Chunksize testing. #34
5b9806d Fix netcdf error messages only. #34
e750397 Introduce namelist parameters for parallel IO. #34
647efe1 io chunking test. #34
ade84a9 Convert to real() before writing NetCDF. #34
a4adc63 Fix bad work array shapes. #34
b93342c Fixes to parallel netcdf code - tested with all ice diagnostics. #34
9afdd10 Complete parallel netcdf code. #34
764f167 Code to make sure that all PEs have the same number of blocks - needed for parallel collective IO. #34
8e13fab Parallel netcdf test code working. #34
96dd6fa Parallel netcdf test code. #34
ead64ba Improved error handling - lot's more to go.
42223fc Whitespace fixes only.
d687345 Fix some variable names in error messages.
1bc8f3b Add ParallelIO module.
a0e7b10 Tested 1deg parallelIO with deflation. #34
26e6159 Merge pull request #47 from russfiedler/wombat

a4adc63 Fix bad work array shapes. #34 looks potentially relevant

One big change is switching to parallel IO.

These are the changes between those commits: https://github.com/COSIMA/cice5/compare/26e6159..9864af2

There are lots of changes to io_netcdf/ice_history_write.F90, including to subroutine write_3d_and_4d_variables and subroutine write_3d_and_4d_variables_parallel and subroutine put_4d_with_blocks
https://github.com/COSIMA/cice5/compare/26e6159..9864af2#diff-d797700daec4fe292f1a753f74beaa20bd24883a06a1cc32ae0a23ddae023c95

@aekiss
Copy link
Contributor

aekiss commented Jan 31, 2022

@scrallen what CICE executable did you use for /home/548/sxa548/access-om2-sample_output/iceh.2018-08-15.nc?

@scrallen
Copy link
Author

My access-om2 git log has the following:

Merge: ba96ff8 43568e5
Author: Stewart Allen <stewart.allen@bom.gov.au>
Date:   Mon Apr 26 23:22:35 2021 +1000

    pull of bug fix to chio. set to 0.006 in cice_in.nml to reproduce previous runs

Forgive my limited git skills, but I believe that points to this ACCESS-OM2 commit which includes this commit of CICE, dfb833a. As my commit message suggests, I pulled to fix the chio bug.

Does that help?

@scrallen
Copy link
Author

I might add, the commit I pulled was made some time after those in the span you listed.

@aekiss
Copy link
Contributor

aekiss commented Jan 31, 2022

Thanks - to double-check, what is the full path to the cice exe given in config.yaml? The cice git hash should be in the filename before _libaccessom2.

@scrallen
Copy link
Author

/g/data/ep4/sxa548/access-om2/bin/cice_auscom_360x300_24p_5bbfec7_libaccessom2_44e8821.exe
so 5bbfec7.

@scrallen
Copy link
Author

Oh, that's a local commit, as I was changing to source code. The git log in the cice src directory has:

Merge: 16fcc6d c1c6a47
Author: russfiedler <russfiedler@users.noreply.github.com>
Date:   Tue Mar 9 16:38:49 2021 +1100

    Merge pull request #56 from COSIMA/iss55
    
    don't ignore chio namelist value - fixes issue 55

so I suggest it could be c1c6a47

@aekiss
Copy link
Contributor

aekiss commented Jan 31, 2022

It may become clearer if you inspect the git history with gitk --all

@russfiedler
Copy link
Contributor

The code in io_netcdf/ice_history_history_write.F90 is very odd. It looks like 5D variable (including time) are being written with a 4D call with start=(/ 1, 1,k,ic/), count=(/nx_global,ny_global,1, 1/) which is inconsistent to me. It definitely means there can be only one time record per file.. I'm sure there should be an extra ,1 in the start and count arguments.

@aekiss
Copy link
Contributor

aekiss commented Jan 31, 2022

Well spotted @russfiedler - I guess you mean this 4D bit of write_3d_and_4d_variables? It looks like a copy-paste of this 3D bit above it, so I agree it would need more indices. [see below]

@russfiedler
Copy link
Contributor

I'm sure it applies to all the variables. The 2D variable are written the same way. Also, it's still in CICE6. I looked at the API for nf90_put_var and I'm sure it's wrong.

https://www.unidata.ucar.edu/software/netcdf/docs-fortran/f90-variables.html

@aekiss
Copy link
Contributor

aekiss commented Jan 31, 2022

Hmm, interesting - but we haven't seen this issue in anything other than the 4D (not including time) vars.

Also we may actually be using write_3d_and_4d_variables_parallel, since we do parallel IO. [see below]

I also misunderstood the loop limits defined here in my last comment - I now think both sections I referred to are actually for 4D (not including time), so it's OK that they are nearly identical.

@aekiss
Copy link
Contributor

aekiss commented Jan 31, 2022

Another correction - we don't set history_parallel_io in cice_in.nml in cycle 3 and it defaults to false, so we don't use write_3d_and_4d_variables_parallel.

In fact, in cycle 3 we get parallel IO by building with PIO (see #34), which sets IODIR to io_pio so we should be looking at io_pio/ice_history_write.F90 rather than io_netcdf/ice_history_write.F90 to debug the swapped indices.

@aekiss
Copy link
Contributor

aekiss commented Jan 31, 2022

The lack of a time index in cycle 2 is a separate bug, presumably in io_netcdf/ice_history_write.F90 since it was built with setenv IO_TYPE netcdf.

@aekiss
Copy link
Contributor

aekiss commented Apr 13, 2022

So we have 2 separate problems:

  1. Stewart's original issue of swapped indices in 4d fields
  2. missing time axis in cycle 2, which seems to have been fixed by the changes in the executables used in more recent runs (cycles 3 and 4)

Issue 1. is unresolved, and affects both salinity and temperature profiles (Sinz and Tinz) in current configurations.
e.g. these are area averaged S and T for Jan 1958 in 01deg_jra55v140_iaf_cycle4 before and after correction via .reshape((4,5)) (the depth index is positive downwards, and categories are in order of increasing thickness):

download
download-1

download-2
download-3

See notebook for details: https://nbviewer.org/github/aekiss/notebooks/blob/8ec1ef20eba5b34408f27e6feaf71c8a11e7484c/check-sinz-tinz.ipynb#Check-S-and-T-profiles-in-cycle-4

@access-hive-bot
Copy link

This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:

https://forum.access-hive.org.au/t/output-internal-sea-ice-temperatures-from-access-om2/1531/9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants