Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problematic regional diagnostics for certain ocean layout #87

Closed
hguo-gfdl opened this issue Apr 8, 2019 · 8 comments
Closed

problematic regional diagnostics for certain ocean layout #87

hguo-gfdl opened this issue Apr 8, 2019 · 8 comments
Labels

Comments

@hguo-gfdl
Copy link

I used MOM6 tagged "om4/v1.0.1", and dumped regional oceanic diagnostics in Carribean Windward Passage. The variables: "vmo" (Ocean Mass Y Transport) looked unreasonable when I used certain ocean layout, for example,
ocn ranks="4671" threads="1" layout = "90,72" io_layout = "1,4" mask_table="mask_table.1809.90x72".

The untarred history file is
gfdl:/archive/oar.gfdl.cmip6/CM4/warsaw_201710_om4_v1.0.1/CM4_piControl_C/gfdl.ncrc4-intel16-prod-openmp/history/tmp_650/06500101.ocean_Windward_Passage.nc

The stdout is gfdl:/archive/oar.gfdl.cmip6/CM4/warsaw_201710_om4_v1.0.1/CM4_piControl_C/gfdl.ncrc4-intel16-prod-openmp/ascii/06500101.ascii_out.tar

Could you please take a look? Please let me know if any other info. is needed.

Thanks.

@underwoo
Copy link
Member

underwoo commented Apr 8, 2019

More information from @hguo-gfdl. Some is included above, but this helps clarify the issue:

Email from @hguo-gfdl

I ran into problematic regional output "vmo" (Ocean Mass Y Transport) along Carribean Windward Passage after changing ocean layout. Originally I used around 8100 PEs as below,

<freInclude name="resourceparams_A1152x2_O5791_production">
   <resources jobWallclock="16:00:00" segRuntime="2:30:00">
      <atm ranks="1152" threads="2"   layout = "8,24"   io_layout = "1,4" />
      <lnd                            layout = "8,24"   io_layout = "1,4" />
      <ice                            layout = "144,8"  io_layout = "1,4" />
      <ocn ranks="5791" threads="1"   layout = "90,90"  io_layout = "1,5"  mask_table="mask_table.2309.90x90"/>
   </resources>
</freInclude>

After reducing to 6408 PEs, "vmo" at Windward Passage reached extremely large value, which was unreasonable,

<freInclude name="resourceparams_A864x2_O4671_production">
   <resources jobWallclock="16:00:00" segRuntime="2:40:00">
      <atm ranks="864" threads="2"    layout = "6,24"   io_layout = "1,4" />
      <lnd                            layout = "6,24"   io_layout = "1,4" />
      <ice                            layout = "72,12"  io_layout = "1,4" />
      <ocn ranks="4671" threads="1"   layout = "90,72"  io_layout = "1,4" mask_table="mask_table.1809.90x72"/>
   </resources>
</freInclude>

The xml for the above run is on gaea:

/ncrc/home2/oar.gfdl.cmip6/xml/CM4/CM4Bling_am4p0c96L33_OM4p25_piControl_C.xml

The untarred history file is at gfdl:

/archive/oar.gfdl.cmip6/CM4/warsaw_201710_om4_v1.0.1/CM4_piControl_C/gfdl.ncrc4-intel16-prod-openmp/history/tmp_650/06500101.ocean_Windward_Passage.nc

The stdout is in gfdl:

/archive/oar.gfdl.cmip6/CM4/warsaw_201710_om4_v1.0.1/CM4_piControl_C/gfdl.ncrc4-intel16-prod-openmp/ascii/06500101.ascii_out.tar

@underwoo
Copy link
Member

underwoo commented Apr 8, 2019

@hguo-gfdl could you please point me to a file that has the correct (or what looks correct) for the vmo variable? I grabbed two random files from /archive/oar.gfdl.cmip6/CM4/warsaw_201710_om4_v1.0.1/CM4_piControl_C/gfdl.ncrc4-intel16-prod-openmp/history and see no difference. It would be easier if you could point me to two versions of the variable: one with the expected output, and the one with the bad output.

@hguo-gfdl
Copy link
Author

@underwoo The expected output:
gfdl:/archive/oar.gfdl.cmip6/CM4/warsaw_201710_om4_v1.0.1/CM4_piControl_C/gfdl.ncrc4-intel16-prod-openmp/history/tmp_293/02930101.ocean_Windward_Passage.nc

The bad output:
gfdl:/archive/oar.gfdl.cmip6/CM4/warsaw_201710_om4_v1.0.1/CM4_piControl_C/gfdl.ncrc4-intel16-prod-openmp/history/tmp_650/06500101.ocean_Windward_Passage.nc

@underwoo
Copy link
Member

underwoo commented Apr 9, 2019 via email

@hguo-gfdl
Copy link
Author

@underwoo I just added read permission on these files. Please let me know if there are any problems.

Thanks,

@underwoo
Copy link
Member

@hguo-gfdl this might be related to another issue: #52. This issue points to an inconsistency in how diag_manager_mod names axes. I will need to run a test case, both to see if I can reproduce the error and what information is in the uncombined history files. Hopefully I will have some information next week.

@underwoo
Copy link
Member

@hguo-gfdl I was able to reproduce the issue, and it does appear to be a diag_manager issue, more than likely this is not related to what we see in #52.

In your case, with this layout, diag_manager produces two files (since this is a region). One of those files has the variables volcello, thetao and so. The other file has variable vmo and vo. Uncombined, the variable outputs look correct. When combined, only the variables vmo and vo exist in the combined file, but have the data for volcello and thetao respectively.

We will continue to look into this issue, and try to come up with a fix.

Thank you for reporting this.

@underwoo
Copy link
Member

We think we have discovered the issue, and it is something we do not know how to fix at this time. We believe what is happening is that at different resolutions, some of the MPI processes may not have access to all the same variables within small regions due to different grids used in the model. The regional output process writes a file per MPI process, and simply dumps the data to the file. If the variable doesn't exist within the grid on a particular MPI process, the variables is unregistered on that process. Note, this particular case will only happen with certain variables on the ocean, ice and land component grids.

We will update, in a future patch, the workflow to not combine files that do not have the same variables set of variables. This will keep the workflow from overwriting data. However, it is not a fix to this issue.

If we discover a method to correct this behavior, we will let you know. For now, I will close this as something we will not fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants