Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Id fix for 32 bit #51

Merged
merged 19 commits into from
Jun 20, 2017
Merged

Id fix for 32 bit #51

merged 19 commits into from
Jun 20, 2017

Conversation

adcroft
Copy link
Member

@adcroft adcroft commented Jun 19, 2017

  • Replaces the 32-bit iceberg_num with:
    • a 64-bit id within the code, and
    • two 32-bit id-like variables in the restart files.
  • Reads (old) restart files with 32-bit iceberg_num and generates new IDs for bergs.
  • Always writes the new restart file.
  • Trajectory files no long have the "marker" entries but instead all records have the 64-bit equivalent id.
  • This should not change answers but just changes the book keeping internally and for i/o.
  • I removed the "read_orig" ability since maintaining compatibility with 3 year old formats is impossible when I don't have any such files around to test with.

I've assigned this to @nikizadehgfdl to handle - we need confirmation this fixes the issue in Bill Hurlin's run. Once (assuming) it has passed your tests, can you complete the merge by clicking green buttons? I believe that Bill then just needs to use the merged version and his run should magically continue from where it stopped.

@sternalon Can you check that the bond restarts, and indeed the bonds, still work? I don't have a test case with bonds active. Note that the restart file format is different in order to accommodate the 64-bit id. I did not code up a backward compatibility for bond restarts because I couldn't debug it. If we need backward compatibility, it will not be hard to follow how it works in read_restart_bergs() - there's about 5 lines that handle it. There are a lot of changes in all but made in small commits so should not be to hard to follow the code evolution.

- Two new functions to cobine two 32-bit integers into a single 64-bit
  integer and back again.
  - When we wrote the iceberg_num (iceberg id code) we assumed integers
    were 64-bit but by default they are 32-bit with the compilation options
    used with FMS codes, and netcdf only supports 32-bit integers!
- Also renamed unitTests() to unit_tests().
- A new member %id is a 64-bit integer for a better constructed id.
- New function, generate_id(), returns a 64-bit integer hash of
  counter and position.
- This calls the new generate_id() function and stores
  the 64-bit %id in the iceberg everywhere we first create
  %iceberg_num.
- When the restart file exists but has zero length data in then
  the model was correctly NOT allocating but was then proceeding to
  try and read into the non-allocated array. This avoids that condition.
- I thought this was fixed before but apparently not.
- There were unnecessary use and public lines for the functions
  pack_berg_into_buffer2() and unpack_berg_from_buffer2().
- Previously the pack_berg_into_buffer2() and unpack_berg_from_buffer2()
  routines would place particular members in the berg type into a
  particular position of a 2D array. This makes adding variables or
  re-ordering variables cumbersome since fixed integer locations have to
  be made consistent.
- Now we treat a row of the 2d array as a fifo stack so that only the
  order matters in the code, and the absolute position does not need to
  appear in the code. The order of packing and unpacking must still be
  consistent.
- In order to switch over to using the 64-bit IDs I needed to be able
  calculate new 64-bit IDs from old 32-bit IDs which involved splitting
  the generate_id() function into more modular parts.
- As for pack_berg_into_buffer2(), the trajectory packing and unpacking
  now uses the FIFO stack-style approach.
- %id is set everywhere that %iceberg_num is set.
- When resading restarts we convert the 32-bit id to a 64-bit id.
- New bergs are assigned both a 32-bit and 64-bit id.
- To reduce amount of code that needs to edited when deprecating $iceberg_num
  I've uniformly updated every print statement that uses %iceberg_num.
- gnu was casting a constant integer to the kind of the result, while
  intel and pgi compilers are making all constants integer*4. This commit
  forces the expression to be evaluated in integer*8 terms.
- Everywhere tests are made now uses %id instead of $iceberg_num,
  except for the restart files.
- Removed code used to read legacy restarts.
- Added FATAL error message if "read_orig" is set in the namelist.
- Add id (as id_cnt,id_ij) to restart files (not yet used when reading).
- I can not figure out why we would initialize the counter to 1 and then
  increment after generating a berg. Starting at 0 and incrementing before
  generating a berg means the counter is now a counter of how many bergs
  were calved in that cell.
- When reading the 32-bit iceberg_num from restarts we will need to
  recover the starting i,j for the berg.
- Also removed comments abotu moving read_restart_calving().
- We now read id_cnt and id_ij to create an id, unless iceberg_num
  is present.
- No longer write iceberg_num (ever).
- iceberg_num now only appears when reading an old restart with the
  32-bit reperesentation of iceberg_num.
@nikizadehgfdl
Copy link
Contributor

I reproduced the bug (by providing year 225 restart of Bill's coupled model run as initCond) and verified that
this commit fixes the problem.

My regression tests also passed, except for one, the iceberg distributed restart does not work.
Given the urgency for this fix I am recommending it to be pulled in and I filed issue #52 for the distributed restarts (which is not actively used yet).

@adcroft adcroft merged commit a527054 into NOAA-GFDL:dev/gfdl Jun 20, 2017
adcroft added a commit to adcroft/icebergs that referenced this pull request Jun 20, 2017
…arts

- When a restart file is empty (zero record length) we were skipping the
  read step to avoid segmentation violations when passing unallocated arrays
  to mpp. However, an mpp_sync inside mpp_io then caused some PEs to hang.
- In order to call mpp_io on all PEs we are now allocating zero-sized
  arrays which happily allows the everything to work inside mpp.
- Closes NOAA-GFDL#52
- Also closes NOAA-GFDL#48 which should have been closed with PRs NOAA-GFDL#50 and NOAA-GFDL#51 for
  which this commit is a further fix.
@adcroft adcroft deleted the id-fix-for-32-bit branch July 15, 2020 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants