Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates for Focal (Ubuntu 20.04) including gcc10 and updated CI #311

Merged
merged 45 commits into from
May 7, 2021

Conversation

tmbgreaves
Copy link
Contributor

@tmbgreaves tmbgreaves commented Apr 22, 2021

This PR addresses:

  • Fixes for building on Ubuntu Focal and Ubuntu Groovy
  • CI migration to switch on testing on Focal and Groovy
  • Test fixes on Bionic as a result of CI migration
  • Test fixes for passing on Focal and Groovy

The substantial part of the PR is ready for review, with some (hopefully) minor test tweaks outstanding.

I'd welcome any thoughts and reviews on the CI migration from any/all developers - there is an ongoing push to try and move as many of the Fluidity codes as possible to Actions from Jenkins due to ageing Jenkins hardware and no clear way forward for the service. My goal is to try and get solid alternatives to Jenkins for all Fluidity codebases in place and switched to over the next month or two.

stephankramer and others added 21 commits March 29, 2021 17:09
See https://gcc.gnu.org/gcc-10/porting_to.html
This comes up in a few calls to petsc in particular in combination with
PETSC_NULL_XXX
In the case of MatCreateAIJ/MatCreateBAIJ it turns out we shouldn't have
been using PETSC_NULL_INTEGER in the first place. See https://lists.mcs.anl.gov/pipermail/petsc-users/2018-May/035319.html

Partly fixes #308
Next up is h5hut which doesn't build under gcc10 currently.
Stricter interface checking in gfortran10 is not going to work with
netcdf interface thus we need to disable that check. For now only in
ocean_forcing/ with gfortran >=10

Adds new AX_COMPARE_VERSION macro to compare versions more easily and
introduces new @FCFLAGS_ALLOW_INTERFACE_MISMATCH@  that get substituted
to -fallow-interface-mismatch with gfortran>=10 and is empty otherwise
so we can selectively apply this flag in Makefiles.
Not sure why this helps - is it the "external mpi_bcast" ?
This commit adds a workflow to build and run unit, shor, and medium
tests on Ubuntu LTS (bionic and focal) and latest release (groovy)
via GitHub Actions, along with a supporting Dockerfile for the
container in which the build and tests run.

The base image Dockerfiles are also overhauled to speed up CI run
times.

All three Ubuntu runs are condensed into one workflow as the JUnit
publishing stage publishes directly to the repo and Actions only
shows the published reports on one workflow.

Note: testing is currently restricted to the branch this commit
initially went to (gcc10-fixes) and this will need to be updated
before merge, if this commit is included in a merge.
Fixing paths to artifacts, both in upload and reading.
Setting FCFLAGS for the build to pick up zoltan.mod; fairly sure
this shouldn't be necessary and there is almost certainly a more
elegant configure fix, but this should at least progress the build.

Drop when better fix is present..
Core oversubscription should be switched on for tests through
testharness (see issue #182) but doesn't appear to be working,
so explicitly turning it on in the base images.
At present Fluidity doesn't support GMSH > v.2
I think the issue is that random_symmetric matrix produces matrices that
can be arbitrarily badly conditioned. Instead use random_posdef_matrix
that has a lower bound on the smallest eigenvalue.
This example runs in ~3h which is an order of magnitude longer
than the otherwise longest medium test - moving to long.
The medium test suite has grown to be extremely large, to the point
that it doesn't fit in hosted CI runs any more. A significant amount
of this appears to be down to a large number of short-running tests
which by their runtime (sub-300s) should be in short tests. This
commit reassigns them to short testing, which increases the short
testing total runtime but critically gets the medium testing total
runtime back comfortably under 6h (~4.5h).
With the test suites rebalanced so medium testing fits into the
6h time limit for github-hosted runners, Actions no longer needs
to use self-hosted runners.
Now self-hosted runners aren't used, github runners always have a clean
environment so the clean-up step isn't needed any more.
Chris has changed jobs and isn't working on Fluidity any more,
so changing test ownership to Rhodri for checking failures shown
up by Actions testing on this branch.
Chris has changed jobs and is no longer working on Fluidity, so this
example is moved to Rhodri's ownership.
This brings Actions to the point of running all the tests that run
on Jenkins.
With this branch moving to PR, switch over test triggers to master
as the PR will pick up on that.
Copy link
Contributor

@angus-g angus-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks pretty good to me! The Actions workflow is slick and seems to pick up failing tests just fine. I guess there would still be some tweaks needed to actually get the tests passing: things like the particle_entrainment test being (seemingly?) wildly off... It doesn't look like there were any substantial code changes, but I'm not really qualified to speculate beyond that!

@tmbgreaves
Copy link
Contributor Author

Thanks, @angus-g ! Feedback from your point of view as a developer on the Actions side is particularly appreciated. Rhod has the failing tests on his list to look at when time allows, hopefully over the next couple of weeks. I think the failures also exist in master, just not being tested, but it would be good to merge at a state where master tested with a full suite of passes.

@tmbgreaves tmbgreaves changed the title Updates for Focal/Groovy including gcc10 and updated CI Updates for Focal (20.04) including gcc10 and updated CI May 3, 2021
@tmbgreaves tmbgreaves changed the title Updates for Focal (20.04) including gcc10 and updated CI Updates for Focal (Ubuntu 20.04) including gcc10 and updated CI May 3, 2021
@tmbgreaves
Copy link
Contributor Author

Testing of vlong tests initiated at : https://github.com/FluidityProject/longtests/actions

This will almost certainly be buggy at first. It's running on Jenkins hardware.

@tmbgreaves
Copy link
Contributor Author

@drhodrid - Almost there with your tests, thank you! Last one, hopefully a tolerance issue, in particle_prescribed_vortex_flow running on Focal I'm seeing:

Max cons error 0.010923752666083475

against a test of 'less than 0.01'. Can this be relaxed to pass?

@tmbgreaves
Copy link
Contributor Author

@stephankramer @jrper - wondering if you have any thoughts on what might be going on with the one wierd remaining short test failure of square-convection-parallel-trivial ?

What I'm seeing is that it fairly consistently segfaults when running in the short test suite on github runners but so far has never segfaulted in the same container run on my local system. I've tried setting it up on a separate repo and have had it both segfault and run successfully on github runners, sometimes on sequential runs in the same container. I've not yet managed to catch it segfaulting to get a backtrace out of it.

Does this touch anywhere unusual in the codebase that might be a culprit, or do anything unusual?

I'm out of ideas...

@tmbgreaves
Copy link
Contributor Author

tmbgreaves commented May 5, 2021

Question moved to issue #316

2 similar comments
@tmbgreaves
Copy link
Contributor Author

tmbgreaves commented May 5, 2021

Question moved to issue #316

@tmbgreaves
Copy link
Contributor Author

tmbgreaves commented May 5, 2021

Question moved to issue #316

@drhodrid
Copy link
Contributor

drhodrid commented May 5, 2021 via email

@stephankramer
Copy link
Contributor

@tmbgreaves : re. running the longtest as well. Do I understand correctly all these are now separate jobs that in principle could run in paralllel? If so, yes sounds great - and it looks good from the interface. I guess it depends a bit whether this is going to affect the overall completion time before reaching the green light...

I'll take a look at square-convection-parallel-trivial to see if can reproduce...

Adding sudo access for the Fluidity user in actions
containers to aid debugging.
@tmbgreaves
Copy link
Contributor Author

@stephankramer - yes, that's exactly what I'm aiming for in terms of them running in parallel. At the moment I'm seeing a bit of slowdown on the run as a whole - perhaps adding ~2.5h to the already ~5h run, which may extend a little as the last few longtests/examples are fixed and added back in. The limit here is the 20 github runners we get at a time. My feeling is that it's worth it for the hopefully significantly better coverage we get by regularly running most of the longtests against PRs.

I think the time-to-green-light is probably not too significant on the timescale of getting code reviews etc.

There is also the 'vlong' suite of tests which I didn't manage to fit onto github runners, likely going to be on the order of 20 from examples and longtests, and which I'll aim to run via the longtests repo on self-hosted hardware in a similar weekly schedule to the way we used to run all longtests, but now in a much more controlled environment.

I'd suggest that if we can get to the point that square-convection-parallel-trivial is passing we move in the direction of final reviews and (hopefully) merging this PR with whatever tests are passing, given that should then be overall green, and we can then drip-feed in the remaining longtests as and when they're fixed.

@stephankramer
Copy link
Contributor

All sounds good re. longtests - we're not in a situation where we have many active PRs in parallel - so that still sounds manageable

Re. the square-convection-parallel-trivial test case: I do have a backtrace now. It's segfaulting in the packing of detectors in parallel, line 455 of Detector_Tools.F90. Unfortunately it seems a bit heisenbuggy - if I rebuild with debugging it doesn't hit the segfault - and some of the variables I'd like to inspect are <optimized-out>. @drhodrid - in case it's relevant to failures you're looking at...

Slightly off-topic @tmbgreaves - but I was enjoying the flawless reproducability by running the relevant docker image. It seems the docker image is very bare bone (which is good!) - but it would be nice if the fluidity user could have sudo access so I can install stuff in a debugging session. At the moment it doesn't seem to have any editors, not even vi!

@tmbgreaves
Copy link
Contributor Author

Thank you very much for the debugging on square-convection-parallel-trivial, @stephankramer . I don't think there's much I can do, but if there is, do shout.

I was waiting for someone to request more useful features in the container, @stephankramer , and being lazy implementing until someone did ;-) I'll add in sudo now - that's a very good idea.

@tmbgreaves
Copy link
Contributor Author

Sudo available in containers from 48e1d2a (ie, next build of this PR).

@tmbgreaves
Copy link
Contributor Author

tmbgreaves commented May 6, 2021

It's spurious pending fixes to square-convection-parallel-trivial but Actions has now completed a fully-passing run which took just under 7.5h, confirming the <3h additional time for the longtest suite running thus far.

Thoughts on going ahead with an Actions/Focal merge now given square-convection-parallel-trivial isn't a bug introduced by this merge, vs. holding on getting a fix there if it looks like a tractable problem to solve?

@stephankramer
Copy link
Contributor

Yes, happy if you want to go ahead. Progress on square-convection-parallel-trivial in separate issue #317 I think it's a real issue, but likely already in master...

@tmbgreaves tmbgreaves merged commit 4ef793b into master May 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bugfixing gcc10
7 participants