Updates for Focal (Ubuntu 20.04) including gcc10 and updated CI #311

tmbgreaves · 2021-04-22T07:46:38Z

This PR addresses:

Fixes for building on Ubuntu Focal and Ubuntu Groovy
CI migration to switch on testing on Focal and Groovy
Test fixes on Bionic as a result of CI migration
Test fixes for passing on Focal and Groovy

The substantial part of the PR is ready for review, with some (hopefully) minor test tweaks outstanding.

I'd welcome any thoughts and reviews on the CI migration from any/all developers - there is an ongoing push to try and move as many of the Fluidity codes as possible to Actions from Jenkins due to ageing Jenkins hardware and no clear way forward for the service. My goal is to try and get solid alternatives to Jenkins for all Fluidity codebases in place and switched to over the next month or two.

See https://gcc.gnu.org/gcc-10/porting_to.html This comes up in a few calls to petsc in particular in combination with PETSC_NULL_XXX In the case of MatCreateAIJ/MatCreateBAIJ it turns out we shouldn't have been using PETSC_NULL_INTEGER in the first place. See https://lists.mcs.anl.gov/pipermail/petsc-users/2018-May/035319.html Partly fixes #308 Next up is h5hut which doesn't build under gcc10 currently.

Stricter interface checking in gfortran10 is not going to work with netcdf interface thus we need to disable that check. For now only in ocean_forcing/ with gfortran >=10 Adds new AX_COMPARE_VERSION macro to compare versions more easily and introduces new @FCFLAGS_ALLOW_INTERFACE_MISMATCH@ that get substituted to -fallow-interface-mismatch with gfortran>=10 and is empty otherwise so we can selectively apply this flag in Makefiles.

Not sure why this helps - is it the "external mpi_bcast" ?

This commit adds a workflow to build and run unit, shor, and medium tests on Ubuntu LTS (bionic and focal) and latest release (groovy) via GitHub Actions, along with a supporting Dockerfile for the container in which the build and tests run. The base image Dockerfiles are also overhauled to speed up CI run times. All three Ubuntu runs are condensed into one workflow as the JUnit publishing stage publishes directly to the repo and Actions only shows the published reports on one workflow. Note: testing is currently restricted to the branch this commit initially went to (gcc10-fixes) and this will need to be updated before merge, if this commit is included in a merge. Fixing paths to artifacts, both in upload and reading.

Setting FCFLAGS for the build to pick up zoltan.mod; fairly sure this shouldn't be necessary and there is almost certainly a more elegant configure fix, but this should at least progress the build. Drop when better fix is present..

Core oversubscription should be switched on for tests through testharness (see issue #182) but doesn't appear to be working, so explicitly turning it on in the base images.

At present Fluidity doesn't support GMSH > v.2

I think the issue is that random_symmetric matrix produces matrices that can be arbitrarily badly conditioned. Instead use random_posdef_matrix that has a lower bound on the smallest eigenvalue.

This example runs in ~3h which is an order of magnitude longer than the otherwise longest medium test - moving to long.

The medium test suite has grown to be extremely large, to the point that it doesn't fit in hosted CI runs any more. A significant amount of this appears to be down to a large number of short-running tests which by their runtime (sub-300s) should be in short tests. This commit reassigns them to short testing, which increases the short testing total runtime but critically gets the medium testing total runtime back comfortably under 6h (~4.5h).

With the test suites rebalanced so medium testing fits into the 6h time limit for github-hosted runners, Actions no longer needs to use self-hosted runners.

…o gcc10-fixes

Now self-hosted runners aren't used, github runners always have a clean environment so the clean-up step isn't needed any more.

Chris has changed jobs and isn't working on Fluidity any more, so changing test ownership to Rhodri for checking failures shown up by Actions testing on this branch.

Chris has changed jobs and is no longer working on Fluidity, so this example is moved to Rhodri's ownership.

This brings Actions to the point of running all the tests that run on Jenkins.

With this branch moving to PR, switch over test triggers to master as the PR will pick up on that.

angus-g

This all looks pretty good to me! The Actions workflow is slick and seems to pick up failing tests just fine. I guess there would still be some tweaks needed to actually get the tests passing: things like the particle_entrainment test being (seemingly?) wildly off... It doesn't look like there were any substantial code changes, but I'm not really qualified to speculate beyond that!

tmbgreaves · 2021-04-23T07:51:52Z

Thanks, @angus-g ! Feedback from your point of view as a developer on the Actions side is particularly appreciated. Rhod has the failing tests on his list to look at when time allows, hopefully over the next couple of weeks. I think the failures also exist in master, just not being tested, but it would be good to merge at a state where master tested with a full suite of passes.

tmbgreaves · 2021-05-04T11:25:35Z

Testing of vlong tests initiated at : https://github.com/FluidityProject/longtests/actions

This will almost certainly be buggy at first. It's running on Jenkins hardware.

tmbgreaves · 2021-05-04T22:22:10Z

@drhodrid - Almost there with your tests, thank you! Last one, hopefully a tolerance issue, in particle_prescribed_vortex_flow running on Focal I'm seeing:

Max cons error 0.010923752666083475

against a test of 'less than 0.01'. Can this be relaxed to pass?

tmbgreaves · 2021-05-04T22:34:29Z

@stephankramer @jrper - wondering if you have any thoughts on what might be going on with the one wierd remaining short test failure of square-convection-parallel-trivial ?

What I'm seeing is that it fairly consistently segfaults when running in the short test suite on github runners but so far has never segfaulted in the same container run on my local system. I've tried setting it up on a separate repo and have had it both segfault and run successfully on github runners, sometimes on sequential runs in the same container. I've not yet managed to catch it segfaulting to get a backtrace out of it.

Does this touch anywhere unusual in the codebase that might be a culprit, or do anything unusual?

I'm out of ideas...

tmbgreaves · 2021-05-05T11:50:29Z

Question moved to issue #316

tmbgreaves · 2021-05-05T12:01:03Z

Question moved to issue #316

tmbgreaves · 2021-05-05T12:30:32Z

Question moved to issue #316

drhodrid · 2021-05-05T12:50:24Z

Hi Tim, Yes - I think you can safely relax this one. R On 5 May 2021, at 8:22 am, Tim Greaves ***@***.******@***.***>> wrote: @drhodrid<https://github.com/drhodrid> - Almost there with your tests, thank you! Last one, hopefully a tolerance issue, in particle_prescribed_vortex_flow running on Focal I'm seeing: Max cons error 0.010923752666083475 against a test of 'less than 0.01'. Can this be relaxed to pass? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#311 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AB25UKVLGZH3RYOCKDG4B3DTMBXSFANCNFSM43L4RWTQ>.

stephankramer · 2021-05-05T18:07:54Z

@tmbgreaves : re. running the longtest as well. Do I understand correctly all these are now separate jobs that in principle could run in paralllel? If so, yes sounds great - and it looks good from the interface. I guess it depends a bit whether this is going to affect the overall completion time before reaching the green light...

I'll take a look at square-convection-parallel-trivial to see if can reproduce...

Adding sudo access for the Fluidity user in actions containers to aid debugging.

tmbgreaves · 2021-05-05T19:00:52Z

@stephankramer - yes, that's exactly what I'm aiming for in terms of them running in parallel. At the moment I'm seeing a bit of slowdown on the run as a whole - perhaps adding ~2.5h to the already ~5h run, which may extend a little as the last few longtests/examples are fixed and added back in. The limit here is the 20 github runners we get at a time. My feeling is that it's worth it for the hopefully significantly better coverage we get by regularly running most of the longtests against PRs.

I think the time-to-green-light is probably not too significant on the timescale of getting code reviews etc.

There is also the 'vlong' suite of tests which I didn't manage to fit onto github runners, likely going to be on the order of 20 from examples and longtests, and which I'll aim to run via the longtests repo on self-hosted hardware in a similar weekly schedule to the way we used to run all longtests, but now in a much more controlled environment.

I'd suggest that if we can get to the point that square-convection-parallel-trivial is passing we move in the direction of final reviews and (hopefully) merging this PR with whatever tests are passing, given that should then be overall green, and we can then drip-feed in the remaining longtests as and when they're fixed.

stephankramer · 2021-05-05T19:50:18Z

All sounds good re. longtests - we're not in a situation where we have many active PRs in parallel - so that still sounds manageable

Re. the square-convection-parallel-trivial test case: I do have a backtrace now. It's segfaulting in the packing of detectors in parallel, line 455 of Detector_Tools.F90. Unfortunately it seems a bit heisenbuggy - if I rebuild with debugging it doesn't hit the segfault - and some of the variables I'd like to inspect are <optimized-out>. @drhodrid - in case it's relevant to failures you're looking at...

Slightly off-topic @tmbgreaves - but I was enjoying the flawless reproducability by running the relevant docker image. It seems the docker image is very bare bone (which is good!) - but it would be nice if the fluidity user could have sudo access so I can install stuff in a debugging session. At the moment it doesn't seem to have any editors, not even vi!

tmbgreaves · 2021-05-05T20:11:09Z

Thank you very much for the debugging on square-convection-parallel-trivial, @stephankramer . I don't think there's much I can do, but if there is, do shout.

I was waiting for someone to request more useful features in the container, @stephankramer , and being lazy implementing until someone did ;-) I'll add in sudo now - that's a very good idea.

tmbgreaves · 2021-05-05T20:36:55Z

Sudo available in containers from 48e1d2a (ie, next build of this PR).

tmbgreaves · 2021-05-06T08:06:52Z

It's spurious pending fixes to square-convection-parallel-trivial but Actions has now completed a fully-passing run which took just under 7.5h, confirming the <3h additional time for the longtest suite running thus far.

Thoughts on going ahead with an Actions/Focal merge now given square-convection-parallel-trivial isn't a bug introduced by this merge, vs. holding on getting a fix there if it looks like a tractable problem to solve?

stephankramer · 2021-05-06T17:15:14Z

Yes, happy if you want to go ahead. Progress on square-convection-parallel-trivial in separate issue #317 I think it's a real issue, but likely already in master...

stephankramer and others added 21 commits March 29, 2021 17:09

Patch h5hut for gcc10

5c0c1ce

Use mpi_interfaces to avoid gfortran10 argument mismatch failure

fcbafe9

Not sure why this helps - is it the "external mpi_bcast" ?

DROP WHEN FIXED: explicitly set FCFLAGS

3af336e

Setting FCFLAGS for the build to pick up zoltan.mod; fairly sure this shouldn't be necessary and there is almost certainly a more elegant configure fix, but this should at least progress the build. Drop when better fix is present..

DOCKER: Explicitly allow core oversubscription

b44dc99

Core oversubscription should be switched on for tests through testharness (see issue #182) but doesn't appear to be working, so explicitly turning it on in the base images.

DOCKER: Add local install of GMSH v2

a6d6bab

At present Fluidity doesn't support GMSH > v.2

Fixes for h5hut when HDF5 >1.12

8b2577a

CI/Actions: Fix ordering of Ubuntu testing

94ffacc

Fix use statement ordering

61e0cdd

Fix this test random failures

b76f1e0

I think the issue is that random_symmetric matrix produces matrices that can be arbitrarily badly conditioned. Instead use random_posdef_matrix that has a lower bound on the smallest eigenvalue.

EXAMPLES: Switching example to long

022919a

This example runs in ~3h which is an order of magnitude longer than the otherwise longest medium test - moving to long.

CI/Actions: Remove self-hosted runner dependency

f8a5877

With the test suites rebalanced so medium testing fits into the 6h time limit for github-hosted runners, Actions no longer needs to use self-hosted runners.

Merge branch 'gcc10-fixes' of github.com:FluidityProject/fluidity int…

b48edeb

…o gcc10-fixes

CI/Actions: Remove unneeded cleanup steps

2ea2f4c

Now self-hosted runners aren't used, github runners always have a clean environment so the clean-up step isn't needed any more.

TESTS: Changing ownership from cmath to drhodrid

c820767

Chris has changed jobs and isn't working on Fluidity any more, so changing test ownership to Rhodri for checking failures shown up by Actions testing on this branch.

EXAMPLES: Changing ownership from cmath to drhodrid

83a6cdd

Chris has changed jobs and is no longer working on Fluidity, so this example is moved to Rhodri's ownership.

CI/Actions: Adding make makefiles and manual

ac194df

This brings Actions to the point of running all the tests that run on Jenkins.

Makefiles: Updating makefile dependencies

7101d10

tmbgreaves added bug enhancement labels Apr 22, 2021

tmbgreaves requested review from jrper and angus-g April 22, 2021 07:46

tmbgreaves self-assigned this Apr 22, 2021

CI/Actions: Switch actions target to master

0fb06e0

With this branch moving to PR, switch over test triggers to master as the PR will pick up on that.

angus-g reviewed Apr 23, 2021

View reviewed changes

Merge remote-tracking branch 'origin/master' into gcc10-fixes

66875b3

tmbgreaves changed the title ~~Updates for Focal/Groovy including gcc10 and updated CI~~ Updates for Focal (20.04) including gcc10 and updated CI May 3, 2021

tmbgreaves changed the title ~~Updates for Focal (20.04) including gcc10 and updated CI~~ Updates for Focal (Ubuntu 20.04) including gcc10 and updated CI May 3, 2021

tmbgreaves added 2 commits May 4, 2021 09:00

CI/Actions: Bugfixing longtesting command line

84a240e

CI/Actions: Adding more passing tests

c4f6d83

tmbgreaves added 6 commits May 4, 2021 13:58

Merge remote-tracking branch 'origin/master' into gcc10-fixes

c68de70

EXAMPLES: More fixing of deprecated zip.sort() in BFS2D

452f693

CI/Actions: Fix typo in test name

6b155a7

EXAMPLES: Fixing python3 syntax in driven_cavity

e779592

EXAMPLES: Another attempt at fixing lock_exchange

a048c5d

EXAMPLES: More python3 fixes in BFS2D

a2ea4ce

tmbgreaves added 3 commits May 5, 2021 13:57

TESTS: Relaxing tolerance slightly in particle_prescribed_vortex_flow

36dc92d

CI/Actions: Reinstating more passing longtests

8c84823

EXAMPLES: Python3 zip fixes in BFS3D example

7b84b6f

CI/Actions: Adding sudo in actions containers

48e1d2a

Adding sudo access for the Fluidity user in actions containers to aid debugging.

tmbgreaves mentioned this pull request May 5, 2021

Currently failing longtests and examples #316

Open

tmbgreaves merged commit 4ef793b into master May 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates for Focal (Ubuntu 20.04) including gcc10 and updated CI #311

Updates for Focal (Ubuntu 20.04) including gcc10 and updated CI #311

tmbgreaves commented Apr 22, 2021 •

edited

Loading

angus-g left a comment

tmbgreaves commented Apr 23, 2021

tmbgreaves commented May 4, 2021

tmbgreaves commented May 4, 2021

tmbgreaves commented May 4, 2021

tmbgreaves commented May 5, 2021 •

edited

Loading

tmbgreaves commented May 5, 2021 •

edited

Loading

tmbgreaves commented May 5, 2021 •

edited

Loading

drhodrid commented May 5, 2021 via email

stephankramer commented May 5, 2021

tmbgreaves commented May 5, 2021

stephankramer commented May 5, 2021

tmbgreaves commented May 5, 2021

tmbgreaves commented May 5, 2021

tmbgreaves commented May 6, 2021 •

edited

Loading

stephankramer commented May 6, 2021

Updates for Focal (Ubuntu 20.04) including gcc10 and updated CI #311

Updates for Focal (Ubuntu 20.04) including gcc10 and updated CI #311

Conversation

tmbgreaves commented Apr 22, 2021 • edited Loading

angus-g left a comment

Choose a reason for hiding this comment

tmbgreaves commented Apr 23, 2021

tmbgreaves commented May 4, 2021

tmbgreaves commented May 4, 2021

tmbgreaves commented May 4, 2021

tmbgreaves commented May 5, 2021 • edited Loading

tmbgreaves commented May 5, 2021 • edited Loading

tmbgreaves commented May 5, 2021 • edited Loading

drhodrid commented May 5, 2021 via email

stephankramer commented May 5, 2021

tmbgreaves commented May 5, 2021

stephankramer commented May 5, 2021

tmbgreaves commented May 5, 2021

tmbgreaves commented May 5, 2021

tmbgreaves commented May 6, 2021 • edited Loading

stephankramer commented May 6, 2021

tmbgreaves commented Apr 22, 2021 •

edited

Loading

tmbgreaves commented May 5, 2021 •

edited

Loading

tmbgreaves commented May 5, 2021 •

edited

Loading

tmbgreaves commented May 5, 2021 •

edited

Loading

tmbgreaves commented May 6, 2021 •

edited

Loading