-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates for Focal (Ubuntu 20.04) including gcc10 and updated CI #311
Conversation
See https://gcc.gnu.org/gcc-10/porting_to.html This comes up in a few calls to petsc in particular in combination with PETSC_NULL_XXX In the case of MatCreateAIJ/MatCreateBAIJ it turns out we shouldn't have been using PETSC_NULL_INTEGER in the first place. See https://lists.mcs.anl.gov/pipermail/petsc-users/2018-May/035319.html Partly fixes #308 Next up is h5hut which doesn't build under gcc10 currently.
Stricter interface checking in gfortran10 is not going to work with netcdf interface thus we need to disable that check. For now only in ocean_forcing/ with gfortran >=10 Adds new AX_COMPARE_VERSION macro to compare versions more easily and introduces new @FCFLAGS_ALLOW_INTERFACE_MISMATCH@ that get substituted to -fallow-interface-mismatch with gfortran>=10 and is empty otherwise so we can selectively apply this flag in Makefiles.
Not sure why this helps - is it the "external mpi_bcast" ?
This commit adds a workflow to build and run unit, shor, and medium tests on Ubuntu LTS (bionic and focal) and latest release (groovy) via GitHub Actions, along with a supporting Dockerfile for the container in which the build and tests run. The base image Dockerfiles are also overhauled to speed up CI run times. All three Ubuntu runs are condensed into one workflow as the JUnit publishing stage publishes directly to the repo and Actions only shows the published reports on one workflow. Note: testing is currently restricted to the branch this commit initially went to (gcc10-fixes) and this will need to be updated before merge, if this commit is included in a merge. Fixing paths to artifacts, both in upload and reading.
Setting FCFLAGS for the build to pick up zoltan.mod; fairly sure this shouldn't be necessary and there is almost certainly a more elegant configure fix, but this should at least progress the build. Drop when better fix is present..
Core oversubscription should be switched on for tests through testharness (see issue #182) but doesn't appear to be working, so explicitly turning it on in the base images.
At present Fluidity doesn't support GMSH > v.2
I think the issue is that random_symmetric matrix produces matrices that can be arbitrarily badly conditioned. Instead use random_posdef_matrix that has a lower bound on the smallest eigenvalue.
This example runs in ~3h which is an order of magnitude longer than the otherwise longest medium test - moving to long.
The medium test suite has grown to be extremely large, to the point that it doesn't fit in hosted CI runs any more. A significant amount of this appears to be down to a large number of short-running tests which by their runtime (sub-300s) should be in short tests. This commit reassigns them to short testing, which increases the short testing total runtime but critically gets the medium testing total runtime back comfortably under 6h (~4.5h).
With the test suites rebalanced so medium testing fits into the 6h time limit for github-hosted runners, Actions no longer needs to use self-hosted runners.
Now self-hosted runners aren't used, github runners always have a clean environment so the clean-up step isn't needed any more.
Chris has changed jobs and isn't working on Fluidity any more, so changing test ownership to Rhodri for checking failures shown up by Actions testing on this branch.
Chris has changed jobs and is no longer working on Fluidity, so this example is moved to Rhodri's ownership.
This brings Actions to the point of running all the tests that run on Jenkins.
With this branch moving to PR, switch over test triggers to master as the PR will pick up on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks pretty good to me! The Actions workflow is slick and seems to pick up failing tests just fine. I guess there would still be some tweaks needed to actually get the tests passing: things like the particle_entrainment
test being (seemingly?) wildly off... It doesn't look like there were any substantial code changes, but I'm not really qualified to speculate beyond that!
Thanks, @angus-g ! Feedback from your point of view as a developer on the Actions side is particularly appreciated. Rhod has the failing tests on his list to look at when time allows, hopefully over the next couple of weeks. I think the failures also exist in master, just not being tested, but it would be good to merge at a state where master tested with a full suite of passes. |
Testing of vlong tests initiated at : https://github.com/FluidityProject/longtests/actions This will almost certainly be buggy at first. It's running on Jenkins hardware. |
@drhodrid - Almost there with your tests, thank you! Last one, hopefully a tolerance issue, in particle_prescribed_vortex_flow running on Focal I'm seeing: Max cons error 0.010923752666083475 against a test of 'less than 0.01'. Can this be relaxed to pass? |
@stephankramer @jrper - wondering if you have any thoughts on what might be going on with the one wierd remaining short test failure of square-convection-parallel-trivial ? What I'm seeing is that it fairly consistently segfaults when running in the short test suite on github runners but so far has never segfaulted in the same container run on my local system. I've tried setting it up on a separate repo and have had it both segfault and run successfully on github runners, sometimes on sequential runs in the same container. I've not yet managed to catch it segfaulting to get a backtrace out of it. Does this touch anywhere unusual in the codebase that might be a culprit, or do anything unusual? I'm out of ideas... |
Question moved to issue #316 |
Hi Tim,
Yes - I think you can safely relax this one.
R
On 5 May 2021, at 8:22 am, Tim Greaves ***@***.******@***.***>> wrote:
@drhodrid<https://github.com/drhodrid> - Almost there with your tests, thank you! Last one, hopefully a tolerance issue, in particle_prescribed_vortex_flow running on Focal I'm seeing:
Max cons error 0.010923752666083475
against a test of 'less than 0.01'. Can this be relaxed to pass?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#311 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AB25UKVLGZH3RYOCKDG4B3DTMBXSFANCNFSM43L4RWTQ>.
|
@tmbgreaves : re. running the longtest as well. Do I understand correctly all these are now separate jobs that in principle could run in paralllel? If so, yes sounds great - and it looks good from the interface. I guess it depends a bit whether this is going to affect the overall completion time before reaching the green light... I'll take a look at square-convection-parallel-trivial to see if can reproduce... |
Adding sudo access for the Fluidity user in actions containers to aid debugging.
@stephankramer - yes, that's exactly what I'm aiming for in terms of them running in parallel. At the moment I'm seeing a bit of slowdown on the run as a whole - perhaps adding ~2.5h to the already ~5h run, which may extend a little as the last few longtests/examples are fixed and added back in. The limit here is the 20 github runners we get at a time. My feeling is that it's worth it for the hopefully significantly better coverage we get by regularly running most of the longtests against PRs. I think the time-to-green-light is probably not too significant on the timescale of getting code reviews etc. There is also the 'vlong' suite of tests which I didn't manage to fit onto github runners, likely going to be on the order of 20 from examples and longtests, and which I'll aim to run via the longtests repo on self-hosted hardware in a similar weekly schedule to the way we used to run all longtests, but now in a much more controlled environment. I'd suggest that if we can get to the point that square-convection-parallel-trivial is passing we move in the direction of final reviews and (hopefully) merging this PR with whatever tests are passing, given that should then be overall green, and we can then drip-feed in the remaining longtests as and when they're fixed. |
All sounds good re. longtests - we're not in a situation where we have many active PRs in parallel - so that still sounds manageable Re. the square-convection-parallel-trivial test case: I do have a backtrace now. It's segfaulting in the packing of detectors in parallel, line 455 of Detector_Tools.F90. Unfortunately it seems a bit heisenbuggy - if I rebuild with debugging it doesn't hit the segfault - and some of the variables I'd like to inspect are Slightly off-topic @tmbgreaves - but I was enjoying the flawless reproducability by running the relevant docker image. It seems the docker image is very bare bone (which is good!) - but it would be nice if the fluidity user could have |
Thank you very much for the debugging on square-convection-parallel-trivial, @stephankramer . I don't think there's much I can do, but if there is, do shout. I was waiting for someone to request more useful features in the container, @stephankramer , and being lazy implementing until someone did ;-) I'll add in sudo now - that's a very good idea. |
Sudo available in containers from 48e1d2a (ie, next build of this PR). |
It's spurious pending fixes to square-convection-parallel-trivial but Actions has now completed a fully-passing run which took just under 7.5h, confirming the <3h additional time for the longtest suite running thus far. Thoughts on going ahead with an Actions/Focal merge now given square-convection-parallel-trivial isn't a bug introduced by this merge, vs. holding on getting a fix there if it looks like a tractable problem to solve? |
Yes, happy if you want to go ahead. Progress on square-convection-parallel-trivial in separate issue #317 I think it's a real issue, but likely already in master... |
This PR addresses:
The substantial part of the PR is ready for review, with some (hopefully) minor test tweaks outstanding.
I'd welcome any thoughts and reviews on the CI migration from any/all developers - there is an ongoing push to try and move as many of the Fluidity codes as possible to Actions from Jenkins due to ageing Jenkins hardware and no clear way forward for the service. My goal is to try and get solid alternatives to Jenkins for all Fluidity codebases in place and switched to over the next month or two.