New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
D3.8: Continuous integration platform for multi-platform build/test. #67
Comments
cc @embray |
Per my comment here I think we should look into drafting a brief proposal to some cloud hosting companies (I mentioned Google, Digital Ocean, and Microsoft, though any others could be valuable as well) to see if they would be willing to donate computing resources for Sage builds. @saraedum has done some fantastic work potentially improving Sage's continuous integration processes, with support for saving intermediate build results as Docker images (in a process that I believe could be valuable to other projects as well). The biggest hurdle now is just running up against the resource limitations of free-tier cloud computing/CI platforms. This is particularly a problem for building Sage from scratch and running the full test suite--intermediate builds typically fit within the resource limitations of some providers' free tiers. However, we don't need much more than what is already provided. ODK could probably pay for such a thing short term, but that funding is only available for the next ~1 year. It would be nice if we could convince one or more of these companies to make a small donation for open source mathematics software. |
For the record: I got in touch with Robert Bradshaw who gave me some suggestions for approaching google. |
I'm not really sure what to do. @saraedum has achieved a lot in this area, and has several months worth of testing to report on. But it's only testing--none of this has been used in production, and I don't think it would be a very interesting report if we can't report gains in real development. I believe we will be able to make such a report, but it will take time. Also, the specific work Julian does is, as you say, not really equivalent to what this deliverable is about. But the original deliverable was also a bit vague... |
I would suggest that any such report be very brief, because the details are very technical and uninteresting to reviewers. The biggest achievement is in making docker containers of intermediate build results so that we can more efficiently do incremental continuous integration of Sage; and then the pushing to binder which makes it easier to test new changes to sage without having to build them one's self. |
I will be ready to add a section with the overview of the current state of continuous integration in GAP. In addition to the private Jenkins CI instance that we use for wrapping and testing release candidates, checking for GAP package updates and testing new versions of GAP packages, we now use Travis CI to run a number of tests for the main GAP repositories at https://travis-ci.org/gap-system/. These include not only CI test for the main GAP development repository, but also package integration tests that use a set of Docker containers build in various settings (https://hub.docker.com/r/gapsystem/). We have a standard CI setup for GAP packages, which package authors may adopt and customise. GAP packages using Travis CI in the gap-packages VO can be seen at https://travis-ci.org/gap-packages/ (some more at https://gap-packages.github.io/). Also, GAP and packages use CodeCov to measure code coverage: see https://codecov.io/gh/gap-system/gap for GAP and https://codecov.io/gh/gap-packages/ for GAP packages. |
I think we have quite some material in terms of reporting on best practices for CI, even if some things are still experimental. I agree with Erik, though: let's keep it short! We need a good will to start working on the report. I don't feel knowledgeable enough for this (and have other deadlines). @saraedum, would you agree starting the write-up? |
The mostly unaddressed multi-platform question remains my biggest concern for the state of this deliverable. |
I remember that. But it seems that Julian's help in drafting the report would be very valuable here.
Let's not strive to stick to a few paragraphs that were written 4 year ago by people (me) who didn't at have a clear view of the matter at the time. We ended up not working toward a "common infrastructure" because we realized that wouldn't be useful. We just have to say that, and report on the things we've done.
Yup, we missed that goal. I can't think of any possible excuse other than "it's difficult". However maybe we can argue how other factors are improving the portability of our software. |
Another bit of text from the proposal is:
I have discussed this before--I thought on this issue--but apparently not. There are a few confusing/inaccurate aspects of the wording here. It reads "SAGE's buildbot" as though Buildbot were a Sage-specific tool, which it is not. Presumably this is referring to Sage's use of the generic tool [Buildbot https://buildbot.net/] for continuous integration, which also includes Sage's fleet of donated machines for running buildbot builds on. Like other self-hosted CI software, Buildbot consists of a single "master"--a centralized server which schedules units of work, and also provides a web server for viewing results on: http://build.sagemath.org, and then one or more machines that serve as "workers", to which the master distributes units of work. Any number of workers can be registered to a single master to accept work from that master. Point being, "Sage's buildbot" consists of two major components: the server which runs the Buildbot "master", and a whole fleet of donated systems, across multiple platforms, which perform builds doled out to them by the master. This is not, as the proposal states "x86-64/Linux specific". Sage's fleet of donated workers includes multiple Linux distributions, mostly 64-bit but some 32-bit, as well as one OSX (10.11) machine owned by @vbraun. There are no Windows workers yet, but @slel has been offering to help with that, and supposedly there is ODK money to pay for one. One thing I have concluded is that it would be best to have dedicated Windows hardware. Getting back to the point: Perhaps the original idea here was that other projects might wish to leverage Sage's Buildbot worker fleet. That makes sense in principle, I think, though in that case those projects would need to administer their own Buildbot "master" (it does not make a lot of sense to share a master for multiple, diverse projects), and then request access to specific worker machines that they might like to use. However--and @vbraun can chime in here--Sage itself taxes the workers quite a bit much of the time, and would probably just slow down builds for other projects. Most of these are not machines that can sustain being builders for any arbitrary number of projects. Some of them are quite beefy though. This would also require those projects to use Buildbot which they might not prefer to. As a slight alternative, the same machines can also be used as workers for other CI systems--for example there's no reason a single machine can't be both a Buildbot worker and a GitLab CI worker. I think the most useful thing ODK could in theory supply here is funding for some really big OSX and Windows machines that could be shared by other projects. My experience has been that getting access to powerful OSX and Windows machines is the biggest hurdle to testing on those platforms. If the ODK-affiliated projects would like to get together and agree on some system specs, and where to host them, and if there is enough budget to be found, then we could order such systems and make them available to use for multiple open source projects to share. I believe that something like that would be the most useful outcome of this, but I don't know how feasible that is. |
See my above comments, however. It's a bit ranty so the TL;DR is that we could achieve exactly this if we had powerful dedicated OSX and Windows machines that could be shared by multiple projects, so that we don't have to keep facing the question of: "Ok, now how do I get a Windows/OSX machine to build this on?" I feel like that is a question that is faced repeatedly by many projects, and usually comes down to maybe someone has some hardware (and a license!) they can donate, maybe not. Who knows. |
I know; I agree. We've talked about that. I'm just trying to contextualize things and figure out how to define the scope of "what we have done" as it relates to this deliverable. |
Somewhat like @embray already pointed out, we could say that we have a docker image that is now integrated into Sage's development workflow (almost at least,) gets updated automatically with every beta release of Sage... This gives projects that rely on math software the possibility to use it as a basis for their CI needs if they rely on something that is a part of the SageMath distribution. For most projects I don't see the point of using the buildbot (but I have almost never used it myself so I might be missing something here.) It's usually much more convenient to use one of the free CIs and that's where the docker image or sage on conda-forge (and its dependencies) are a requirement. |
I met with @slel yesterday and we discussed the next steps of getting CI for Linux, OSX & Windows. I could imagine that we could say something about "common infrastructure" by saying that we are building up a GitLab CI runner system that other projects could use. We have several ideas on how to get lots of Linux runners but that's not so important here as the free runners provided by say GitLab CI are sufficient for most projects (except for Sage which has to be able to build its SageMath distribution from scratch.) As I recently got Windows 10 to be reasonably fast on QEMU/KVM, we are trying to see if we can use his ODK funds to build a big Linux machine that can spawn Windows GitLab runners through libvirt. For OSX the situation is more complicated as you are only allowed to run two instances of OSX on one piece of Apple hardware, and I really want to go the QEMU/KVM path. So my plan would be to try to provide an array of Mac Minis that run the same software stack as the Windows machine. I just bought a Mac Mini for 150€ off ebay and I'll see whether the stack that works nicely on my laptop also works there (I don't see why it shouldn't but I have to try still…). |
We could have budgeted this in the proposal. It's a shame we didn't. There may still be money left in Versailles, and I'd be happy to use it for this purpose, but, with no engineers in the CS department, it's probably going to be complicated to host the machine. However @jpflori, who helped draft this deliverable, just moved to a company that's been mentioned in this thread. Maybe he has some good adivce 😉 |
Btw., I am in Paris until at least Friday if you guys want to meet up in person. |
That'd be a nightmare to host at a university!
Unfortunately I will only be back in 2 weeks. |
My first impression is that you are talking about buying infrastructure here. Reimbursement for infrastructure costs is very tricky in EU projects and in any case must never be done around the end of the project. |
Okay, so something along the lines of:
|
Then perhaps we don't do that right now, but do explain why it would be something that would be very useful to have some means of funding for :) |
I know. @slel is going to talk to his admins whether they'd be cool about him having a small stack of these in his office. And they do stack nicely :P |
@alex-konovalov Thanks; that's interesting. Perhaps there is some more future opportunity to cross-polinate on this. E.g. even if you prefer to continue using Jenkins (which is fine by me, I've used it in the past) it would probably be easier for you if the Jenkins master were hosted somewhere beyond your local firewall such that other GAP developers can view your results. There used to be a service called Shining Panda that was offering free Jenkins hosting for open source projects--they provided some degree of build running on their own machines, but also allowed you to submit build result from your own build workers. Shining Panda stopped doing that since though couldn't find a way to make it profitable. But maybe if there were a home for hosting the Jenkins build master that were more easily accessible that would be useful (e.g. just as Sage has build.sagemath.org). |
@saraedum It occurred to us that if it would help save time, for explaining your work on Sage you could practically take the ticket description from https://trac.sagemath.org/ticket/24655 and clean it up a bit--especially focusing on the screenshots and the benchmarks. It should be pretty non-technical: Assume the reader doesn't know any specifics about Sage development (except perhaps that it's challenging). I think what we already have on that ticket is not a bad starting point with a bit of cleanup. There should also be a bit more by way of introduction, explaining the motivations (especially what the "patchbot" is and why it currently fails to live up to our needs). |
I pushed some more of the completed portions of this deliverable in 2789704 if anyone wants to have look. There's still work to be done, and I'm waiting for bits from @alex-konovalov and @saraedum : it might be helpful to you guys to look at what I've already written in order to place your reports in context (they still need only be brief). |
@jdemeyer One other question: Do you know why Cysignals isn't using Travis to test on OSX? |
Travis CI doesn't support Python projects on OS X. People do use various workarounds, such as pretending to be a C project and then manually installing Python. Somebody suggested using Conda to test cysignals in a more portable way. That could be made to work, I just need to finish that (see sagemath/cysignals#78) |
FYI, we used to use Travis for OSX checks, but stopped it due to reliability/availability issues: We just spent too much time restarting broken OS X builds; or waiting hours for Travis to finally run the OS X jobs (long after all Linux jobs completed), which caused major delays in our work flows (esp. during GAP Days, were we typically trigger gazillion builds). |
Huh. I actually did not know that OSX is officially spelled "macOS" now. I had to look that up. |
@fingolfin Thank you for that clarification; that's good to know and I will note that in the report. @alex-konovalov Thanks for your additions--I'm looking them over now! I'll ping you if I need anything else but I think for the most part I'll be able to take it from here. |
Really? How strange. I know that didn't used to be the case, so I wonder when that changed. Sure, for pure (and by "pure" I mean not even using low-level interfaces like ctypes and dlopen) Python projects it doesn't make much sense to test on "macOS", but for anything involving compiling C code it does... :/ |
There is a crucial "not" missing here. It should read, the need for a "common infrastructure for CI did not align with our original expectations". Does that clarify it, or does still require additional restructuring? What I am trying to communicate here--and which I want to clarify a little later--is that idea (as this deliverable was originally seemingly envisioned) of sharing Sage's Buildbot with other project did not pan out. But there is still a need (in my opinion) for shared infrastructure w.r.t. hardware, and especially easy access to non-free OSes, and for various use cases beyond just CI. Thank you otherwise for proofreading; all the rest of your edits look good! |
@nthiery The main text of this report is mostly done, modulo a few minor TODOs. Most of it has already been well-reviewed by others, so no need for deep scrutiny if you don't have time. But I would like it if you could look over the few concluding paragraphs I added, especially the last paragraph about working with EGI. Thanks! |
@nthiery I've made my final updates to this report. It's ready for final review and submission from my end. Give me a ping if there are an last-minute fixes needed. |
@embray thanks a lot! Let's unify style and put your full name or our initials in OpenDreamKit/WP3/D3.8/report.tex Line 7 in a5a3e54
|
Submitted! Thank you so much Erik and Julian for the big step in Sage's continuous integration and for report. I did not get to read it all yet, but am looking forward to it. It would make a very nice blog post to disseminate the lessons learned. Thanks Alex for the additional insight you provided from GAP's perspective! One small last step (does not need to be tonight): please edit the first entry of this issue to look like, e.g., that of #64, including in particular a copy of the introduction. |
@alex-konovalov, @embray |
Submitted for real, after spelling all first names in full. |
Let's unify style and put your full name or our initials in
Ah shoot, I missed your comment while submitting. Oh well, please
proceed and fix that; it won't be in the pdf on EU's portal, but we
can still have it in our repo.
|
Puzzled with the last comment - was email delayed on the way ;-) ? |
Exactly ... I moved from one spot to the other, got some weird network configuration issue, and the e-mail got stuck on my machine for some time ... |
No big deal, but for what it's worth: b034d46 |
Done. |
In this report we look at what some OpenDreamKit-affiliated projects have
achieved in the areas of continuous integration and multi-platform building and
testing.
Continuous integration (CI) in software development is a process whereby work
performed by one or more developers on a software project is regularly merged
together into a single, central software repository (referred to as the
'mainline'), and the software built and tested with success or failure of the
build reported quickly back to the developers of the project. This helps to
ensure that individual developers' changes do not conflict with each other or
otherwise "break the build", and provides rapid feedback when breaking changes
are introduced into the mainline. Both the process, and the associated tools
(e.g. automated continuous integration servers) are an essential part of the
day-to-day work of developers on those projects that use it.
Modern CI requires server infrastructure. At the very least one server is needed
to both perform software builds and serve (usually through a web-based UI) reports
back to developers so that they are kept regularly up-to-date on the
"health" of the build. For some projects -- especially those that support
multiple software platforms -- continuous integration infrastructure can involve
a whole fleet of hardware systems, each of which perform builds and tests of
the software and report results back to a central server which collates them
into a single multi-platform build report for developers to examine.
Unsurprisingly, as the CI needs of a project grow, so too does the size of its
CI infrastructure, and the time, financial resources, and expertise required to
maintain it.
The Sage project, being quite large both in terms of number of contributors
and in terms of overall code base (and by extension the length of time required
to build the software and run its test suite) has non-trivial CI needs, and to
address this it has, over time, amassed a small multi-platform fleet of build
machines as part of its "buildbot"" infrastructure (based on the
Buildbot CI software framework), as well
as expertise needed to maintain that infrastructure. One of the original aims
of this deliverable was to see if other projects under the OpenDreamKit umbrella
could benefit from using Sage's buildbots, and thus achieve better
multi-platform CI. Additionally, we would look into widening the set of
platforms supported by Sage's buildbot infrastructure -- in particular adding
Windows builds to coincide with Sage's newfound Windows support (see
D3.7).
In practice, the needs of the OpenDreamKit community as a whole with respect to
multi-platform CI, and in particular the need for a ``common infrastructure''
for CI, did not align with our original expectations, for reasons that are
enumerated in the following sections. Nevertheless, significant achievements
were made by OpenDreamKit projects in the area of CI, and there are lessons
learned that we are communicating through this report with plans for future
cross-pollination on the subject. Our experiences have also taught us that
although there is not a one-size-fits-all solution to CI, there remains a clear
need in the community for easier access to multi-platform build and development
infrastructure, especially for non-free operating systems such as Windows and
macOS.
The text was updated successfully, but these errors were encountered: