Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider migrating off Travis CI #1770

Closed
webknjaz opened this issue Feb 22, 2019 · 33 comments
Closed

Consider migrating off Travis CI #1770

webknjaz opened this issue Feb 22, 2019 · 33 comments
Labels
test Improvement to quality assurance: CI/CD, testing, building

Comments

@webknjaz
Copy link
Member

webknjaz commented Feb 22, 2019

Today I saw some disturbing news on the Internet:

TL;DR Idera (new Travis' overlord) has unexpectedly fired an enormous amount of engineers. There are rumors/suggestion that Travis CI platform itself may be trashed.

With that in mind, there are several candidates:

  • Azure DevOps (former Pipelines) which AFAIR allow up to 10 simultaneous jobs running and I've got positive feedbacks from various FOSS maintainers included but not limited to tox, virtualenv, cpython etc. Supports GNU/Linux, win and macOS.
  • Circle CI has 4 simultaneous jobs for GNU/Linux (container-based only) and supports non-parallelized macOS (VM) per request.
  • The most popular CI for testing Windows-based projects, recent year they've also added GNU/Linux support. Jobs are moderately throttled.
  • GitLab CI, supports integration with GitHub but only 1 year or free trial for that. Mostly GNU/Linux container based unless you bring your own nodes (which implies a need for additional costs).
  • Shippable, supports mostly GNU/Linux unless you BYON. Hugely throttled if free.
  • GitHub Actions. Now it limited public Beta but we have access. It's container-based with a limited runtime. For example, you cannot spawn other containers from within those. And you probably can't spawn VMs either. But we can offload linters and probably unit tests there.
  • Zuul (as proposed by @pabelanger)
  • smth else I haven't been playing with

  • there's an extra option which is basically using all of them and load-balancing jobs across multiple CIs which helps to overcome throttling a bit
@webknjaz webknjaz added the test Improvement to quality assurance: CI/CD, testing, building label Feb 22, 2019
@decentral1se
Copy link
Contributor

decentral1se commented Feb 22, 2019

"Capitalism Considered Harmful"

Can we just do what Ansible core does and get the same guarantees and workflows? It uses Shippable? Can Red Hat pay for a CI account for this project so we don't get limited? Financial support considered useful :)

EDIT: Thanks for raising this issue! Very important, it seems.

@unlikelyzero
Copy link

There's a pretty good awesome list here: https://github.com/ligurio/awesome-ci

@webknjaz
Copy link
Member Author

@decentral1se Ansible already spends a lot of funds on Shippable CI (and we already create an enormous load for them with this) for Ansible Core Engine and I'm not aware of any funds dedicated to extending it. OTOH I'm not the right person to comment on this. And yet I think it's typical for businesses to pay for things only if they see business need/opportunities in such things.

Additionally, if we had a wider community we could think of deploying http://buildbot.net/. I know it's used by CPython project and AFAIK it's based on various community members providing their machines to be nodes and centralized orchestration system puts jobs on those.

Based on this I'd like to discover freely available resources first.

Another point to this would be that we will probably benefit from distributing jobs across various CIs regardless of Travis CI going down just because we have a lot of heavy jobs in our test suite and there's more coming.

@webknjaz

This comment has been minimized.

@pabelanger
Copy link
Contributor

I would like to see us add Zuul to the list, we have access to zuul resources in the ansible-network org and believe we can discuss opening it up to other teams. In fact we already have awx using it, would be great to have another project.

@pabelanger
Copy link
Contributor

Last week, I had some time to hack on this a little. If you are interested in the results of zuul running molecule tests, you can follow along at:

ansible-network/sandbox#27

@decentral1se
Copy link
Contributor

And yet I think it's typical for businesses to pay for things only if they see business need/opportunities in such things.

The fact that Molecule is now ansible/molecule tells me this is already the case.

@webknjaz
Copy link
Member Author

webknjaz commented Mar 1, 2019

@decentral1se not really, funding process is still separated. Molecule is not being sold as a supported product, at this point, it's mostly community-based.

@gundalow
Copy link
Contributor

gundalow commented Mar 25, 2019

Some good discussion here, I've closed #1874 so we can just have one place to discuss Molecule CI.

I thought a summary of where we are up to would be good:

  • Activity in the Molecule repo is increasing and will continue to = more test runs
  • Zuul is still a few months away from being ready for general use
  • We have a need to make CI less painful sooner
  • If we want to pay for CI we will do that using Shippable [1]
  • I'm happy to go and request some cash to increase Shippable resources
  • Adding some smarts into our test framework so that a docs only PR doesn't trigger full integration tests would also help [2]

[1] It's much cheaper by node to add resources in Shippable than Travis. Also as resources are added at the GitHub Org level (ie github.com/ansible) those paid resources are available to Ansible, ansible-lint and Molecule (the three projects that used paid Shippable) so it benefits multiple projects by giving us a larger common CI resource pool.
[2] This is how ansible-test works in the Ansible repo, and makes a huge difference. Looking at the files change in a PR would allow us to do this.

@ssbarnea
Copy link
Member

@gundalow To me it seems that for short term we should try to piggyback Ansibe Shipably and medium term to have Zuul running.

@gundalow
Copy link
Contributor

@ssbarnea Yup, that's my thinking as well. We can discuss this during Wednesday's meeting.

  • Rewriting .travis.yml as shippable.yml will take some work
  • Independently looking at not running full tests on docs only changes would also help
  • I can work on increasing Shippable node count

@webknjaz
Copy link
Member Author

@gundalow sounds good. I'd like to improve publishing flow tho. By exploiting GitHub Apps maybe (not sure about Actions, yet)

@pabelanger
Copy link
Contributor

Regarding zuul, realistically, we are likely looking at 1st of May for live date for new zuul.ansible.com. I am hoping sooner, but we first need to migrate ansible-network, then awx / ansible-runner.

@gundalow
Copy link
Contributor

OK, update to my comments in #1770 (comment)

Given Zuul is closer than I thought (yay) I'll setup a Travis =-CI subscription just for Molecule to use for the next few months till Zuul is here.

Given out non-trivial travis.yml this avoids the "double porting cost" of Travis -> Shippable -> Molecule

@decentral1se
Copy link
Contributor

Great news then!

@gundalow
Copy link
Contributor

Question: How many concurrent jobs do we want to be able to run?

The Travis plans seem to be

Bootstrap
1 concurrent jobs
$69 per month

Startup
2 concurrent jobs
$129 per month

Small Business
5 concurrent jobs
$249 per month

Premium
10 concurrent jobs
$489 per month

@themr0c
Copy link
Contributor

themr0c commented Mar 27, 2019

Assuming that doubling the number of concurrent jobs would divide by 2 the duration of a build (I know this assumption is borderline).

plan duration max jobs count per day
5 concurrent builds 6 hours 4
10 concurrent builds 3 hours ? 8 ?

4 jobs per day is very close to the current workload, it does not let room for much workload increase.

Considering this, I would vote for an increase to 10 jobs per day.

@gundalow
Copy link
Contributor

We now have a concurrent job count of 10 on Travis-CI. We can monitor this. Feedback welcome via this issue.

@webknjaz
Copy link
Member Author

@gundalow is that for repo or for org?

@webknjaz
Copy link
Member Author

I've cancelled a bunch of things and now it all just looks stuck

@gundalow
Copy link
Contributor

@webknjaz https://www.traviscistatus.com/ shows there have been issues with Linux node not booting 1450-1809UTC, though they are still cleaning up thing.

10 nodes are for the org, though I don't see much other traffic on the other projects

@pabelanger
Copy link
Contributor

ansible-network/sandbox#31 shows an example of how molecule can be tested on https://dashboard.zuul.ansible.com. For the example, I just used the existing tox testing for python36 on a fedora-29 node.

@webknjaz
Copy link
Member Author

@pabelanger I see that uses old Check Status API. How can it be transitioned to Checks API?

@ssbarnea
Copy link
Member

@pabelanger I see that uses old Check Status API. How can it be transitioned to Checks API?

If I remember well @pabelanger said at some pint that porting to the check status api was on its way, the only reason being lack of time.

@webknjaz
Copy link
Member Author

Currently, it's exceptionally hard to navigate to the proper log pages so I'd say that proper reporting is critical here.

@ssbarnea
Copy link
Member

@webknjaz Is happening but I would not say that zuul logs are easier to read, in fact the opposite.

While I faces the issue already in my projects I ended up using pytest and pytest-html as a wrapper for molecule calls in order to get the output easier to read. Look at https://review.rdoproject.org/r/#/c/20240/ - click py27 job and see the last pytest, which is mainly an execution of molecule. As you can imagine this output scales well with multiple executions.

@webknjaz
Copy link
Member Author

@ssbarnea show me plz where I can see that log for molecule tests from this example: #1770 (comment)

I'm having a hard time finding the view for that :(
I don't have a mental model for working with Zuul.
All my attempts to look at it failed.

@pabelanger
Copy link
Contributor

@webknjaz This is just the default log collection done by our tox based jobs, if you point me which specific logs you'd like collected, this can be handled in a post-run playbook.

Also, check status api is a work in progress, when I lasted checked we were waiting for github3.py to add support, and somebody to write the patch.

@webknjaz
Copy link
Member Author

It's not Check Status API. This one is used currently. I'm talking about Checks API.

I just want the output of whatever was run during the job execution and can't find it anywhere.

@pabelanger
Copy link
Contributor

Is this an example of the output you are looking for? https://logs.zuul.ansible.com/31/31/af348c98d67a9f59028159c2a9656049428f1692/check/molecule-tox-py37-ansible27-unit/9379045/job-output.html#l437

Please keep in mind, I have zero experience on how molecule developers are doing testing or what expected output should be seen. This was my 1 hour attempt to show how zuul.ansible.com could run a tox based job. Any sort of log collection, reporting, failing tests all can be fixed, I'm happy to work with people to do that.

It looks like the test runner is pytest, by default the jobs today are setup to use testr / stestr and subunit, which is why some of the additionally logging / reporting is missing.

@webknjaz
Copy link
Member Author

@pabelanger yes, thanks! It's just that I don't know how to navigate to that URL by myself. And it's a blocker as most folks will probably have the same problem. So this must be solved in order to have a successful transition.

And it's nice that you can refer to lines in logs. I like that. Though, it'd be nice if you could link a range of lines like in other CIs. Travis/GitHub and a lot of others use a URI fragment like #L12-14 to achieve this.

Our entrypoint for everything is tox. So the main objective would be setting up the mapping with tox envs.
In Travis, I've also parametrized jobs based on ansible version + tests type.

I'm curious whether it's possible to generate job matrixes in a compact way in Zuul.

Regarding logs, I think the main thing I want to see is just raw line-addressable logs. And additionally, I'd be nice to see the representation from xunit reports which pytest is able to generate out of the box.

@pabelanger
Copy link
Contributor

The good news here, is we can propose any changes to make Zuul better with GitHub integration, we can likely drive a lot of that design. What we'd need to do, is come up with what the spec would look like and propose it upstream to Zuul mailing list. Then we can start work on contributing that work.

For logs, we have full control over how they are generated and presented, today this is just the default. We can totally support multi-line selects, today we use an ansible role to generate them (htmlify-logs)[1] before we upload them into swift. We just need to determine what the javascript looks like and update the role to include it. I can poke around on online and see if I can find an example.

As for job matrix in zuul, I would say yes. Zuul jobs them self do not support templating of variables or parameters, however we do have the ability to do job inheritance. We can create a base molecule tox job that applies to all tox env, then create child jobs as we need more defined variables.

If you can point me to an example of the xunit reports you have for molecule today, I can write up a post-run playbook to collect them properly.

[1] http://git.zuul-ci.org/cgit/zuul-jobs/tree/roles/htmlify-logs

@webknjaz
Copy link
Member Author

We don't have xunit reports yet. I have used them in other projects, though. It's an easy to use built in feature of pytest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Improvement to quality assurance: CI/CD, testing, building
Projects
None yet
Development

No branches or pull requests

7 participants