Add search support for Job groups #3116

oded-dd · 2019-01-22T10:51:02Z

What challenge are you facing?

We are trying to build one pipeline for all of our services, instead of splitting it to multiple pipelines (As all pipelines are basically the same but working with different repos).
We have ~130 repos and we are using job groups as filter per service (repo) and filter per environment.

What would make this better?

We would like to be able to search specific job group

A search bar will be an amazing addition

vito · 2019-01-22T15:14:16Z

What do you mean by search? A way to filter the groups themselves from the pipeline page?

Guessing Cmd+F isn't good enough?

What is the benefit of having them all in one pipeline if they work with different repos? It really sounds like they should be separate pipelines, perhaps sharing a common template. 🤔

oded-dd · 2019-01-23T12:23:44Z

The pipeline is identical for all services, hence, I see no sense on using separate pipelines.
I would prefer to have one huge pipeline which allows me to follow the behavior of all services (and filter with job groups) then multiple pipeline.

One of the use cases would be, being able to see in one place (using job group) all the services status on the master branch.

Cmd + F can work but the UI is not user friendly for so many repository

See below:

vito · 2019-01-23T16:07:11Z

That screenshot doesn't exactly look ideal. 🙂

Sorry, but this still feels like it's kind of going against the design of pipelines and groups. Just because they're nearly identical doesn't mean they should share one config - in fact to that sounds like a maintenance nightmare. Pipelines should be separate when possible.

We have a similar use case with our various resources, but they all share a template and are configured as separate pipelines instead. They can be all viewed at once via the dashboard - which is designed for precisely this use case, and even supports search. Overall it's much more maintainable and easy to use, monitor, and interact with (I can actually click the job boxes).

Here's what our dashboard looks like - visible at https://ci.concourse-ci.org/:

You can even scope it to a team via the search bar and bookmark the search: https://ci.concourse-ci.org/?search=team%3Aresources

ari-becker · 2019-01-24T08:51:40Z

That screenshot is the result of ~750 lines (not including types) of Dhall expressions which generated a ~71,000 line pipeline YAML. 750 lines of code is quite easy to maintain. Also, the job boxes are quite easy to interact with / click on when a more narrow job group is selected (since job groups are basically just UI filters).

If anything, keeping the entire build and deploy process in a single pipeline makes maintenance easier for us.

It allows us to separate pipelines only for truly independent processes, e.g. a separate pipeline for nightly business report generation. Sure, we could organize these "independent processes" by splitting them into different teams, but then instead of having too many job groups, we'd have too many teams, and the UI for searching / filtering / aggregating by team is even worse - and, arguably, intentionally so, since different teams aren't supposed to look at each other's pipelines.
Getting a 10,000 foot view of the entire system is easier with one pipeline. If the system is composed of many pipelines, not all of these pipelines may fit on a single screen, and in high-density mode it is impossible to get a relative sense of where errors are in each pipeline (did the pipeline fail because a pull request failed, or because a production deployment failed?). With a single pipeline, we can simply zoom out, and get a sense not just of where but how widespread the issue is.
Routine operations with fly are much easier with a single pipeline. If I want to pause just my deployment, or just my report generation, I can just pause that specific pipeline with fly -t <team> pause-pipeline -p <pipeline>. If I want to pause just the deployment of a specific service, or the generation of one specific report, I can pause the specific job that deploys that service with fly -t <team> pause-job -j <job>. This setup affords me a lot of control. If I split my service deployments into multiple pipelines, and want to pause all production deployments but not earlier jobs like pull request builds or unrelated pipelines like my report generations, how can I do that? Because pipelines cannot be labeled as being logically related, I have to either have a naming convention for the pipelines so that I know which ones to pause (which is fragile) or I have to group all of the relevant pipelines under a single team. That sounds like it would be a suitable solution, until we as a company grow and need to use team separation for access control for different actual teams in the company, at which point figuring out which teams' pipelines should be paused and which should not becomes more the maintenance nightmare. Even after I do find the relevant pipelines to affect, I then need to find the relevant jobs to pause within those pipelines, which evokes a similar problem, one which we solve today by a naming standard for the jobs, which works (since the job definitions are generated) but is not ideal.
Splitting pipelines for services which are deployed independently but have common, duplicated resources sounds wasteful to me. If I have a single copy of a resource in a shared pipeline then Concourse needs to check this resource only once, and based on the results, trigger dozens or hundreds of relevant jobs, which may or may not start simultaneously depending on the number of workers I have, but entirely within Concourse's spec. If I have to duplicate this resource dozens or hundreds (or more?) times in order to separate my services into separate pipelines, then Concourse needs to carry out the same de-facto checks for de-jure different resources, which may cause Concourse to run afoul of external limitations (e.g. rate limitations) or cause subtle scheduling bugs (i.e. a resource that has been duplicated 100 times with version: latest picking up different versions if the resource has been updated while Concourse was checking the version for each of its copies of the duplicated resource).

Given that job groups are basically just UI filters, the feature request is really just asking for a better way to manage these filters if there are a lot of them (as there are when they're generated).

vito · 2019-01-24T14:58:32Z

At this point I think this is a discussion of where we are now, where we can be in one small step (by adding filters for the filters), and where we should be with how pipelines are meant to be used.

My main concern here is with the long-term. In the short-term, adding filters for the groups is fine, but I want to make sure I understand the use case properly as right now this still feels like it goes against the intended usage of pipelines, which is going to make things feel weird in situations like this simply because Concourse is not designed to support it.

The core issue here for me is that a lot of the things you're using massive pipelines for, the dashboard is intended to solve. But it doesn't work for you for various reasons you've outlined.

It allows us to separate pipelines only for truly independent processes, e.g. a separate pipeline for nightly business report generation. Sure, we could organize these "independent processes" by splitting them into different teams, but then instead of having too many job groups, we'd have too many teams, and the UI for searching / filtering / aggregating by team is even worse - and, arguably, intentionally so, since different teams aren't supposed to look at each other's pipelines.

I don't want to suggest using teams for this, but I don't see how it's worse - it's supported via the search bar on the dashboard - I linked an example in #3116 (comment). And to clarify, there's nothing wrong with different teams looking at each other's pipelines, and this becomes easier in 5.0 with RBAC where teams can allow read-only access.

Rather than teams, this sounds like it may be a use case for #532, and based on your feedback I can think of a few more ideas around it. For example, it sounds like you care a lot about being able to pause the whole thing at once - I could see that being supported by pausing entire hierarchies. That would allow for both fine-grained pausing (of one of the 'groups' in your pipeline) and coarse-grained (when you want to pause the whole dang thing).

Getting a 10,000 foot view of the entire system is easier with one pipeline. If the system is composed of many pipelines, not all of these pipelines may fit on a single screen, and in high-density mode it is impossible to get a relative sense of where errors are in each pipeline (did the pipeline fail because a pull request failed, or because a production deployment failed?). With a single pipeline, we can simply zoom out, and get a sense not just of where but how widespread the issue is.

Getting a 10,000 foot view of the entire system was the core reason people wanted the dashboard view, so something must be wrong here. It sounds like the only problem is not being able to fit all the pipeline thumbnails on one page, but so long as your pipelines are gigantic that's obviously going to be even harder. Wouldn't a simpler fix be to just zoom out (Ctrl+-) on the dashboard view, too? Or do we need to adjust the scaling on the dashboard overall?

Routine operations with fly are much easier with a single pipeline. [...]

This again sounds like it would be improved by something like #532.

Splitting pipelines for services which are deployed independently but have common, duplicated resources sounds wasteful to me. [...]

This point is no longer true with #2386 which will be in 5.0 (off by default for now). Equivalent resource definitions, even across teams, will only result in one version history and one check container. Scheduling will therefore be equivalent across the pipelines as they'll all see the same versions at the same time.

At the end of the day, the concern here is by using pipelines in ways that go against the design of the rest of Concourse, there may be all kinds of awkward user flows that you're experiencing. So I want to make sure I understand why doing so is still the best option for you, so that we can find and fix the gaps in the right places.

Thanks for the feedback so far! I'm going to leave this open for now. I'd be happy to review a PR for this.

ari-becker · 2019-01-24T15:47:49Z

I don't want to suggest using teams for this, but I don't see how it's worse - it's supported via the search bar on the dashboard - I linked an example in #3116 (comment). And to clarify, there's nothing wrong with different teams looking at each other's pipelines, and this becomes easier in 5.0 with RBAC where teams can allow read-only access.

My thinking here is in line with permissions. RBAC in 5.0 will help in this regard, but I'm still uncomfortable with it since the notion of teams in Concourse no longer lines up with the real-life organizational hierarchy, and that serves as a point of confusion.

Rather than teams, this sounds like it may be a use case for #532, and based on your feedback I can think of a few more ideas around it. For example, it sounds like you care a lot about being able to pause the whole thing at once - I could see that being supported by pausing entire hierarchies. That would allow for both fine-grained pausing (of one of the 'groups' in your pipeline) and coarse-grained (when you want to pause the whole dang thing).

Looking at #532 , it seems like it could be a great solution, especially if enhanced tooling is built around hierarchical pipelines, as you point out. My question regarding #532 is around the nature of the hierarchy - a fly set-pipeline -p x/y/z interface is well-suited for fan-out hierarchies, but not fan-in hierarchies (consider for instance #313 where I brought up the use-case of a common job for applying infrastructure changes where the changes may be supplied by different source jobs/pipelines).

Wouldn't a simpler fix be to just zoom out (Ctrl+-) on the dashboard view, too? Or do we need to adjust the scaling on the dashboard overall?

Job groups allow me to specify exactly which jobs are shown. The search bar on the dashboard view makes it easier to find specific pipelines, but does not allow me to define an exact subset of pipelines to be seen. Even if it did (e.g. by labeling pipelines and searching by label), it's faster an easier to press a button that represents a predefined filter than it does to type a filter into a search box each time afresh, and it's particularly important as non/less-technical people are hired (management, support) for whom workflows need to be as simple and foolproof as possible. The use case involves a wall monitor to show real-time status - it is important for the wall monitor to show exactly and only the status of relevant jobs.

Case in point - in the screenshot, we have job groups per deployable service, as well as job groups like all-pull-request, all-staging-deploy, and all-prod-deploy. This makes it easy for product owners to see what's relevant to them (the job group per deployable service contains everything they need), for project / community managers to see what's relevant to them (all-pull-request), and for ops/platform/infrastructure engineers to see what's relevant to them (all-staging-deploy, all-prod-deploy). #532 may help with this, depending on how an implementation for it would look like.

This point is no longer true with #2386 which will be in 5.0 (off by default for now). Equivalent resource definitions, even across teams, will only result in one version history and one check container. Scheduling will therefore be equivalent across the pipelines as they'll all see the same versions at the same time.

Awesome 👍 can't come quickly enough.

stale · 2019-07-16T17:00:36Z

Beep boop! This issue has been idle for long enough that it's time to check
in and see if it's still important.

If it is, what is blocking it? Would anyone be interested in submitting a
PR or
continuing the discussion to help move things forward?

If no activity is observed within the next week, this issue will be
~~exterminated~~ closed, in accordance with our stale issue
process.

oded-dd added enhancement triage labels Jan 22, 2019

vito added needs-more-info and removed triage labels Jan 22, 2019

no-response bot removed the needs-more-info label Jan 23, 2019

vito added the web-ui label Jan 24, 2019

stale bot added the wontfix label Jul 16, 2019

stale bot closed this as completed Jul 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add search support for Job groups #3116

Add search support for Job groups #3116

oded-dd commented Jan 22, 2019

vito commented Jan 22, 2019

oded-dd commented Jan 23, 2019

vito commented Jan 23, 2019

ari-becker commented Jan 24, 2019

vito commented Jan 24, 2019

ari-becker commented Jan 24, 2019

stale bot commented Jul 16, 2019

Add search support for Job groups #3116

Add search support for Job groups #3116

Comments

oded-dd commented Jan 22, 2019

What challenge are you facing?

What would make this better?

vito commented Jan 22, 2019

oded-dd commented Jan 23, 2019

vito commented Jan 23, 2019

ari-becker commented Jan 24, 2019

vito commented Jan 24, 2019

ari-becker commented Jan 24, 2019

stale bot commented Jul 16, 2019