Skip to content
This repository has been archived by the owner on Oct 4, 2024. It is now read-only.

Restrict merge access to most GOV.UK repos #52

Closed
wants to merge 1 commit into from

Conversation

benthorner
Copy link
Contributor

@benthorner benthorner commented Jun 23, 2020

https://trello.com/c/TUB9OGWD/152-prepare-and-run-a-beta-trial-of-continuous-deployment

Deadline for discussion: 2020-06-30

This restriction is necessary for apps with Continuous Deployment
enabled i.e. where "merge" is the same as "deploy". Currently, the
set of developers without production access may include people without
security clearance, for whom we have to assume a worst case scenario.

As part of moving to Continuous Deployment, we will incrementally
override each repo to add this restriction. Rather than adding lots
of overrides, we should try to make a clear policy decision about
whether non-production developers should be able to merge PRs.

Exceptions to this:

- govuk-puppet - some changes only apply to Integration, such as
initial PRs for someone joining GOV.UK [1]

Reasons for this change:

- We avoid building up a set of overrides for Continuous Deployment.

- We more clearly align deploying and change with merging it (since
the person merging it is always able to).

- People deploying changes can have more visibility of them, having
merged the changes they are deploying themselves.

- A higher barrier to merge for new joiners should promote the good
habit of doing branch deploys for any manual testing.

Reasons against this change:

- Some loss of satisfaction of merging your own PR, even though you
couldn't actually deploy it to Production.

- Some additional workload on developers with production access, who
will now be asked to merge PRs, as well as deploy them.

[1]: https://docs.publishing.service.gov.uk/manual/get-started.html#6-get-ssh-access-to-integration

Follow-up tasks:

  • Update our docs to cover the restriction
  • Send an email round to make everyone aware

@benthorner benthorner changed the title WIP Restrict merge access to most GOV.UK repos Jun 23, 2020
This restriction is necessary for apps with Continuous Deployment
enabled i.e. where "merge" is the same as "deploy". Currently, the
set of developers without production access may include people without
security clearance, for whom we have to assume a worst case scenario.

As part of moving to Continuous Deployment, we will incrementally
override each repo to add this restriction. Rather than adding lots
of overrides, we should try to make a clear policy decision about
whether non-production developers should be able to merge PRs.

Exceptions to this:

- govuk-puppet - some changes only apply to Integration, such as
initial PRs for someone joining GOV.UK [1]

Reasons for this change:

- We avoid building up a set of overrides for Continuous Deployment.

- We more clearly align deploying and change with merging it (since
the person merging it is always able to).

- People deploying changes can have more visibility of them, having
merged the changes they are deploying themselves.

- A higher barrier to merge for new joiners should promote the good
habit of doing branch deploys for any manual testing.

Reasons against this change:

- Some loss of satisfaction of merging your own PR, even though you
couldn't actually deploy it to Production.

- Some additional workload on developers with production access, who
will now be asked to merge PRs, as well as deploy them.

[1]: https://docs.publishing.service.gov.uk/manual/get-started.html#6-get-ssh-access-to-integration
@barrucadu
Copy link
Contributor

This isn't going to cause a problem for my team, but for now we're working outside the usual apps, on stuff which is currently only deployed to the PaaS (and all of us have production access anyway). So I'll leave it to people with more normal teams to approve.

Though I'm not too sure about

  • govuk-puppet - some changes only apply to Integration, such as
    initial PRs for someone joining GOV.UK [1]

Does that mean only people with production access will be able to approve PRs for puppet, so they can verify that the change isn't going to affect staging or production?

@benthorner
Copy link
Contributor Author

@barrucadu perhaps I need to clarify what I mean by an "exclusion". The affect of this change will be that only developers with production access will be able to click the "merge" button; anyone can still approve.

I'm suggesting we exclude Puppet (i.e. keep the repo as it is now) because it's sort of OK to merge a PR without deploying further if it only applies to Integration (that, and we're not likely to enable CD for it!).

Does that answer your question?

@issyl0
Copy link
Contributor

issyl0 commented Jun 24, 2020

This will restrict merging Dependabot's many PRs (although maybe less soon!) to people with (permanent?) production access. Currently, there are significantly fewer production-cleared developers compared to the stupid number of repos, and a stupid number of Dependabot PRs. Is that sustainable? Should we relate "learning about how to keep software up to date, effectively test it and be confident in its operation" with "being able to deploy to Production"?

Partially related: we're potentially going to hire more juniors, how do we ensure that they are getting the rounded education they need in how to be software engineers if they can't self-merge their own PRs and sometimes see the pitfalls and brokenness of doing that too early, and debug broken stuff in integration? I know that times have changed, but I'm not sure I'd have got where I am today - having diligently waited years for production access - with these restrictions if they'd been around six years ago. Y'know? How do we ensure that people take care and pride in the changes that they make, if they don't get the satisfaction of merging them themselves? It's about trust at the end of the day, and I think that this erodes some of that.

I have more thoughts, but I can't form them in my brain right now and at the risk of this turning into an essay, I'm going to stop here. This should give more than enough to think about!

@barrucadu
Copy link
Contributor

that, and we're not likely to enable CD for it!

Ok, that clears things up for me. Thanks.

@benthorner
Copy link
Contributor Author

@issyl0 thanks for your thoughts. I'll try and respond here.

This will restrict merging Dependabot's many PRs (although maybe less soon!) to people with (permanent?) production access. Is that sustainable?

I think so. Anyone can still review and approve a PR, which is the real time consuming part. Having to ask someone to click a green button before deploying isn't a significant change.

how do we ensure that they are getting the rounded education they need in how to be software engineers if they can't self-merge their own PRs and sometimes see the pitfalls and brokenness of doing that too early, and debug broken stuff in integration?

It sounds like there's a need to let people break Integration on the master branch, so they learn to not break Integration on the master branch. How do they learn this at the moment?

How do we ensure that people take care and pride in the changes that they make, if they don't get the satisfaction of merging them themselves?

Are you saying we need to be able to press a green button in order to motivate us to do a good job?

It's about trust at the end of the day, and I think that this erodes some of that.

Our current process means that, of the people without prod access, we can't tell who is security cleared and not. The argument of this PR is really that we're a little too trusting of this group at the moment.

How trusting should we be of people without production access?

@theseanything
Copy link
Contributor

Seems like the main downside is more "process" / "friction" for teams (e.g. checking if changes merged, finding some to merge, making sure that they also deploy it etc. etc). Although maybe this isn't that much more compared to the overhead of just needing someone else to deploy.

The end goal is enabling CD, which should overall reduce the "process". For me it's a question of "does the extra work of slowly adding manual overrides as we rollout CD worth the extra "process" of merge restrictions without CD?" And if we do want add this blanket rule, should we wait closer until we are rolling out CD across all apps to minimise extra "process"?

@benthorner
Copy link
Contributor Author

benthorner commented Jun 29, 2020

Thanks for explaining your thoughts @theseanything. If we think of this PR in isolation, then I agree there's little motivation to make a change. The reason I've raised this is because I can see it being a thorny issue in the upcoming RFC and subsequent rollout for Continuous Deployment. This would be a shame, because what we're talking about here isn't related to CD - it's just a tangential compatibility change. I'll try and list the alternatives I see:

  • Rewrite our security processes to avoid the need for this. This seems like an unbounded goal to me, and quite out-of-scope of the Continuous Deployment work, so I'm trying to avoid this if alternatives exist (which they do).

  • List this as a consequence on the RFC. I don't want to do that, as it will steer the discussion away from Continuous Deployment and could jeopardise the RFC in general.

  • Suggest an incremental approach on the RFC. This runs the risk of discovering an issue down the road that block the rollout of CD (despite the RFC). It could also cause confusion: which repos have the restriction vs. which don't.

  • Make the blanket change here and now. This seems to be the lowest risk option, and gives us some time to learn about any issues instead of them becoming blocking later on. And like you say, it's not that much more overhead.

So to answer your questions: I think it is worth making this change now: partly for the advantages ☝️, which support CD but may also be beneficial in general; and partly for the learning we get from applying it on its own.

How do you see it playing out?

@theseanything
Copy link
Contributor

  • Rewrite our security processes to avoid the need for this. This seems like an unbounded goal to me, and quite out-of-scope of the Continuous Deployment work, so I'm trying to avoid this if alternatives exist (which they do).

Definitely agree that we shouldn't trying to change our security processes now. (but possibly worth revisiting in future)

How do you see it playing out?

To be honest, I really don't have a strong view for either way.

  • Make the blanket change here and now. This seems to be the lowest risk option, and gives us some time to learn about any issues instead of them becoming blocking later on. And like you say, it's not that much more overhead.
  • Suggest an incremental approach on the RFC. This runs the risk of discovering an issue down the road that block the rollout of CD (despite the RFC). It could also cause confusion: which repos have the restriction vs. which don't.

Slightly lean towards the incremental approach - as it allows us to rollout slowly and tackle issues as they come up, and avoids getting into an all or nothing situation. Accompanying it with CD will potentially lessen any "process" for teams and helps people realise why the change.

From what we've noticed on Smart Answers, it's really been a non-issue. There was slight confusion as to why someone couldn't merge - but it was easy to explain and didn't result in any consequences.

@richardTowers
Copy link

Thanks for raising this Ben. Sorry for not getting involved in the discussion sooner.

My main concern is that this will be disempowering for developers without "production access". Admittedly they're mostly disempowered already by not being able to deploy their own changes, so this may not be making things that much worse.

When we're moving to continuous deployment I think it's a small price to pay to introduce this restriction, but I'm not sure about rolling it out everywhere in preparation - that feels like it will be disempowering with little benefit. I guess that makes me in favour of an incremental approach (i.e. add this restriction as we roll CD out to new apps).

I know you said you wanted to avoid rewriting our whole security process, but (without wanting to derail the conversation entirely) I think it's worth considering it here.

I think the main issue is that our levels of permission aren't granular enough to represent the jobs that we want people to do.

Currently, when someone starts on GOV.UK they get access to push code, deploy to integration, ssh onto integration machines etc., but there's no way for them to affect production (other than merging a PR and waiting for some patsy to deploy it for them).

The next level of access is full production access, which basically means being an administrator of everything - ssh onto any machine, super admin in signon, AWS admin, etc.

Because production access means potentially using a bunch of very sharp tools, we want to make sure people aren't going to cut themselves (or the rest of us) before giving them that level of access. Consequently, that means there's a long, arduous process people have to go through before getting production access.

Gating being able to merge PRs on having done two shadow 2ndline shifts and one ordinary 2ndline shift feels overly burdensome.

Personally, I think it would be better to have something more along the lines of:

  • New starters get read / write access to repos, but can't merge PRs. They get ssh / AWS access to integration (as they do now)
  • Once somebody's been on a team for a few weeks and understands our processes, they're allowed to merge PRs and deploy them, but not administer the environment (which would require us giving them some non-admin deploy Jenkins access).
  • Once somebody's done their 2ndline shifts they get full production access (SSH etc.), as they do now.

... but that probably wants a separate RFC.

What I'd suggest regarding this PR is:

  • Let's continue to restrict merge access only to repositories which are continuously deployed, rather than restricting merge access everywhere
  • I'll volunteer to raise a separate RFC to discuss updating our security stance to allow a middle level of access where people can merge and deploy to production, but don't have administrative access.
  • We can revisit this decision if needed (e.g. if my RFC is rejected, or if managing the CD rollout gets difficult due to the inconsistency).

@issyl0
Copy link
Contributor

issyl0 commented Jun 30, 2020

My main concern is that this will be disempowering for developers without "production access".

Disempowering! That's the word that sums up my long, emotive, meandering comment above.

@benthorner
Copy link
Contributor Author

@richardTowers thanks for all your thoughts on this. I like the sound of the revised process for granting access - CD should make it easier to implement, since we won't actually need to grant access beyond Integration.

Looking over the comments, I'm reading the conclusion as: close this PR and apply it incrementally. Before I do this, I'd like us to agree (in principle) what the future will look like, which is really why I raised this PR:

  1. Close this PR
  2. (In a couple of weeks) Incrementally enable CD for a few more repos, after applying this restriction to them
  3. (In a couple of weeks) Write an RFC for CD, which will recommend the same, incremental restriction approach
  4. (Soonish) Write another RFC to revise our process for granting access to stuff

@issyl0 @richardTowers @theseanything @barrucadu have I got that right? (You can just 👍). I'd much rather found out any lingering concerns we have now than have a new version of this discussion on the RFC.

@benthorner benthorner closed this Jul 1, 2020
@benthorner benthorner deleted the invert-prod-merge-access branch July 1, 2020 16:29
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants