Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we have some automated build testing? #384

Closed
huwd opened this issue Aug 6, 2020 · 12 comments
Closed

Should we have some automated build testing? #384

huwd opened this issue Aug 6, 2020 · 12 comments
Assignees

Comments

@huwd
Copy link
Member

huwd commented Aug 6, 2020

Was chatting with @injms and thinking perhaps, once a week we pull this repo and a bot / github action tries to run make all-apps and then reports back in some way if any build fails?

Strikes me that there's a few images in here that are lesser used, but might be returned to.
Especially whilst we're in a world of pausing projects to jump onto pressing national concerns, it would be good if there was a way to monitor which of the images have rotted away whilst we weren't looking?

Is this something we could do with a github action?

@huwd huwd self-assigned this Aug 6, 2020
@benthorner
Copy link
Contributor

I've thought about this myself in the past. It's a hard problem to solve, especially on a local machine (where these apps may be in all manner of strange under-development states). Since all the error state is inside containers, it could be horrendous to try and debug a problem happening inside an action. The sheer number of steps to run make all-apps to completion means that the task is also highly likely to fail for transient reasons.

  • Suggestion: remove make all-apps, as it's not something we need/recommend to run

Perhaps it's worth breaking this down:

  • What problems could we prevent with simpler, unit-style tests?
  • What problems will never happen again?
  • What problems arise externally to this repo?

One thing to note: it's not necessary for make <app> to pass in order for development to happen. make <app> does a "full" build to cover any API dependencies; that's not necessary for tests.

@huwd
Copy link
Member Author

huwd commented Aug 6, 2020

Sure, good headings, so my starting point would be something like this:

What problems could we prevent with simpler, unit-style tests?

Where might those unit tests live? The problem i'd be hoping to address with a change is:

I return to an app after an absence and find I cannot build the govuk-docker image becuase updates have been made to the app that were not reflected in the docker-compose.yml or Makefile.

I wonder if it's a similar problem to smokey? I forget it exists, and I can deploy my changes because all tests pass within the repo... but I forget to check if the docker image still builds and don't see smokey fail until second line give me a tap on the shoulder.

🤔 I wonder if there's something like after a sucessful deploy trigger an attempt to build the deployed app if it is on an allowlist that govuk-docker supports. If the build fails, then report it somewhere?

What problems will never happen again?

I suspect you have some in mind, but I guess this will be the ones where we're doing initial setup - right? Things I'd expect to change are what an app depends_on: and perhaps things like versions of databases or images that get pulled. Those could introduce incompatibilities right?

What problems arise externally to this repo?

That's likely nearly all of them - as we're making changes to the apps and not these files. @injms you had some examples where changes to whitehall or some of it's dependencies had caused it to stop building, could you give us a case study here?

@injms
Copy link
Contributor

injms commented Aug 7, 2020

One example is a naming change for the database exports on S3 caused the local Whitehall database replication to fail - fixed in #277.

Reinstalling Whitehall is tricky - so I'm guessing that most people won't try it unless they really have so (and I count myself among those people!) Because of this Whitehall can remain un-buildable without anyone noticing. It's this kind of small change in a far-off dependency that I'd like to catch before it becomes a problem.

Another thing last week was that to get Whitehall (frontend) running locally I needed to bump Rails to 5.2, and then change Whitehall's docker-compose.yml to use bin/rails server rather than rails server - or it would have a "can't find rails" error.

I think that it'd be really great to have some way of checking that a clean build works, from git clone to make {app} to ./replicate-{database-type}.sh {app} to govuk-docker-up (and anything inbetween that I've missed).

Whilst I thought about GitHub Actions, it's the thing I'm most comfortable with - so likely there's a 'I have a hammer and every problem looks like a nail' thought process here.

@benthorner
Copy link
Contributor

Let's see what we can do. First, we need to make this issue closable: so far, if we achieved the stated goals - every app totally works in a throw-away environment - we would have re-implemented GOV.UK and 2ndline.

To clarify: when I was talking about "unit" tests, I actually meant these "integration" ones 🙄

Second, we need to make this issue worthwhile: the effort required to close it should be less than the effort required to fix the individual problems we are experiencing. So let's focus on those specific problems first:

I'm pleased to find a lot of the issues we're experiencing are easily preventable. Ultimately, I'm trying to avoid this repo ending up like this one, which we of course, all love.

What problems arise externally to this repo?

I think we may find: relatively few. The reason for this is that, at least in theory, anyone making a change to a repo will be doing so with GOV.UK Docker as their dev environment, finding the problem, and fixing it.

Should we have some automated build testing?

Perhaps. I'm seeing it as a high effort endeavour. On the other hand, I appreciate some of the examples above have come out of doing this manually. However, I suspect the problems and potential fixes above will tie us over for quite some time, so it may be imperfect, but good enough, to assume someone will run make all-apps every so often.

Next steps

  • We could try and implement some or all of the proposals above. Maybe put them into a card to work on.

  • We should continue this discussion to agree what is worthwhile about this issue, and when it is closable.

  • Make sure we can run make all-apps to completion at the moment, on a few different machines.

Thanks for raising this, by the way. I've been putting off thinking about that last Mongo DB problem, and this has been a good prompt to actually think about it.

@huwd
Copy link
Member Author

huwd commented Aug 17, 2020

I think those are good principles and I agree that I like how focused and simple govuk-docker is. I can appreciate where steps have been taken in bits like the docker basefile to prevent complexity creeping in.

I'm interesting that you see this as a high effort thing, I guess to go back to the first commnet, you said:

The sheer number of steps to run make all-apps to completion means that the task is also highly likely to fail for transient reasons.

Could I get a better sense of what you were thinking about there? Is that perhaps like dockerhub has gone down for 5 minutes or something?

I'm still wondering if the lowest complexity thing would be to:

  • Make sure we can run make all-apps to completion at the moment, on a few different machines. (<- so i'll keep going through and trying to fix issues where the build trundles out, then ask a few others to try and build off master).
  • Once we're happy declare the repo "clean".
  • Set up a github action that says:
    1. Once a week run make all-apps, If it errors, report the repo it errored on (and 🤷‍♂️ at the idea you'd need to find others after it).
    2. Perhaps also set up something that says if there are any changes within a project folder, rebuild just that project (I'd know how to do this on concourse, would need to check for a github action but i'm sure there's a way to do monitoring?)

Bit I'm really unsure of is where we could do this. I guess building everything will take a lot of minutes / resources. So i'm wondering if concourse and paas would be a better place?

So my suggestion would be:
Either we all agree that the list @benthorner gave above are kind of "one off" teething issues, and that once we've got make all-apps working this eventuality will be far rarer? (@injms do you think that is true of the instance you had?)

Or, having got make all-apps to work, we try some kind of conditional building with either Ian's github action hammer, or my concourse one?

@benthorner
Copy link
Contributor

I'm interesting that you see this as a high effort thing.
...
Could I get a better sense of what you were thinking about there?

Sure, it's worth thinking about how such an approach would work:

  • We could make this a prerequisite check for PRs. It would be weird to do this on individual app repos, as this would take some time to setup, and look odd next to existing GitHub Actions. Having make all-apps as a PR check on this repo would slow down PRs a lot, and is mostly unnecessary (we seldom make changes that affect every app).

  • We could make this a scheduled job. This leads to the question: who will care when the job fails - enough to actually fix the failures? I'm basing my scepticism on the perpetual non-green state of the running GOV.UK platforms.

The reason for the transient failures is that makeing an app involves connecting to the Internet: rubygems.org, github.com, various package sites (if we're rebuilding an image). So "lots of apps" * "lots of sites" = "doom".

If we're playing this agile, a simpler starting point could be to be to plan another make-all-apps-and-fix-everything piece of work e.g. in 6 months. If we find it's useful, we could schedule another; at some point, we may think it's worth doing more frequently (e.g. if we're not catching things quickly enough), up to the point where we automate it.

We can also do some of the above suggestions to help prevent recurring problems.


How about:

  • A card to "Run make all-apps to completion and raise/fix all issues"

  • A card to "Add tests for preventable config issues":

    • Investigate forcing all Rails apps to run in "privileged" mode (to prevent this)
    • Investigate forcing all Ruby commands in Makefiles run with bundle / bin (to prevent this)
    • Investigate using a generic step for Postgres / Mongo create/migrate (to prevent this)
  • A card to "Add a PR template to ensure individual apps actually work":

    • Does make <app> run to completion, without errors?
    • Can you run the tests without any failures?
    • Can you start the app (if applicable) and visit key parts of it?

Once we've done these things, we could consider this issue closed. How does that sound? Would you be willing to write up some cards (on the "GOV.UK Developer Tools" board) to work on this?

@huwd
Copy link
Member Author

huwd commented Sep 30, 2020

Humm was trying to go through these all, and generated a bunch of issues, but realise i was tripping myself over trying to exclude "broken" ones, that would then be called upon by others and break. Was a bad testing methodology on my part.

I've deleted those issues, apologies if folks got email alerts.

None the less we've got these so far, these are real i believe:

So far here's what I've got:

@huwd
Copy link
Member Author

huwd commented Oct 7, 2020

Good news, I can get everything to build except:

After that they all build 🎉

So i'll track down the last of these, ensure we get good tickets then tick them off.
After which I'll just periodically run a make on everything to check it's still working, and we can do the review of if anything more complicated is needed to ensure everything remains well.

If others want to try, here's the command I ran.
Start in your govuk-docker repo:

ls projects/*/Makefile | xargs -L 1 dirname | xargs -L 1 basename | grep -v generic-ruby-library | grep -v bouncer | grep -v specialist-publisher | xargs make

@benthorner
Copy link
Contributor

Whoop, thorough job @huwd 🏅.

specalist-publiusher 🤔 humm thought we'd solved this one... but perhaps not. Will try to reproduce some more and open a new issue.

I tried this myself and make specialist-publisher runs to completion. Have you pulled recently?

So i'll track down the last of these, ensure we get good tickets then tick them off.

Is there anything we want to carry forward from my comment?

@huwd
Copy link
Member Author

huwd commented Oct 15, 2020

Humm yeah just confirmed make specialist-publisher works on mine too.
Ok so as of today everything builds...

I guess yeah, what do we want to do now?

@huwd
Copy link
Member Author

huwd commented Oct 15, 2020

How about:

A card to "Run make all-apps to completion and raise/fix all issues"

A card to "Add tests for preventable config issues":
    Investigate forcing all Rails apps to run in "privileged" mode (to prevent this)
    Investigate forcing all Ruby commands in Makefiles run with bundle / bin (to prevent this)
    Investigate using a generic step for Postgres / Mongo create/migrate (to prevent this)

A card to "Add a PR template to ensure individual apps actually work":
    Does make <app> run to completion, without errors?
    Can you run the tests without any failures?
    Can you start the app (if applicable) and visit key parts of it?

Once we've done these things, we could consider this issue closed. How does that sound? Would you be willing to write up some cards (on the "GOV.UK Developer Tools" board) to work on this?

This sounds like a good plan to me, will write up some cards.
And count this as closed

@huwd huwd closed this as completed Oct 15, 2020
@benthorner
Copy link
Contributor

Thanks @huwd!

Investigate forcing all Rails apps to run in "privileged" mode (to prevent this)

This one shouldn't be necessary for much longer, once this is merged in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants