Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor app doc classes & build task #2918

Merged
merged 15 commits into from
Jan 15, 2021
Merged

Refactor app doc classes & build task #2918

merged 15 commits into from
Jan 15, 2021

Conversation

ChrisBAshton
Copy link
Contributor

@ChrisBAshton ChrisBAshton commented Jan 6, 2021

This PR tackles some of the technical debt that has built up over time. See commits for details.

The method does actually return an array of elements of type `App`,
so `pages` is quite a disingenuous name, all the more problematic
now that we have a concept of imported app 'docs' (synonymous with
pages).

Renaming 'pages' to 'apps' makes it clearer what object types we
are dealing with.
We're already checking if a repo is private in the
'consume_docs_folder' method in AppDocs, but this will be removed
in the next commit. It's safer to move the check to a lower level
in the code, to ensure that there's no possible way we can
accidentally retrieve private information through these methods.

I had to stop using `app_name` and start using `github_repo_name`
for `govuk-content-store-examples`, which is actually called
`govuk-content-schemas` on GitHub.

I also had to explicitly reject private repos from ProxyPages
and api_layout, as otherwise the build would error with
`RuntimeError: alphagov/govuk-knowledge-graph not found` (which
is a private repository). Hopefully we can continue to refactor
govuk-developer-docs to be less brittle around this.
Since #2910, _all_ apps have their `docs/` folder imported. We
have not identified a reason why any of them should be opted out,
so to simplify the code & tests, we've now removed the relevant
code.
- Move govuk-content-schemas check later on in the task, as it's
  quite slow.
- Suppress the very noisy `HTTP.get` output from the
  `verify_deployable_apps` and `check_puppet_names` tasks, so it's
  easier to figure out where in the task we are.
- Remove the `/tmp/govuk-content-schemas` before cloning, to a)
  stop the task from borking when this dir already exists, and b)
  ensure it is always up to date.
- Clarify that you do now NEED a GitHub auth token to build or
  test govuk-developer-docs, given that there are so many API
  requests made now that we import external documentation.
  Documents a useful shortcut for retrieving your auth token
  without exposing it in your bash history.
These tests would often fail if the cache exists already (from an
earlier test run or build). In the case of document_types_spec.rb,
the keys would sometimes return in a different order, so I've
now arranged them alphabetically before asserting them. And in the
case of the CSV tests, I've bypassed having to worry about the
cache and simply mock the method calls rather than stub the
network requests. This has led to easier to read tests!
The way these tests were written meant that the cache got cleared
when the tests were run, so if we'd just gone through the
expensive (many requests, multiple minutes) build process, we'd
have to go through it allllll again to get requests in the cache.

We can now build the project from scratch (i.e. make lots of HTTP
requests), then run the test suite, then build the project again
(this time with no HTTP requests, as the requests are cached), and
it all works nicely.
There's quite a lot of mocking going on, but I think this is an
improvement on what we had previously (you no longer need to
scroll in order to read the entire test, for a start!).

This test is mainly about how the responses from network requests
and method calls are combined into an array result, so rather than
do all the complicated setup to get those method calls returning
something we can test against, it's better to simply mock the
return values of those method calls.
@ChrisBAshton ChrisBAshton force-pushed the refactor branch 5 times, most recently from d55bed1 to 6f53c71 Compare January 15, 2021 08:05
@ChrisBAshton ChrisBAshton changed the title WIP: Refactor Refactor app doc classes & build task Jan 15, 2021
@ChrisBAshton ChrisBAshton marked this pull request as ready for review January 15, 2021 08:31
Comment on lines 12 to 14
def public_repo
double("Public repo", private_repo?: false, default_branch: "master")
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really following the explanation in 5acf5ad, aren't lets reinitialised for every test which uses them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just tried switching it back to a let, and it worked! I found it did work when running github_repo_fetcher_spec.rb directly, but would fail when running the entire test suite. However, I've just tried a couple of runs and it seems to pass every time, so some wires must have got crossed somewhere.

I've now fixed up the commit to use a let instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I think it just happens intermittently. An unrelated change just failed:

Failures:

  1) GitHubRepoFetcher#repo fetches a repo from cache if it exists
     Failure/Error: all_alphagov_repos.find { |repo| repo.name == repo_name } || raise("alphagov/#{repo_name} not found")
       #<Double "Public repo"> received unexpected message :name with (no args)
     # ./app/github_repo_fetcher.rb:8:in `block in repo'
     # ./app/github_repo_fetcher.rb:8:in `each'
     # ./app/github_repo_fetcher.rb:8:in `find'
     # ./app/github_repo_fetcher.rb:8:in `repo'
     # ./spec/app/github_repo_fetcher_spec.rb:19:in `block (3 levels) in <top (required)>'

  2) GitHubRepoFetcher#repo raises error if no repo is found
     Failure/Error:
       expect {
         GitHubRepoFetcher.instance.repo("something-not-here")
       }.to raise_error(StandardError)

       expected StandardError, got #<RSpec::Mocks::ExpiredTestDoubleError: #<Double "Public repo"> was originally created in one example...only last for one example, and you need to create a new one in each example you wish to use it for.> with backtrace:
         # ./app/github_repo_fetcher.rb:8:in `block in repo'
         # ./app/github_repo_fetcher.rb:8:in `each'
         # ./app/github_repo_fetcher.rb:8:in `find'
         # ./app/github_repo_fetcher.rb:8:in `repo'
         # ./spec/app/github_repo_fetcher_spec.rb:34:in `block (4 levels) in <top (required)>'
         # ./spec/app/github_repo_fetcher_spec.rb:33:in `block (3 levels) in <top (required)>'
     # ./spec/app/github_repo_fetcher_spec.rb:33:in `block (3 levels) in <top (required)>'

  3) GitHubRepoFetcher#repo fetches a repo from GitHub if it doesn't exist in the cache
     Failure/Error: all_alphagov_repos.find { |repo| repo.name == repo_name } || raise("alphagov/#{repo_name} not found")
       #<Double "Public repo"> was originally created in one example but has leaked into another example and can no longer be used. rspec-mocks' doubles are designed to only last for one example, and you need to create a new one in each example you wish to use it for.
     # ./app/github_repo_fetcher.rb:8:in `block in repo'
     # ./app/github_repo_fetcher.rb:8:in `each'
     # ./app/github_repo_fetcher.rb:8:in `find'
     # ./app/github_repo_fetcher.rb:8:in `repo'
     # ./spec/app/github_repo_fetcher_spec.rb:27:in `block (3 levels) in <top (required)>'

  4) GitHubRepoFetcher#readme retrieves the README content from the GitHub CDN
     Failure/Error: all_alphagov_repos.find { |repo| repo.name == repo_name } || raise("alphagov/#{repo_name} not found")
       #<Double "Public repo"> was originally created in one example but has leaked into another example and can no longer be used. rspec-mocks' doubles are designed to only last for one example, and you need to create a new one in each example you wish to use it for.
     # ./app/github_repo_fetcher.rb:8:in `block in repo'
     # ./app/github_repo_fetcher.rb:8:in `each'
     # ./app/github_repo_fetcher.rb:8:in `find'
     # ./app/github_repo_fetcher.rb:8:in `repo'
     # ./app/github_repo_fetcher.rb:14:in `readme'
     # ./spec/app/github_repo_fetcher_spec.rb:70:in `block (3 levels) in <top (required)>'

  5) GitHubRepoFetcher#readme returns nil if no README exists
     Failure/Error: all_alphagov_repos.find { |repo| repo.name == repo_name } || raise("alphagov/#{repo_name} not found")
       #<Double "Public repo"> was originally created in one example but has leaked into another example and can no longer be used. rspec-mocks' doubles are designed to only last for one example, and you need to create a new one in each example you wish to use it for.
     # ./app/github_repo_fetcher.rb:8:in `block in repo'
     # ./app/github_repo_fetcher.rb:8:in `each'
     # ./app/github_repo_fetcher.rb:8:in `find'
     # ./app/github_repo_fetcher.rb:8:in `repo'
     # ./app/github_repo_fetcher.rb:14:in `readme'
     # ./spec/app/github_repo_fetcher_spec.rb:90:in `block (3 levels) in <top (required)>'

Finished in 1.06 seconds (files took 2.33 seconds to load)
1488 examples, 5 failures

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's unfortunate. Well I'm happy to approve a PR restoring the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly I just tried it and the issue still persists: https://github.com/alphagov/govuk-developer-docs/runs/1733868114

I'll have to rearchitect the tests I think.

We had some doubles that were duplicated in places, and an
unnecessary `let(:repo)` that was only used by one test.
Having the `App` class nested inside `AppDocs` made it harder to
test, and a much longer file to read and understand. The
functionality of both the `AppDocs` and `App` classes has now
grown to the point where it makes sense to treat them as disparate
classes in their own right.
Now that we have a concept of 'documentation from a given app's
repo', the term 'AppDocs' is misleading - it suggests that it is
that. When actually it is a collection of what we now call `App`s
(as per the class in `app.rb`), and docs are separate. To attempt
to remove the confusion, I've renamed the class. I'll rename some
of its methods next.
`.apps` made sense when this was called `AppDocs`, but now is just
a repetition of the class name. `.all` is more appropriate, and
follows traditional Railsy naming conventions.
Duplication of the word 'app' is unnecessary.
In several places, we needed to just retrieve the public repos.
So I've encapsulated this in its own method.
We had an unnecessary `active_app_pages` helper method, where a
simple scope on the `Applications` class would do. We also had
duplicated filtering logic in several places, all of which just
wanted to retrieve the non-retired apps.
We do have distinct concepts of `app_name` vs `repo_name`; often
they are the same thing, but can be overridden in config, e.g.
https://github.com/alphagov/govuk-developer-docs/blob/91185cc86710ccf39f170f3743a2e3f9b5c12f7d/data/applications.yml#L341-L342

GitHubRepoFetcher does indeed rely on the repo name rather than
the app name, so we should make that clearer in the code.
@ChrisBAshton ChrisBAshton merged commit 77c983c into master Jan 15, 2021
@ChrisBAshton ChrisBAshton deleted the refactor branch January 15, 2021 11:06
ChrisBAshton added a commit that referenced this pull request Jan 20, 2021
In theory, a `let` should create a new variable each time, but
we're seeing errors like:

```
     GitHubRepoFetcher#repo fetches a repo from GitHub if it doesn't exist in the cache
     Failure/Error: all_alphagov_repos.find { |repo| repo.name == repo_name } || raise("alphagov/#{repo_name} not found")
       #<Double "Public repo"> was originally created in one example but has leaked into another example and can no longer be used. rspec-mocks' doubles are designed to only last for one example, and you need to create a new one in each example you wish to use it for.
     # ./app/github_repo_fetcher.rb:8:in `block in repo'
     # ./app/github_repo_fetcher.rb:8:in `each'
     # ./app/github_repo_fetcher.rb:8:in `find'
     # ./app/github_repo_fetcher.rb:8:in `repo'
     # ./spec/app/github_repo_fetcher_spec.rb:27:in `block (3 levels) in <top (required)>'
```

We'll try switching to a method call so that it creates a new one
every time. See discussion:
#2918 (comment)
ChrisBAshton added a commit that referenced this pull request Jan 20, 2021
In theory, a `let` should create a new variable each time, but
we're seeing errors like:

```
     GitHubRepoFetcher#repo fetches a repo from GitHub if it doesn't exist in the cache
     Failure/Error: all_alphagov_repos.find { |repo| repo.name == repo_name } || raise("alphagov/#{repo_name} not found")
       #<Double "Public repo"> was originally created in one example but has leaked into another example and can no longer be used. rspec-mocks' doubles are designed to only last for one example, and you need to create a new one in each example you wish to use it for.
     # ./app/github_repo_fetcher.rb:8:in `block in repo'
     # ./app/github_repo_fetcher.rb:8:in `each'
     # ./app/github_repo_fetcher.rb:8:in `find'
     # ./app/github_repo_fetcher.rb:8:in `repo'
     # ./spec/app/github_repo_fetcher_spec.rb:27:in `block (3 levels) in <top (required)>'
```

We'll try switching to a method call so that it creates a new one
every time. See discussion:
#2918 (comment)
ChrisBAshton added a commit that referenced this pull request Jan 20, 2021
We're intermittently seeing errors like:

```
     GitHubRepoFetcher#repo fetches a repo from GitHub if it doesn't exist in the cache
     Failure/Error: all_alphagov_repos.find { |repo| repo.name == repo_name } || raise("alphagov/#{repo_name} not found")
       #<Double "Public repo"> was originally created in one example but has leaked into another example and can no longer be used. rspec-mocks' doubles are designed to only last for one example, and you need to create a new one in each example you wish to use it for.
     # ./app/github_repo_fetcher.rb:8:in `block in repo'
     # ./app/github_repo_fetcher.rb:8:in `each'
     # ./app/github_repo_fetcher.rb:8:in `find'
     # ./app/github_repo_fetcher.rb:8:in `repo'
     # ./spec/app/github_repo_fetcher_spec.rb:27:in `block (3 levels) in <top (required)>'
```

I've tried switching `let(:public_repo)` for a method (see discussion):
#2918 (comment)

...but that did not have the intended effect. I believe we need to
stop manipulating the `CACHE` global itself to fix our tests.
ChrisBAshton added a commit that referenced this pull request Jan 21, 2021
We're intermittently seeing errors like:

```
     GitHubRepoFetcher#repo fetches a repo from GitHub if it doesn't exist in the cache
     Failure/Error: all_alphagov_repos.find { |repo| repo.name == repo_name } || raise("alphagov/#{repo_name} not found")
       #<Double "Public repo"> was originally created in one example but has leaked into another example and can no longer be used. rspec-mocks' doubles are designed to only last for one example, and you need to create a new one in each example you wish to use it for.
     # ./app/github_repo_fetcher.rb:8:in `block in repo'
     # ./app/github_repo_fetcher.rb:8:in `each'
     # ./app/github_repo_fetcher.rb:8:in `find'
     # ./app/github_repo_fetcher.rb:8:in `repo'
     # ./spec/app/github_repo_fetcher_spec.rb:27:in `block (3 levels) in <top (required)>'
```

I've tried switching `let(:public_repo)` for a method (see discussion):
#2918 (comment)

...but that did not have the intended effect. I believe we need to
stop manipulating the `CACHE` global itself to fix our tests.
ChrisBAshton added a commit that referenced this pull request Feb 2, 2021
Fixes #2940. Bug introduced in #2918.

Tests were sensitive to their order of execution. State from a
previous run of tests can remain on the `GitHubRepoFetcher`
singleton object.

The Singleton pattern is somewhat discouraged [1] [2], though it
can be useful to refer to a common instance of a class where its
fetch methods are computationally expensive and we rely heavily
on caching.

This commit follows the guidance in the second referenced article,
to retain the Singleton presentation of the class but also to
allow it to be instantiated (making it much easier to test). I've
also added a test to capture and verify the intended `.instance`
singleton behaviour.

[1]: https://www.rubyguides.com/2018/05/singleton-pattern-in-ruby/
[2]: https://8thlight.com/blog/josh-cheek/2012/10/20/implementing-and-testing-the-singleton-pattern-in-ruby.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants