Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import docs contents of all repositories #2910

Merged
merged 8 commits into from
Dec 30, 2020
Merged

Conversation

ChrisBAshton
Copy link
Contributor

@ChrisBAshton ChrisBAshton commented Dec 24, 2020

This PR pulls in top-level *.md files from the docs/ folder of all GOV.UK repositories, rather than just those opted in with consume_docs_folder: true. This is so that we surface all documentation more easily and have a consistent approach to pulling in external documentation. It also gives us the option of hosting documentation closer to the code (advantageous when documenting rake tasks, for example) and an opportunity to consolidate fragmented docs (deleting less useful docs and putting effort into maintaining one resource).

Trello: https://trello.com/c/UyfVvwYm/183-consume-all-docs-folders

Visual changes

There are minimal visual changes. I've removed the https://docs.publishing.service.gov.uk/apis.html landing page and now external docs have app.html as a parent (as evidenced by the navigation at the top of the screenshot). There will be a follow-up PR (#2912) for improved visual changes.

Before After
Screenshot 2020-12-29 at 14 45 29 Screenshot 2020-12-29 at 14 46 12

Logic changes

  • I've switched the consume_docs_folder functionality to be opt out rather than opt in. That is, setting consume_docs_folder: false on an app in applications.yml will prevent the Developer Docs from pulling in content from its docs folder.
  • I've removed the concept of 'API documentation'. There is now just 'App documentation', for all apps, regardless of whether or not the app is an API. Routes previously available at /apis/** will be redirected to their /apps/** equivalent. It is perhaps best explained with a diff, which demonstrates the more RESTful nature of these slugs:
  /apps/publishing-api.html
- /apis/publishing-api/dependency-resolution.html
+ /apps/publishing-api/dependency-resolution.html

I've also manually tested that private repositories' docs folders are not imported.

Performance impact

  • The compiled search.json is now 6.8 MB in size, compared to the current 5.7 MB, reflecting the extra documents that can be searched. The current one is "only" 1.3 MB when g-zipped, so there should only be a moderate dip in performance. The Middleman setup doesn't support GZip locally so it's hard to say for sure what the impact will be.
  • As a result of the additional GitHub API requests we're making, the build time has increased from about 2m26s to about 3m15s (about a 25% increase). This is unfortunate but necessary, and I would say is still a reasonable build time.

@ChrisBAshton ChrisBAshton force-pushed the import-all-docs branch 4 times, most recently from abe8b9a to f1a3e0d Compare December 29, 2020 15:12
@ChrisBAshton ChrisBAshton marked this pull request as ready for review December 29, 2020 15:29
Copy link
Contributor

@benthorner benthorner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been looking forward to this - now we can see just how much (probably out-of-date 😬) documentation we actually have 😮 ☂️.

I've left a couple of more major comments where I think you've missed a few things. I also wondered: since none of the apps are opting out of consuming docs, can we just delete the code to support that? (It would make it quite a lot simpler.)

Finally, just a minor comment that I wasn't sure what the first commit was about, as it just repeats the diff. I ended up reading it a few times to try and understand it. For me, the important word to include would be "fix" e.g. "Fix returning no docs across all apps".

app/github_repo_fetcher.rb Show resolved Hide resolved
data/applications.yml Show resolved Hide resolved
@@ -10,9 +10,7 @@ def self.resources

def self.api_docs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def self.api_docs
def self.app_docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

app_docs already exists, so I've renamed the existing one to app_overviews (more appropriate, I reckon) and then applied the rename as suggested above.

Copy link
Contributor

@benthorner benthorner Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - I agree that's clearer. Looking again, I see there's a terminology conflict in the use of "docs": sometimes it refers to "docs in the repo"; sometimes to "manual pages". For me, the most confusing instance is AppDocs.apps_with_docs 🤯. One other potential smell here is the encapsulation of AppDocs, which seems to be leaking into ProxyPages - the latter is taking on some of the responsibility of the former by adapting the GitHub response. Maybe making more use of the App class address both these issues?

(This was already an issue, so out-of-scope for this PR. I think it would be good to follow-up on, though, as I imagine it will be confusing for someone looking at this code in future.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppDocs.apps_with_docs has never sat right with me! Agreed, we can refactor that out when we remove the consume_docs_folder functionality.

I seem to remember running into serialisation-related issues when delegating some of ProxyPage's concerns to the App, but will give this another go when that refactor PR comes. 👍

@@ -1,6 +1,6 @@
---
layout: api_layout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
layout: api_layout
layout: app_layout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only suggestion I'm hesitant about, as it involves a UI change. By removing the api_layout, we lose the menu that shows the imported docs, e.g.

Screenshot 2020-12-29 at 16 56 42

I was hoping to tackle UI changes in #2912 instead. I experimented with moving these new menu items to the app_layout but it felt very cluttered, so this felt like a possible area of contention which I wanted to move to a subsequent PR rather than try to tack on here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting. I see the application_layout is currently used here. However, I'm confused about how the layout affects the menu, which also lead me to question: how will this PR affect the current "Apps" menu (the one which shows "Application ownership by team")?

(I think there's definitely scope for improvement in the current navigation: it's really unclear what the relationship between the left sidebar and the header is - they seem to be totally different navigation systems!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The layout affects the menu ('sidebar') here - it sets the content for the sidebar. API layout has its own sidebar which I've left relatively untouched for the purposes of just enabling 'consume docs' functionality across all repos. It therefore looks no different in this branch.

In a spike, I tried removing the API layout and moving the sidebar content into Application layout, but it gets very noisy. I'm taking a different approach in #2912.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Thanks for explaining. And I see the home page sidebar is statically inconsistent with the header links. Lovely! I think I see where you're going in the follow-up PR - I'll be interested to see what you do with the sidebar!

Some apps have no `docs/` folder, in which case GitHubRepoFetcher
returns `nil`. When ProxyPages encountered this `nil`, rather than
just returning an empty array `[]` to the mapped application, it
returned _early_ so that the overall return value for the `api_docs`
method was `[]`, clearing any legitimate app doc values that came
before it.
We'll soon be pulling in docs from all repositories, whether they
are APIs or not. The old link structure therefore no longer makes
sense.

This commit re-slugs imported documents like so:

```diff
  /apps/publishing-api.html
- /apis/publishing-api/dependency-resolution.html
+ /apps/publishing-api/dependency-resolution.html
 ```

This is a more RESTful URL structure.

It also sets up redirects for all existing imported docs. I expect
we'll be able to remove these redirects in a few weeks/months when
we can be sure nothing is linking to them.
This used to act as the hub for collating all external API
documentation, but we're moving towards a model where _all_
external documentation is imported, so 'Apps' will be a better
parent than 'APIs'.
Some docs have different encoding, which breaks Middleman.
This fix takes the same approach as ExternalDoc does here:
https://github.com/alphagov/govuk-developer-docs/blob/34b8e6ebaf85e4c81081dacc722b8f574f9e4699/app/external_doc.rb#L38
`consume_docs_folder` used to default to `false` and was an opt in
functionality. It is now opt-out functionality.
Now that we import _all_ repositories' `docs/` folders, we've run
into an issue whereby any repo without a `docs/` folder returns
`nil`, and we can't call `.each` on `nil`.

This probably indicates that our interface could be improved a bit,
although it is useful to be able to query whether there are any
docs by checking if `nil` rather than counting how many docs are
returned and then concluding that there aren't any docs after all
if 0 is returned. That said, we'll be changing the UI in subsequent
commits, so I'm not too worried about improving the underlying
code just yet.
Tested this by manually setting email-alert-api's 'private_repo'
property to 'true'. We should never be fetching docs from private
repositories.
There was an existing 'app_docs' method, which I've renamed more
appropriately to 'app_overviews', as it represent the 'overview'
page for each app. Example:
https://docs.publishing.service.gov.uk/apps/asset-manager.html

As opposed to 'app docs', which suggests the docs associated with
the apps, e.g.
https://docs.publishing.service.gov.uk/apis/email-alert-api/analytics.html
(which will now become
https://docs.publishing.service.gov.uk/apps/email-alert-api/analytics.html)
Copy link
Contributor Author

@ChrisBAshton ChrisBAshton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @benthorner! RE removing consume_docs_folder functionality... it could certainly be removed, as no repos are currently using it. But I could see it being useful to be able to disable docs being imported per repo. I'd certainly want to keep it around for a little while in case when we roll this out we run into some unforeseen issues :) I can put it on my list to remove the code later if it hasn't been used in X weeks / months?

@benthorner
Copy link
Contributor

No worries - thanks for taking the time to look over my comments! Your replies have been helpful in giving me a bit more context about how this repo works. Just a couple of threads to tie up there, but hopefully nothing major.

What issues do you think might necessitate consume_docs_folder: false? I wonder if we could consider removing it sooner - it should be trivial to add the functionality back if we do need it in future (we'd need to do a PR anyway to update the config for whichever repo needed it). I agree we don't need to remove it in this PR, though 👍.

@ChrisBAshton
Copy link
Contributor Author

What issues do you think might necessitate consume_docs_folder: false?

Perhaps a repo has too many noisy & not particularly useful documents, which are reducing the prominence of other search results. Or an imported document contains a special character that the import process crashes on (not that I know of any that would!). There may be no dev time to actually sit down and consolidate the documents, so it's quicker to just temporarily disable the docs import.

I agree we shouldn't need it for long though. Perhaps just a few days, once this is merged and we can prove it's all working fine on production.

I think I've addressed all your comments - let me know if I've missed anything! I suspect the more involved conversations may happen in #2912! 😄

Copy link
Contributor

@benthorner benthorner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we shouldn't need it for long though. Perhaps just a few days, once this is merged and we can prove it's all working fine on production.

Yup, that sounds reasonable to me.

title: title,
markdown: contents,
title: title.to_s.force_encoding("UTF-8"),
markdown: contents.to_s.force_encoding("UTF-8"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I forgot to suggest this could also have a test to show the kind of thing it's protecting against.

Copy link
Contributor Author

@ChrisBAshton ChrisBAshton Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current test setup is pretty horrific when mocking the actual file contents of a doc. Rather than add to the cruft, I'm hoping to refactor this in a future PR, where I'll retrospectively add a test for UTF-8 conversion 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, that's not very easy to extend 😬. Sounds like we're on the same page with all the follow-up changes 👍.

@ChrisBAshton ChrisBAshton merged commit bab3650 into master Dec 30, 2020
@ChrisBAshton ChrisBAshton deleted the import-all-docs branch December 30, 2020 16:58
ChrisBAshton added a commit that referenced this pull request Jan 4, 2021
Since #2910, _all_ apps have their `docs/` folder imported. We
have not identified a reason why any of them should be opted out,
so to simplify the code & tests, we've now removed the relevant
code.

A side effect of removing the `consume_docs_folder` method is that
we've removed the check for whether the repo in question is a
private repository. This logic is now moved closer to the code
where it matters, i.e. `readme` and `docs`, to prevent us from
accidentally downloading and displaying private content.
ChrisBAshton added a commit that referenced this pull request Jan 6, 2021
Since #2910, _all_ apps have their `docs/` folder imported. We
have not identified a reason why any of them should be opted out,
so to simplify the code & tests, we've now removed the relevant
code.
ChrisBAshton added a commit that referenced this pull request Jan 6, 2021
Since #2910, _all_ apps have their `docs/` folder imported. We
have not identified a reason why any of them should be opted out,
so to simplify the code & tests, we've now removed the relevant
code.
ChrisBAshton added a commit that referenced this pull request Jan 6, 2021
Since #2910, _all_ apps have their `docs/` folder imported. We
have not identified a reason why any of them should be opted out,
so to simplify the code & tests, we've now removed the relevant
code.
ChrisBAshton added a commit that referenced this pull request Jan 6, 2021
Since #2910, _all_ apps have their `docs/` folder imported. We
have not identified a reason why any of them should be opted out,
so to simplify the code & tests, we've now removed the relevant
code.
ChrisBAshton added a commit that referenced this pull request Jan 6, 2021
Since #2910, _all_ apps have their `docs/` folder imported. We
have not identified a reason why any of them should be opted out,
so to simplify the code & tests, we've now removed the relevant
code.
ChrisBAshton added a commit that referenced this pull request Jan 14, 2021
Since #2910, _all_ apps have their `docs/` folder imported. We
have not identified a reason why any of them should be opted out,
so to simplify the code & tests, we've now removed the relevant
code.
ChrisBAshton added a commit to alphagov/email-alert-api that referenced this pull request May 7, 2021
Importing from the govuk-developer-docs so that the docs can live
closer to the code, making it easier to make changes to both at
the same time, as well as to have relevant email-alert-api docs
locally when checking out the repo. This was not possible prior to
alphagov/govuk-developer-docs#2910, which
enabled external doc importing into the govuk-developer-docs.

Future devs can dig into the commit history of the govuk-developer-docs
versions of these files at these URLs:

- https://github.com/alphagov/govuk-developer-docs/blame/2289779c12930502177bc2d43c10dd385c834c90/source/manual/load-test-email-alert-api.html.md
- https://github.com/alphagov/govuk-developer-docs/blame/2289779c12930502177bc2d43c10dd385c834c90/source/manual/receiving-emails-from-email-alert-api-in-integration-and-staging.html.md
- https://github.com/alphagov/govuk-developer-docs/blame/2289779c12930502177bc2d43c10dd385c834c90/source/manual/bulk-email.html.md
ChrisBAshton added a commit that referenced this pull request May 7, 2021
We now have a prefix of `apps/` rather than `apis/`, since
#2910
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants