Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use definitions for shared properties in details #509

Merged
merged 25 commits into from Feb 3, 2017
Merged

Use definitions for shared properties in details #509

merged 25 commits into from Feb 3, 2017

Conversation

@fofr
Copy link
Contributor

@fofr fofr commented Feb 2, 2017

Follow up to #505
Please review commit by commit.

Features that share a name should share a definition. This:

  • avoids the risk that a format implements a feature that clashes with
    the name of another feature
  • avoids the risk of implementing the same feature in a slightly
    different way in multiple places
  • makes it harder to write details.json poorly

Add definitions for shared features

  • Try to add useful descriptions for each definition
  • All schema updates are backwards compatible
  • Many definitions match original property name, but others use a new name to represent a definition for a more specific concept (eg documents property uses "attachments_with_thumbnails" definition)
  • Add body definition that let's body be a string or an array containing two strings – govspeak and HTML
  • Add definition for confusing tags property and document usage
  • Policies and finders share many properties, these now use definitions
    • Policies had additionalProperties: true set on facets and filters, but on inspection of content the structure of the data was identical to finders. They've been switched to use the same definitions, removing any differences and reducing risk of drift

Content schema technical debt found

  • first_public_at needs to be replaced with first_published_at, documented in new definitions
  • public_updated_at is provided at the top level but coming_soon and unpublishing formats still use their version in the details hash 4ae1373
  • Travel advice has an ordered list of countries in the details, this could be an ordered list of links now links can be ordered
  • Formats still have withdraw_notices in their details hash, despite being provided at the top level (eg c8b8ee6)
  • The groups feature on service manual topics, topics and mainstream browse was slightly different
    • Topics and mainstream have been made consistent. The schema contains some leftover changes from #203 - Properties marked as deprecated for future removal
    • Created a separate definition for groups that use content_ids rather than paths
    • groups for topics, mainstream browse and service manual could all use content_ids and links hash, similar to the behaviour of groups in document collections
  • Subtle differences between change_notes and child_section_groups (0fddc8b) on manuals and HMRC manuals
    • HMRC manuals are very similar, except they use section_ids for many things – in this case change notes are associated by section_id rather than a section content item or path
  • The email_alert_signup format uses summary for content, it should use description
  • email_signup_link should be using dependency resolution and links, rather than path and URI a2cec84
  • Travel advice contains publishing_request_id but it's not clear that it works, or is being used. According to @gpeng the team wrapped up before completion of this feature. 8f0673c
  • Manual sections (and HMRC manual sections) define their parent using "manual" in the details, this could be "parent" in the links hash
  • Manuals define their organisations in the details, they should use links
  • Finders have a beta flag, it can be set by the publisher and rendered by frontend, no content is currently using and the feature if it remains should switch to using the phase property at the top level
  • HMRC manuals and email alert signups use custom breadcrumb properties
    • No breadcrumbs display on email alert signups, this needs to be fixed, and should use the common patter – basing breadcrumbs on the parent in the links hash
    • HMRC manuals is harder to update as it's based on section_ids and we don't control the publisher
  • Policies uses a custom nation_applicability property but should use the more widely used national_applicability definition

+109,999 −5,483 🤔

* Features that share a name should share a definition
* Avoids the risk that a format implements a feature that clashes with
the name of another feature
* Avoids the risk of implementing the same feature in a slightly
different way in multiple places
* Makes it harder to write `details.json` poorly
* Run validation as part of default Rake task
@fofr fofr force-pushed the force-definitions branch from 9c95daa to 4a53ca4 Feb 2, 2017
@fofr
Copy link
Contributor Author

@fofr fofr commented Feb 2, 2017

Test failure looks unrelated:
/usr/lib/rbenv/versions/2.2.3/bin/ruby: No such file or directory -- test/unit/published_edition_presenter_test.rb (LoadError)

I think that's related to #491

@fofr fofr requested review from h-lame, gpeng, tijmenb and jennyd Feb 2, 2017
@tijmenb
tijmenb approved these changes Feb 2, 2017
Copy link
Contributor

@tijmenb tijmenb left a comment

This is fantastic 🥇

I've reviewed the individual commits and it all looks good.

The big increase in lines is expected because we ship the entire definitions.json with each schema. One solution would be to rewrite the schema generation to only include the schemas we need.

@36degrees
Copy link
Member

@36degrees 36degrees commented Feb 3, 2017

Looks great.

Publisher is now being tested as part of continuous-integration/jenkins/branch on new CI – you can safely ignore the "Verify publisher against content schemas" failure which is coming from old CI.

It'd be good to get a better understanding of the groups technical debt which affects the service manual, so I can get a card in our backlog to resolve it 👍

Copy link
Contributor

@h-lame h-lame left a comment

My comments are definitely not blockers, more for discussion. I'm not 100% sure about the things I've suggested we could merge, as I don't know if the titles and section_ids that cause differences between hmrc and govuk manuals have important semantic meanings even if they're ignored or treated as optional by the frontend rendering app.

@@ -7,7 +7,7 @@
],
"properties": {
"body": {
"type": "string"
"$ref": "#/definitions/body"

This comment has been minimized.

@h-lame

h-lame Feb 3, 2017
Contributor

This feels mildly dangerous. The frontend for rendering hmrc manuals doesn't understand that a body can be anything but a string. If someone started submitting manuals with a multi-part govspeak+html body the schema would be accepted, but the frontend would break. I don't know if other frontend apps would behave the same for bodies that used to be strings but now might be multi-part.

This comment has been minimized.

@fofr

fofr Feb 3, 2017
Author Contributor

Good point. I will split this into two definitions, one for each.

This comment has been minimized.

@fofr

fofr Feb 3, 2017
Author Contributor

Updated in 1d1176d

This comment has been minimized.

@h-lame

h-lame Feb 3, 2017
Contributor

👍

@@ -13,17 +13,7 @@
"minProperties": 1,
"properties": {
"tags": {
"type": "object",
"description": "DEPRECATED: The tags used to match subscribers lists",

This comment has been minimized.

@h-lame

h-lame Feb 3, 2017
Contributor

We're losing these custom descriptions. Is it possible to keep them?

This comment has been minimized.

@fofr

fofr Feb 3, 2017
Author Contributor

I left these out as I wasn't certain they applied to all uses of tags. Happy to put them back in.

This comment has been minimized.

@fofr

fofr Feb 3, 2017
Author Contributor

On closer inspection, this description was replaced with the following, which I think is clearer:

Field used by email-alert-api to trigger email alerts for subscriptions to topics (gov.uk/topic) and policies (gov.uk/policies).

This comment has been minimized.

@h-lame

h-lame Feb 3, 2017
Contributor

I think that description is good, I guess I was concerned by the "DEPRECATED" bit from here which we've lost (I assume this has some semantic meaning about these formats where publisher will stop sending tags soon?) and the description of tags on hmrc_manuals which doesn't indicate its got anything to do with email-alert-api (although it may do - I'm just not sure).

This comment has been minimized.

@h-lame

h-lame Feb 3, 2017
Contributor

Oh. Seems that "tags" were removed from the hmrc-manuals-api (see alphagov/hmrc-manuals-api#128) that translates from the hmrc publisher and publishing-api so maybe we can drop tags from that schema entirely. I think the "DEPRECATED" tag might still be useful, although it's more about the details/tags key than the definition per se.

This comment has been minimized.

@fofr

fofr Feb 3, 2017
Author Contributor

Should it be removed from the HMRC manuals schema altogether? (in a separate PR)

This comment has been minimized.

@h-lame

h-lame Feb 3, 2017
Contributor

I think removed from hmrc manuals in a separate PR makes a lot of sense, yes. Will give us a chance to make sure there aren't any copies lurking in the content-store that still have it, which we can republish if needs be.

@@ -526,6 +526,30 @@
"$ref": "#/definitions/absolute_path"
}
}
},
"manual_organisations": {
"description": "A manual’s organisations. TODO: Switch to use organisations in links",

This comment has been minimized.

@h-lame

h-lame Feb 3, 2017
Contributor

I think we might be able to drop these already. hmrc manuals don't look like they have the organisation in the details hash at all (but do have it in the links) and govuk manuals appear to have both. We control govuk manuals so we could drop it and republish things relatively easily. The frontend has already been updated to prefer the links.organisation over the details.organisation (see: alphagov/manuals-frontend#169).

"type": "object",
"additionalProperties": false,
"required": [
"section_id",

This comment has been minimized.

@h-lame

h-lame Feb 3, 2017
Contributor

We could merge these definitions as although section_id is "required" for hmrc manuals change notes - it's not actually used by manuals-frontend when rendering, so a single definition with optional section_id would work.

This comment has been minimized.

@fofr

fofr Feb 3, 2017
Author Contributor

Yay 👍 , I will add as a further commit.

This comment has been minimized.

@fofr

fofr Feb 3, 2017
Author Contributor

I'm going to approach this in a separate PR, I'd prefer to minimise the risk with this larger set of changes.

This comment has been minimized.

@h-lame

h-lame Feb 3, 2017
Contributor

Fair enough.

}
}
}
}

This comment has been minimized.

@h-lame

h-lame Feb 3, 2017
Contributor

Like the previous definition for change notes I think we could merge these; the rendering app handles missing titles for groups and section_ids for sections so we can make both optional. One fun quirk is that manuals-frontend has different rendering paths for hmrc manuals vs govuk manuals and doesn't render group titles at all for govuk even though they are required by the schema (I think because all govuk manuals have one group and the title is always "Contents" - the publisher doesn't let you create groups or give them titles).

@@ -9,7 +9,7 @@
],
"properties": {
"beta": {
"type": "boolean"
"$ref": "#/definitions/finder_beta"

This comment has been minimized.

@h-lame

h-lame Feb 3, 2017
Contributor

Is it going to be a problem that this now allows null, when before it had to be true or false? Is it worth checking the rendering app for this - if it uses the same finder frontend it's probably fine.

This comment has been minimized.

This comment has been minimized.

@h-lame

h-lame Feb 3, 2017
Contributor

👍

@gpeng
gpeng approved these changes Feb 3, 2017
Copy link
Contributor

@gpeng gpeng left a comment

Great work. Nice one 👍

fofr added 19 commits Feb 1, 2017
* Switch all uses of body to use the definition
* Body can either be a string, or an object with an HTML string and a
govspeak string
* Manuals and Service Manual use body for something that is not the
main content of the page. This could be switched to something more
appropriate.
* Use definitions with descriptions for these fields
* Document why both exist and the path to removing one of them
Withdrawn notices are being removed from details as they are now top
level. Until this is finally removed, use definitions to prevent drift.
The usage of this property is unclear. We tried to remove it once
before but had to revert:
#292
* Coming soon and unpublishing formats use public_updated_at in the
details, should instead use the top-level version
* Travel advice index should switch to using links for countries now
they can be ordered
The “documents” concept is shared between formats with lists of
attachments and collections which has a list of document links.

* Create a new definition for attachments
These links appear to point at content items. Include note about
switching to use the links hash instead, which may be more reliable.
There are slight inconsistencies in the property:
* Topic and mainstream browse page should be the same, but the schema
contains some leftover changes from
#203 - include
those properties in definition but mark as deprecated for later removal
* Create a separate definition for service manual which uses content
IDs rather than paths for its lists
Make it clearer that internal_name relates specifically to taxonomy
admin tools.
This feature was built before organisations were provided in links with
dependency resolution.
* Create separate definitions for each and document the subtle
difference
Create separate definitions for the slightly different features between
the two. Document differences in description.
* Create definitions for features shared between policies and finders
* Remove `additionalProperties: true` from policy filters and policy
facet item, make them conform to the same definition as finders.
Checked that all published policies confirm to finder rules.
Note that this could still be used but that the `phase` label at the
top level is now favoured.
There’s a subtle difference between this summary and the finder ones.
Here the summary is marked as required and can’t be set to null.

This summary behaves like a description or a body and should use one of
those fields instead (description is empty)
fofr added 5 commits Feb 2, 2017
* HMRC manuals uses sections to generate breadcrumb
* Email alert signup has a custom field that never worked:
alphagov/email-alert-frontend#28
* No published content uses it
* It is not set by the publisher
* It is not rendered by the frontend
It’s not commonly used, but it’s shared between placeholders and the
service manual, with a suggestion it’s used for emails.

It’s also provided at the top level but doesn’t appear to be used there
yet.
Switch to the already defined definition
@fofr fofr force-pushed the force-definitions branch from 4a53ca4 to 4508bc7 Feb 3, 2017
@fofr
Copy link
Contributor Author

@fofr fofr commented Feb 3, 2017

It's worth noting that this PR hasn't looked for cases of the same feature being used across formats but with different property names.

@h-lame
h-lame approved these changes Feb 3, 2017
@fofr fofr merged commit a5aaa2e into master Feb 3, 2017
14 checks passed
14 checks passed
Verify calendars against content schemas Build #572 succeeded on Jenkins
Details
Verify contacts-admin against content schemas Build #1775 succeeded on Jenkins
Details
Verify contacts-frontend against content schemas Build #1320 succeeded on Jenkins
Details
Verify government-frontend against content schemas Build #1802 succeeded on Jenkins
Details
Verify govuk_schemas_gem against content schemas Build #494 succeeded on Jenkins
Details
Verify hmrc-manuals-api against content schemas Build #1459 succeeded on Jenkins
Details
Verify licence-finder against content schemas Build #418 succeeded on Jenkins
Details
Verify manuals-frontend against schema examples Build #1635 succeeded on Jenkins
Details
Verify manuals-publisher against content schemas Build #1783 succeeded on Jenkins
Details
Verify multipage-frontend against content schemas Build #266 succeeded on Jenkins
Details
Verify specialist-frontend against content schemas Build #1552 succeeded on Jenkins
Details
Verify static components against schema examples Build #1177 succeeded on Jenkins
Details
Verify travel-advice-publisher against content schemas Build #1474 succeeded on Jenkins
Details
continuous-integration/jenkins/branch This commit looks good
Details
@fofr fofr deleted the force-definitions branch Feb 3, 2017
@fofr
Copy link
Contributor Author

@fofr fofr commented Feb 10, 2017

Content schema tech debt captured in a Trello board here:
https://trello.com/b/BOULnRrS/content-schema-tech-debt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.