Define how to track multi-page documents #773

Ndpnt · 2022-03-09T13:51:43Z

Context and Problem Statement

Some documents are divided into several sub-documents accross many web pages.
For example, the Community Guidelines for Twitter or Facebook are divided, whereas those for TikTok are written in one document.
Currently multi-page documents are not tracked.

Solutions considered

Option 1: Create a document type for each sub-documents

For example in the Twitter.json declaration file:

{
  "name": "Twitter",
  "documents": {
    …
    "Community Guidelines - Hateful conduct policy": {
      "fetch": "https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy",
      "select": [ "#twtr-main" ],
      "filters": "removeReturnToTopButton"
    },
    "Community Guidelines - Violent and Graphic Content": {
      "fetch": "https://help.twitter.com/en/rules-and-policies/violent-groups",
      "select": [ "#twtr-main" ],
      "filters": "removeReturnToTopButton"
    }
  }
}

Resulting file structure:

TikTok/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
Twitter/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines - Hateful conduct policy.md
├─ Community Guidelines - Violent and Graphic Content.md

Implications

New document types have to be defined
A convention on how to handle undivided documents have to be defined

Pros:

No new major concepts
Already available, no archivist update needed

Cons:

Multiply documents types
Look like a workaround
May lead to inconsistency if some contributors do not follow the convention on how to handle an undivided document. See remaining questions.

Option 2: Concatenate all sub-documents in one document

For example in the Twitter.json declaration file:

{
  "name": "Twitter",
  "documents": {
    …
    "Community Guidelines": [
      {
        "fetch": "https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy",
        "select": [ "#twtr-main" ],
        "filters": "removeReturnToTopButton"
      },
      {
        "fetch": "https://help.twitter.com/en/rules-and-policies/violent-groups",
        "select": [ "#twtr-main" ],
        "filters": "removeReturnToTopButton"
      }
    ]
  }
}

Resulting file structure:

TikTok/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
Twitter/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md

Pros:

No new major concepts
Simplify document comparison accross different services as there are only one document

Cons:

Break the invariant of one snapshot for one version
Generate a version of a document that do not really exist
May lead to inconsistency as contributors will have to arbitrarily choose the order of sub-documents

Option 3: Allow sub-documents to be defined in one document as sub-document type

{
  "name": "Twitter",
  "documents": {
    …
    "Community Guidelines": {
      "Hateful conduct policy": {
        "fetch": "https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy",
        "select": [ "#twtr-main" ],
        "filters": "removeReturnToTopButton"
      },
      "Violent and Graphic Content": {
        "fetch": "https://help.twitter.com/en/rules-and-policies/glorification-of-violence",
        "select": [ "#twtr-main" ],
        "filters": "removeReturnToTopButton"
      }
    }
  }
}

Resulting file structure:

TikTok/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
Twitter/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines/
│  ├─ Hateful conduct policy.md
│  ├─ Violent and Graphic Content.md

Implications:

Need to define the sub-document type concept and see what it can imply globally
Need to define allowed sub-document types for each document type

Pros:

Relatively straightforward concept

Cons:

New concept increase complexity for new contributors
Contributors may be tempted to split unified documents into several sub-documents

Option 4: Introduce the concept of sections

{
  "name": "Twitter",
  "documents": {
    …
    "Community Guidelines": {
      "sections": {
        "Hateful conduct policy": {
          "fetch": "https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy",
          "select": [ "#twtr-main" ],
          "filters": "removeReturnToTopButton"
        },
        "Violent and Graphic Content": {
          "fetch": "https://help.twitter.com/en/rules-and-policies/glorification-of-violence",
          "select": [ "#twtr-main" ],
          "filters": "removeReturnToTopButton"
        }
      }
    }
  }
}

Or factorized version:

{
  "name": "Twitter",
  "documents": {
    …
    "Community Guidelines": {
      "select": [ "#twtr-main" ],
      "filters": "removeReturnToTopButton",
      "sections": {
        "Hateful conduct policy": {
          "fetch": "https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy"
        },
        "Violent and Graphic Content": {
          "fetch": "https://help.twitter.com/en/rules-and-policies/glorification-of-violence"
        }
      }
    }
  }
}

Resulting file structure:

TikTok/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
Twitter/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines/
│  ├─ Hateful conduct policy.md
│  ├─ Violent and Graphic Content.md

Implications:

Need to define the section concept and see what it can imply globally

Pros:

Section concept can be used in many other documents, not just those divided into several sub-documents
Can increase metadata tracked by OTA

Cons:

New concept increase complexity for new contributors

Remaining questions:

For options 1, 3 and 4, about consistency:

Do already unified documents have to be split in many documents for consistency? (Option B)

Which resulting file structure is expected:

Option A:

TikTok/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
Twitter/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines/
│  ├─ Hateful conduct policy.md
│  ├─ Violent and Graphic Content.md

Option B:

TikTok/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines/
│  ├─ Hateful conduct policy.md
│  ├─ Violent and Graphic Content.md
Twitter/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines/
│  ├─ Hateful conduct policy.md
│  ├─ Violent and Graphic Content.md

For option 2, about snapshots:

How to store snapshots? with suffix in their filename? (like $documentType-part-1.html, $documentType-part-2.html, …)

Which snapshot ID is used as reference for related version?

How do we store snapshot ID used as reference in git version commit?

Possible solutions:

Start tracking Community Guidelines/Hateful conduct policy

This version was recorded after filtering snapshots with Mongo IDs:
  - $id1
  - $id2
  - $id3

Start tracking Community Guidelines/Hateful conduct policy

This version was recorded after filtering snapshots:
  - https://github.com/OpenTermsArchive/snapshots-dating/commit/$id1
  - https://github.com/OpenTermsArchive/snapshots-dating/commit/$id2
  - https://github.com/OpenTermsArchive/snapshots-dating/commit/$id3

For option 3, about nesting:

Which nesting level is allowed?

For option 3 and 4, about storage:

How do we store sub-document in git commit?

Start tracking Community Guidelines/Hateful conduct policy

This version was recorded after filtering snapshot with Mongo $id

How do we store section in git commit?

Start tracking Community Guidelines#Hateful conduct policy

This version was recorded after filtering snapshot with Mongo $id

Questions to bear in mind when choosing an appropriate solution:

What does each solution involve in adding document?
What does each solution involve in document maintenance?
What does each solution imply for the history system, dataset generation and rewriting process?

Some thoughts

After discussion, it seems that option 2 can be abandoned mainly because it generates a document that does not really exist.
Options 3 and 4 seem very similar and it may appear that section and sub-document are different terms for the same underlying concept. But in fact they imply really different things. The concept of a sub-document type is similar to the existing document type, it only adds the concept of nesting. So, sub-document types could be defined and centralized for document types where it makes sense, and use with parsimony. This solution implies no arbitrary choice from contributors. Whereas, the concept of section is more flexible. Sections could be arbitrarily chosen by contributors and it can be used in all document types without having a centralized definition. And even if allowed sections for a document type are defined and centralized to avoid having inconsistency between documents, the concept itself suggest a more open usage.
In the long term, it seems that option 3 and option 4 will coexist as they bring different elements. But in the short term, it seems that option 3 is the most appropriate to the problem from a conceptual point of view.

The text was updated successfully, but these errors were encountered:

martinratinaud · 2022-03-10T05:13:21Z

Thanks for this very detailed explanation.

I believe the sections AND sub-documents types must be centralized as if not, it may result in unuseable datasets.

So sub documents would have to be defined relative to their parent document type.

In that sense, I do not see that much difference between option 3 and 4 anymore but would rather go for the option 4 syntax, which permits the factorizing of select and filters

Ndpnt · 2022-03-14T09:31:01Z

For option 4, I suggest an update to the factorised version as I find it more understandable:

{
  "name": "Twitter",
  "documents": {
    …
    "Community Guidelines": {
      "sections": {
        "select": [ "#twtr-main" ],
        "filters": "removeReturnToTopButton",
        "Hateful conduct policy": {
          "fetch": "https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy"
        },
        "Violent and Graphic Content": {
          "fetch": "https://help.twitter.com/en/rules-and-policies/glorification-of-violence"
        }
      }
    }
  }
}

MattiSG · 2022-03-14T10:56:33Z

We have strong time pressure to support multi-page documents for Community Guidelines in the context of the French presidential election.

A first analysis of the ability to align Community Guidelines subdocuments is not very conclusive: we can cover with shared types between 100% (TikTok, LinkedIn) and 60% (Twitter), through 80% (YouTube, Instagram, Facebook) of Community Guidelines subdocuments. Option 1 would mean losing the non-covered ones; option 2 would mean creating a non-existing, virtual document; options 3 and 4 would mean opening up divergence for documents and making them incomparable. It seems impossible do decide what is most appropriate for Open Terms Archive at this stage.

Thus, we'll use real options to try out both options 1 and 4 in parallel, as they seem to be the most sustainable and the most divergent. If we have enough time, we'll also try option 3.

This means 2 (or 3) instances from experimental feature branches will run in parallel on a dedicated server. We will track documents this way and feed the results to analysts. We will conclude on the effectiveness and relevance of each option end of April.

I will share here data on Community Guidelines alignment this week.

Ndpnt · 2022-03-14T15:36:32Z

As discussed with @clementbiron, we should also track the index page of Community Guidelines as it may contain important content.

It will therefore have an impact on each option as follows:

Option 1:

{
  "name": "Twitter",
  "documents": {
    …
    "Community Guidelines": {
      "fetch": "https://help.twitter.com/en/rules-and-policies",
      "select": [ "#twtr-main" ],
    },
    "Community Guidelines - Hateful conduct policy": {
      "fetch": "https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy",
      "select": [ "#twtr-main" ],
      "filters": "removeReturnToTopButton"
    },
    "Community Guidelines - Violent and Graphic Content": {
      "fetch": "https://help.twitter.com/en/rules-and-policies/violent-groups",
      "select": [ "#twtr-main" ],
      "filters": "removeReturnToTopButton"
    }
  }
}

Resulting file structure:

TikTok/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
Twitter/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
├─ Community Guidelines - Hateful conduct policy.md
├─ Community Guidelines - Violent and Graphic Content.md

Option 2: Not relevant

Option 3:

{
  "name": "Twitter",
  "documents": {
    …
    "Community Guidelines": {
      "fetch": "https://help.twitter.com/en/rules-and-policies",
      "select": "#main",
      "Hateful conduct policy": {
        "fetch": "https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy",
        "select": [ "#twtr-main" ],
        "filters": "removeReturnToTopButton"
      },
      "Violent and Graphic Content": {
        "fetch": "https://help.twitter.com/en/rules-and-policies/glorification-of-violence",
        "select": [ "#twtr-main" ],
        "filters": "removeReturnToTopButton"
      }
    }
  }
}

Resulting file structure:

TikTok/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
Twitter/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
├─ Community Guidelines/
│  ├─ Hateful conduct policy.md
│  ├─ Violent and Graphic Content.md

Option 4:

{
  "name": "Twitter",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://twitter.com/en/privacy",
      "select": ["main"]
    },
    "Community Guidelines": {
      "fetch": "https://help.twitter.com/en/rules-and-policies",
      "select": "#main",
      "sections": {
        "select": [ "#twtr-main" ],
        "filters": "removeReturnToTopButton",
        "Hateful conduct policy": {
          "fetch": "https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy"
        },
        "Violent and Graphic Content": {
          "fetch": "https://help.twitter.com/en/rules-and-policies/glorification-of-violence"
        }
      }
    }
  }
}

Resulting file structure:

TikTok/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
Twitter/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
├─ Community Guidelines/
│  ├─ Hateful conduct policy.md
│  ├─ Violent and Graphic Content.md

MattiSG · 2022-03-14T16:43:52Z

Community guidelines ontology

@Ndpnt collected the titles of the Community Guidelines subtypes of Facebook, Instagram, YouTube, Twitter, LinkedIn and TikTok. I then tried to align each of these subtypes and to give a more generic name that would be fit for Open Terms Archive.

Interesting points

Facebook and Instagram share the exact same subtypes.
LinkedIn and TikTok have only one document with subtitles; it makes sense to use them for sections, but not for separate documents.
Twitter has significantly more entries than all others.

Aligned

When lines have an empty header, this means there is ambiguity as to which platform document should get this type.

I believe that each “subtype” should be prefixed with Community Guidelines — .

Open Terms Archive Candidate Subtype	Facebook & Instagram	YouTube	Twitter	LinkedIn	TikTok
Self-harm	Safety - Suicide and Self-Injury	Sensitive content - Suicide and self-harm	Safety and cybercrime - Suicide and Self-harm Policy	Do not share harmful or shocking material	Suicide, self-harm, and disordered eating
Hate Speech	Objectionable Content - Hate Speech	Violent or dangerous content - Hate speech	Safety and cybercrime - Hateful conduct policy	Do not be hateful	Hateful behavior
Child Sexual Exploitation	Safety - Child Sexual Exploitation, Abuse and Nudity	Sensitive content - Child safety	Safety and cybercrime - Child sexual exploitation policy		Minor Safety
Violence Incitement	Violence And Criminal Behavior - Violence and Incitement	Violent or dangerous content - Harmful or dangerous content	Safety and cybercrime - Glorification of violence policy	Do not threaten, incite, or promote violence	Dangerous acts and challenges
	Objectionable Content - Violent and Graphic Content	Violent or dangerous content - Violent or graphic content	Safety and cybercrime - Sensitive media policy		Violent and graphic content
Violent Organizations	Violence And Criminal Behavior - Dangerous Individuals and Organizations	Violent or dangerous content - Violent criminal organizations	Safety and cybercrime - Violent organizations policy	Do not post terrorist content or promote terrorism	Violent extremism
	Violence And Criminal Behavior - Coordinating Harm and Promoting Crime
Spam	Integrity And Authenticity - Spam	Spam & deceptive practices - Spam, deceptive practices & scams	Platform integrity and authenticity - Platform manipulation and spam policy	Do not engage in spam or scam	Integrity and authenticity
	Violence And Criminal Behavior - Fraud and Deception		Platform integrity and authenticity - Financial scam policy
Regulated Goods	Violence And Criminal Behavior - Restricted Goods and Services	Regulated goods - Sale of illegal or regulated goods or services	Safety and cybercrime - Illegal or certain regulated goods or services		Illegal activities and regulated goods
Harassment	Safety - Bullying and Harassment	Violent or dangerous content - Harassment and cyberbullying	Safety and cybercrime - Abusive behavior	Do not harass or bully	Bullying and harassment
			Platform integrity and authenticity - Coordinated harmful activity
		Regulated goods - Firearms
Misinformation	Integrity And Authenticity - Misinformation	Misinformation - Misinformation		Do not share false or misleading content
		Misinformation - Elections misinformation
		Misinformation - COVID-19 medical misinformation	Platform integrity and authenticity - COVID-19 misleading information policy
		Misinformation - Vaccine misinformation
Intellectual Property	Respecting Intellectual Property - Intellectual Property		Intellectual property - Copyright policy	Respect the intellectual property of others and do not violate the intellectual property rights of others	Copyright and trademark infringement
			Intellectual property - Counterfeit policy
			Intellectual property - Trademark policy
			Intellectual property - Automated copyright claims for live video
Adult Nudity	Objectionable Content - Adult Nudity and Sexual Activity	Sensitive content - Nudity and sexual content			Adult nudity and sexual activities
Sexual Solicitation	Objectionable Content - Sexual Solicitation			Do not engage in unwanted advances
Inauthentic Behaviour / Platform Manipulation	Integrity And Authenticity - Inauthentic Behavior	Spam & deceptive practices - Fake engagement	Platform integrity and authenticity - Platform manipulation and spam policy	Interference with LinkedIn	Platform security
Privacy Violations	Safety - Privacy Violations		Safety and cybercrime - Private information policy	Respect others' privacy
	Integrity And Authenticity - Account Integrity and Authentic Identity		Platform integrity and authenticity - Impersonation policy	Do not create a fake profile or falsify information about yourself
	Safety - Adult Sexual Exploitation		Safety and cybercrime - Non-consensual nudity policy
Reach Amplification			Platform Use Guidelines - About specific instances when a Tweet’s reach may be limited		Ineligible for the For You Feed
Overview			General - The Twitter Rules
Scraping				Unauthorized access and use
Terms Updates			Platform Use Guidelines - Updates to our Terms of Service and Privacy Policy
Deceased Users			General - Deceased individuals

Unclassified

I did not manage to align these documents. They should either be read in full to understand where they could fit, or be left out.

Facebook & Instagram	YouTube	Twitter
Safety - Human Exploitation	Sensitive content - Vulgar language	Platform integrity and authenticity - Distribution of hacked materials policy
Integrity And Authenticity - Cybersecurity	Spam & deceptive practices - Impersonation	Platform integrity and authenticity - Ban evasion policy
Content-Related Requests And Decisions - User Requests	Spam & deceptive practices - External links	Platform integrity and authenticity - Parody, newsfeed, commentary, and fan account policy
Content-Related Requests And Decisions - Additional Protection of Minors	Spam & deceptive practices - Additional policies	Platform integrity and authenticity - Civic integrity policy
Integrity And Authenticity - Memorialization		Platform integrity and authenticity - Synthetic and manipulated media policy
		General - Username squatting policy
		Safety and cybercrime - Violent threats policy

Platform specific

These documents depend on platform features and have no reason to be tracked with a shared name.

With option 1, they would be dropped.

YouTube	Twitter
Sensitive content - Thumbnails	Platform Use Guidelines - Twitter Moments guidelines and principles
Spam & deceptive practices - Playlists	Platform Use Guidelines - Notices on Twitter and what they mean
	Platform Use Guidelines - Curation style guide
	Platform Use Guidelines - Super Follows policy
	Platform Use Guidelines - Ticketed Spaces policy

Country specific

Twitter has a document named “Platform Use Guidelines - Reporting false information in France”.

Twitter

On top of all of the above documents, Twitter goes really deep in specification and also adds some usage guidance that could be considered as parts of a manual.

Platform Use Guidelines - Report violations
Platform Use Guidelines - Our range of enforcement options
Platform Use Guidelines - Fair use policy
Platform Use Guidelines - Content Monetization Standards
Platform Use Guidelines - Guidelines for Promotions on Twitter
Platform Use Guidelines - About search rules and restrictions
Platform Use Guidelines - Twitter, our services, and corporate affiliates
Platform Use Guidelines - How to report security vulnerabilities
Platform Use Guidelines - About Twitter limits
Platform Use Guidelines - Defending and respecting the rights of people using our service
Platform Use Guidelines - About rules and best practices with account behaviors
Platform Use Guidelines - About Twitter’s APIs
Platform Use Guidelines - About government and state-affiliated media account labels on Twitter
Platform Use Guidelines - Automation rules
Platform Use Guidelines - Inactive account policy
Platform Use Guidelines - About country withheld content
Platform Use Guidelines - About public-interest exceptions on Twitter
Platform Use Guidelines - Additional information about data processing
Platform Use Guidelines - Our approach to policy development and enforcement philosophy

MattiSG · 2022-03-15T15:46:14Z

The above table has been implemented in #778.

ckatzenbach · 2022-03-31T16:46:02Z

Great to see these detailed discussions! With a lot of these things we have struggled at www.pga.hiig.de as well – and only responded with manual curation. I am curious to look at the results of the test runs – where do you stand currently with regard to the decision? I must confess that conceptually I am much more inclined to go for option 4 (or 3) than option 1. As part of our work at the PGA we have seen how all major platforms have evolved their community guidelines from single-pages documents into these nested websites of explanation. So I'd very much argue that this is "one thing" but that is has gotten much more complex over the years. And the wording and categorization is also changing over time. So this will remain a challenge – but much better to keep this under the umbrella of "community guidelines" than to have 10-20 separate document type that change names every other year and also re-integrate, bifurcate etc. This space is very much in flux.

MattiSG · 2022-03-31T17:01:25Z

Thanks @ckatzenbach! This idea that this space will keep on evolving is very relevant indeed. Even if we happened to succeed to create an ontology for the current document set, we have to assess the chance that it would be stable over time.

where do you stand currently with regard to the decision?

As mentioned in #773 (comment), we are collecting data and feedback and intend to conclude on the effectiveness and relevance of each option end of April 🙂

pg-adrian · 2022-04-05T21:57:25Z

Hi everyone, I'm Adrian and I worked with @ckatzenbach on the (historical) collection of these multi-page documents for the Platform Governance Archive. As he said, we ran into some of the exact same issues and questions during our collection process so it is very interesting to read your discussion here! Maybe it is helpful for you to hear about our experience and the solution that we ended up with.

The first realization that we had when investigating the historical evolution of platform policies and collecting the documents is that what you called "the ontology of the Community Guidelines" is sometimes not as straightforward as one might expect.

In the case of Facebook, it is still relatively clear from my perspective. Here, the Community Guidelines (initially called 'Content Code of Conduct' then 'Facebook Community Standards') evolved from a document that was completely displayed on one URL into first an interactive document with drop-down sections and then a multi-page document. But even today as it is spread across many different URLs, I think it is very clear that Facebook considers all of these subsites as part of one document: The Facebook Community Standards.

Grouping these subsites into one document, from my perspective, does therefore not mean creating an artificial document. Much rather, the historical evolution of Facebook's Community Standards shows that the multi-page format should not be seen as a splitting up of the Community Guidelines into many sub-parts but much rather the contemporary form of displaying the document and making it easier to navigate for users. From my perspective, it therefore makes sense to puzzle the different Community Standards back together into one document because that's what they are from Facebook's/the user perspective and because that creates a document that can be compared to other platforms' Community Guidelines.

Now in the case of Twitter, it is a bit more hard to define what actually constitutes their Community Guidelines: Do they, as you suggest, encompass all of the 75 subsites currently linked on their "Rules and policies" overview page (https://help.twitter.com/en/rules-and-policies#general)? Or should they rather be understood as "The Twitter Rules" page (https://help.twitter.com/en/rules-and-policies/twitter-rules) and the 18 selected sub policies that are linked there? (This is the option that we went for).

These two options by themselves actually raised an ontological question for the collection: Are the Community Guidelines what platforms define as their Community Guidelines or do they actually encompass all of the platforms' rules that regulate their community in some way? That would mean that also rules or policies that are spelled out in sections of a site which are not part of the officially defined "Community Guidelines", for instance on help pages - as it very often happens - would also form part of a platform's Community Guidelines. For reasons of feasibility and practicability, we opted for taking the platforms own definition of their "Community Guidelines" as the reference point for our collection.

In the case of Twitter, this meant considering "The Twitter Rules" page as their Community Guidelines, because this is what the company has generally and historically considered as their Community Guidelines. Our team member João can explain this decision in more detail because he went deep into the history of the Twitter Rules. Another argument for this approach would be that, as @MattiSG also noted, Twitter's "Rules and Policies" page includes many usage guidance/information pages such as "Updates to our Terms of Service and Privacy Policy" which are probably better classified as help pages than as rules/policies for the community.

For Twitter, we hence decided to collect "The Twitter Rules", meaning that we collected the main page and the first sublevel of the policies that are linked on this page (if I understood it correctly this is what you referred to as nesting level). In practice this meant that we first had to create a timeline which denotes when subpolicies became part or where removed from the Twitter Rules page. It is important to note, that some of these subpages existed before they became part of the Twitter Rules or continue to exist after they are removed from the index pages. I guess as a general takeaway this means that taking an index page as the starting point for the collection entails monitoring when sublinks appear/are removed from this page. This is due to the fact that sections are sometimes merged or added to/removed from the master document.

I have to admit that I did not understand all of the technicalities of your discussion above regarding the difference between option 3 and 4, so I cannot say how all of this influences your decision or speaks in favor of one or the other option. Generally however I would say that:

Compiling subsections from different URLs into one document does not necessarily create an artificial document
In terms of document maintenance, defining an index page as a starting point would entail automatically or manually monitoring when new sublinks/URLs are added to/removed from overarching page
I find your grouping of the subsections very impressive and interesting for the comparison of specific Community Guideline sections but it does in my eyes not erase the meaningfulness of also having one compiled version of all rules
I agree that treating all subsections as their own document as in option 1 probably leads to a level of complexity in which it is hard to keep an overview

I'm very sorry for the length of this post and hope this is in any way helpful for your decision! Its actually quite helpful for us to spell our procedure out again in this discussion :)

MattiSG · 2022-05-04T16:11:08Z

Comparison of implemented options 1 and 4

As announced, we compared the results of running side-by-side implementations of options 1 and 4 for 7 weeks. Here are our results and observations 🙂

Common observations

Most community guidelines document could be tracked within a fixed types ontology, and detecting those changes did yield value to analysts.
Open Terms Archive scaled well with no other modification than additional document types.
Mass changes triggered notifications across many documents, leading to spam, as when Twitter mangled URLs (see OpenTermsArchive/france-elections-versions@3e472cd, OpenTermsArchive/france-elections-versions@d31174e and 5 other documents). This can happen with any other set of documents from the same service, but is made worse with the given implementation since the number of documents is significantly larger.
Listing sections of documents risks pushing contributors towards wanting to list sections for arbitrary document types, which is not supported.

Declarations

For Facebook, the resulting declaration was 101 lines for option 1 vs 83 lines for option 4. You can find them in full below.

Option 1

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "executeClientScripts": true
    },
    "Community Guidelines": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards",
      "select": ["._9ntw"],
      "remove": ["._9nxl", "._9ntv", ".img"]
    },
    "Community Guidelines - Self-harm": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Hate Speech": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Child Exploitation": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Violence Incitement": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Violent Organizations": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Spam": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/spam/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Regulated Goods": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/regulated-goods/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Harassment": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Misinformation": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Intellectual Property": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Adult Nudity": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Sexual Solicitation": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Platform Manipulation": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Privacy Violations": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Deceased Users": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/memorialization/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    }
  }
}

Option 4

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "executeClientScripts": true
    },
    "Community Guidelines": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards",
      "select": ["._9ntw"],
      "remove": ["._9nxl", "._9ntv", ".img"],
      "sections": {
        "select": ["._9nrm", "._9q49", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"],
        "Self-harm": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/"
        },
        "Hate Speech": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/"
        },
        "Child Exploitation": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/"
        },
        "Violence Incitement": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/"
        },
        "Violent Organizations": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/"
        },
        "Spam": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/spam/"
        },
        "Regulated Goods": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/regulated-goods/"
        },
        "Harassment": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/"
        },
        "Misinformation": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/"
        },
        "Intellectual Property": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/"
        },
        "Adult Nudity": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/"
        },
        "Sexual Solicitation": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/"
        },
        "Platform Manipulation": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/"
        },
        "Privacy Violations": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/"
        }
      }
    },
    "Deceased Users": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/memorialization/",
      "select": ["._9nrm", "._9q49", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    }
  }
}

Factoring selectors in option 4 improves readability compared to option 1.
Avoiding repetition of type prefix in option 4 improves readability compared to option 1.

Snapshots and versions

The folder in option 4 is surprising: why is this subtype more present than others?
Having both a folder and a file with the same name in option 4 is surprising: why is this both a document and a folder?

The overrepresentation of Community Guidelines files in option 1 is surprising: why is this subtype more present than others? It also harms readability of the whole folder.

Listing sections of documents risks making the user want to list sections for other document types.

Reliability and maintenance

No difference was measured between the two options.

Conclusion

After implementation, none of the experimented solutions emerge as a clear winner. Along with comments from the PGA team (thanks @pg-adrian for your detailed message 🙇), this reinforces the validity of option 2, where all documents are consolidated into a single one.

The blocking point that was identified was the risk of voiding the promise that documents tracked by Open Terms Archive can hold in court, since the resulting document could not easily be cited as it can not be referenced by a single URL or date.

However, this coud be handled by ensuring snapshots are still integral copies of the documents found online. While admittedly to a lower extent, versions are already “recreated” from snapshots. Thus, as long as the resulting version references its source snapshots properly, it seems acceptable to consolidate them as part of a minimal readability improvement process.

Detailed proposals for implementation of option 2 will be published in this RFC by next week.

MattiSG · 2022-05-04T16:16:05Z

Reframing of problem statement

Declaring, maintaining and analysing the results of tracking community guidelines of several services, along with comments in this RFC, led us to the following reframing. This reframing does not impact the currently explored solution space, but it will hopefully help in avoiding confusion and managing expectations.

Defining pages vs sections

The case of community guidelines is one where service providers have implemented their content sectioning by spreading it across separate web pages.

However, this is not the only solution that they use. Solutions such as accordions have the same aim as splitting across pages: improving legibility. In the case of accordions, we have no problem representing the folded information in a single continuous flow. Similarly, we already split some pages into documents: when Terms of Service and Privacy Policy are on the same webpage, declaration files enable selecting each of those independently through select, even though they share the same fetch source page. This demonstrates that we do prioritise documents over source pages.

In the case of this RFC, most of the proposed solutions made the confusion between sections and pages. This concept was not present in Open Terms Archive until now, because all the documents we handled were inside 1 or 0 page. Community Guidelines demonstrate the possibility of having 2 or more pages constituting one document.

Distinguishing pages and sections support in Open Terms Archive

These topics are different, and it is important not to mix them. One is about how to track a document that is split across multiple pages through its sections. The other is how to annotate sections across a document, no matter if they are on a single page or on many.

Postponing section support

While section support has value, it should not impact the data collection phase: Open Terms Archive is unique in part thanks to its separation of snapshots and versions. Section splitting (or annotations) should be another additional step in the pipeline, one that never puts snapshots authenticity or version consolidation at risk.

There are known users for such a feature: Apolia created their own script to split content last year; ToS;DR does split terms into sections to enable annotating and ranking them.

However, this RFC aims at extending OTA's current solid implementation of document tracking towards multi-page documents. The opportunity of adding section support at the same time is misleading. Such a feature will be handled independently, in its own timeframe.

pg-adrian · 2022-05-05T19:35:44Z

Great to hear that our experiences were helpful for you in reframing the problem statement and specifying your conceptualization of the relationship between documents, sections and pages and looking forward to read about the proposals for the implementation of option 2.

Ndpnt · 2022-05-11T13:43:56Z

Proposals for implementation of option 2

Option 2A:

Declare an array instead of an object for a document type where each entries of this array is a document declaration.

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "executeClientScripts": true
    },
    "Community Guidelines": [
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards",
        "select": ["._9ntw"],
        "remove": ["._9nxl", "._9ntv", ".img"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/spam/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/regulated-goods/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      }
    ]
  }
}

Option 2B:

Declare an array for each document keys inside a document declaration.

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": ["removeEmptyAnchorsLinks", "removeTrackingIDs", "removeLocaleFromUrls"],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": ["removeEmptyAnchorsLinks", "removeTrackingIDs", "removeLocaleFromUrls"],
      "executeClientScripts": true
    },
    "Community Guidelines": {
      "fetch": [
        "https://transparency.fb.com/fr-fr/policies/community-standards", ,
        "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/spam/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/regulated-goods/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/memorialization/"
      ],
      "select": [
        "._9ntw",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c"
      ],
      "remove": [
        "._9nxl, ._9ntv, .img",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_"
      ]
    }
  }
}

Option 2C:

A kind of mix of Option 2a and Option 2b where only the fetch key can accept an array. It allows to factorize select, remove and filters for an array of pages to fetch.

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "executeClientScripts": true
    },
    "Community Guidelines": [
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards",
        "select": ["._9ntw"],
        "remove": ["._9nxl"]
      },
      {
        "fetch": [
          "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/"
        ],
        "select": ["._9nrm"],
        "remove": ["._9p72"]
      },
      {
        "fetch": [
          "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/spam/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/"
        ],
        "select": ["._9ntw"],
        "remove": ["._9nxl", "._9ntv", ".img"]
      }
    ]
  }
}

Option 2D:

Add a pages key to the document declaration which is an array that can accept document declarations. When a required key is not defined, this specific key defined at the root of the document declaration is used. It also allows to factorize select, remove and filters.

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "executeClientScripts": true
    },
    "Community Guidelines": {
      "select": ["._9ntw"],
      "remove": ["._9nxl", "._9ntv", ".img"],
      "pages": [
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/" },
        {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/",
          "select": ["._9nrm", "._9p7c"],
          "remove": ["._9p72"]
        },
        {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/spam/",
          "select": ["._9nrm", "._9p7c"],
          "remove": ["._9p72"]
        },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/regulated-goods/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/" }
      ]
    }
  }
}

MattiSG · 2022-05-15T14:19:48Z

Thank you very much @Ndpnt for this work of consolidation!

Reading through these examples, I am quite clearly in favour of 2D. I like how having a dedicated key makes it explicit and enables validation with no ambiguity: either we have a fetch or we have a pages key, there is no mysterious syntax to know about arrays.

In particular, I dislike 2A and 2C because they are ambiguous on the intention: what does it mean that a document type maps to an array? Are there as many Community Guidelines as there are entries in the array?
2B, while not elegant, at least seems at the right level of nesting to me.

I have one suggestion for improvement though. For the moment, we voluntarily stuck to using verbs for every entry of a document declaration. I would suggest that we keep that behaviour, and use a verb such as merge, assemble, consolidate, join, combine, fuse, meld…

Naming

In order to sort through synonyms, I ran a Google Trends search to find the most common term. The most common term for this operation seems to be “to merge”, followed by “to combine”. I confirmed this by running the same search with “PDF” or “pages” instead of “documents”.

However, I am bit concerned that, in a context of operation where we rely a lot on Git, “merge” becomes ambiguous with the eponymous Git operation, when it is something very different that we want to describe here. Thus, I offer:

Option 2.D.i

Same as 2D, just renaming pages to combine. I also suggest to write the factored keys after the combine key, in order to further distinguish with the (much more common) single-page declarations.

Support a combine key in document declarations, that contains an array of objects with fetch and optionally select, remove, filter keys; in this case, the select, remove, filter specified at the same level as combine are considered as default for every entry in the array.

Formal definition

Redefine document declaration as single-page declaration or multipage declaration.
Define page declaration as almost the same as the current document declaration, with its select key is made optional.
Define single-page declaration as a page declaration with mandatory select.
Define multipage declaration as an object with a mandatory combine key containing at least 2 single-page declarations, and optionally select, remove and filter keys.
- These keys at the multipage declaration level are interpreted as to be applied to each page declaration when they are not defined at that level.

Example

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "executeClientScripts": true
    },
    "Community Guidelines": {
      "combine": [
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/" },
        {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/",
          "select": ["._9nrm", "._9p7c"],
          "remove": ["._9p72"]
        },
        {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/spam/",
          "select": ["._9nrm", "._9p7c"],
          "remove": ["._9p72"]
        },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/regulated-goods/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/" }
      ],
      "select": ["._9ntw"],
      "remove": ["._9nxl", "._9ntv", ".img"]
    }
  }
}

martinratinaud · 2022-05-16T06:21:00Z

Thanks for this very neat and understandable propositions.

i'm in favor of the 2.D.i option which is the easiest to understand.
i'm fine with combine even though (And I agree merge would be too ambiguous in our context)

clementbiron · 2022-05-16T07:05:41Z

Thanks @Ndpnt @MattiSG it is very clear and complete !

I'm also in favor of the 2.D.i and using combine seems to me to be a great idea that allows not to introduce the notion of page 👍

MattiSG · 2022-05-16T07:30:27Z

Looping in @Amustache @LVerneyPEReN @afisher3578 @Manu1400 @streitlua for them to vote on these options or suggest improvements 🙂

Ndpnt · 2022-05-16T07:53:58Z

Thanks @MattiSG for the relevant improvement of the option 2D.

I am wondering if factored keys is easily understandable. 🤔
Should we introduce a specific term to wrap these keys?

It could be share, or with (combine … with …), or another term.

For example:

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "executeClientScripts": true
    },
    "Community Guidelines": {
      "combine": [
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/" },
        {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/",
          "select": ["._9nrm", "._9p7c"],
          "remove": ["._9p72"]
        },
        {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/spam/",
          "select": ["._9nrm", "._9p7c"],
          "remove": ["._9p72"]
        },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/regulated-goods/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/" }
      ],
      "with": {
        "select": ["._9ntw"],
        "remove": ["._9nxl", "._9ntv", ".img"]
      }
    }
  }
}

or

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "executeClientScripts": true
    },
    "Community Guidelines": {
      "combine": {
        "pages": [
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/" },
          {
            "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/",
            "select": ["._9nrm", "._9p7c"],
            "remove": ["._9p72"]
          },
          {
            "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/spam/",
            "select": ["._9nrm", "._9p7c"],
            "remove": ["._9p72"]
          },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/regulated-goods/" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/" }
        ],
        "with": {
          "select": ["._9ntw"],
          "remove": ["._9nxl", "._9ntv", ".img"]
        }
      }
    }
  }
}

MattiSG · 2022-05-16T08:05:16Z

Thanks @Ndpnt for this proposal! It is interesting, but is currently lacking the level of formality that would enable us to debate it properly in the context of an RFC, as it diverges without offering a clear option that we could decide on. Could we please evolve this into a formal option, with a clear naming proposal? 🙂 Thanks!

Ndpnt · 2022-05-16T08:25:51Z

I understand, you are right. I going to do this in the next days.

streitl · 2022-05-17T18:00:50Z

I also like option 2.D.i - it's easy to understand how it works and the writing overheads are minimal.

I agree that the "with" key could improve natural language readability, but I think that it's not necessary. In any case, I prefer @Ndpnt 's second proposition.

Ndpnt · 2022-05-18T07:54:39Z

Option 2.D.i.a

Same as 2.D.i, but with the factorized values made explicit as default values with the suffix …Default, for example selectDefault.

In this context, I suggest writing the defaults key before the combine key, as it is more common to have defaults set before their replacements.

Formal definition

Redefine document declaration as single-page declaration or multipage declaration.
Define page declaration as almost the same as the current document declaration, with its select, remove, filter keys are made optional, only fetch is required.
Define single-page declaration as a page declaration with mandatory fetch and select.
Define multipage declaration as an object with a mandatory combine key containing at least 2 single-page declarations, and optionally selectDefault, removeDefault and filterDefault keys.
- These keys at the multipage declaration level are interpreted as to be applied to each page declaration when they are not defined at that level.
- These keys should be defined before the combine key

Example

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "executeClientScripts": true
    },
    "Community Guidelines": {
      "selectDefault": ["._9ntw"],
      "removeDefault": ["._9nxl", "._9ntv", ".img"],
      "combine": [
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/" },
        {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/",
          "select": ["._9nrm", "._9p7c"],
          "remove": ["._9p72"]
        },
        {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/spam/",
          "select": ["._9nrm", "._9p7c"],
          "remove": ["._9p72"]
        },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/regulated-goods/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/" }
      ]
    }
  }
}

Ndpnt · 2022-05-18T07:54:58Z

Option 2.D.i.b

Same as 2.D.i, but with the factorized values made explicit as default values within a new key defaults.

In this context, I suggest writing the defaults key before the combine key, as it is more common to have defaults set before their replacements.

Formal definition

Redefine document declaration as single-page declaration or multipage declaration.
Define page declaration as almost the same as the current document declaration, with its select, remove, filter keys are made optional, only fetch is required.
Define single-page declaration as a page declaration with mandatory fetch and select.
Define multipage declaration as an object with a mandatory combine key containing at least 2 single-page declarations, and optionally a defaults key. defaults key could contain optional keys select, remove, filter, but at meast one of them is required.
- Keys defined in the defaults key at the multipage declaration level are interpreted as to be applied to each page declaration when they are not defined at that level.
- The key defaults should be defined before the combine key

Example

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "executeClientScripts": true
    },
    "Community Guidelines": {
      "defaults": {
        "select": ["._9ntw"],
        "remove": ["._9nxl", "._9ntv", ".img"],
      },
      "combine": [
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/" },
        {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/",
          "select": ["._9nrm", "._9p7c"],
          "remove": ["._9p72"]
        },
        {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/spam/",
          "select": ["._9nrm", "._9p7c"],
          "remove": ["._9p72"]
        },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/regulated-goods/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/" }
      ]
    }
  }
}

clementbiron · 2022-05-18T08:30:03Z

Thanks Nico for these proposals.

I am in favor of option 2.D.i because in options 2.D.i.a and 2.D.i.b the syntax is too different between a declaration for one page and a multipage declaration. This could be introduce complexity and confusion.

But I would be curious to know the opinion of users who are less used to manipulating this syntax.

Ndpnt · 2022-05-18T08:54:16Z

I am in favor of option 2.D.i because in options 2.D.i.a and 2.D.i.b the syntax is too different between a declaration for one page and a multipage declaration. This could be introduce complexity and confusion.

In option 2.D.i, we have a select and remove without fetch key and I'm not sure it's so obvious for contributors that they are defaults values that will be applied to each page declared in the combine key when they are missing. So, in fact, the syntax is already different and there is a kind of magic. And I think it's better to be expose the magic and be explicit.

martinratinaud · 2022-05-18T11:26:00Z

I voted through emojis as discussed in retrospective.

Also I believe 2.D.i with default at the top is enough and more readable than a defaults key or suffixed Default key

afisher3578 · 2022-05-18T12:03:41Z

I also found option 2.D.i to still be easy to understand, even as someone that is new to this syntax.

MattiSG · 2022-05-19T08:08:46Z

Thanks everyone for your inputs and contributions on this first semi-formal RFC! 💖 I'm glad of the direction we're taking and the good collaboration around it 😊

We'll leave this open until next Tuesday for any additional comments. Until then, let's all try to stay focused on either casting votes on existing propositions, adding new ones formally, or adding objective data points 🙂

MattiSG · 2022-05-19T08:13:26Z

I noticed that, when we had a brief, transient issue with fetching documents on Instagram, we received a huge amount of notifications (and the same when the issue solved itself out) because the number of declared documents was very large in the implementation of option 1. The fact that all of the community guidelines were inaccessible at the same moment, and not other documents, is another hint that they are treated as a single group by platforms. As a maintainer, receiving all these notifications and trying to fix them was made needlessly more complex by having 20 documents instead of a single one.

In my view, this very much goes in favour of concatenating (option 2), which was the path we were already on anyway 😉

MattiSG · 2022-05-19T09:08:05Z

Another example of multi-page document that is not Community Guidelines: AdMob policies and restrictions.

Ndpnt · 2022-06-20T09:15:02Z

Hi all,
As there are no additional comments since one month, I think we can have a first conclusion for this RFC on how to track and declare multi-page documents.

Among all proposed solutions, the one that received the most upvotes is the Option 2.D.i, so this is the option retained.
We will propose an implementation of this solution and test it against reality to see if any adjustments are necessary.

Thanks again to everyone who participated in this RFC.

MattiSG · 2023-01-17T16:22:12Z

This RFC has been fully implemented a few months ago in #891 (congrats @Ndpnt!), and has since then demonstrated its reliability in production in two separate collections (France-Elections and PGA). We are now only missing to close this issue the related user documentation.

MattiSG · 2023-01-17T16:25:09Z

Moved documentation issue to the docs repository: OpenTermsArchive/docs#32.

Thanks again everyone for your contributions 🙇

Ndpnt added the RFC Request for comments label Mar 9, 2022

This comment was marked as off-topic.

Sign in to view

MattiSG mentioned this issue Mar 16, 2022

Create a france-elections instance demonstrating sections support #780

Closed

9 tasks

This was referenced Mar 16, 2022

Add Twitter Community Guidelines OpenTermsArchive/france-elections-declarations#13

Merged

Follow a multi-page document #216

Closed

MattiSG assigned Ndpnt May 3, 2022

MattiSG assigned MattiSG and unassigned Ndpnt Jun 8, 2022

MattiSG removed their assignment Jan 17, 2023

MattiSG mentioned this issue Jan 17, 2023

Document multipage documents declarations OpenTermsArchive/docs#32

Closed

MattiSG closed this as completed Jan 17, 2023

Define how to track multi-page documents #773

Define how to track multi-page documents #773

Comments

Ndpnt commented Mar 9, 2022 • edited Loading

Context and Problem Statement

Solutions considered

Option 1: Create a document type for each sub-documents

Option 2: Concatenate all sub-documents in one document

Option 3: Allow sub-documents to be defined in one document as sub-document type

Option 4: Introduce the concept of sections

Remaining questions:

Questions to bear in mind when choosing an appropriate solution:

Some thoughts

martinratinaud commented Mar 10, 2022

This comment was marked as off-topic.

Ndpnt commented Mar 14, 2022

MattiSG commented Mar 14, 2022

Ndpnt commented Mar 14, 2022

Option 1:

Option 2: Not relevant

Option 3:

Option 4:

MattiSG commented Mar 14, 2022 • edited Loading

Community guidelines ontology

Interesting points

Aligned

Unclassified

Platform specific

Country specific

Twitter

MattiSG commented Mar 15, 2022 • edited Loading

ckatzenbach commented Mar 31, 2022

MattiSG commented Mar 31, 2022

pg-adrian commented Apr 5, 2022 • edited Loading

MattiSG commented May 4, 2022

Comparison of implemented options 1 and 4

Common observations

Declarations

Option 1

Option 4

Snapshots and versions

Reliability and maintenance

Conclusion

MattiSG commented May 4, 2022

Reframing of problem statement

Defining pages vs sections

Distinguishing pages and sections support in Open Terms Archive

Postponing section support

pg-adrian commented May 5, 2022

Ndpnt commented May 11, 2022 • edited by MattiSG Loading

Proposals for implementation of option 2

Option 2A:

Option 2B:

Option 2C:

Option 2D:

MattiSG commented May 15, 2022

Naming

Option 2.D.i

Formal definition

Example

martinratinaud commented May 16, 2022

clementbiron commented May 16, 2022

MattiSG commented May 16, 2022

Ndpnt commented May 16, 2022 • edited Loading

MattiSG commented May 16, 2022

Ndpnt commented May 16, 2022

streitl commented May 17, 2022

Ndpnt commented May 18, 2022

Option 2.D.i.a

Formal definition

Ndpnt commented May 18, 2022

Option 2.D.i.b

Formal definition

clementbiron commented May 18, 2022

Ndpnt commented May 18, 2022

martinratinaud commented May 18, 2022

afisher3578 commented May 18, 2022

MattiSG commented May 19, 2022

MattiSG commented May 19, 2022 • edited Loading

MattiSG commented May 19, 2022

Ndpnt commented Mar 9, 2022 •

edited

Loading

MattiSG commented Mar 14, 2022 •

edited

Loading

MattiSG commented Mar 15, 2022 •

edited

Loading

pg-adrian commented Apr 5, 2022 •

edited

Loading

Ndpnt commented May 11, 2022 •

edited by MattiSG

Loading

Ndpnt commented May 16, 2022 •

edited

Loading

MattiSG commented May 19, 2022 •

edited

Loading