Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define how to track multi-page documents #773

Closed
Ndpnt opened this issue Mar 9, 2022 · 34 comments
Closed

Define how to track multi-page documents #773

Ndpnt opened this issue Mar 9, 2022 · 34 comments
Labels
RFC Request for comments

Comments

@Ndpnt
Copy link
Member

Ndpnt commented Mar 9, 2022

Context and Problem Statement

Some documents are divided into several sub-documents accross many web pages.
For example, the Community Guidelines for Twitter or Facebook are divided, whereas those for TikTok are written in one document.
Currently multi-page documents are not tracked.

Solutions considered

Option 1: Create a document type for each sub-documents

For example in the Twitter.json declaration file:

{
  "name": "Twitter",
  "documents": {
    
    "Community Guidelines - Hateful conduct policy": {
      "fetch": "https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy",
      "select": [ "#twtr-main" ],
      "filters": "removeReturnToTopButton"
    },
    "Community Guidelines - Violent and Graphic Content": {
      "fetch": "https://help.twitter.com/en/rules-and-policies/violent-groups",
      "select": [ "#twtr-main" ],
      "filters": "removeReturnToTopButton"
    }
  }
}

Resulting file structure:

TikTok/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
Twitter/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines - Hateful conduct policy.md
├─ Community Guidelines - Violent and Graphic Content.md

Implications

  • New document types have to be defined
  • A convention on how to handle undivided documents have to be defined

Pros:

  • No new major concepts
  • Already available, no archivist update needed

Cons:

  • Multiply documents types
  • Look like a workaround
  • May lead to inconsistency if some contributors do not follow the convention on how to handle an undivided document. See remaining questions.

Option 2: Concatenate all sub-documents in one document

For example in the Twitter.json declaration file:

{
  "name": "Twitter",
  "documents": {
    
    "Community Guidelines": [
      {
        "fetch": "https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy",
        "select": [ "#twtr-main" ],
        "filters": "removeReturnToTopButton"
      },
      {
        "fetch": "https://help.twitter.com/en/rules-and-policies/violent-groups",
        "select": [ "#twtr-main" ],
        "filters": "removeReturnToTopButton"
      }
    ]
  }
}

Resulting file structure:

TikTok/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
Twitter/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md

Pros:

  • No new major concepts
  • Simplify document comparison accross different services as there are only one document

Cons:

  • Break the invariant of one snapshot for one version
  • Generate a version of a document that do not really exist
  • May lead to inconsistency as contributors will have to arbitrarily choose the order of sub-documents

Option 3: Allow sub-documents to be defined in one document as sub-document type

{
  "name": "Twitter",
  "documents": {
    
    "Community Guidelines": {
      "Hateful conduct policy": {
        "fetch": "https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy",
        "select": [ "#twtr-main" ],
        "filters": "removeReturnToTopButton"
      },
      "Violent and Graphic Content": {
        "fetch": "https://help.twitter.com/en/rules-and-policies/glorification-of-violence",
        "select": [ "#twtr-main" ],
        "filters": "removeReturnToTopButton"
      }
    }
  }
}

Resulting file structure:

TikTok/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
Twitter/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines/
│  ├─ Hateful conduct policy.md
│  ├─ Violent and Graphic Content.md

Implications:

  • Need to define the sub-document type concept and see what it can imply globally
  • Need to define allowed sub-document types for each document type

Pros:

  • Relatively straightforward concept

Cons:

  • New concept increase complexity for new contributors
  • Contributors may be tempted to split unified documents into several sub-documents

Option 4: Introduce the concept of sections

{
  "name": "Twitter",
  "documents": {
    
    "Community Guidelines": {
      "sections": {
        "Hateful conduct policy": {
          "fetch": "https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy",
          "select": [ "#twtr-main" ],
          "filters": "removeReturnToTopButton"
        },
        "Violent and Graphic Content": {
          "fetch": "https://help.twitter.com/en/rules-and-policies/glorification-of-violence",
          "select": [ "#twtr-main" ],
          "filters": "removeReturnToTopButton"
        }
      }
    }
  }
}

Or factorized version:

{
  "name": "Twitter",
  "documents": {
    
    "Community Guidelines": {
      "select": [ "#twtr-main" ],
      "filters": "removeReturnToTopButton",
      "sections": {
        "Hateful conduct policy": {
          "fetch": "https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy"
        },
        "Violent and Graphic Content": {
          "fetch": "https://help.twitter.com/en/rules-and-policies/glorification-of-violence"
        }
      }
    }
  }
}

Resulting file structure:

TikTok/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
Twitter/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines/
│  ├─ Hateful conduct policy.md
│  ├─ Violent and Graphic Content.md

Implications:

  • Need to define the section concept and see what it can imply globally

Pros:

  • Section concept can be used in many other documents, not just those divided into several sub-documents
  • Can increase metadata tracked by OTA

Cons:

  • New concept increase complexity for new contributors

Remaining questions:

  • For options 1, 3 and 4, about consistency:

    Do already unified documents have to be split in many documents for consistency? (Option B)

    Which resulting file structure is expected:

    Option A:

    TikTok/
    ├─ Privacy Policy.md
    ├─ Terms of Service.md
    ├─ Community Guidelines.md
    Twitter/
    ├─ Privacy Policy.md
    ├─ Terms of Service.md
    ├─ Community Guidelines/
    │  ├─ Hateful conduct policy.md
    │  ├─ Violent and Graphic Content.md
    

    Option B:

    TikTok/
    ├─ Privacy Policy.md
    ├─ Terms of Service.md
    ├─ Community Guidelines/
    │  ├─ Hateful conduct policy.md
    │  ├─ Violent and Graphic Content.md
    Twitter/
    ├─ Privacy Policy.md
    ├─ Terms of Service.md
    ├─ Community Guidelines/
    │  ├─ Hateful conduct policy.md
    │  ├─ Violent and Graphic Content.md
    
  • For option 2, about snapshots:

How to store snapshots? with suffix in their filename? (like $documentType-part-1.html, $documentType-part-2.html, …)

Which snapshot ID is used as reference for related version?

How do we store snapshot ID used as reference in git version commit?

Possible solutions:

Start tracking Community Guidelines/Hateful conduct policy

This version was recorded after filtering snapshots with Mongo IDs:
  - $id1
  - $id2
  - $id3
Start tracking Community Guidelines/Hateful conduct policy

This version was recorded after filtering snapshots:
  - https://github.com/OpenTermsArchive/snapshots-dating/commit/$id1
  - https://github.com/OpenTermsArchive/snapshots-dating/commit/$id2
  - https://github.com/OpenTermsArchive/snapshots-dating/commit/$id3
  • For option 3, about nesting:

    Which nesting level is allowed?

  • For option 3 and 4, about storage:

    How do we store sub-document in git commit?

    Start tracking Community Guidelines/Hateful conduct policy
    
    This version was recorded after filtering snapshot with Mongo $id
    

    How do we store section in git commit?

    Start tracking Community Guidelines#Hateful conduct policy
    
    This version was recorded after filtering snapshot with Mongo $id
    

Questions to bear in mind when choosing an appropriate solution:

  • What does each solution involve in adding document?
  • What does each solution involve in document maintenance?
  • What does each solution imply for the history system, dataset generation and rewriting process?

Some thoughts

  • After discussion, it seems that option 2 can be abandoned mainly because it generates a document that does not really exist.
  • Options 3 and 4 seem very similar and it may appear that section and sub-document are different terms for the same underlying concept. But in fact they imply really different things. The concept of a sub-document type is similar to the existing document type, it only adds the concept of nesting. So, sub-document types could be defined and centralized for document types where it makes sense, and use with parsimony. This solution implies no arbitrary choice from contributors. Whereas, the concept of section is more flexible. Sections could be arbitrarily chosen by contributors and it can be used in all document types without having a centralized definition. And even if allowed sections for a document type are defined and centralized to avoid having inconsistency between documents, the concept itself suggest a more open usage.
  • In the long term, it seems that option 3 and option 4 will coexist as they bring different elements. But in the short term, it seems that option 3 is the most appropriate to the problem from a conceptual point of view.
@Ndpnt Ndpnt added the RFC Request for comments label Mar 9, 2022
@martinratinaud
Copy link
Member

Thanks for this very detailed explanation.

I believe the sections AND sub-documents types must be centralized as if not, it may result in unuseable datasets.

So sub documents would have to be defined relative to their parent document type.

In that sense, I do not see that much difference between option 3 and 4 anymore but would rather go for the option 4 syntax, which permits the factorizing of select and filters

@martinratinaud

This comment was marked as off-topic.

@Ndpnt
Copy link
Member Author

Ndpnt commented Mar 14, 2022

For option 4, I suggest an update to the factorised version as I find it more understandable:

{
  "name": "Twitter",
  "documents": {
    
    "Community Guidelines": {
      "sections": {
        "select": [ "#twtr-main" ],
        "filters": "removeReturnToTopButton",
        "Hateful conduct policy": {
          "fetch": "https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy"
        },
        "Violent and Graphic Content": {
          "fetch": "https://help.twitter.com/en/rules-and-policies/glorification-of-violence"
        }
      }
    }
  }
}

@MattiSG
Copy link
Member

MattiSG commented Mar 14, 2022

We have strong time pressure to support multi-page documents for Community Guidelines in the context of the French presidential election.

A first analysis of the ability to align Community Guidelines subdocuments is not very conclusive: we can cover with shared types between 100% (TikTok, LinkedIn) and 60% (Twitter), through 80% (YouTube, Instagram, Facebook) of Community Guidelines subdocuments. Option 1 would mean losing the non-covered ones; option 2 would mean creating a non-existing, virtual document; options 3 and 4 would mean opening up divergence for documents and making them incomparable. It seems impossible do decide what is most appropriate for Open Terms Archive at this stage.

Thus, we'll use real options to try out both options 1 and 4 in parallel, as they seem to be the most sustainable and the most divergent. If we have enough time, we'll also try option 3.

This means 2 (or 3) instances from experimental feature branches will run in parallel on a dedicated server. We will track documents this way and feed the results to analysts. We will conclude on the effectiveness and relevance of each option end of April.

I will share here data on Community Guidelines alignment this week.

@Ndpnt
Copy link
Member Author

Ndpnt commented Mar 14, 2022

As discussed with @clementbiron, we should also track the index page of Community Guidelines as it may contain important content.

It will therefore have an impact on each option as follows:

Option 1:

{
  "name": "Twitter",
  "documents": {
    
    "Community Guidelines": {
      "fetch": "https://help.twitter.com/en/rules-and-policies",
      "select": [ "#twtr-main" ],
    },
    "Community Guidelines - Hateful conduct policy": {
      "fetch": "https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy",
      "select": [ "#twtr-main" ],
      "filters": "removeReturnToTopButton"
    },
    "Community Guidelines - Violent and Graphic Content": {
      "fetch": "https://help.twitter.com/en/rules-and-policies/violent-groups",
      "select": [ "#twtr-main" ],
      "filters": "removeReturnToTopButton"
    }
  }
}

Resulting file structure:

TikTok/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
Twitter/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
├─ Community Guidelines - Hateful conduct policy.md
├─ Community Guidelines - Violent and Graphic Content.md

Option 2: Not relevant

Option 3:

{
  "name": "Twitter",
  "documents": {
    
    "Community Guidelines": {
      "fetch": "https://help.twitter.com/en/rules-and-policies",
      "select": "#main",
      "Hateful conduct policy": {
        "fetch": "https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy",
        "select": [ "#twtr-main" ],
        "filters": "removeReturnToTopButton"
      },
      "Violent and Graphic Content": {
        "fetch": "https://help.twitter.com/en/rules-and-policies/glorification-of-violence",
        "select": [ "#twtr-main" ],
        "filters": "removeReturnToTopButton"
      }
    }
  }
}

Resulting file structure:

TikTok/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
Twitter/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
├─ Community Guidelines/
│  ├─ Hateful conduct policy.md
│  ├─ Violent and Graphic Content.md

Option 4:

{
  "name": "Twitter",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://twitter.com/en/privacy",
      "select": ["main"]
    },
    "Community Guidelines": {
      "fetch": "https://help.twitter.com/en/rules-and-policies",
      "select": "#main",
      "sections": {
        "select": [ "#twtr-main" ],
        "filters": "removeReturnToTopButton",
        "Hateful conduct policy": {
          "fetch": "https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy"
        },
        "Violent and Graphic Content": {
          "fetch": "https://help.twitter.com/en/rules-and-policies/glorification-of-violence"
        }
      }
    }
  }
}

Resulting file structure:

TikTok/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
Twitter/
├─ Privacy Policy.md
├─ Terms of Service.md
├─ Community Guidelines.md
├─ Community Guidelines/
│  ├─ Hateful conduct policy.md
│  ├─ Violent and Graphic Content.md

@MattiSG
Copy link
Member

MattiSG commented Mar 14, 2022

Community guidelines ontology

@Ndpnt collected the titles of the Community Guidelines subtypes of Facebook, Instagram, YouTube, Twitter, LinkedIn and TikTok. I then tried to align each of these subtypes and to give a more generic name that would be fit for Open Terms Archive.

Interesting points

  • Facebook and Instagram share the exact same subtypes.
  • LinkedIn and TikTok have only one document with subtitles; it makes sense to use them for sections, but not for separate documents.
  • Twitter has significantly more entries than all others.

Aligned

When lines have an empty header, this means there is ambiguity as to which platform document should get this type.

I believe that each “subtype” should be prefixed with Community Guidelines — .

Open Terms Archive Candidate Subtype Facebook & Instagram YouTube Twitter LinkedIn TikTok
Self-harm Safety - Suicide and Self-Injury Sensitive content - Suicide and self-harm Safety and cybercrime - Suicide and Self-harm Policy Do not share harmful or shocking material Suicide, self-harm, and disordered eating
Hate Speech Objectionable Content - Hate Speech Violent or dangerous content - Hate speech Safety and cybercrime - Hateful conduct policy Do not be hateful Hateful behavior
Child Sexual Exploitation Safety - Child Sexual Exploitation, Abuse and Nudity Sensitive content - Child safety Safety and cybercrime - Child sexual exploitation policy Minor Safety
Violence Incitement Violence And Criminal Behavior - Violence and Incitement Violent or dangerous content - Harmful or dangerous content Safety and cybercrime - Glorification of violence policy Do not threaten, incite, or promote violence Dangerous acts and challenges
Objectionable Content - Violent and Graphic Content Violent or dangerous content - Violent or graphic content Safety and cybercrime - Sensitive media policy Violent and graphic content
Violent Organizations Violence And Criminal Behavior - Dangerous Individuals and Organizations Violent or dangerous content - Violent criminal organizations Safety and cybercrime - Violent organizations policy Do not post terrorist content or promote terrorism Violent extremism
Violence And Criminal Behavior - Coordinating Harm and Promoting Crime
Spam Integrity And Authenticity - Spam Spam & deceptive practices - Spam, deceptive practices & scams Platform integrity and authenticity - Platform manipulation and spam policy Do not engage in spam or scam Integrity and authenticity
Violence And Criminal Behavior - Fraud and Deception Platform integrity and authenticity - Financial scam policy
Regulated Goods Violence And Criminal Behavior - Restricted Goods and Services Regulated goods - Sale of illegal or regulated goods or services Safety and cybercrime - Illegal or certain regulated goods or services Illegal activities and regulated goods
Harassment Safety - Bullying and Harassment Violent or dangerous content - Harassment and cyberbullying Safety and cybercrime - Abusive behavior Do not harass or bully Bullying and harassment
Platform integrity and authenticity - Coordinated harmful activity
Regulated goods - Firearms
Misinformation Integrity And Authenticity - Misinformation Misinformation - Misinformation Do not share false or misleading content
Misinformation - Elections misinformation
Misinformation - COVID-19 medical misinformation Platform integrity and authenticity - COVID-19 misleading information policy
Misinformation - Vaccine misinformation
Intellectual Property Respecting Intellectual Property - Intellectual Property Intellectual property - Copyright policy Respect the intellectual property of others and do not violate the intellectual property rights of others Copyright and trademark infringement
Intellectual property - Counterfeit policy
Intellectual property - Trademark policy
Intellectual property - Automated copyright claims for live video
Adult Nudity Objectionable Content - Adult Nudity and Sexual Activity Sensitive content - Nudity and sexual content Adult nudity and sexual activities
Sexual Solicitation Objectionable Content - Sexual Solicitation Do not engage in unwanted advances
Inauthentic Behaviour / Platform Manipulation Integrity And Authenticity - Inauthentic Behavior Spam & deceptive practices - Fake engagement Platform integrity and authenticity - Platform manipulation and spam policy Interference with LinkedIn Platform security
Privacy Violations Safety - Privacy Violations Safety and cybercrime - Private information policy Respect others' privacy
Integrity And Authenticity - Account Integrity and Authentic Identity Platform integrity and authenticity - Impersonation policy Do not create a fake profile or falsify information about yourself
Safety - Adult Sexual Exploitation Safety and cybercrime - Non-consensual nudity policy
Reach Amplification Platform Use Guidelines - About specific instances when a Tweet’s reach may be limited Ineligible for the For You Feed
Overview General - The Twitter Rules
Scraping Unauthorized access and use
Terms Updates Platform Use Guidelines - Updates to our Terms of Service and Privacy Policy
Deceased Users General - Deceased individuals

Unclassified

I did not manage to align these documents. They should either be read in full to understand where they could fit, or be left out.

Facebook & Instagram YouTube Twitter
Safety - Human Exploitation Sensitive content - Vulgar language Platform integrity and authenticity - Distribution of hacked materials policy
Integrity And Authenticity - Cybersecurity Spam & deceptive practices - Impersonation Platform integrity and authenticity - Ban evasion policy
Content-Related Requests And Decisions - User Requests Spam & deceptive practices - External links Platform integrity and authenticity - Parody, newsfeed, commentary, and fan account policy
Content-Related Requests And Decisions - Additional Protection of Minors Spam & deceptive practices - Additional policies Platform integrity and authenticity - Civic integrity policy
Integrity And Authenticity - Memorialization Platform integrity and authenticity - Synthetic and manipulated media policy
General - Username squatting policy
Safety and cybercrime - Violent threats policy

Platform specific

These documents depend on platform features and have no reason to be tracked with a shared name.

With option 1, they would be dropped.

YouTube Twitter
Sensitive content - Thumbnails Platform Use Guidelines - Twitter Moments guidelines and principles
Spam & deceptive practices - Playlists Platform Use Guidelines - Notices on Twitter and what they mean
  Platform Use Guidelines - Curation style guide
  Platform Use Guidelines - Super Follows policy
  Platform Use Guidelines - Ticketed Spaces policy

Country specific

Twitter has a document named “Platform Use Guidelines - Reporting false information in France”.

Twitter

On top of all of the above documents, Twitter goes really deep in specification and also adds some usage guidance that could be considered as parts of a manual.

  • Platform Use Guidelines - Report violations
  • Platform Use Guidelines - Our range of enforcement options
  • Platform Use Guidelines - Fair use policy
  • Platform Use Guidelines - Content Monetization Standards
  • Platform Use Guidelines - Guidelines for Promotions on Twitter
  • Platform Use Guidelines - About search rules and restrictions
  • Platform Use Guidelines - Twitter, our services, and corporate affiliates
  • Platform Use Guidelines - How to report security vulnerabilities
  • Platform Use Guidelines - About Twitter limits
  • Platform Use Guidelines - Defending and respecting the rights of people using our service
  • Platform Use Guidelines - About rules and best practices with account behaviors
  • Platform Use Guidelines - About Twitter’s APIs
  • Platform Use Guidelines - About government and state-affiliated media account labels on Twitter
  • Platform Use Guidelines - Automation rules
  • Platform Use Guidelines - Inactive account policy
  • Platform Use Guidelines - About country withheld content
  • Platform Use Guidelines - About public-interest exceptions on Twitter
  • Platform Use Guidelines - Additional information about data processing
  • Platform Use Guidelines - Our approach to policy development and enforcement philosophy

@MattiSG
Copy link
Member

MattiSG commented Mar 15, 2022

The above table has been implemented in #778.

@ckatzenbach
Copy link

Great to see these detailed discussions! With a lot of these things we have struggled at www.pga.hiig.de as well – and only responded with manual curation. I am curious to look at the results of the test runs – where do you stand currently with regard to the decision? I must confess that conceptually I am much more inclined to go for option 4 (or 3) than option 1. As part of our work at the PGA we have seen how all major platforms have evolved their community guidelines from single-pages documents into these nested websites of explanation. So I'd very much argue that this is "one thing" but that is has gotten much more complex over the years. And the wording and categorization is also changing over time. So this will remain a challenge – but much better to keep this under the umbrella of "community guidelines" than to have 10-20 separate document type that change names every other year and also re-integrate, bifurcate etc. This space is very much in flux.

@MattiSG
Copy link
Member

MattiSG commented Mar 31, 2022

Thanks @ckatzenbach! This idea that this space will keep on evolving is very relevant indeed. Even if we happened to succeed to create an ontology for the current document set, we have to assess the chance that it would be stable over time.

where do you stand currently with regard to the decision?

As mentioned in #773 (comment), we are collecting data and feedback and intend to conclude on the effectiveness and relevance of each option end of April 🙂

@pg-adrian
Copy link

pg-adrian commented Apr 5, 2022

Hi everyone, I'm Adrian and I worked with @ckatzenbach on the (historical) collection of these multi-page documents for the Platform Governance Archive. As he said, we ran into some of the exact same issues and questions during our collection process so it is very interesting to read your discussion here! Maybe it is helpful for you to hear about our experience and the solution that we ended up with.

The first realization that we had when investigating the historical evolution of platform policies and collecting the documents is that what you called "the ontology of the Community Guidelines" is sometimes not as straightforward as one might expect.

In the case of Facebook, it is still relatively clear from my perspective. Here, the Community Guidelines (initially called 'Content Code of Conduct' then 'Facebook Community Standards') evolved from a document that was completely displayed on one URL into first an interactive document with drop-down sections and then a multi-page document. But even today as it is spread across many different URLs, I think it is very clear that Facebook considers all of these subsites as part of one document: The Facebook Community Standards.

Grouping these subsites into one document, from my perspective, does therefore not mean creating an artificial document. Much rather, the historical evolution of Facebook's Community Standards shows that the multi-page format should not be seen as a splitting up of the Community Guidelines into many sub-parts but much rather the contemporary form of displaying the document and making it easier to navigate for users. From my perspective, it therefore makes sense to puzzle the different Community Standards back together into one document because that's what they are from Facebook's/the user perspective and because that creates a document that can be compared to other platforms' Community Guidelines.

Now in the case of Twitter, it is a bit more hard to define what actually constitutes their Community Guidelines: Do they, as you suggest, encompass all of the 75 subsites currently linked on their "Rules and policies" overview page (https://help.twitter.com/en/rules-and-policies#general)? Or should they rather be understood as "The Twitter Rules" page (https://help.twitter.com/en/rules-and-policies/twitter-rules) and the 18 selected sub policies that are linked there? (This is the option that we went for).

These two options by themselves actually raised an ontological question for the collection: Are the Community Guidelines what platforms define as their Community Guidelines or do they actually encompass all of the platforms' rules that regulate their community in some way? That would mean that also rules or policies that are spelled out in sections of a site which are not part of the officially defined "Community Guidelines", for instance on help pages - as it very often happens - would also form part of a platform's Community Guidelines. For reasons of feasibility and practicability, we opted for taking the platforms own definition of their "Community Guidelines" as the reference point for our collection.

In the case of Twitter, this meant considering "The Twitter Rules" page as their Community Guidelines, because this is what the company has generally and historically considered as their Community Guidelines. Our team member João can explain this decision in more detail because he went deep into the history of the Twitter Rules. Another argument for this approach would be that, as @MattiSG also noted, Twitter's "Rules and Policies" page includes many usage guidance/information pages such as "Updates to our Terms of Service and Privacy Policy" which are probably better classified as help pages than as rules/policies for the community.

For Twitter, we hence decided to collect "The Twitter Rules", meaning that we collected the main page and the first sublevel of the policies that are linked on this page (if I understood it correctly this is what you referred to as nesting level). In practice this meant that we first had to create a timeline which denotes when subpolicies became part or where removed from the Twitter Rules page. It is important to note, that some of these subpages existed before they became part of the Twitter Rules or continue to exist after they are removed from the index pages. I guess as a general takeaway this means that taking an index page as the starting point for the collection entails monitoring when sublinks appear/are removed from this page. This is due to the fact that sections are sometimes merged or added to/removed from the master document.

I have to admit that I did not understand all of the technicalities of your discussion above regarding the difference between option 3 and 4, so I cannot say how all of this influences your decision or speaks in favor of one or the other option. Generally however I would say that:

  • Compiling subsections from different URLs into one document does not necessarily create an artificial document
  • In terms of document maintenance, defining an index page as a starting point would entail automatically or manually monitoring when new sublinks/URLs are added to/removed from overarching page
  • I find your grouping of the subsections very impressive and interesting for the comparison of specific Community Guideline sections but it does in my eyes not erase the meaningfulness of also having one compiled version of all rules
  • I agree that treating all subsections as their own document as in option 1 probably leads to a level of complexity in which it is hard to keep an overview

I'm very sorry for the length of this post and hope this is in any way helpful for your decision! Its actually quite helpful for us to spell our procedure out again in this discussion :)

@MattiSG
Copy link
Member

MattiSG commented May 4, 2022

Comparison of implemented options 1 and 4

As announced, we compared the results of running side-by-side implementations of options 1 and 4 for 7 weeks. Here are our results and observations 🙂

Common observations

  • Most community guidelines document could be tracked within a fixed types ontology, and detecting those changes did yield value to analysts.
  • Open Terms Archive scaled well with no other modification than additional document types.
  • Mass changes triggered notifications across many documents, leading to spam, as when Twitter mangled URLs (see OpenTermsArchive/france-elections-versions@3e472cd, OpenTermsArchive/france-elections-versions@d31174e and 5 other documents). This can happen with any other set of documents from the same service, but is made worse with the given implementation since the number of documents is significantly larger.
  • Listing sections of documents risks pushing contributors towards wanting to list sections for arbitrary document types, which is not supported.

Declarations

For Facebook, the resulting declaration was 101 lines for option 1 vs 83 lines for option 4. You can find them in full below.

Option 1

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "executeClientScripts": true
    },
    "Community Guidelines": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards",
      "select": ["._9ntw"],
      "remove": ["._9nxl", "._9ntv", ".img"]
    },
    "Community Guidelines - Self-harm": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Hate Speech": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Child Exploitation": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Violence Incitement": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Violent Organizations": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Spam": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/spam/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Regulated Goods": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/regulated-goods/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Harassment": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Misinformation": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Intellectual Property": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Adult Nudity": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Sexual Solicitation": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Platform Manipulation": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Community Guidelines - Privacy Violations": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    },
    "Deceased Users": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/memorialization/",
      "select": ["._9nrm", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    }
  }
}

Option 4

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "executeClientScripts": true
    },
    "Community Guidelines": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards",
      "select": ["._9ntw"],
      "remove": ["._9nxl", "._9ntv", ".img"],
      "sections": {
        "select": ["._9nrm", "._9q49", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"],
        "Self-harm": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/"
        },
        "Hate Speech": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/"
        },
        "Child Exploitation": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/"
        },
        "Violence Incitement": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/"
        },
        "Violent Organizations": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/"
        },
        "Spam": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/spam/"
        },
        "Regulated Goods": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/regulated-goods/"
        },
        "Harassment": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/"
        },
        "Misinformation": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/"
        },
        "Intellectual Property": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/"
        },
        "Adult Nudity": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/"
        },
        "Sexual Solicitation": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/"
        },
        "Platform Manipulation": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/"
        },
        "Privacy Violations": {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/"
        }
      }
    },
    "Deceased Users": {
      "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/memorialization/",
      "select": ["._9nrm", "._9q49", "._9p7c"],
      "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
    }
  }
}
  • Factoring selectors in option 4 improves readability compared to option 1.
  • Avoiding repetition of type prefix in option 4 improves readability compared to option 1.

Snapshots and versions

  • The folder in option 4 is surprising: why is this subtype more present than others?
  • Having both a folder and a file with the same name in option 4 is surprising: why is this both a document and a folder?

Screen Shot 2022-05-04 at 19 02 30

  • The overrepresentation of Community Guidelines files in option 1 is surprising: why is this subtype more present than others? It also harms readability of the whole folder.

Screen Shot 2022-05-04 at 19 00 08

  • Listing sections of documents risks making the user want to list sections for other document types.

Reliability and maintenance

No difference was measured between the two options.

Conclusion

After implementation, none of the experimented solutions emerge as a clear winner. Along with comments from the PGA team (thanks @pg-adrian for your detailed message 🙇), this reinforces the validity of option 2, where all documents are consolidated into a single one.

The blocking point that was identified was the risk of voiding the promise that documents tracked by Open Terms Archive can hold in court, since the resulting document could not easily be cited as it can not be referenced by a single URL or date.

However, this coud be handled by ensuring snapshots are still integral copies of the documents found online. While admittedly to a lower extent, versions are already “recreated” from snapshots. Thus, as long as the resulting version references its source snapshots properly, it seems acceptable to consolidate them as part of a minimal readability improvement process.

Detailed proposals for implementation of option 2 will be published in this RFC by next week.

@MattiSG
Copy link
Member

MattiSG commented May 4, 2022

Reframing of problem statement

Declaring, maintaining and analysing the results of tracking community guidelines of several services, along with comments in this RFC, led us to the following reframing. This reframing does not impact the currently explored solution space, but it will hopefully help in avoiding confusion and managing expectations.

Defining pages vs sections

The case of community guidelines is one where service providers have implemented their content sectioning by spreading it across separate web pages.

However, this is not the only solution that they use. Solutions such as accordions have the same aim as splitting across pages: improving legibility. In the case of accordions, we have no problem representing the folded information in a single continuous flow. Similarly, we already split some pages into documents: when Terms of Service and Privacy Policy are on the same webpage, declaration files enable selecting each of those independently through select, even though they share the same fetch source page. This demonstrates that we do prioritise documents over source pages.

In the case of this RFC, most of the proposed solutions made the confusion between sections and pages. This concept was not present in Open Terms Archive until now, because all the documents we handled were inside 1 or 0 page. Community Guidelines demonstrate the possibility of having 2 or more pages constituting one document.

Distinguishing pages and sections support in Open Terms Archive

These topics are different, and it is important not to mix them. One is about how to track a document that is split across multiple pages through its sections. The other is how to annotate sections across a document, no matter if they are on a single page or on many.

Postponing section support

While section support has value, it should not impact the data collection phase: Open Terms Archive is unique in part thanks to its separation of snapshots and versions. Section splitting (or annotations) should be another additional step in the pipeline, one that never puts snapshots authenticity or version consolidation at risk.

There are known users for such a feature: Apolia created their own script to split content last year; ToS;DR does split terms into sections to enable annotating and ranking them.

However, this RFC aims at extending OTA's current solid implementation of document tracking towards multi-page documents. The opportunity of adding section support at the same time is misleading. Such a feature will be handled independently, in its own timeframe.

@pg-adrian
Copy link

Great to hear that our experiences were helpful for you in reframing the problem statement and specifying your conceptualization of the relationship between documents, sections and pages and looking forward to read about the proposals for the implementation of option 2.

@Ndpnt
Copy link
Member Author

Ndpnt commented May 11, 2022

Proposals for implementation of option 2

Option 2A:

Declare an array instead of an object for a document type where each entries of this array is a document declaration.

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "executeClientScripts": true
    },
    "Community Guidelines": [
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards",
        "select": ["._9ntw"],
        "remove": ["._9nxl", "._9ntv", ".img"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/spam/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/regulated-goods/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      },
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/",
        "select": ["._9nrm", "._9p7c"],
        "remove": ["._9p72", "svg", "._9ooi", "._9q3_"]
      }
    ]
  }
}

Option 2B:

Declare an array for each document keys inside a document declaration.

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": ["removeEmptyAnchorsLinks", "removeTrackingIDs", "removeLocaleFromUrls"],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": ["removeEmptyAnchorsLinks", "removeTrackingIDs", "removeLocaleFromUrls"],
      "executeClientScripts": true
    },
    "Community Guidelines": {
      "fetch": [
        "https://transparency.fb.com/fr-fr/policies/community-standards", ,
        "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/spam/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/regulated-goods/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/",
        "https://transparency.fb.com/fr-fr/policies/community-standards/memorialization/"
      ],
      "select": [
        "._9ntw",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c",
        "._9nrm, ._9p7c"
      ],
      "remove": [
        "._9nxl, ._9ntv, .img",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_",
        "._9p72, svg, ._9ooi, ._9q3_"
      ]
    }
  }
}

Option 2C:

A kind of mix of Option 2a and Option 2b where only the fetch key can accept an array. It allows to factorize select, remove and filters for an array of pages to fetch.

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "executeClientScripts": true
    },
    "Community Guidelines": [
      {
        "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards",
        "select": ["._9ntw"],
        "remove": ["._9nxl"]
      },
      {
        "fetch": [
          "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/"
        ],
        "select": ["._9nrm"],
        "remove": ["._9p72"]
      },
      {
        "fetch": [
          "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/spam/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/",
          "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/"
        ],
        "select": ["._9ntw"],
        "remove": ["._9nxl", "._9ntv", ".img"]
      }
    ]
  }
}

Option 2D:

Add a pages key to the document declaration which is an array that can accept document declarations. When a required key is not defined, this specific key defined at the root of the document declaration is used. It also allows to factorize select, remove and filters.

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "executeClientScripts": true
    },
    "Community Guidelines": {
      "select": ["._9ntw"],
      "remove": ["._9nxl", "._9ntv", ".img"],
      "pages": [
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/" },
        {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/",
          "select": ["._9nrm", "._9p7c"],
          "remove": ["._9p72"]
        },
        {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/spam/",
          "select": ["._9nrm", "._9p7c"],
          "remove": ["._9p72"]
        },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/regulated-goods/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/" }
      ]
    }
  }
}

@MattiSG
Copy link
Member

MattiSG commented May 15, 2022

Thank you very much @Ndpnt for this work of consolidation!

Reading through these examples, I am quite clearly in favour of 2D. I like how having a dedicated key makes it explicit and enables validation with no ambiguity: either we have a fetch or we have a pages key, there is no mysterious syntax to know about arrays.

In particular, I dislike 2A and 2C because they are ambiguous on the intention: what does it mean that a document type maps to an array? Are there as many Community Guidelines as there are entries in the array?
2B, while not elegant, at least seems at the right level of nesting to me.

I have one suggestion for improvement though. For the moment, we voluntarily stuck to using verbs for every entry of a document declaration. I would suggest that we keep that behaviour, and use a verb such as merge, assemble, consolidate, join, combine, fuse, meld

Naming

In order to sort through synonyms, I ran a Google Trends search to find the most common term. The most common term for this operation seems to be “to merge”, followed by “to combine”. I confirmed this by running the same search with “PDF” or “pages” instead of “documents”.

Screen Shot 2022-05-15 at 16 59 48

However, I am bit concerned that, in a context of operation where we rely a lot on Git, “merge” becomes ambiguous with the eponymous Git operation, when it is something very different that we want to describe here. Thus, I offer:

Option 2.D.i

Same as 2D, just renaming pages to combine. I also suggest to write the factored keys after the combine key, in order to further distinguish with the (much more common) single-page declarations.

Support a combine key in document declarations, that contains an array of objects with fetch and optionally select, remove, filter keys; in this case, the select, remove, filter specified at the same level as combine are considered as default for every entry in the array.

Formal definition

  • Redefine document declaration as single-page declaration or multipage declaration.
  • Define page declaration as almost the same as the current document declaration, with its select key is made optional.
  • Define single-page declaration as a page declaration with mandatory select.
  • Define multipage declaration as an object with a mandatory combine key containing at least 2 single-page declarations, and optionally select, remove and filter keys.
    • These keys at the multipage declaration level are interpreted as to be applied to each page declaration when they are not defined at that level.

Example

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "executeClientScripts": true
    },
    "Community Guidelines": {
      "combine": [
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/" },
        {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/",
          "select": ["._9nrm", "._9p7c"],
          "remove": ["._9p72"]
        },
        {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/spam/",
          "select": ["._9nrm", "._9p7c"],
          "remove": ["._9p72"]
        },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/regulated-goods/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/" }
      ],
      "select": ["._9ntw"],
      "remove": ["._9nxl", "._9ntv", ".img"]
    }
  }
}

@martinratinaud
Copy link
Member

Thanks for this very neat and understandable propositions.

i'm in favor of the 2.D.i option which is the easiest to understand.
i'm fine with combine even though (And I agree merge would be too ambiguous in our context)

@clementbiron
Copy link
Member

Thanks @Ndpnt @MattiSG it is very clear and complete !

I'm also in favor of the 2.D.i and using combine seems to me to be a great idea that allows not to introduce the notion of page 👍

@MattiSG
Copy link
Member

MattiSG commented May 16, 2022

Looping in @Amustache @LVerneyPEReN @afisher3578 @Manu1400 @streitlua for them to vote on these options or suggest improvements 🙂

@Ndpnt
Copy link
Member Author

Ndpnt commented May 16, 2022

Thanks @MattiSG for the relevant improvement of the option 2D.

I am wondering if factored keys is easily understandable. 🤔
Should we introduce a specific term to wrap these keys?

It could be share, or with (combinewith …), or another term.

For example:

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "executeClientScripts": true
    },
    "Community Guidelines": {
      "combine": [
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/" },
        {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/",
          "select": ["._9nrm", "._9p7c"],
          "remove": ["._9p72"]
        },
        {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/spam/",
          "select": ["._9nrm", "._9p7c"],
          "remove": ["._9p72"]
        },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/regulated-goods/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/" }
      ],
      "with": {
        "select": ["._9ntw"],
        "remove": ["._9nxl", "._9ntv", ".img"]
      }
    }
  }
}

or

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "executeClientScripts": true
    },
    "Community Guidelines": {
      "combine": {
        "pages": [
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/" },
          {
            "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/",
            "select": ["._9nrm", "._9p7c"],
            "remove": ["._9p72"]
          },
          {
            "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/spam/",
            "select": ["._9nrm", "._9p7c"],
            "remove": ["._9p72"]
          },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/regulated-goods/" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/" },
          { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/" }
        ],
        "with": {
          "select": ["._9ntw"],
          "remove": ["._9nxl", "._9ntv", ".img"]
        }
      }
    }
  }
}

@MattiSG
Copy link
Member

MattiSG commented May 16, 2022

Thanks @Ndpnt for this proposal! It is interesting, but is currently lacking the level of formality that would enable us to debate it properly in the context of an RFC, as it diverges without offering a clear option that we could decide on. Could we please evolve this into a formal option, with a clear naming proposal? 🙂 Thanks!

@Ndpnt
Copy link
Member Author

Ndpnt commented May 16, 2022

I understand, you are right. I going to do this in the next days.

@streitl
Copy link

streitl commented May 17, 2022

I also like option 2.D.i - it's easy to understand how it works and the writing overheads are minimal.

I agree that the "with" key could improve natural language readability, but I think that it's not necessary. In any case, I prefer @Ndpnt 's second proposition.

@Ndpnt
Copy link
Member Author

Ndpnt commented May 18, 2022

Option 2.D.i.a

Same as 2.D.i, but with the factorized values made explicit as default values with the suffix …Default, for example selectDefault.

In this context, I suggest writing the defaults key before the combine key, as it is more common to have defaults set before their replacements.

Formal definition

  • Redefine document declaration as single-page declaration or multipage declaration.
  • Define page declaration as almost the same as the current document declaration, with its select, remove, filter keys are made optional, only fetch is required.
  • Define single-page declaration as a page declaration with mandatory fetch and select.
  • Define multipage declaration as an object with a mandatory combine key containing at least 2 single-page declarations, and optionally selectDefault, removeDefault and filterDefault keys.
    • These keys at the multipage declaration level are interpreted as to be applied to each page declaration when they are not defined at that level.
    • These keys should be defined before the combine key

Example

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "executeClientScripts": true
    },
    "Community Guidelines": {
      "selectDefault": ["._9ntw"],
      "removeDefault": ["._9nxl", "._9ntv", ".img"],
      "combine": [
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/" },
        {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/",
          "select": ["._9nrm", "._9p7c"],
          "remove": ["._9p72"]
        },
        {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/spam/",
          "select": ["._9nrm", "._9p7c"],
          "remove": ["._9p72"]
        },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/regulated-goods/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/" }
      ]
    }
  }
}

@Ndpnt
Copy link
Member Author

Ndpnt commented May 18, 2022

Option 2.D.i.b

Same as 2.D.i, but with the factorized values made explicit as default values within a new key defaults.

In this context, I suggest writing the defaults key before the combine key, as it is more common to have defaults set before their replacements.

Formal definition

  • Redefine document declaration as single-page declaration or multipage declaration.
  • Define page declaration as almost the same as the current document declaration, with its select, remove, filter keys are made optional, only fetch is required.
  • Define single-page declaration as a page declaration with mandatory fetch and select.
  • Define multipage declaration as an object with a mandatory combine key containing at least 2 single-page declarations, and optionally a defaults key. defaults key could contain optional keys select, remove, filter, but at meast one of them is required.
    • Keys defined in the defaults key at the multipage declaration level are interpreted as to be applied to each page declaration when they are not defined at that level.
    • The key defaults should be defined before the combine key

Example

{
  "name": "Facebook",
  "documents": {
    "Privacy Policy": {
      "fetch": "https://fr-fr.facebook.com/privacy/explanation/",
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "select": ["div[role=\"main\"]"],
      "remove": ["._5tko"],
      "executeClientScripts": true
    },
    "Terms of Service": {
      "fetch": "https://fr-fr.facebook.com/legal/terms/plain_text_terms",
      "select": ["div[role=\"main\"]"],
      "remove": ["footer[role=\"contentinfo\"]"],
      "filter": [
        "removeEmptyAnchorsLinks",
        "removeTrackingIDs",
        "removeLocaleFromUrls"
      ],
      "executeClientScripts": true
    },
    "Community Guidelines": {
      "defaults": {
        "select": ["._9ntw"],
        "remove": ["._9nxl", "._9ntv", ".img"],
      },
      "combine": [
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/suicide-self-injury/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/hate-speech/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/child-sexual-exploitation-abuse-nudity/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/violence-incitement/" },
        {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/dangerous-individuals-organizations/",
          "select": ["._9nrm", "._9p7c"],
          "remove": ["._9p72"]
        },
        {
          "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/spam/",
          "select": ["._9nrm", "._9p7c"],
          "remove": ["._9p72"]
        },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/regulated-goods/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/bullying-harassment/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/misinformation/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/intellectual-property/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/adult-nudity-sexual-activity/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/sexual-solicitation/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/inauthentic-behavior/" },
        { "fetch": "https://transparency.fb.com/fr-fr/policies/community-standards/privacy-violations-image-privacy-rights/" }
      ]
    }
  }
}

@clementbiron
Copy link
Member

Thanks Nico for these proposals.

I am in favor of option 2.D.i because in options 2.D.i.a and 2.D.i.b the syntax is too different between a declaration for one page and a multipage declaration. This could be introduce complexity and confusion.

But I would be curious to know the opinion of users who are less used to manipulating this syntax.

@Ndpnt
Copy link
Member Author

Ndpnt commented May 18, 2022

I am in favor of option 2.D.i because in options 2.D.i.a and 2.D.i.b the syntax is too different between a declaration for one page and a multipage declaration. This could be introduce complexity and confusion.

In option 2.D.i, we have a select and remove without fetch key and I'm not sure it's so obvious for contributors that they are defaults values that will be applied to each page declared in the combine key when they are missing. So, in fact, the syntax is already different and there is a kind of magic. And I think it's better to be expose the magic and be explicit.

@martinratinaud
Copy link
Member

I voted through emojis as discussed in retrospective.

Also I believe 2.D.i with default at the top is enough and more readable than a defaults key or suffixed Default key

@afisher3578
Copy link

I also found option 2.D.i to still be easy to understand, even as someone that is new to this syntax.

@MattiSG
Copy link
Member

MattiSG commented May 19, 2022

Thanks everyone for your inputs and contributions on this first semi-formal RFC! 💖 I'm glad of the direction we're taking and the good collaboration around it 😊

We'll leave this open until next Tuesday for any additional comments. Until then, let's all try to stay focused on either casting votes on existing propositions, adding new ones formally, or adding objective data points 🙂

@MattiSG
Copy link
Member

MattiSG commented May 19, 2022

I noticed that, when we had a brief, transient issue with fetching documents on Instagram, we received a huge amount of notifications (and the same when the issue solved itself out) because the number of declared documents was very large in the implementation of option 1. The fact that all of the community guidelines were inaccessible at the same moment, and not other documents, is another hint that they are treated as a single group by platforms. As a maintainer, receiving all these notifications and trying to fix them was made needlessly more complex by having 20 documents instead of a single one.

In my view, this very much goes in favour of concatenating (option 2), which was the path we were already on anyway 😉

Screen Shot 2022-05-19 at 11 14 10

@MattiSG
Copy link
Member

MattiSG commented May 19, 2022

Another example of multi-page document that is not Community Guidelines: AdMob policies and restrictions.

@MattiSG MattiSG assigned MattiSG and unassigned Ndpnt Jun 8, 2022
@Ndpnt
Copy link
Member Author

Ndpnt commented Jun 20, 2022

Hi all,
As there are no additional comments since one month, I think we can have a first conclusion for this RFC on how to track and declare multi-page documents.

Among all proposed solutions, the one that received the most upvotes is the Option 2.D.i, so this is the option retained.
We will propose an implementation of this solution and test it against reality to see if any adjustments are necessary.

Thanks again to everyone who participated in this RFC.

@MattiSG MattiSG removed their assignment Jan 17, 2023
@MattiSG
Copy link
Member

MattiSG commented Jan 17, 2023

This RFC has been fully implemented a few months ago in #891 (congrats @Ndpnt!), and has since then demonstrated its reliability in production in two separate collections (France-Elections and PGA). We are now only missing to close this issue the related user documentation.

@MattiSG
Copy link
Member

MattiSG commented Jan 17, 2023

Moved documentation issue to the docs repository: OpenTermsArchive/docs#32.

Thanks again everyone for your contributions 🙇

@MattiSG MattiSG closed this as completed Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request for comments
Projects
None yet
Development

No branches or pull requests

8 participants