Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIP 1 - Efficient Operation Mapping #766

Closed
csuwildcat opened this issue Jun 23, 2020 · 21 comments
Closed

SIP 1 - Efficient Operation Mapping #766

csuwildcat opened this issue Jun 23, 2020 · 21 comments

Comments

@csuwildcat
Copy link
Member

csuwildcat commented Jun 23, 2020

  SIP: 1
  Upgrade-Type: Hard Fork (to guarantee outcomes)
  Title: Efficient DID Operation Mapping
  Author: Daniel Buchner <daniel.buchner@microsoft.com>
  Comments-Summary: No comments yet.
  Comments-URI: https://github.com/decentralized-identity/sidetree/sips/1.md
  Status: Draft
  Created: 2020-06-23

Summary

By segregating the proving data contained in the operation entries currently housed in the Anchor File and Map File (for Recovery, Deactivate, and Update operations), it is possible to realize a rather dramatic ~75% reduction in the minimum dataset required to trustlessly resolve DIDs.

The effect of moving this data to a segregated Proving File is that the Anchor File and Map Files become lightweight, spam-protected operation indexes, allowing for deferred acquisition of Proving Data in a JIT fashion, for nodes of various configurations.

Motivation

These changes would make initialization of many node types faster, more efficient, and most importantly: operationally feasible for the average user-operator. Sustainable operation of nodes across consumer hardware is a key requirement for any decentralized network of this class, thus keeping network storage growth comfortably 'under the line' of the commodity storage technology cost curve and bandwidth growth curves is essential. While such curves lack precision, when one examines the trajectory of storage and bandwidth in reference to the waning cadence of the Kryder's Law and Edholm's Law doubling conjectures, it appears that a 2-3 terabytes per annum growth in the size of the minimum required dataset for a network is the top end of sustainability for a system that features peer-based replication of data and deferral of CPU intensive tasks until a JIT compilation/resolution phase.

Requirements

  • Target a minimum required dataset upper limit of 2-3 terabytes, assuming one year at a rate of 1000 operations per second.
  • Push as much data out of the primary indexing files (Anchor and Map Files) as possible.

Technical Proposal

The primary technical changes center around moving proving data out of the Anchor File and Map File, leaving those files to act as bare minimum indexes that enable a node to have global awareness of possible operations for any DID in the system. The proposed changes include the addition of two new intermediary files between the Anchor and Chunk Files. All changes to the existing Anchor and Map Files, as well as the new Proving Files, are as follows:

Anchor File

The Anchor File would be modified in the following ways:

  1. Add a new CAS URI link to a Retained Proving File, which contains the signed operation data that used to exist in the recover and deactivate operation entries.
  2. Add a new CAS URI link to a Transient Proving File, which contains the signed operation data that used to exist in the update operation entries of the Map File.
  3. Modify the create operation across the spec to reflect the fact that the reveal_value is the hash of the hash of the JWK value that is being committed to.
  4. Modify the recover and deactivate operation entries to only include the did_suffix and reveal_value properties. The reveal_value is the hash of the hash of the JWK in the signed_data object that was relocated to the Retained Proving File.
{
  "retained_proving_file": CAS_URI,
  "transient_proving_file": CAS_URI,
  "map_file": CAS_URI,
  "writer_lock_id": OPTIONAL_LOCKING_VALUE,
  "operations": {
    "create": [
      {
        "suffix_data": { // Base64URL encoded
          "delta_hash": DELTA_HASH,
          "recovery_commitment": COMMITMENT_HASH
        }
      },
      {...}
    ],
    "recover": [
      {
        "did_suffix": SUFFIX_STRING,
        "reveal_value": MULTIHASH_OF_JWK
      },
      {...}
    ],
    "deactivate": [
      {
        "did_suffix": SUFFIX_STRING,
        "reveal_value": MULTIHASH_OF_JWK
      },
      {...}
    ]
  }
}

Map File

The Map File would be modified in the following ways:

  1. Modify the update operation entries to only include the did_suffix and reveal_value properties. The reveal_value is the hash of the hash of the JWK in the signed data object that was relocated to the Transient Proving File.
{
  "chunks": [
    { "chunk_file_uri": CHUNK_HASH },
    {...}
  ],
  "operations": {
    "update": [
      {
        "did_suffix": DID_SUFFIX,
        "reveal_value": MULTIHASH_OF_JWK
      },
      {...}
    ]
  }
}

Retained Proving File

The Retained Proving File will contain the following:

  1. The signed_data portion of the recover and deactivate operation entries that used to live in the Anchor File are now present in the operations object under their respective properties, and MUST be ordered in the same index order their corresponding entries are present in the Anchor File.
{
  "operations": {
    "recover": [
      {
        "signed_data": { // Base64URL encoded, compact JWS
          "protected": {...},
          "payload": {
            "recovery_commitment": COMMITMENT_HASH,
            "recovery_key": JWK_OBJECT,
            "delta_hash": DELTA_HASH
          },
          "signature": SIGNATURE_STRING
        }
      },
      {...}
    ],
    "deactivate": [
      {
        "signed_data": { // Base64URL encoded, compact JWS
          "protected": {...},
          "payload": {
            "did_suffix": SUFFIX_STRING,
            "recovery_key": JWK_OBJECT
          },
          "signature": SIGNATURE_STRING
        }
      },
      {...}
    ]
  }
}

Transient Proving File

The Transient Proving File will contain the following:

  1. The signed_data portion of the update operation entries that used to live in the Map File are now present in the operations object under their respective properties, and MUST be ordered in the same index order their corresponding entries are present in the Map File.
{
  "operations": {
    "update": [
      {
        "did_suffix": DID_SUFFIX,
        "signed_data": { // Base64URL encoded, compact JWS
          "protected": {...},
          "payload": {
            "update_key": JWK_OBJECT,
            "delta_hash": DELTA_HASH
          },
          "signature": SIGNATURE_STRING
        }   
      },
      {...}
    ]
  }
}

Operation Data Changes

  1. Commitments are now the hash of the hash of the JWK revealed values, vs just the hash, as they are currently.
  2. The revealed values in the Anchor and Map Files are the hash of the JWK, not the JWK itself, as they are currently.
@csuwildcat csuwildcat added the sip label Jun 23, 2020
@OR13
Copy link
Contributor

OR13 commented Jun 23, 2020

@thehenrytsai @Therecanbeonlyone1969 any idea how this growth rate stacks up to bitcoin/ethereum growth rate?
obiviously those ledgers do stuff other than DIDs as well, but would be interesting to put "requirements" in the context of other real world production systems.

@OR13
Copy link
Contributor

OR13 commented Jun 23, 2020

should we consider eliminating the base64url encoding at the same time to stretch the storage gain to the limit?

@troyronda
Copy link
Collaborator

troyronda commented Jun 23, 2020

Suggest renaming "transient" - as the eventual meaning is that it is could be pruneable after checkpoints rather than transient at the current time.

@tplooker
Copy link
Member

tplooker commented Jun 23, 2020

Suggested alternative syntax for anchor file

{
  "map_file": CAS_URI,
  "writer_lock_id": OPTIONAL_LOCKING_VALUE,
  "operations": {
    "create": [
      {
        "file_ref": CAS_URI,
        "suffix_data": { // Base64URL encoded
          "delta_hash": DELTA_HASH,
          "recovery_commitment": COMMITMENT_HASH
        }
      },
      {...}
    ],
    "recover": [
      {
        "file_ref": CAS_URI,
        "did_suffix": SUFFIX_STRING,
        "reveal_value": MULTIHASH_OF_JWK
      },
      {...}
    ],
    "deactivate": [
      {
        "file_ref": CAS_URI,
        "did_suffix": SUFFIX_STRING,
        "reveal_value": MULTIHASH_OF_JWK
      },
      {...}
    ]
  }
}

file_ref could actually be a JSON pointer CAS_URI

*Feedback

  • Does not achieve the space saving goals, the same CAS_URI would be repeated per operation

@troyronda
Copy link
Collaborator

I think enabling checkpoints and pruning is important, so I think a structure that enables that aspect is useful.

@csuwildcat
Copy link
Member Author

Just want to note that the current file structures already implicitly support the addition of a checkpoint/pruning mechanism. This is about reducing the minimum dataset required to run a light node by ~75+%.

@OR13
Copy link
Contributor

OR13 commented Jun 23, 2020

I'm generally in favor of this proposal, but I'm a bit worried about how we go about implementing it.

Here is my proposal:

We inventory the set of features for which we believe we are shipping support for in spec v1.

We determine what level of testing is required to believe that the feature is supported in spec v1.

We create issues to ensure those tests exist in the reference implementation.

We close those issues when the tests exist.

We publish spec v1 and reference implementation and we bump to v1.1.

We open issues for the core set of features in v1.1 ( probably the same as v1).

We close those issues when we have tests that prove that they work.

We publish spec v1.1 and reference implementation.

Vendors that don't have production customers can choose to skip spec v1, and jump to v1.1... vendors who can't "wipe their production database" can use spec v1, until spec v1.1 is ready to migrate too.

We target SIP-1 to spec v1.1.

@OR13
Copy link
Contributor

OR13 commented Jun 23, 2020

We need to be careful to have a stable, rigorous, and confidence building release process, and versioning system, and I think its dangerously confidence destroying to rewrite versions and refuse to publish, vs choosing to publish regular versions with clear changes, tests and documentation to support each release. (our reference implementation does a good job of this... we need to ensure the spec does as well).

@csuwildcat
Copy link
Member Author

@OR13 how about we cut an official version of the spec, as it stands now, to 0.1.0, and use this change as an opportunity to do a proper minor version bump of the spec in accordance with the version descriptions in the spec.

@OR13
Copy link
Contributor

OR13 commented Jun 23, 2020

I'm fine as long as we cut a version before we attempt to implement a sip. ideally, we try and make it as clean a version as we can, by closing out any low hanging fruit before the cut.

@OR13
Copy link
Contributor

OR13 commented Jun 23, 2020

it can be v0.1.0 and SIP-1 can target v0.2.0 or whatever... features should be planned to target versions...

@csuwildcat
Copy link
Member Author

Aside: are folks here OK if I do a PR to add this general SIP template as a start for that sort of thing? I was thinking to create a SIP directory with MD files in it that would render just like our specs do.

@csuwildcat
Copy link
Member Author

csuwildcat commented Jun 24, 2020

@tplooker I don't think the pointer URI to a place inside the linked file is worth it if we can do the same thing via a 0-byte alternative, given it degrades the primary goal of SIP 1. However, if we changed our mind about, we could always add it later in a way that Sidetree-based implementations could push out via a rather straightforward upgrade.

@csuwildcat
Copy link
Member Author

@troyronda and others: if we don't want to go with Transient, what are some names for the files that will be cyclically eliminated after checkpoint pruning occurs?

@tplooker
Copy link
Member

To further optimize the above proposal, we could remove an additional base64 encoding of suffix_data if we instead relied on using JCS to canonicalize the structure

@OR13
Copy link
Contributor

OR13 commented Jun 30, 2020

lets take the encoding performance debate to #781

any tests / proof for the "75%" reduction claim being made here?

@csuwildcat
Copy link
Member Author

@OR13 here's the test: the entries with proving data was 275 bytes, and the new size of the entries without proving data is 65 bytes, which is a reduction of 76.5% in the minimum required for a node to boot up and have a global index of all op entries.

@OR13
Copy link
Contributor

OR13 commented Jun 30, 2020

^ nice test, you must code a lot ; )

thehenrytsai added a commit that referenced this issue Nov 25, 2020
thehenrytsai added a commit that referenced this issue Nov 25, 2020
thehenrytsai added a commit that referenced this issue Dec 3, 2020
thehenrytsai added a commit that referenced this issue Dec 4, 2020
thehenrytsai added a commit that referenced this issue Dec 4, 2020
thehenrytsai added a commit that referenced this issue Dec 5, 2020
thehenrytsai added a commit that referenced this issue Dec 5, 2020
thehenrytsai added a commit that referenced this issue Dec 7, 2020
* feat(ref-imp): #766 - added support to validate reveal value as a hash
* feat(ref-imp): #766 - added test for applying operation with different reveal value algorithm
* chore(ref-imp): hiked nodejs version support to 12 and 14
@thehenrytsai
Copy link
Collaborator

Fully implemented.

Reference implementation automation moved this from 2020 November to Done Dec 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

6 participants