Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new "proof reissue" command #493

Merged
merged 3 commits into from
Aug 30, 2022
Merged

Conversation

thomasjfox
Copy link
Contributor

@thomasjfox thomasjfox commented Aug 27, 2022

PoC implementation for this discussion: #492
"How to deal with a person leaving an organization?"

This commands allows to find existing proofs
and sign them again with a different id.

This comes in handy when an id is going be revoked but
the trust of the existing reviews should be retained by a new id.

Scenarios:

  • Old id got compromised
  • Person leaves an organization, so organization can reissue
    existing reviews using a different id to maintain trust level.

Checklist:

  • cargo +nightly fmt --all
  • Modify CHANGELOG.md if applicable

Code is originally based upon the "proof find" code.

@thomasjfox
Copy link
Contributor Author

thomasjfox commented Aug 27, 2022

To test the code I reissued all 1500+ reviews I had available. That worked fine and I added protection against this "reissue all of them" mode.

One corner case though: The iterator returned by db.get_pkg_reviews_for_source(PROJECT_SOURCE_CRATES_IO) acts funny for one crate + version combination. It's the "default" crate with version 0.1.2. On the first run it returns both existing reviews and reissues them.

One review for this crate has the git "revision" field omitted.

If the command crev proof reissue --no-commit --crate default --vers 0.1.2 is invoked again, the iterator returns just one of the two reviews signed by the new id. I suspect there's some internal hash map keeping just one of the two reissued reviews.

What happens is that one of the two reviews then gets reissued again on every "proof reissue" invocation. I tried to debug the behaviour for like two hours but I guess it's better if someone with real knowledge of the internal data structures comments on this minor glitch.

@thomasjfox
Copy link
Contributor Author

thomasjfox commented Aug 27, 2022

btw: I noticed struct PackageInfo in package_info.rs serializes revision like this:

    #[serde(skip_serializing_if = "proof::equals_default", default)]
    pub revision: String,
    #[serde(
        rename = "revision-type",
        skip_serializing_if = "proof::equals_default_revision_type",
        default = "proof::default_revision_type"
    )]
    pub revision_type: String,

while struct Revision in revision.rs serializes it like this:

    pub revision: String,
    #[serde(
        rename = "revision-type",
        skip_serializing_if = "proof::equals_default_revision_type",
        default = "proof::default_revision_type"
    )]

If I remove #[serde(skip_serializing_if = "proof::equals_default", default)] from pub revision: String in struct PackageInfo, the iterator returned from db.get_pkg_reviews_for_source(PROJECT_SOURCE_CRATES_IO) acts consistent.

I must humbly admit that I have no idea what I'm changing there. 🙈
This affects reissuing reviews of the default crate with version 0.1.2 corner case above.

@dpc
Copy link
Collaborator

dpc commented Aug 27, 2022

👍 on the idea. One thing that would be probably good to have is adding a prefix to the comment about the whole thing being "reissued". Or possibly maybe an additional field for the purpose altogether, somehow linking to the previous version.

I need to find some more time to look at the issues you've pointed out.

@thomasjfox
Copy link
Contributor Author

One corner case though: The iterator returned by db.get_pkg_reviews_for_source(PROJECT_SOURCE_CRATES_IO) acts funny for one crate + version combination. It's the "default" crate with version 0.1.2. On the first run it returns both existing reviews and reissues them.

Ok, I think I figured it out by giving the code a fresh look: db.get_pkg_reviews_for_source() operates on package_reviews which is defined as

// pkg_review_id by package information, nicely grouped
package_reviews:
    BTreeMap<Source, BTreeMap<Name, BTreeMap<Version, HashSet<PkgVersionReviewId>>>>,

PkgVersionReviewId is defined as

/// * author's ID
/// * pkg source
/// * pkg name
/// * pkg version
pub struct PkgVersionReviewId {
    from: Id,
    package_version_id: proof::PackageVersionId,
}

Basically the HashMap stores just one review from the same crev id for the same PackageVersionId. Since reviews can be overwritten, the function update_to_more_recent() makes sure to store just the latest one for the "unique" combination above.

How should this be handled when reissuing existing reviews?

The current PoC implementation calling reissue_review.touch_date(); will mess up if multiple reviews for the same crate + version (+ source) combination are available and all get rewritten to the same crev id.

One way to fix this the easy way is to restrict all proofs-to-be-reissued to the same source id. This would avoid the multiple reviews refer to the same PackageVersionId situation completely. The --author argument would then be mandatory.

Regarding the other feedback:

One thing that would be probably good to have is adding a prefix to the comment about the whole thing being "reissued". Or possibly maybe an additional field for the purpose altogether, somehow linking to the previous version.

Good idea. The information is already in the commit message and it would be way more useful to have it in the actual proof. Proofs don't seem to have a uuid, but the Signature seems to be unique per proof. Would that be a good field to use for the reference?

Thanks for your feedback so far and please no rush with this. I might need the feature in many months time only, so lots of time to cook this further. 😀

@thomasjfox
Copy link
Contributor Author

+1 on the idea. One thing that would be probably good to have is adding a prefix to the comment about the whole thing being "reissued". Or possibly maybe an additional field for the purpose altogether, somehow linking to the previous version.

May be like this in the proof yaml format:

----- BEGIN CREV PROOF -----
kind: package review
version: -1
date: 2022-08-27T20:09:44.970188805+02:00
from:
  id-type: crev
  id: 89j3NY-J-XNtcLZeTgOy5Vp1gd1rYPHlCGoytg9wYuQ         <-- current id
  url: https://github.com/owner/repo.git
reissue-for:
  id-type: crev
  id: 6E-JFdHO008HicLRuRBi9TSedIhdscjOAfpokc5Jl0U         <-- old id
  url: https://github.com/other-owner/some-repo.git
  date: 2020-05-25T20:42:42.970188805+02:00               <-- previous review date
..
comment: |-
  Minor changes compared to previous version.

  [Review was reissued by id 89j3NY-J-XNtcLZeTgOy5Vp1gd1rYPHlCGoytg9wYuQ]

Not sure if modifying the comment is a good idea if the information is in the meta data already. The metadata is probably the way.

@dpc
Copy link
Collaborator

dpc commented Aug 27, 2022

will mess up if multiple reviews for the same crate + version (+ source) combination are available and all get rewritten to the same crev id.

In crev from-id + source + package name + version is an unique key of a review. If the same ID creates newer (higher timestampt) review for the exact package version, the old review is completely overwritten and ignored. This is how crev achives "mutability" in a distributed, immutable, eventually consistent world of proof circulation.

There's no need to re-issue an old review version, only the most recent one, IMO. The old versions are to forgotten anyway.

@dpc
Copy link
Collaborator

dpc commented Aug 27, 2022

Reviewes can be keyed only by their checksum to save on space. I would probably go with something more generic than reissue-from, because

original:
  id: <digest of the original>
  comment: "Reissued after X left the XYZ"

?

I could imagine some people making duplicates for some other reasons so a human readable reason is needed.

I'm happy to keep bikesheading the exact names.

@thomasjfox
Copy link
Contributor Author

thomasjfox commented Aug 28, 2022

I've implemented the original: format and pushed two more commits. Those commits should later on be squashed into the original commit.

Referencing the proof digest was a pita to implement since the existing data structure proof::Common didn't store it yet. I conservatively named the new variable "digest_on_load" to prevent future programmers from mistaking this as an up-to-date proof digest after modifications.

The CLI is not very friendly to search by proof digest, so I also included the crev id of the original proof for the human reader. Storing the proof digest now is more or less an investment into the future.

The --author and --comment arguments are mandatory now. Making --author mandatory will avoid the collision when reissuing multiple reviews for the same crate+version from multiple source crev ids. If one needs to reissue reviews for multiple crev ids, I don't see a problem with them calling the tool for each crev id.

This is the output for a reissued review:

----- BEGIN CREV PROOF -----
kind: package review
version: -1
date: 2022-08-28T11:19:35.643176546+02:00
from:
  id-type: crev
  id: 89j3NY-J-XNtcLZeTgOy5Vp1gd1rYPHlCGoytg9wYuQ
  url: https://github.com/intra2net/crev-proofs-playground.git
original:
  id-type: crev
  id: FYlr8YoYGVvDwHQxqEIs89reKKDy-oWisoO0qXXEfHE
  proof_digest: 3t1MkzxOO_ZY_bt-r5oVeqZUMVS2of0bb7rZK6_xqE0
  comment: Reissue to replace compromised id from stolen laptop
package:
  source: https://crates.io
  name: default
  version: 0.1.2
  digest: YuxzyXhCHZYMi4__Hj_hCzkQyxRLrZjDqL8usLqA4QY
review:
  thoroughness: high
  understanding: high
  rating: strong
comment: |-
  I'm the author. And this crate is 3 lines of trivial code.
----- SIGN CREV PROOF -----
FimnMpCGMKrjid-q43d_rQ2UQxP4HoTbEALghfw4SwOCoD0XZ9_JyvHc8ICG1OFYq7_nPLvvQ_SZcUZODWEdAQ
----- END CREV PROOF -----

This is how the generated git commit message looks like:

Signed existing review for default v0.1.2 with different id

New id: 89j3NY-J-XNtcLZeTgOy5Vp1gd1rYPHlCGoytg9wYuQ
Previous id: FYlr8YoYGVvDwHQxqEIs89reKKDy-oWisoO0qXXEfHE
Previous proof digest: 3t1MkzxOO_ZY_bt-r5oVeqZUMVS2of0bb7rZK6_xqE

@thomasjfox
Copy link
Contributor Author

There's one (new) ergonomic problem with the original: field: When one overwrites a review the was reissued previously. Right now the editor doesn't show the "original:" fields, this can be fixed easily.

The questions is: Do we want to keep the original: reference on overwrites or not?
I'm unsure what makes the most sense here.

This is an example of such an overwrite:

----- BEGIN CREV PROOF -----
kind: package review
version: -1
date: 2022-08-28T11:40:06.009446836+02:00
from:
  id-type: crev
  id: 89j3NY-J-XNtcLZeTgOy5Vp1gd1rYPHlCGoytg9wYuQ
  url: https://github.com/intra2net/crev-proofs-playground.git
original:
  id-type: crev
  id: FYlr8YoYGVvDwHQxqEIs89reKKDy-oWisoO0qXXEfHE
  proof_digest: 3t1MkzxOO_ZY_bt-r5oVeqZUMVS2of0bb7rZK6_xqE0
  comment: Reissue to replace compromised id from stolen laptop
package:
  source: https://crates.io
  name: default
  version: 0.1.2
  digest: YuxzyXhCHZYMi4__Hj_hCzkQyxRLrZjDqL8usLqA4QY
review:
  thoroughness: high
  understanding: high
  rating: strong
comment: |-
  Just hangin' out.
----- SIGN CREV PROOF -----
jg7aU8fueiBmVfU3tR17nz3UWVCFRUGfzmVm-sCx6KXtDzXwtMLY_3QP4s-ffJQ2xUtGcaqhMg9aXF1jenveDw
----- END CREV PROOF -----

I detected this behaviour by chance while testing the new feature.

@dpc
Copy link
Collaborator

dpc commented Aug 28, 2022

An overwrite would no longer be a straight re-signing of the original, I think, so probably no need preserve original.

@dpc
Copy link
Collaborator

dpc commented Aug 28, 2022

  proof_digest: 3t1MkzxOO_ZY_bt-r5oVeqZUMVS2of0bb7rZK6_xqE0

Maybe just digest? If not then we use - as separator in other fields already.

@dpc
Copy link
Collaborator

dpc commented Aug 28, 2022

Sorry, for throwing you a curveball, but that digest_on_load is not necessary. The digest storing and lookup should go through ProofDB. TheProofDB::add_proof is already handed a Proof that has a digest field. The problem is that I stupidly decided to track stuff in ProofDB by signature which is ... problematic at least. I should have used digest for everything from the start and wanted to fix it sometime soon anyway. So ids_to_trust_proof_signatures would become ids_to_trust_proof_digest and so on. With that looking up digest of something should be easy (feel free to add an additional mapping if you need it). Depending how you feel about it, you can just add such mapping and move on, or you can try to fix ProofDB yourself and make it use digest for all the keys/ids there. AFAIU, the TimestampedSignature should dissapear and everything should use TimestampedDigest.

@thomasjfox
Copy link
Contributor Author

An overwrite would no longer be a straight re-signing of the original, I think, so probably no need preserve original.

ok, thanks, I'll make sure to clear it on the next improvement round.

@thomasjfox
Copy link
Contributor Author

  proof_digest: 3t1MkzxOO_ZY_bt-r5oVeqZUMVS2of0bb7rZK6_xqE0

Maybe just digest? If not then we use - as separator in other fields already.

I thought including the "proof-" prefix would help people not to mistake it for the digest of the crate files. '-' as separator is a good idea.

Given that one can see the "digest:" below "package:" in relative distance in the yaml file, I'm not sure if it's an issue at all since people will notice they will differ for sure.

So if you prefer the shorter field name, we can go with that.

@thomasjfox
Copy link
Contributor Author

Sorry, for throwing you a curveball, but that digest_on_load is not necessary. The digest storing and lookup should go through ProofDB. TheProofDB::add_proof is already handed a Proof that has a digest field. The problem is that I stupidly decided to track stuff in ProofDB by signature which is ... problematic at least. I should have used digest for everything from the start and wanted to fix it sometime soon anyway.

that explains a few things for me ^^ I mentioned this here #493 (comment) at the end: "Proofs don't seem to have a uuid, but the Signature seems to be unique per proof. Would that be a good field to use for the reference?"

So I was looking for a unique identifier and somehow mistook Signature for it since it's used in a lot of places in the code. 😁

I will have to stop working on this soon for a while (AFK) but I will come back to this feature since it's the missing puzzle piece for organization use (for me). May be I get lucky and the ProofDb is refactored from Signature to Digest as key by then. Please don't feel rushed or pressured though.

I don't have any problem rebasing my code later on and throw some parts away. That's normal development process and the knowledge I gained from the inner workings let me to fix up #495 in almost no time.

@thomasjfox
Copy link
Contributor Author

An overwrite would no longer be a straight re-signing of the original, I think, so probably no need preserve original.

ok, thanks, I'll make sure to clear it on the next improvement round.

Implemented this now along with the "proof-digest" field rename. I've rebased the branch on master but didn't squash the commits yet.

@thomasjfox thomasjfox changed the title PoC: Add new "proof reissue" command Add new "proof reissue" command Aug 29, 2022
@thomasjfox
Copy link
Contributor Author

Depending how you feel about it, you can just add such mapping and move on

new mapping table has been added. 😁 I also introduced a newtype proof::Digest which can be used in more places in the future and won't be mistaken for a package digest by the code.

Besides may be bikeshedding about some naming or console output here and there I think that should complete the feature?

@dpc
Copy link
Collaborator

dpc commented Aug 29, 2022

Everything is perfect. It's just the document format bikesheding.

I'm looking at:

original:
  id-type: crev
  id: FYlr8YoYGVvDwHQxqEIs89reKKDy-oWisoO0qXXEfHE
  proof_digest: 3t1MkzxOO_ZY_bt-r5oVeqZUMVS2of0bb7rZK6_xqE0
  comment: Reissue to replace compromised id from stolen laptop

And maybe we should remove id and id-type. Why? Because the reader of this document doesn't have a good way to verify it anyway (if read by human), and it is redundant (if handled by code). Also - just to save some space.

As for "proof digest", maybe just proof: 3t1MkzxOO_ZY_bt-r5oVeqZUMVS2of0bb7rZK6_xqE0 + comment? No need to say it's a "digest". This proof is "re-creating" (re-issuing) another proof, here is the id (proof) + human-readable explanation, the end.

@thomasjfox
Copy link
Contributor Author

And maybe we should remove id and id-type. Why? Because the reader of this document doesn't have a good way to verify it anyway (if read by human), and it is redundant (if handled by code). Also - just to save some space.

the original crev id might help a human fellow to use "proof find" and locate the original review. As it's not too important and the tooling around "search for proof digest" might improve in the future, I've axed it.

As for "proof digest", maybe just proof: 3t1MkzxOO_ZY_bt-r5oVeqZUMVS2of0bb7rZK6_xqE0 + comment? No need to say it's a "digest". This proof is "re-creating" (re-issuing) another proof, here is the id (proof) + human-readable explanation, the end.

good idea.

This is the output from the latest code:

from:
  id-type: crev
  id: Ja1uYcBfo4pxqkHEgaHNAdBtvZdQ667nDGZyhrCkJs0
  url: https://github.com/intra2net/crev-proofs.git
original:
  proof: 3t1MkzxOO_ZY_bt-r5oVeqZUMVS2of0bb7rZK6_xqE0
  comment: Shiny new key
package:
  source: https://crates.io
  name: default
  version: 0.1.2
  digest: YuxzyXhCHZYMi4__Hj_hCzkQyxRLrZjDqL8usLqA4QY

I'm going to clean up the branch now.

Will be used by upcoming "proof reissue" command
to look up the proof digest.
This commands allows to find existing proofs
and sign them again with a different id.

This comes in handy when an id is going be revoked but
the trust of the existing reviews should be retained by a new id.

Scenarios:
- Old id got compromised
- Person leaves an organization, so organization can reissue
  existing reviews using a different id to maintain trust level.
@dpc dpc merged commit 8bd006c into crev-dev:master Aug 30, 2022
@dpc
Copy link
Collaborator

dpc commented Aug 30, 2022

Thank you!

@thomasjfox thomasjfox deleted the proof-reissue-cmd branch August 30, 2022 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants