New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of package metadata in repodata.json for artifact verification #10713
Comments
Few metrics: Performance of artifact verificationIt takes about 200us to verify a package metadata signature.
conda-content-trust snippet...
import json
from conda_content_trust.authentication import verify_delegation as verify_trust_delegation
from conda_content_trust.signing import wrap_as_signable
with open("./noarch/repodata.json", 'r') as f:
j = json.load(f)
signable = wrap_as_signable(j["packages"]["test-package-0.1-0.tar.bz2"])
signable['signatures'].update(j["signatures"]["test-package-0.1-0.tar.bz2"])
with open("1.root.json") as f:
trusted_root = json.load(f)
with open("key_mgr.json") as f:
key_mgr = json.load(f)
import time
N = 10000
start = time.time()
for i in range(N):
verify_trust_delegation('pkg_mgr', signable, key_mgr)
print("Verification takes:", round((time.time() - start) * 1e6 / N, 0), "us/package metadata") repodata sample...
{
"info": {
"subdir": "noarch"
},
"packages": {
"test-package-0.1-0.tar.bz2": {
"build": "0",
"build_number": 0,
"depends": [],
"license": "BSD",
"license_family": "BSD",
"md5": "2a8595f37faa2950e1b433acbe91d481",
"name": "test-package",
"noarch": "generic",
"sha256": "b908ffce2d26d94c58c968abf286568d4bcf87d1cfe6c994958351724a6f6988",
"size": 5719,
"subdir": "noarch",
"timestamp": 1613117294885,
"version": "0.1"
}
},
"packages.conda": {},
"removed": [],
"repodata_version": 1,
"signatures": {
"test-package-0.1-0.tar.bz2": {
"f46b5a7caa43640744186564c098955147daa8bac4443887bc64d8bfee3d3569": {
"signature": "0a50063539baf249970f1d08b07f00f544e2d87982826790e9ec6e80874ad90aec21a9607cf38bb58897163533c39cb4a4f1c741a7f8e9e4f67e2ff2087d2d00"
}
}
}
} roles...
Root role: {
"signatures": {
"2b920f88531576643ada0a632915d1dcdd377557647093f29cbe251ba8c33724": {
"other_headers": "04001608001d1621040673d781a8b80bcb7b002040ac7bc8bcf821360d050260b687a1",
"signature": "8eecc8f58df848f7af0188fbb47f99a0f2622f8a32ab8ede6340507fc48b8785c96a217c17889d39154c290d99ac0bb6ca75c971f913778598dbab368b49040e"
}
},
"signed": {
"delegations": {
"key_mgr": {
"pubkeys": [
"013ddd714962866d12ba5bae273f14d48c89cf0773dee2dbf6d4561e521c83f7"
],
"threshold": 1
},
"root": {
"pubkeys": [
"2b920f88531576643ada0a632915d1dcdd377557647093f29cbe251ba8c33724"
],
"threshold": 1
}
},
"expiration": "2022-06-01T19:16:49Z",
"metadata_spec_version": "0.6.0",
"timestamp": "2021-06-01T19:16:49Z",
"type": "root",
"version": 1
}
} Key mgr role: {
"signatures": {
"013ddd714962866d12ba5bae273f14d48c89cf0773dee2dbf6d4561e521c83f7": {
"signature": "20d8728ae8ba212e6229f9a69b3de14cd747fcd20cfaa1c5d39111cc6aad7a94036187a6c49e13a531d08c282a0d11b07c276d0f0773dc5344f54a14fb0d7700"
}
},
"signed": {
"delegations": {
"pkg_mgr": {
"pubkeys": [
"f46b5a7caa43640744186564c098955147daa8bac4443887bc64d8bfee3d3569"
],
"threshold": 1
}
},
"expiration": "2022-06-01T19:16:49Z",
"metadata_spec_version": "0.6.0",
"timestamp": "2021-06-01T19:16:49Z",
"type": "key_mgr",
"version": 1
}
} Storing signed metadataImpact on the binary cache file (pickled/.solv) of storing package metadata as a string on the deserialized object:
Evaluation made on
|
Is that with libsolv commit e13455d011710a99ef1dfb33432044cc7eae0efb? |
It was done with the branch used for the related PR on libsolv linked just above |
Hi there, thank you for your contribution! This issue has been automatically marked as stale because it has not had recent activity. It will be closed automatically if no further activity occurs. If you would like this issue to remain open please:
NOTE: If this issue was closed prematurely, please leave a comment. Thanks! |
Not stale |
Hi!
This is a discussion about handling of package metadata of
repodata.json
index file. It concerns bothconda
andconda-content-trust
projects.Introduction/Context
Package signature
Signing a package metadata of a
repodata.json
is currently done over a canonicalized JSON object at/packages/<fn>
, producing a signature stored at/signatures/<fn>/<keyid>
as a JSON object.To perform the verification of the metadata, the exact same data has to be obviously used (not using the same content would produce a verification/signature error).
Verification process
That's the reason why verification has been implemented before any updates of the content parsed from the
repodata.json
:https://github.com/chenghlee/conda/blob/240287064b3b095c6ed304c3fcdadb659e888b76/conda/core/subdir_data.py#L554
From my understanding of the current implementation (do not hesitate if there is missing or incorrect info):
PackageRecord
metadata_signature_status
may be logged, for packages that are not concerned by the current operationPropositions
Package signature
Another implementation would be to check signatures of packages lazily before fetching them:
repodata.json
files)Storage of signed package metadata
For the proposed implementation as well as for other possible technical solutions, it looks interesting to be able to reproduce or store package signed metadata.
Here are few possible solutions I would like to discuss:
Normalization
Normalizing package signed metadata looks a good way to provide flexibility about where signatures are verified, without paying an extra cost of memory, no I/O and few CPU usage.
An example of file->deserialization->serialization difficulty is:
depends
key will be deserialized as an empty array/vectordepends: []
Here is a first proposition of keys:
Types definitions are well defined in https://github.com/conda/schemas and it could be nice to have a dedicated JSON schema for that. Maybe it would be redundant with https://github.com/conda/schemas/blob/master/repodata-1.schema.json and that schema could be used as-is TBD.
Storing signed metadata
Another option would be to parse the
repodata.json
file only once, but while deserializing it to operate some modifications to this/these structure(s) (adding information, modifying others) making re-serialization not 100% consistent with initial data, we could also add it the serialized initial metadata.This storage would look like >95% duplicated information, and may be prohibitive for large repodata files (memory footprint).
Maybe signing a hash (SHA-256) of those metadata would make possible and non-prohibitive this storage. The security impact has to be assessed.
Parsing repodata multiple times
After solving and just before fetching data, the
repodata.json
files are stored in the cache folder and it's still possible to parse them again to get the original package metadata for the targets to be downloaded.This option has the advantage to work without any change, but I would not recommend it for performance reasons.
Feedback much appreciated!
The text was updated successfully, but these errors were encountered: