-
Notifications
You must be signed in to change notification settings - Fork 24
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add ability to download cached workspace (#520)
* create "stale" field on workspace state A provider that downloads its workspace state directly cannot assume that this state is a valid basis for a future incremental update, and should mark the downloaded workspace as stale. Signed-off-by: Will Murphy <will.murphy@anchore.com> * WIP add configs Signed-off-by: Will Murphy <will.murphy@anchore.com> * lint fix Signed-off-by: Will Murphy <will.murphy@anchore.com> * [wip] working on vunnel results db listing Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com> * update and tests for safe_extract_tar Now that we're using it for more than one thing, make an extractor that generally prevents path traversal. Signed-off-by: Will Murphy <will.murphy@anchore.com> * [wip] adding tests for fetching listing and archives Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com> * [wip] add more negative tests for provider tests Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com> * unit test for new workspace changes Signed-off-by: Will Murphy <will.murphy@anchore.com> * replace the workspace results instead of overlaying Signed-off-by: Will Murphy <will.murphy@anchore.com> * clean up hasher implementation Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com> * add tests for prep workspace from listing entry Signed-off-by: Will Murphy <will.murphy@anchore.com> * do not include inputs in tar test fixture Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com> * vunnel fetch existing workspace working Signed-off-by: Will Murphy <will.murphy@anchore.com> * add unit test for full update flow Signed-off-by: Will Murphy <will.murphy@anchore.com> * update existing unit tests for new config values Signed-off-by: Will Murphy <will.murphy@anchore.com> * add unit test for default behavior of new configs Signed-off-by: Will Murphy <will.murphy@anchore.com> * lint fix Signed-off-by: Will Murphy <will.murphy@anchore.com> * add missing annotations import Signed-off-by: Will Murphy <will.murphy@anchore.com> * Use 3.9 compatible annotations Relying on the from __future__ import annotations doesn't work with the mashumaro. Signed-off-by: Will Murphy <will.murphy@anchore.com> * validate that enabling import results requires host and path Signed-off-by: Will Murphy <will.murphy@anchore.com> * rename listing field and add schema Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com> * only require github token when downloading Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com> * add zstd support Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com> * add tests for zstd support Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com> * add tests for _has_newer_archive Signed-off-by: Will Murphy <will.murphy@anchore.com> * fix tests for zstd Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com> * show stderr to log when git commands fail Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com> * move import_results to common field on provider Signed-off-by: Will Murphy <will.murphy@anchore.com> * add concept for distribution version Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com> * single source of truth for provider schemas Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com> * add distribution-version to schema, provider state, and listing entry Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com> * clear workspace on different dist version Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com> * fix defaulting logic and update tests Signed-off-by: Will Murphy <will.murphy@anchore.com> * default distribution version and path Signed-off-by: Will Murphy <will.murphy@anchore.com> * make "" and None both use default path Signed-off-by: Will Murphy <will.murphy@anchore.com> --------- Signed-off-by: Will Murphy <will.murphy@anchore.com> Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com> Co-authored-by: Alex Goodman <wagoodman@users.noreply.github.com>
- Loading branch information
1 parent
6b4fa38
commit 90b176c
Showing
41 changed files
with
1,967 additions
and
127 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# `ProviderState` JSON Schema | ||
|
||
This schema governs the `listing.json` file used when providers are configured to fetch pre-computed results (by using `import_results_enabled`). The listing file is how the provider knows what results are available, where to fetch them from, and how to validate them. | ||
|
||
See `src/vunnel.distribution.Listing` for the root object that represents this schema. | ||
|
||
## Updating the schema | ||
|
||
Versioning the JSON schema must be done manually by copying the existing JSON schema into a new `schema-x.y.z.json` file and manually making the necessary updates (or by using an online tool such as https://www.liquid-technologies.com/online-json-to-schema-converter). | ||
|
||
This schema is being versioned based off of the "SchemaVer" guidelines, which slightly diverges from Semantic Versioning to tailor for the purposes of data models. | ||
|
||
Given a version number format `MODEL.REVISION.ADDITION`: | ||
|
||
- `MODEL`: increment when you make a breaking schema change which will prevent interaction with any historical data | ||
- `REVISION`: increment when you make a schema change which may prevent interaction with some historical data | ||
- `ADDITION`: increment when you make a schema change that is compatible with all historical data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
{ | ||
"$schema": "http://json-schema.org/draft-04/schema#", | ||
"type": "object", | ||
"properties": { | ||
"schema": { | ||
"type": "object", | ||
"properties": { | ||
"version": { | ||
"type": "string" | ||
}, | ||
"url": { | ||
"type": "string" | ||
} | ||
}, | ||
"required": [ | ||
"version", | ||
"url" | ||
] | ||
}, | ||
"provider": { | ||
"type": "string" | ||
}, | ||
"available": { | ||
"type": "object", | ||
"properties": { | ||
"1": { | ||
"type": "array", | ||
"items": [ | ||
{ | ||
"type": "object", | ||
"properties": { | ||
"distribution_checksum": { | ||
"type": "string" | ||
}, | ||
"built": { | ||
"type": "string" | ||
}, | ||
"checksum": { | ||
"type": "string" | ||
}, | ||
"url": { | ||
"type": "string" | ||
}, | ||
"version": { | ||
"type": "integer" | ||
} | ||
}, | ||
"required": [ | ||
"built", | ||
"checksum", | ||
"distribution_checksum", | ||
"url", | ||
"version" | ||
] | ||
} | ||
] | ||
} | ||
} | ||
} | ||
}, | ||
"required": [ | ||
"schema", | ||
"available", | ||
"provider" | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
{ | ||
"$schema": "http://json-schema.org/draft-04/schema#", | ||
"type": "object", | ||
"title": "provider-workspace-state", | ||
"description": "describes the filesystem state of a provider workspace directory", | ||
"properties": { | ||
"provider": { | ||
"type": "string" | ||
}, | ||
"urls": { | ||
"type": "array", | ||
"items": [ | ||
{ | ||
"type": "string" | ||
} | ||
] | ||
}, | ||
"store": { | ||
"type": "string" | ||
}, | ||
"timestamp": { | ||
"type": "string" | ||
}, | ||
"listing": { | ||
"type": "object", | ||
"properties": { | ||
"digest": { | ||
"type": "string" | ||
}, | ||
"path": { | ||
"type": "string" | ||
}, | ||
"algorithm": { | ||
"type": "string" | ||
} | ||
}, | ||
"required": [ | ||
"digest", | ||
"path", | ||
"algorithm" | ||
] | ||
}, | ||
"version": { | ||
"type": "integer", | ||
"description": "version describing the result data shape + the provider processing behavior semantics" | ||
}, | ||
"distribution_version": { | ||
"type": "integer", | ||
"description": "version describing purely the result data shape" | ||
}, | ||
"schema": { | ||
"type": "object", | ||
"properties": { | ||
"version": { | ||
"type": "string" | ||
}, | ||
"url": { | ||
"type": "string" | ||
} | ||
}, | ||
"required": [ | ||
"version", | ||
"url" | ||
] | ||
}, | ||
"stale": { | ||
"type": "boolean", | ||
"description": "set to true if the workspace is stale and cannot be used for an incremental update" | ||
} | ||
}, | ||
"required": [ | ||
"provider", | ||
"urls", | ||
"store", | ||
"timestamp", | ||
"listing", | ||
"version", | ||
"schema" | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
from __future__ import annotations | ||
|
||
import datetime | ||
import os | ||
from dataclasses import dataclass, field | ||
from urllib.parse import urlparse | ||
|
||
import iso8601 | ||
from mashumaro.mixins.dict import DataClassDictMixin | ||
|
||
from vunnel import schema as schema_def | ||
|
||
DB_SUFFIXES = {".tar.gz", ".tar.zst"} | ||
|
||
|
||
@dataclass | ||
class ListingEntry(DataClassDictMixin): | ||
# the date this archive was built relative to the data enclosed in the archive | ||
built: str | ||
|
||
# the URL where the vunnel provider archive is located | ||
url: str | ||
|
||
# the digest of the archive referenced at the URL. | ||
# Note: all checksums are labeled with "algorithm:value" ( e.g. sha256:1234567890abcdef1234567890abcdef) | ||
distribution_checksum: str | ||
|
||
# the digest of the checksums file within the archive referenced at the URL | ||
# Note: all checksums are labeled with "algorithm:value" ( e.g. xxhash64:1234567890abcdef) | ||
enclosed_checksum: str | ||
|
||
# the provider distribution version this archive was built with (different than the provider version) | ||
distribution_version: int = 1 | ||
|
||
def basename(self) -> str: | ||
basename = os.path.basename(urlparse(self.url, allow_fragments=False).path) | ||
if not _has_suffix(basename, suffixes=DB_SUFFIXES): | ||
msg = f"entry url is not a db archive: {basename}" | ||
raise RuntimeError(msg) | ||
|
||
return basename | ||
|
||
def age_in_days(self, now: datetime.datetime | None = None) -> int: | ||
if not now: | ||
now = datetime.datetime.now(tz=datetime.timezone.utc) | ||
return (now - iso8601.parse_date(self.built)).days | ||
|
||
|
||
@dataclass | ||
class ListingDocument(DataClassDictMixin): | ||
# mapping of provider versions to a list of ListingEntry objects denoting archives available for download | ||
available: dict[int, list[ListingEntry]] | ||
|
||
# the provider name this document is associated with | ||
provider: str | ||
|
||
# the schema information for this document | ||
schema: schema_def.Schema = field(default_factory=schema_def.ProviderListingSchema) | ||
|
||
@classmethod | ||
def new(cls, provider: str) -> ListingDocument: | ||
return cls(available={}, provider=provider) | ||
|
||
def latest_entry(self, schema_version: int) -> ListingEntry | None: | ||
if schema_version not in self.available: | ||
return None | ||
|
||
if not self.available[schema_version]: | ||
return None | ||
|
||
return self.available[schema_version][0] | ||
|
||
def add(self, entry: ListingEntry) -> None: | ||
if not self.available.get(entry.distribution_version): | ||
self.available[entry.distribution_version] = [] | ||
|
||
self.available[entry.distribution_version].append(entry) | ||
|
||
# keep listing entries sorted by date (rfc3339 formatted entries, which iso8601 is a superset of) | ||
self.available[entry.distribution_version].sort( | ||
key=lambda x: iso8601.parse_date(x.built), | ||
reverse=True, | ||
) | ||
|
||
|
||
def _has_suffix(el: str, suffixes: set[str] | None) -> bool: | ||
if not suffixes: | ||
return True | ||
return any(el.endswith(s) for s in suffixes) |
Oops, something went wrong.