Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ability to download cached workspace #520

Merged
merged 35 commits into from
Mar 27, 2024
Merged
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
8b79acc
create "stale" field on workspace state
willmurphyscode Mar 18, 2024
9d17b8b
WIP add configs
willmurphyscode Mar 19, 2024
4e16c21
lint fix
willmurphyscode Mar 19, 2024
44901fc
[wip] working on vunnel results db listing
wagoodman Mar 19, 2024
4d54514
update and tests for safe_extract_tar
willmurphyscode Mar 19, 2024
8834a49
[wip] adding tests for fetching listing and archives
wagoodman Mar 19, 2024
dc7200f
[wip] add more negative tests for provider tests
wagoodman Mar 19, 2024
46b8127
unit test for new workspace changes
willmurphyscode Mar 20, 2024
075405a
replace the workspace results instead of overlaying
willmurphyscode Mar 20, 2024
94d8158
clean up hasher implementation
wagoodman Mar 20, 2024
a3c472c
add tests for prep workspace from listing entry
willmurphyscode Mar 20, 2024
4a87c4c
do not include inputs in tar test fixture
wagoodman Mar 20, 2024
5257544
vunnel fetch existing workspace working
willmurphyscode Mar 20, 2024
1886e9c
add unit test for full update flow
willmurphyscode Mar 21, 2024
7d1202b
update existing unit tests for new config values
willmurphyscode Mar 21, 2024
54cdfd6
add unit test for default behavior of new configs
willmurphyscode Mar 21, 2024
329ff37
lint fix
willmurphyscode Mar 21, 2024
92d4878
add missing annotations import
willmurphyscode Mar 21, 2024
6c8de19
Use 3.9 compatible annotations
willmurphyscode Mar 22, 2024
ee32714
validate that enabling import results requires host and path
willmurphyscode Mar 22, 2024
c930822
rename listing field and add schema
wagoodman Mar 23, 2024
3c083a1
only require github token when downloading
wagoodman Mar 23, 2024
27cf42c
add zstd support
wagoodman Mar 24, 2024
28e5c16
add tests for zstd support
wagoodman Mar 25, 2024
a3ca88b
add tests for _has_newer_archive
willmurphyscode Mar 25, 2024
1631623
fix tests for zstd
wagoodman Mar 25, 2024
27696a3
show stderr to log when git commands fail
wagoodman Mar 25, 2024
b150d02
move import_results to common field on provider
willmurphyscode Mar 25, 2024
68b3197
add concept for distribution version
wagoodman Mar 25, 2024
21e2bd0
single source of truth for provider schemas
wagoodman Mar 25, 2024
62186c0
add distribution-version to schema, provider state, and listing entry
wagoodman Mar 25, 2024
c3d65c1
clear workspace on different dist version
wagoodman Mar 25, 2024
edc2c65
fix defaulting logic and update tests
willmurphyscode Mar 26, 2024
aa9bea8
default distribution version and path
willmurphyscode Mar 26, 2024
2c3c498
make "" and None both use default path
willmurphyscode Mar 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 148 additions & 2 deletions poetry.lock

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ importlib-metadata = "^7.0.1"
xsdata = {extras = ["cli", "lxml", "soap"], version = ">=22.12,<25.0"}
pytest-snapshot = "^0.9.0"
mashumaro = "^3.10"
iso8601 = "^2.1.0"
zstandard = "^0.22.0"

[tool.poetry.group.dev.dependencies]
pytest = ">=7.2.2,<9.0.0"
Expand Down
17 changes: 17 additions & 0 deletions schema/provider-archive-listing/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# `ProviderState` JSON Schema

This schema governs the `listing.json` file used when providers are configured to fetch pre-computed results (by using `import_results_enabled`). The listing file is how the provider knows what results are available, where to fetch them from, and how to validate them.

See `src/vunnel.distribution.Listing` for the root object that represents this schema.

## Updating the schema

Versioning the JSON schema must be done manually by copying the existing JSON schema into a new `schema-x.y.z.json` file and manually making the necessary updates (or by using an online tool such as https://www.liquid-technologies.com/online-json-to-schema-converter).

This schema is being versioned based off of the "SchemaVer" guidelines, which slightly diverges from Semantic Versioning to tailor for the purposes of data models.

Given a version number format `MODEL.REVISION.ADDITION`:

- `MODEL`: increment when you make a breaking schema change which will prevent interaction with any historical data
- `REVISION`: increment when you make a schema change which may prevent interaction with some historical data
- `ADDITION`: increment when you make a schema change that is compatible with all historical data
66 changes: 66 additions & 0 deletions schema/provider-archive-listing/schema-1.0.0.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"schema": {
"type": "object",
"properties": {
"version": {
"type": "string"
},
"url": {
"type": "string"
}
},
"required": [
"version",
"url"
]
},
"provider": {
"type": "string"
},
"available": {
"type": "object",
"properties": {
"1": {
"type": "array",
"items": [
{
"type": "object",
"properties": {
"distribution_checksum": {
"type": "string"
},
"built": {
"type": "string"
},
"checksum": {
"type": "string"
},
"url": {
"type": "string"
},
"version": {
"type": "integer"
}
},
"required": [
"built",
"checksum",
"distribution_checksum",
"url",
"version"
]
}
]
}
}
}
},
"required": [
"schema",
"available",
"provider"
]
}
80 changes: 80 additions & 0 deletions schema/provider-workspace-state/schema-1.0.2.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"title": "provider-workspace-state",
"description": "describes the filesystem state of a provider workspace directory",
"properties": {
"provider": {
"type": "string"
},
"urls": {
"type": "array",
"items": [
{
"type": "string"
}
]
},
"store": {
"type": "string"
},
"timestamp": {
"type": "string"
},
"listing": {
"type": "object",
"properties": {
"digest": {
"type": "string"
},
"path": {
"type": "string"
},
"algorithm": {
"type": "string"
}
},
"required": [
"digest",
"path",
"algorithm"
]
},
"version": {
"type": "integer",
"description": "version describing the result data shape + the provider processing behavior semantics"
},
"distribution_version": {
"type": "integer",
"description": "version describing purely the result data shape"
},
"schema": {
"type": "object",
"properties": {
"version": {
"type": "string"
},
"url": {
"type": "string"
}
},
"required": [
"version",
"url"
]
},
"stale": {
"type": "boolean",
"description": "set to true if the workspace is stale and cannot be used for an incremental update"
}
},
"required": [
"provider",
"urls",
"store",
"timestamp",
"listing",
"version",
"schema"
]
}
50 changes: 45 additions & 5 deletions src/vunnel/cli/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,33 @@

import os
from dataclasses import dataclass, field, fields
from typing import Any
from typing import TYPE_CHECKING, Any

if TYPE_CHECKING:
from collections.abc import Generator

import mergedeep
import yaml
from mashumaro.mixins.dict import DataClassDictMixin

from vunnel import providers
from vunnel import provider, providers


@dataclass
class ImportResults:
"""
These are the defaults for all providers. Corresponding
fields on specific providers override these values.
"""

host: str = ""
path: str = "{provider_name}/listing.json"
enabled: bool = False


@dataclass
class CommonProviderConfig:
import_results: ImportResults = field(default_factory=ImportResults)


@dataclass
Expand All @@ -26,12 +46,32 @@ class Providers:
ubuntu: providers.ubuntu.Config = field(default_factory=providers.ubuntu.Config)
wolfi: providers.wolfi.Config = field(default_factory=providers.wolfi.Config)

common: CommonProviderConfig = field(default_factory=CommonProviderConfig)

def __post_init__(self) -> None:
for name in self.provider_names():
runtime_cfg = getattr(self, name).runtime
if runtime_cfg and isinstance(runtime_cfg, provider.RuntimeConfig):
if runtime_cfg.import_results_enabled is None:
runtime_cfg.import_results_enabled = self.common.import_results.enabled
if not runtime_cfg.import_results_host:
runtime_cfg.import_results_host = self.common.import_results.host
if not runtime_cfg.import_results_path:
runtime_cfg.import_results_path = self.common.import_results.path

def get(self, name: str) -> Any | None:
for f in fields(Providers):
if self._normalize_name(f.name) == self._normalize_name(name):
return getattr(self, f.name)
for candidate in self.provider_names():
if self._normalize_name(candidate) == self._normalize_name(name):
return getattr(self, candidate)
return None

@staticmethod
def provider_names() -> Generator[str, None, None]:
for f in fields(Providers):
if f.name == "common":
continue
yield f.name

@staticmethod
def _normalize_name(name: str) -> str:
return name.lower().replace("-", "_")
Expand Down
89 changes: 89 additions & 0 deletions src/vunnel/distribution.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
from __future__ import annotations

import datetime
import os
from dataclasses import dataclass, field
from urllib.parse import urlparse

import iso8601
from mashumaro.mixins.dict import DataClassDictMixin

from vunnel import schema as schema_def

DB_SUFFIXES = {".tar.gz", ".tar.zst"}


@dataclass
class ListingEntry(DataClassDictMixin):
# the date this archive was built relative to the data enclosed in the archive
built: str

# the provider distribution version this archive was built with (different than the provider version)
distribution_version: int

# the URL where the vunnel provider archive is located
url: str

# the digest of the archive referenced at the URL.
# Note: all checksums are labeled with "algorithm:value" ( e.g. sha256:1234567890abcdef1234567890abcdef)
distribution_checksum: str

# the digest of the checksums file within the archive referenced at the URL
# Note: all checksums are labeled with "algorithm:value" ( e.g. xxhash64:1234567890abcdef)
enclosed_checksum: str

def basename(self) -> str:
basename = os.path.basename(urlparse(self.url, allow_fragments=False).path)
if not _has_suffix(basename, suffixes=DB_SUFFIXES):
msg = f"entry url is not a db archive: {basename}"
raise RuntimeError(msg)

return basename

def age_in_days(self, now: datetime.datetime | None = None) -> int:
if not now:
now = datetime.datetime.now(tz=datetime.timezone.utc)
return (now - iso8601.parse_date(self.built)).days


@dataclass
class ListingDocument(DataClassDictMixin):
# mapping of provider versions to a list of ListingEntry objects denoting archives available for download
available: dict[int, list[ListingEntry]]

# the provider name this document is associated with
provider: str

# the schema information for this document
schema: schema_def.Schema = field(default_factory=schema_def.ProviderListingSchema)

@classmethod
def new(cls, provider: str) -> ListingDocument:
return cls(available={}, provider=provider)

def latest_entry(self, schema_version: int) -> ListingEntry | None:
if schema_version not in self.available:
return None

if not self.available[schema_version]:
return None

return self.available[schema_version][0]

def add(self, entry: ListingEntry) -> None:
if not self.available.get(entry.distribution_version):
self.available[entry.distribution_version] = []

self.available[entry.distribution_version].append(entry)

# keep listing entries sorted by date (rfc3339 formatted entries, which iso8601 is a superset of)
self.available[entry.distribution_version].sort(
key=lambda x: iso8601.parse_date(x.built),
reverse=True,
)


def _has_suffix(el: str, suffixes: set[str] | None) -> bool:
if not suffixes:
return True
return any(el.endswith(s) for s in suffixes)
Loading
Loading