Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support downloading GitHub release assets #48

Merged
merged 1 commit into from
May 17, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 46 additions & 27 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@
| `Issues <https://github.com/con/tinuous/issues>`_
| `Changelog <https://github.com/con/tinuous/blob/master/CHANGELOG.md>`_

``tinuous`` is a command for downloading build logs and (for GitHub Actions
only) artifacts for a GitHub repository from GitHub Actions, Travis-CI.com,
and/or Appveyor.
``tinuous`` is a command for downloading build logs and (for GitHub
only) artifacts & release assets for a GitHub repository from GitHub Actions,
Travis-CI.com, and/or Appveyor.

Installation
============
Expand Down Expand Up @@ -128,6 +128,13 @@ keys:
current working directory) under which the run's artifacts will be
saved. If this is not specified, no artifacts will be downloaded.

``releases_path``
*(optional)* A template string that will be instantiated for each
(non-draft, non-prerelease) GitHub release to produce the path for
the directory (relative to the current working directory) under
which the release's assets will be saved. If this is not
specified, no release assets will be downloaded.

``workflows``
*(optional)* A list of the filenames for the workflows for which to
retrieve assets. The filenames should only consist of the workflow
Expand Down Expand Up @@ -224,11 +231,12 @@ A sample config file:

repo: datalad/datalad
vars:
path_prefix: '{year}/{month}/{day}/{ci}/{type}/{type_id}/{commit}'
path_prefix: '{year}//{month}//{day}/{ci}/{type}/{type_id}/{commit}'
ci:
github:
path: '{path_prefix}/{wf_name}/{number}/'
artifacts_path: '{path_prefix}/{wf_name}/{number}-artifacts/'
path: '{path_prefix}/{wf_name}/{number}/logs/'
artifacts_path: '{path_prefix}/{wf_name}/{number}/artifacts/'
releases_path: '{path_prefix}/'
workflows:
- test_crippled.yml
- test_extensions.yml
Expand All @@ -255,50 +263,61 @@ A sample config file:
Path Templates
--------------

The path at which assets for a given workflow run or build job are saved is
determined by instantiating the path template string given in the configuration
file for the corresponding CI system. A template string is a filepath
containing placeholders of the form ``{field}``, where the available
placeholders are:
The path at which assets for a given workflow run, build job, or release are
saved is determined by instantiating the appropriate path template string given
in the configuration file for the corresponding CI system. A template string
is a filepath containing placeholders of the form ``{field}``, where the
available placeholders are:

=================== ==========================================================
Placeholder Definition
=================== ==========================================================
``{year}`` The four-digit year in which the build was started
``{month}`` The two-digit month in which the build was started
``{day}`` The two-digit day in which the build was started
``{hour}`` The two-digit hour at which the build was started
``{minute}`` The two-digit minute at which the build was started
``{second}`` The two-digit second at which the build was started
``{year}`` The four-digit year in which the build was started or the
release was published
``{month}`` The two-digit month in which the build was started or the
release was published
``{day}`` The two-digit day in which the build was started or the
release was published
``{hour}`` The two-digit hour at which the build was started or the
release was published
``{minute}`` The two-digit minute at which the build was started or the
release was published
``{second}`` The two-digit second at which the build was started or the
release was published
``{ci}`` The name of the CI system (``github``, ``travis``, or
``appveyor``)
``{type}`` The event type that triggered the build (``cron``, ``pr``,
or ``push``)
or ``push``), or ``release`` for GitHub releases
``{type_id}`` Further information on the triggering event; for ``cron``,
this is a timestamp for the start of the build; for
``pr``, this is the number of the associated pull request,
or ``UNK`` if it cannot be determined; for ``push``, this
is the name of the branch to which the push was made (or
possibly the tag that was pushed, if using Appveyor)
``{commit}`` The hash of the commit the build ran against
possibly the tag that was pushed, if using Appveyor); for
``release``, this is the name of the tag
``{commit}`` The hash of the commit the build ran against or that was
tagged for the release
``{abbrev_commit}`` The first seven characters of the commit hash
``{number}`` The run number of the workflow run (GitHub) or the build
number (Travis and Appveyor)
number (Travis and Appveyor) [1]_
``{status}`` The success status of the workflow run (GitHub) or job
(Travis and Appveyor); the exact strings used depend on
the CI system
the CI system [1]_
``{common_status}`` The success status of the workflow run or job, normalized
into one of ``success``, ``failed``, ``errored``, or
``incomplete``
``{wf_name}`` *(GitHub only)* The name of the workflow
``incomplete`` [1]_
``{wf_name}`` *(GitHub only)* The name of the workflow [1]_
``{wf_file}`` *(GitHub only)* The basename of the workflow file
(including the file extension)
``{run_id}`` *(GitHub only)* The unique ID of the workflow run
(including the file extension) [1]_
``{run_id}`` *(GitHub only)* The unique ID of the workflow run [1]_
``{job}`` *(Travis and Appveyor only)* The number of the job,
without the build number prefix (Travis) or the job ID
string (Appveyor)
string (Appveyor) [1]_
=================== ==========================================================

.. [1] These placeholders are only available for ``path`` and
``artifacts_path``, not ``releases_path``

All timestamps and timestamp components are in UTC.

Path templates may also contain custom placeholders defined in the top-level
Expand Down
148 changes: 131 additions & 17 deletions src/tinuous/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
from datalad.api import Dataset
from dateutil.parser import isoparse
from github import Github
from github.Repository import Repository
from github.Workflow import Workflow
from github.WorkflowRun import WorkflowRun
from in_place import InPlace
Expand Down Expand Up @@ -178,17 +179,24 @@ def client(self) -> Github:

@cached_property
def dl_session(self) -> requests.Session:
"""
`requests.Session` used for downloading resources and other actions not
supported by Pygithub
"""
s = requests.Session()
s.headers["Authorization"] = f"token {self.token}"
return s

@cached_property
def ghrepo(self) -> Repository:
return self.client.get_repo(self.repo)

def get_workflows(self) -> Iterator[Workflow]:
repo = self.client.get_repo(self.repo)
if self.workflows is None:
yield from repo.get_workflows()
yield from self.ghrepo.get_workflows()
else:
for wffile in self.workflows:
yield repo.get_workflow(wffile)
yield self.ghrepo.get_workflow(wffile)

def get_assets(
self, event_types: List[EventType], artifacts: bool = False
Expand Down Expand Up @@ -263,6 +271,45 @@ def get_artifacts(self, run: WorkflowRun) -> Iterator[Tuple[str, str]]:
yield (artifact["name"], artifact["archive_download_url"])
url = r.links.get("next", {}).get("url")

def get_release_assets(self) -> Iterator["GHReleaseAsset"]:
log.info("Fetching releases newer than %s", self.since)
for rel in self.ghrepo.get_releases():
if rel.draft:
log.info("Release %s is draft; skipping", rel.tag_name)
continue
if rel.prerelease:
log.info("Release %s is prerelease; skipping", rel.tag_name)
continue
ts = ensure_aware(rel.published_at)
if ts <= self.since:
continue
self.register_build(ts, True) # TODO: Set to False for drafts?
log.info("Found release %s", rel.tag_name)
r = self.dl_session.get(
f"https://api.github.com/repos/{self.repo}/git/refs/tags/{rel.tag_name}"
)
r.raise_for_status()
tagobj = r.json()["object"]
if tagobj["type"] == "commit":
commit = tagobj["sha"]
elif tagobj["type"] == "tag":
r = self.dl_session.get(tagobj["url"])
r.raise_for_status()
commit = r.json()["object"]["sha"]
else:
raise RuntimeError(
f"Unexpected type for tag {rel.tag_name}: {tagobj['type']!r}"
)
for asset in rel.get_assets():
yield GHReleaseAsset(
session=self.dl_session,
published_at=ts,
tag_name=rel.tag_name,
commit=commit,
name=asset.name,
download_url=asset.browser_download_url,
)


class GHAAsset(Asset):
session: requests.Session
Expand Down Expand Up @@ -416,6 +463,59 @@ def download(self, path: Path) -> List[Path]:
return list(iterfiles(target_dir))


class GHReleaseAsset(BaseModel):
session: requests.Session
published_at: datetime
tag_name: str
commit: str
name: str
download_url: str

class Config:
# To allow requests.Session:
arbitrary_types_allowed = True

def path_fields(self) -> Dict[str, str]:
utc_date = self.published_at.astimezone(timezone.utc)
return {
"year": utc_date.strftime("%Y"),
"month": utc_date.strftime("%m"),
"day": utc_date.strftime("%d"),
"hour": utc_date.strftime("%H"),
"minute": utc_date.strftime("%M"),
"second": utc_date.strftime("%S"),
"ci": "github",
"type": "release",
"type_id": self.tag_name,
"commit": self.commit,
"abbrev_commit": self.commit[:7],
}

def expand_path(self, path_template: str, vars: Dict[str, str]) -> str:
return expand_template(path_template, self.path_fields(), vars)

def download(self, path: Path) -> List[Path]:
target = path / self.name
if target.exists():
log.info(
"Asset %s for release %s already downloaded to %s; skipping",
self.name,
self.tag_name,
target,
)
return []
path.mkdir(parents=True, exist_ok=True)
log.info(
"Downloading asset %s for release %s to %s",
self.name,
self.tag_name,
target,
)
r = self.session.get(self.download_url, stream=True)
stream_to_file(r, target)
return [target]


class Travis(CISystem):
@staticmethod
def get_auth_token() -> str:
Expand Down Expand Up @@ -725,6 +825,7 @@ def get_system(self, repo: str, since: datetime, token: str) -> CISystem:

class GitHubConfig(CIConfig):
artifacts_path: Optional[str] = None
releases_path: Optional[str] = None
workflows: Optional[List[str]] = None

@staticmethod
Expand Down Expand Up @@ -876,11 +977,10 @@ def fetch(cfg: Config, state: str, sanitize_secrets: bool) -> None:
ds.create(force=True, cfg_proc=cfg.datalad.cfg_proc)
logs_added = 0
artifacts_added = 0
relassets_added = 0
for name, cicfg in cfg.ci.items():
get_artifacts = getattr(cicfg, "artifacts_path", None) is not None
log.info(
"Fetching logs%s from %s", " and artifacts" if get_artifacts else "", name
)
log.info("Fetching resources from %s", name)
try:
since = datetime.fromisoformat(since_stamps[name])
except KeyError:
Expand All @@ -898,13 +998,7 @@ def fetch(cfg: Config, state: str, sanitize_secrets: bool) -> None:
else:
raise AssertionError(f"Unexpected asset type {type(obj).__name__}")
if cfg.datalad.enabled:
dspaths = path.split("//")
if "" in dspaths:
raise click.UsageError("Path contains empty '//'-delimited segment")
for i in range(1, len(dspaths)):
dsp = "/".join(dspaths[:i])
if not Path(dsp).exists():
ds.create(dsp, cfg_proc=cfg.datalad.cfg_proc)
ensure_datalad(ds, path, cfg.datalad.cfg_proc)
paths = obj.download(Path(path))
if isinstance(obj, BuildLog):
logs_added += len(paths)
Expand All @@ -913,17 +1007,27 @@ def fetch(cfg: Config, state: str, sanitize_secrets: bool) -> None:
sanitize(p, cfg.secrets, cfg.allow_secrets_regex)
elif isinstance(obj, Artifact):
artifacts_added += len(paths)
if isinstance(cicfg, GitHubConfig) and cicfg.releases_path is not None:
assert isinstance(ci, GitHubActions)
for asset in ci.get_release_assets():
path = asset.expand_path(cicfg.releases_path, cfg.vars)
if cfg.datalad.enabled:
ensure_datalad(ds, path, cfg.datalad.cfg_proc)
paths = asset.download(Path(path))
relassets_added += len(paths)
since_stamps[name] = ci.new_since().isoformat()
log.debug("%s timestamp floor updated to %s", name, since_stamps[name])
with open(state, "w") as fp:
json.dump(since_stamps, fp)
log.info("%d logs downloaded", logs_added)
if get_artifacts:
log.info("%d artifacts downloaded", artifacts_added)
if cfg.datalad.enabled and (logs_added or artifacts_added):
log.info("%d artifacts downloaded", artifacts_added)
log.info("%d release assets downloaded", relassets_added)
if cfg.datalad.enabled and (logs_added or artifacts_added or relassets_added):
msg = f"[tinuous] {logs_added} logs added"
if get_artifacts:
if artifacts_added:
msg += f", {artifacts_added} artifacts added"
if relassets_added:
msg += f", {relassets_added} release assets added"
ds.save(recursive=True, message=msg)


Expand Down Expand Up @@ -993,5 +1097,15 @@ def iterfiles(dirpath: Path) -> Iterator[Path]:
yield p


def ensure_datalad(ds: Dataset, path: str, cfg_proc: Optional[str]) -> None:
dspaths = path.split("//")
if "" in dspaths:
raise click.UsageError("Path contains empty '//'-delimited segment")
for i in range(1, len(dspaths)):
dsp = "/".join(dspaths[:i])
if not Path(dsp).exists():
ds.create(dsp, cfg_proc=cfg_proc)


if __name__ == "__main__":
main()