Skip to content
dagss edited this page Apr 1, 2012 · 13 revisions

Build Artifact Cache

bac is a command-line tool written in Python that is responsible for managing the build artifact cache directory. The most important aspect of BAC is what it does not do, but leave as the responsibility of the caller.

BAC is a tool which is only used by package management systems (or a package developer during debugging).

The BAC store

The BAC store is a directory structure on disk that looks roughly like this:

$bacstore/sources/python-2.7.1.tar.gz  # cached downloaded tarball
$bacstore/sources/numpy.git            # raw git repository
$bacstore/sourcedb/hKUWhBuneltGSN4s0N-LMOpG27Q # see bac fetch command

$bacstore/builds/python-hvfkN-qlp-zhXR3cuerq6jd2Z7g/bin/python # already produces builds
$bacstore/builds/...

$backstore/tmp/numpy-6dcfXufJLW3J6S-9rRe4vUlBj5g/.... # build in progress or failed

Source fetching

The first feature of BAC is managing downloading source code and labeling it, so that it can be referred to in build instructions:

bac fetch /path/to/bacstore http://python.org/ftp/python/2.7.2/Python-2.7.2.tar.bz2 [optional md5 hash]
bac fetch /path/to/bacstore https://github.com/numpy/numpy.git 72185d34170369ec07e8e84ed18d2f6a814e327a

Each of these commands will

  • Download the given sources to $backstore/sources (this may be a wget or a git clone or a git remote add ...; git fetch ...)
  • Make a label (such as rDR41po8gfpi5g9cNpYWWk5easQ)
  • Create a database file in $bacstore/sourcedb/$label containing the location of the downloaded source (i.e. a tarfile; or a git repo + a git commit)

The sister command takes a label for a set of sources and unpacks them:

bac unpack rDR41po8gfpi5g9cNpYWWk5easQ [dest-dir]

(be aware of the git archive command when implementing this).

Building artifacts

The following command,:

bac build /path/to/bacstore mybuild.json

ensures that the build specified by the mybuild.json is available in the store (by building it if necesarry), and returns its location. The build specification file is described below.

On success, returns (on standard output) the resulting path to the built package. This could have been built on the fly, or been found already existing in the cache. (Some status information should be provided as one goes too so that one can tail logfiles etc. too; all easily parseable by a scripted called).

In the event of a failure, a directory will be left in $bacstore/tmp that can be inspected for post-mortem debugging. It is the responsibility of the caller to remove this.

bac check /path/to/bacstore mybuild.json

Checks if the build is already present

bac debug /path/to/bacstore mybuild.json

Like build, but only goes through setting up the environment for the build, then echoes the commands that it would have executed had a build been requested, then drops into a shell.

More commands that should be the responsibility of BAC:

  • Garbage collection

Build specification

The build specification fully describes a build, so that it can be hashed. Example:

{ 'name' : 'mypackage',
  'dependencies' :
    { 'numpy' : 'numpy-345wr23wrfw3r4w',
      'blas' : 'ATLAS-32rasdfasdfasdf',
      'gcc' : 'gcc-324qwed32e2q3d',
      'python' : 'python-q324rfaewfcwqrf',
      'bash' : 'bash-34raewfvq23rw'
    },
  'depend-files' : ['/usr/include/foo.h'],
  'copy-files' : ['/path/to/qsnake/spkgs/mypackage.install'],
  'files' : [
    {'filename' : 'mypackage.ini',
      'contents' : [
          'somearg = foo'
      ]},
  'sources' : '6dcfXufJLW3J6S-9rRe4vUlBj5g',
  'build' :
      { 'interpreter' : '$bash/bin/bash --someflag $1',
        'script' : [
            'source $gcc/build_env',
            'export USE_FROBNICATOR=True',
            'export CFLAGS=-O3',
            'bash mypackage.install']
     },
  'hash-payload': ['foo']
}

(Note: lists of strings are multi-line strings encoded as json)

name
Will be prepended to the hash string in the cache (for human-readability).
dependencies
Specifies existing build artifacts that is depended upon by this build. If these are not present already, fail. The full expanded list of dependencies should be present (it is not known that numpy once upon a time depended on Python). Each dependency will be present in the environment variables when running the build script. These strings are included in the hash.
depend-files
(Unordered list) These files
copy-files
(Unordered list) Lists files reachable on the local filesystem. The contents of these files are included in the hash, and the file copied to the build directory.
files
Files that should be created verbatim in the build directory.
sources
The label for the tarball or git commit for the sources that should be built (as passed to bac unpack). This must have been previously downloaded using bac fetch.
build/interpreter
The interpreter command that is used (including arguments). Can be a command to be looked up in the environment PATH (bash), or a full path (starts with /), or it can start with $ and one of the keys of the dependencies dict, in which case one can select a binary from a previously built artifact.
build/script
Simply the script given to the interpreter. If $1 is found in the interpreter command line, this is saved to file and passed as that argument. Otherwise, feed it on standard input.
hash-payload
(Ordered list) Additional bytes to throw into the hash (for the purposes of the caller)

The build environment

By specification, the build process should always. When one depends on the host environment, this should be captured in the hash.

However, actually achieving this relies on the trusting of the packages' build scripts. These are encouraged to use an LD_PRELOAD sandbox to enforce this, but there is no way for BAC to enforce this.

As hinted to above in the source gcc/build_env line, there should be ways to quickly set up the necesarry build environment provided by the build artifacts one depends on. BAC may lay down some guidelines/conventions/utilities for this, or even build a full throw-away prefix for the build (details TBD).

Post-processing

The build artifact directories from a successful build are write-protected.

Profile building

Borrowing terminology from NIX, a profile is a build artifact that contains symlinks into all of its dependencies.

It may make sense to either build this functionality into the bac executable, or to do it in a build-script that is fed to BAC the usual way.

Fake build artifacts for host system software

One may wish to have fake build artifacts for system software simply to make the build process simpler for depending packages. If a package needs perl to build; it could access $perl/bin/perl, and then one would need a fake perl build artifact if one wishes to use the system perl.

This should probably be implemented without any support in BAC. E.g., one can call perl -v and put the results in hash-payload or files, and then let the build script be a series of ln -s commands.

Clone this wiki locally