Skip to content
dagss edited this page Apr 5, 2012 · 13 revisions

Build Artifact Cache

bac is a command-line tool written in Python that is responsible for managing the build artifact cache directory. The most important aspect of BAC is what it does not do, but leave as the responsibility of the caller.

BAC is a tool which is only used by package management systems (or a package developer during debugging).

The BAC store

The BAC store is a directory structure on disk that looks roughly like this:

$bacstore/sources/python-2.7.1.tar.gz  # cached downloaded tarball
$bacstore/sources/numpy.git            # raw git repository
$bacstore/sourcedb/hKUWhBuneltGSN4s0N-LMOpG27Q # see bac fetch command

$bacstore/builds/python-hvfkN-qlp-zhXR3cuerq6jd2Z7g/bin/python # already produces builds
$bacstore/builds/...

$backstore/tmp/numpy-6dcfXufJLW3J6S-9rRe4vUlBj5g/.... # build in progress or failed

Source fetching

The first feature of BAC is managing downloading source code and labeling it, so that it can be referred to in build instructions:

bac fetch /path/to/bacstore http://python.org/ftp/python/2.7.2/Python-2.7.2.tar.bz2 [optional md5 hash]
bac fetch /path/to/bacstore https://github.com/numpy/numpy.git 72185d34170369ec07e8e84ed18d2f6a814e327a

Each of these commands will

  • Download the given sources to $backstore/sources (this may be a wget or a git clone or a git remote add ...; git fetch ...)
  • Make a label (such as rDR41po8gfpi5g9cNpYWWk5easQ)
  • Create a database file in $bacstore/sourcedb/$label containing the location of the downloaded source (i.e. a tarfile; or a git repo + a git commit)

The sister command takes a label for a set of sources and unpacks them:

bac unpack rDR41po8gfpi5g9cNpYWWk5easQ [dest-dir]

(be aware of the git archive command when implementing this).

Building artifacts

The following command,:

bac build /path/to/bacstore mybuild.json

ensures that the build specified by the mybuild.json is available in the store (by building it if necesarry), and returns its location. The build specification file is described below.

On success, returns (on standard output) the resulting path to the built package. This could have been built on the fly, or been found already existing in the cache. (Some status information should be provided as one goes too so that one can tail logfiles etc. too; all easily parseable by a scripted called).

In the event of a failure, a directory will be left in $bacstore/tmp that can be inspected for post-mortem debugging. It is the responsibility of the caller to remove this.

bac check /path/to/bacstore mybuild.json

Checks if the build is already present

bac debug /path/to/bacstore mybuild.json

Like build, but only goes through setting up the environment for the build, then echoes the commands that it would have executed had a build been requested, then drops into a shell.

More commands that should be the responsibility of BAC:

  • Garbage collection

Build specification

The build specification fully describes a build, so that it can be hashed. Example:

{ 'name' : 'mypackage',
  'dependencies' :
    { 'numpy' : 'numpy-345wr23wrfw3r4w',
      'blas' : 'ATLAS-32rasdfasdfasdf',
      'gcc' : 'gcc-324qwed32e2q3d',
      'python' : 'python-q324rfaewfcwqrf',
      'bash' : 'bash-34raewfvq23rw'
    },
  'depend-files' : ['/usr/include/foo.h'],
  'copy-files' : ['/path/to/qsnake/spkgs/mypackage.install'],
  'files' : [
    { 'filename' : 'build.sh',
      'contents' : [
            'source $gcc/build_env',
            'export USE_FROBNICATOR=True',
            'export CFLAGS=-O3',
            'bash mypackage.install'
      ]
    },
    { 'filename' : 'mypackage.ini',
      'contents' : [
          'somearg = foo'
      ]
    }
  ],
  'sources' : '6dcfXufJLW3J6S-9rRe4vUlBj5g',
  'build_cmd' : ['$bash/bin/bash', 'build.sh'],
  'hash-payload': ['foo']
}

(Note: lists of strings are multi-line strings encoded as json)

name
Will be prepended to the hash string in the cache (for human-readability).
dependencies
Specifies existing build artifacts that is depended upon by this build. If these are not present already, fail. The full expanded list of dependencies should be present (it is not known that numpy once upon a time depended on Python). Each dependency will be present in the environment variables when running the build script. These strings are included in the hash.
depend-files
(Unordered list) Contents of these files is included in hash.
copy-files
(Unordered list) Lists files reachable on the local filesystem. The contents of these files are included in the hash, and the file copied to the build directory.
files
Files that should be created verbatim in the build directory.
sources
The label for the tarball or git commit for the sources that should be built (as passed to bac unpack). This must have been previously downloaded using bac fetch.
build_cmd
Command to run (with arguments) to do the build and install, provided as a list. Often this runs a script which is given in files. The command (first list element) can either be a command to be looked up in the environment PATH (bash), a full path (starts with /), or a binary from a previously built artifact (start with $, e.g., $perl/bin/perl, provided perl is given in dependencies.
hash-payload
(Ordered list) Additional bytes to throw into the hash (for whatever purposes the caller decides; e.g. some system state that cannot otherwise be tracked, such as the OS version etc.). Callers are encouraged to use user-readable strings so that it is obvious to a casual developer what is being hashed.

The build environment

By specification, the build process should always produce the same results when run, and the caller should not care whether a build happens or the built artifact is found in cache. When a change happens on the host environment so that a rebuild is required, this should be done by making sure the hash of the build changes. (However, commands for forcing a rebuild may be provided to facilitate debugging.)

Actually achieving build isolation and reproducability is still the responsibility of BAC's caller. These are encouraged to use an LD_PRELOAD sandbox to enforce this. There is no way for BAC to enforce anything.

As hinted to above in the source gcc/build_env line, there should be ways to quickly set up the necesarry build environment provided by the build artifacts one depends on. BAC may lay down some guidelines/conventions/utilities for this, or even build a full throw-away prefix for the build (details TBD).

Post-processing

The build artifact directories from a successful build are write-protected. Other post-processing may be deemed necessary too and should be built into BAC.

Profile building

Borrowing terminology from NIX, a profile is a build artifact that contains symlinks into all of its dependencies.

It may make sense to either build this functionality into BAC, or to do it in a build-script that is fed to BAC the usual way.

Fake build artifacts for host system software

One may wish to have fake build artifacts for system software simply to make the build process simpler for depending packages. If a package needs perl to build; it could access $perl/bin/perl, and then one would need a fake perl build artifact if one wishes to use the system perl.

This should probably be implemented without any support in BAC. E.g., one can call perl -v and put the results in hash-payload or files, and then let the build script be a series of ln -s commands.

Clone this wiki locally