Skip to content

Latest commit

 

History

History
1740 lines (1223 loc) · 69 KB

database.rst

File metadata and controls

1740 lines (1223 loc) · 69 KB

Database

As of version 0.8.0, Buildbot has used a database as part of its storage backend. This section describes the database connector classes, which allow other parts of Buildbot to access the database. It also describes how to modify the database schema and the connector classes themselves.

Note

Buildbot is only half-migrated to a database backend. Build and builder status information is still stored on disk in pickle files. This is difficult to fix, although work is underway.

Database Overview

All access to the Buildbot database is mediated by database connector classes. These classes provide a functional, asynchronous interface to other parts of Buildbot, and encapsulate the database-specific details in a single location in the codebase.

The connector API, defined below, is a stable API in Buildbot, and can be called from any other component. Given a master master, the root of the database connectors is available at master.db, so, for example, the state connector's getState method is master.db.state.getState.

The connectors all use SQLAlchemy Core to achieve (almost) database-independent operation. Note that the SQLAlchemy ORM is not used in Buildbot. Database queries are carried out in threads, and report their results back to the main thread via Twisted Deferreds.

Schema

The database schema is maintained with SQLAlchemy-Migrate. This package handles the details of upgrading users between different schema versions.

The schema itself is considered an implementation detail, and may change significantly from version to version. Users should rely on the API (below), rather than performing queries against the database itself.

API

types

Identifier

An "identifier" is a nonempty unicode string of limited length, containing only ASCII alphanumeric characters along with - (dash) and _ (underscore), and not beginning with a digit Wherever an identifier is used, the documentation will give the maximum length in characters. The function :pybuildbot.util.identifiers.isIdentifier is useful to verify a well-formed identifier.

buildrequests

double: BuildRequests; DB Connector Component

builds

double: Builds; DB Connector Component

steps

double: Steps; DB Connector Component

logs

double: Logs; DB Connector Component

buildsets

double: Buildsets; DB Connector Component

buildslaves

double: BuildSlaves; DB Connector Component

changes

double: Changes; DB Connector Component

changesources

double: ChangeSources; DB Connector Component

schedulers

double: Schedulers; DB Connector Component

sourcestamps

double: SourceStamps; DB Connector Component

state

double: State; DB Connector Component

users

double: Users; DB Connector Component

masters

double: Masters; DB Connector Component

builders

double: Builders; DB Connector Component

Writing Database Connector Methods

The information above is intended for developers working on the rest of Buildbot, and treating the database layer as an abstraction. The remainder of this section describes the internals of the database implementation, and is intended for developers modifying the schema or adding new methods to the database layer.

Warning

It's difficult to change the database schema significantly after it has been released, and very disruptive to users to change the database API. Consider very carefully the future-proofing of any changes here!

The DB Connector and Components

Direct Database Access

The connectors all use SQLAlchemy Core as a wrapper around database client drivers. Unfortunately, SQLAlchemy is a synchronous library, so some extra work is required to use it in an asynchronous context like Buildbot. This is accomplished by deferring all database operations to threads, and returning a Deferred. The ~buildbot.db.pool.Pool class takes care of the details.

A connector method should look like this:

def myMethod(self, arg1, arg2):
    def thd(conn):
        q = ... # construct a query
        for row in conn.execute(q):
            ... # do something with the results
        return ... # return an interesting value
    return self.db.pool.do(thd)

Picking that apart, the body of the method defines a function named thd taking one argument, a Connection <sqlalchemy:sqlalchemy.engine.base.Connection> object. It then calls self.db.pool.do, passing the thd function. This function is called in a thread, and can make blocking calls to SQLAlchemy as desired. The do method will return a Deferred that will fire with the return value of thd, or with a failure representing any exceptions raised by thd.

The return value of thd must not be an SQLAlchemy object - in particular, any ResultProxy <sqlalchemy:sqlalchemy.engine.base.ResultProxy> objects must be parsed into lists or other data structures before they are returned.

Warning

As the name thd indicates, the function runs in a thread. It should not interact with any other part of Buildbot, nor with any of the Twisted components that expect to be accessed from the main thread -- the reactor, Deferreds, etc.

Queries can be constructed using any of the SQLAlchemy core methods, using tables from ~buildbot.db.model.Model, and executed with the connection object, conn.

Database Schema

Database connector methods access the database through SQLAlchemy, which requires access to Python objects representing the database tables. That is handled through the model.

Caching

Connector component methods that get an object based on an ID are good candidates for caching. The ~buildbot.db.base.cached decorator makes this automatic:

In most cases, getter methods return a well-defined dictionary. Unfortunately, Python does not handle weak references to bare dictionaries, so components must instantiate a subclass of dict. The whole assembly looks something like this:

class ThDict(dict):
    pass

class ThingConnectorComponent(base.DBConnectorComponent):

    @base.cached('thdicts')
    def getThing(self, thid):
        def thd(conn):
            ...
            thdict = ThDict(thid=thid, attr=row.attr, ...)
            return thdict
        return self.db.pool.do(thd)

Tests

It goes without saying that any new connector methods must be fully tested!

You will also want to add an in-memory implementation of the methods to the fake classes in master/buildbot/test/fake/fakedb.py. Non-DB Buildbot code is tested using these fake implementations in order to isolate that code from the database code.

The keys and types used in the return value from a connector's get methods are described in :bbmaster/buildbot/test/util/validation.py, via the dbdict module-level value. This is a dictionary of DictValidator objects, one for each return value.

These values are used within test methods like this:

rv = yield self.db.masters.getMaster(7)
validation.verifyDbDict(self, 'masterdict', rv)

Modifying the Database Schema

Changes to the schema are accomplished through migration scripts, supported by SQLAlchemy-Migrate. In fact, even new databases are created with the migration scripts -- a new database is a migrated version of an empty database.

The schema is tracked by a version number, stored in the migrate_version table. This number is incremented for each change to the schema, and used to determine whether the database must be upgraded. The master will refuse to run with an out-of-date database.

To make a change to the schema, first consider how to handle any existing data. When adding new columns, this may not be necessary, but table refactorings can be complex and require caution so as not to lose information.

Create a new script in :bbmaster/buildbot/db/migrate/versions, following the numbering scheme already present. The script should have an update method, which takes an engine as a parameter, and upgrades the database, both changing the schema and performing any required data migrations. The engine passed to this parameter is "enhanced" by SQLAlchemy-Migrate, with methods to handle adding, altering, and dropping columns. See the SQLAlchemy-Migrate documentation for details.

Next, modify :bbmaster/buildbot/db/model.py to represent the updated schema. Buildbot's automated tests perform a rudimentary comparison of an upgraded database with the model, but it is important to check the details -key length, nullability, and so on can sometimes be missed by the checks. If the schema and the upgrade scripts get out of sync, bizarre behavior can result.

Also, adjust the fake database table definitions in :bbmaster/buildbot/test/fake/fakedb.py according to your changes.

Your upgrade script should have unit tests. The classes in :bbmaster/buildbot/test/util/migration.py make this straightforward. Unit test scripts should be named e.g., test_db_migrate_versions_015_remove_bad_master_objectid.py.

The master/buildbot/test/integration/test_upgrade.py also tests upgrades, and will confirm that the resulting database matches the model. If you encounter implicit indexes on MySQL, that do not appear on SQLite or Postgres, add them to implied_indexes in master/buidlbot/db/model.py.

Database Compatibility Notes

Or: "If you thought any database worked right, think again"

Because Buildbot works over a wide range of databases, it is generally limited to database features present in all supported backends. This section highlights a few things to watch out for.

In general, Buildbot should be functional on all supported database backends. If use of a backend adds minor usage restrictions, or cannot implement some kinds of error checking, that is acceptable if the restrictions are well-documented in the manual.

The metabuildbot tests Buildbot against all supported databases, so most compatibility errors will be caught before a release.

Index Length in MySQL

single: MySQL; limitations

MySQL only supports about 330-character indexes. The actual index length is 1000 bytes, but MySQL uses 3-byte encoding for UTF8 strings. This is a longstanding bug in MySQL - see "Specified key was too long; max key length is 1000 bytes" with utf8. While this makes sense for indexes used for record lookup, it limits the ability to use unique indexes to prevent duplicate rows.

InnoDB has even more severe restrictions on key lengths, which is why the MySQL implementation requires a MyISAM storage engine.

Transactions in MySQL

single: MySQL; limitations

Unfortunately, use of the MyISAM storage engine precludes real transactions in MySQL. transaction.commit() and transaction.rollback() are essentially no-ops: modifications to data in the database are visible to other users immediately, and are not reverted in a rollback.

Referential Integrity in SQLite and MySQL

single: SQLite; limitations

single: MySQL; limitations

Neither MySQL nor SQLite enforce referential integrity based on foreign keys. Postgres does enforce, however. If possible, test your changes on Postgres before committing, to check that tables are added and removed in the proper order.

Subqueries in MySQL

single: MySQL; limitations

MySQL's query planner is easily confused by subqueries. For example, a DELETE query specifying id's that are IN a subquery will not work. The workaround is to run the subquery directly, and then execute a DELETE query for each returned id.

If this weakness has a significant performance impact, it would be acceptable to conditionalize use of the subquery on the database dialect.