Connect processes into powerful data pipelines with a simple git-like filesystem interface
OCaml Go Other
Latest commit 20ba8d2 Feb 20, 2017 @talex5 talex5 committed on GitHub Merge pull request #489 from talex5/history
Record history of builds
Permalink
Failed to load latest commit information.
api go: Do not squash enoent in Snapshot.Read into "" Oct 21, 2016
bridge Split datakit-github into datakit-github and datakit-bridge-github Feb 9, 2017
ci ci: indicate when something can be rebuilt Feb 20, 2017
doc Split datakit-github into datakit-github and datakit-bridge-github Feb 9, 2017
examples/ocaml-client Replace error strings in client API with variants Feb 3, 2017
pkg ci: fix the META file after the datakit-github change in #480 Feb 9, 2017
scripts Move all of the github code into bridge/github Aug 18, 2016
src Split datakit-github into datakit-github and datakit-bridge-github Feb 9, 2017
tests Split datakit-github into datakit-github and datakit-bridge-github Feb 9, 2017
.dockerignore Update .dockerignore Jan 24, 2017
.gitattributes Add a .gitattributes file May 19, 2016
.gitignore Add more files to .gitignore Aug 19, 2016
.merlin bridge: enable Prometheus monitoring Jan 19, 2017
9p.md Update README Feb 2, 2017
CHANGES.md More CHANGELOG updates Feb 9, 2017
CONTRIBUTING.md Add a CONTRIBUTING file May 17, 2016
Dockerfile self-ci: test DataKit and Prometheus too Jan 25, 2017
Dockerfile.bridge-local-git Move Prometheus libraries into a separate repository Feb 7, 2017
Dockerfile.ci ci: add link to build history on GitHub Feb 20, 2017
Dockerfile.client Move Prometheus libraries into a separate repository Feb 7, 2017
Dockerfile.github Split datakit-github into datakit-github and datakit-bridge-github Feb 9, 2017
Dockerfile.server Move Prometheus libraries into a separate repository Feb 7, 2017
LICENSE.md Switch to topkg Jun 28, 2016
MAINTAINERS convert maintainers file to toml May 18, 2016
Makefile Add a multi-release helper Feb 10, 2017
README.md Update README Feb 2, 2017
_tags Split datakit-github into datakit-github and datakit-bridge-github Feb 9, 2017
appveyor.yml ci: pin dev packages in CircleCI and Appveyor using a .dev version Feb 9, 2017
check-libev.ml Updates for Lwt 3 API changes Jan 11, 2017
circle.yml ci: pin dev packages in CircleCI and Appveyor using a .dev version Feb 9, 2017
datakit-bridge-github.descr Add missing descr files Feb 10, 2017
datakit-bridge-github.opam Add constraint to uri version for datakit-bridge-github Feb 14, 2017
datakit-bridge-local-git.descr Add missing descr files Feb 10, 2017
datakit-bridge-local-git.opam Update opam constraints Feb 9, 2017
datakit-ci.descr Add missing descr files Feb 10, 2017
datakit-ci.opam Improve descr/opam files Feb 10, 2017
datakit-client.descr Improve descr/opam files Feb 10, 2017
datakit-client.opam Improve descr/opam files Feb 10, 2017
datakit-github.descr datakit-github: fix opam description Feb 10, 2017
datakit-github.opam github: add a constraint for uri version Feb 14, 2017
datakit-server.descr Update opam package descriptions Oct 3, 2016
datakit-server.opam Improve descr/opam files Feb 10, 2017
myocamlbuild.ml Add library for writing DataKit-based Continuous Integration systems Oct 27, 2016
opam Update opam constraints Feb 9, 2017

README.md

DataKit -- Orchestrate applications using a Git-like dataflow

DataKit is a tool to orchestrate applications using a Git-like dataflow. It revisits the UNIX pipeline concept, with a modern twist: streams of tree-structured data instead of raw text. DataKit allows you to define complex build pipelines over version-controlled data.

DataKit is currently used as the coordination layer for HyperKit, the hypervisor component of Docker for Mac and Windows, and for the DataKitCI continuous integration system.


Build Status (OSX, Linux) Build status (Windows) docs

There are several components in this repository:

  • src contains the main DataKit service. This is a Git-like database to which other services can connect.
  • ci contains DataKitCI, a continuous integration system that uses DataKit to monitor repositories and store build results.
  • ci/self-ci is the CI configuration for DataKitCI that tests DataKit itself.
  • bridge/github is a service that monitors repositories on GitHub and syncs their metadata with a DataKit database. e.g. when a pull request is opened or updated, it will commit that information to DataKit. If you commit a status message to DataKit, the bridge will push it to GitHub.
  • bridge/local is a drop-in replacement for bridge/github that just monitors a local Git repository. This is useful for local testing.

Quick Start

The easiest way to use DataKit is to start both the server and the client in containers.

To expose a Git repository as a 9p endpoint on port 5640 on a private network, run:

$ docker network create datakit-net # create a private network
$ docker run -it --net datakit-net --name datakit -v <path/to/git/repo>:/data docker/datakit

Note: The --name datakit option is mandatory. It will allow the client to connect to a known name on the private network.

You can then start a DataKit client, which will mount the 9p endpoint and expose the database as a filesystem API:

# In an other terminal
$ docker run -it --privileged --net datakit-net docker/datakit:client
$ ls /db
branch     remotes    snapshots  trees

Note: the --privileged option is needed because the container will have to mount the 9p endpoint into its local filesystem.

Now you can explore, edit and script /db. See the Filesystem API for more details.

Building

The easiest way to build the DataKit project is to use docker, (which is what the start-datakit.sh script does under the hood):

docker build -t docker/datakit:server -f Dockerfile.server .
docker build -t docker/datakit -f Dockerfile .
docker run -p 5640:5640 -it --rm docker/datakit --listen-9p=tcp://0.0.0.0:5640

These commands will expose the database's 9p endpoint on port 5640.

If you want to build the project from source without Docker, you will need to install ocaml and opam. Then write:

$ make depends
$ make && make test

For information about command-line options:

$ datakit --help

Prometheus metric reporting

Run with --listen-prometheus 9090 to expose metrics at http://*:9090/metrics.

Note: there is no encryption and no access control. You are expected to run the database in a container and to not export this port to the outside world. You can either collect the metrics by running a Prometheus service in a container on the same Docker network, or front the service with nginx or similar if you want to collect metrics remotely.

Language bindings

  • Go bindings are in the api/go directory.
  • OCaml bindings are in the api/ocaml directory. See examples/ocaml-client for an example.

Licensing

DataKit is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.