Motivation

chrisvana edited this page Dec 2, 2014 · 39 revisions

This project was primarily a learning process for me to try integrating open source software, and tries to address frustrations I discovered along the way.

Background

Fresh out of college, I spent ~8 years working at Google. As a result, my exposure to open source software (until now) has been relatively limited. I had a few surprises.

The highlights:

  • Integrating open source software is a pain.
  • Open source software is skewed heavily to lower-complexity libraries.

Anecdote

In March of 2013, Google announced that it was shutting down Google Reader. Some folks were understandably annoyed by this, and asked Google to open source the software rather than shut it down completely. While a reasonable request, it was also extremely naive from a technical standpoint.

Even if the folks maintaining Reader had wanted to open source the code (keep in mind a lot of folks at Google did not want to see it go either), doing it would have been a gargantuan task. This really boils down to how software development happens inside of a large company like Google.

The build system inside of Google (see Google's blog post) makes it incredibly easy to build software using large modular blocks of code. You want a crawler? Add a few lines here. You need an RSS parser? Add a few more lines. A large distributed, fault tolerant datastore? Sure, add a few more lines. These are building blocks and services that are shared by many projects, and easy to integrate.

Your build file might look something like this:

cc_library(name = "foo",
           srcs = [ "my_rss_fetcher.cc" ],
           hdrs = [ "my_rss_fetcher.h" ],
           deps = [ "//crawler:service", "//strings:xml_parser", "//bigtable:client" ])

That 4 line BUILD file now depends on maybe a few hundred thousands lines of code spread out over hundreds of sub-projects. Now imagine that someone asks you to open source it.

You might be able to release my_rss_fetcher.cc, but that really does not help anyone without the 300K other lines of code it requires to compile. A project like Reader might have 100 such modules (everything from the crawl to UI templating and javascript), and depend on millions of lines of code.

Here is the upside: Because of all the available libraries, it maybe took a few hours to write a simple, scalable, fault-tolerant RSS fetcher. And look, we created another library someone else inside of Google could integrate into their project. Imagine what happens when you have thousands of engineers building re-usable modules (on top of other modules) over the course of a decade. Turtles (awesome turtles) all the way down. Or maybe more like the Galactic Library (Uplift Saga).

Open source

This sort of Lego-like development process does not happen as cleanly in the open source world. Some tools (e.g. Maven) do manage to make this a lot easier, at least under certain controlled conditions. Hosted source control (e.g. Github) has also lowered the effort required to do open source development. Empirically, though, what it truly enabled (speculating) was a large number of small projects rather than an increase in individual project complexity.

As a result of this state of affairs (more speculation), there is a complexity barrier in open source that has not changed significantly in the last few years. This creates a gap between what is easily obtainable at a company like Google versus a open sourced project. Not to say that something like Reader could not be done as an open source project, just that it would be significantly more difficult. Further, if one did build an open source Reader, subcomponents would not be easily reusable to bootstrap other projects without a lot more effort.

This is something of a shame. There are a huge number of open source developers in the world (citation needed). A modular build system designed to take advantage of open source development should allow complexity far beyond what is possible at an organization like Google. This does not exist yet (open source does not surpass large company development), but it should be obtainable.

This build system

Repobuild, at its core, is a step towards enabling modular development. It uses BUILD configuration files (like many build systems) that are designed to be relatively easy to read, write, and to plug together. This sacrifices a lot of scriptability that other build tools enable, but makes it easier to connect disparate projects without requiring a lot of context on how they work.

In summary, I wanted a build system that would:

  • Allow modular development
  • Have simple, readable declarations
  • Be declarative, not procedural (tends to get too messy and hard to read/modify)
  • Not be a real programming language (same reason as above)
  • Make it easy to build a large interconnected codebase on something as amorphous as the web
  • Enable complex open source libraries to emerge
  • Avoid requiring any sort of configuration magic (like pre-installing libraries on your system)

Some similar build systems come very close, and are quite good. They tend to tip slightly towards the scriptable, procedural messiness. Many are single language oriented. If useful, though, Repobuild should be able to wrap them.

Goals

This build system is functional and used for real products. At a minimum, though, I hope this provides a proof of concept for other build systems. Ideally, this system eventually provides the required functionality for large open source projects.

Please take a look at the main page, or try one of the tutorials to get started.