Skip to content

The plan for phase 0

James Geddes edited this page Jan 17, 2019 · 8 revisions

The plan for phase 0

Background

This note is a first pass at a plan for phase 0 of nocell. The aim of phase 0 is to produce, by the end of March 2019, some kind of protoype demonstration.

Nocell in brief

Nocell is intended to be a programming language, suitable for building the sorts of models that are presently built as spreadsheets by government and business. The idea is that a nocell program will “compile to” a spreadsheet. In this way, we hope to obtain the benefits of text-based programming languages (modularity, abstraction, code re-use, version control, documentation) without losing the benefits of spreadsheet modelling (widespread adoption and understanding, perceived simplicity, immediacy).

Furthermore, nocell is intended to be a probabilistic programming language, supporting inference and prediction.

The intended user of nocell is already a reasonably sophisticated user of Excel who may not have significant programming experience.

Desiderata

The following are somewhat inchoate thoughts without a great deal of technical detail, intended mostly to provide some inspiration.

Things that spreadsheets do well

There are, presumably, good reasons for the popularity of spreadsheets as a sort of end-user programming tool. It would be really useful not to lose these benefits. I am not entirely sure what those reasons are but the following seem to me to be worthy of some thought:

  1. Immediacy. A spreadsheet is at once a model and the output of the execution of that model. Changing a cell immediately changes all dependent cells and the impact is immediately visible. This sort of feedback seems to help users a lot. Can we do something similar? Perhaps along the lines of Bret Victor’s work? One thing in our favour is that the sort of programs you write in spreadsheets are guaranteed to termtinate (and in reasonably short order, too).
  2. Deferred thinking. One of the nice things about Excel is that you can build models bit by bit. Here are some examples:
    1. You can refer to cells that don’t yet have things in them;
    2. You can insert cells (which move all the other cells around) and then relink the dependency graph;
    3. You don’t have to think about naming things until later.

    These are all ways of deferring having to think about something until you are in a better place to think about it, and are therefore perhaps a kind of “top-down programming” technique.

    See for example the Hazel language, which provides a live programming enironment for programs with holes.

  3. Concreteness. People struggle with abstraction. In a spreadsheet, one doesn’t define functions: one instead writes down some computation on specific numbers (or at any rate specific cells) and then abstracts the function (for example, by copying it across a row). I’m not really sure if there’s anything to be done here, since I am also claiming that abstraction is a huge part of what programming actually is. (By the way, this talk was very interesting: Eugenia Cheng, /Conveying the power of abstraction/. And maybe Microsoft’s work on Programming by Example?)

Things that are good about programming

On the other hand, I don’t want to lose what’s great about programming. In particular, programs are written in some language and represented as a text file, editable in any editor (although some editors or IDEs may offer useful interactions like instanteous recompilation, tooltips, type-driven completion, and so on).

How can we win?

Finally, my sense is that for a novel system to succeed, there must be at least one thing it does that couldn’t be done otherwise. People will put up novelty if it enables them to do something they couldn’t do before, or anyway couldn’t do without a huge amount of effort.

For example, ssh would set your DISPLAY environment variable to tunnel your remote X session back over the ssh connection. More pertinantly, spreadsheet programs themselves (specifically VisiCalc) were the reason you bought and put up with owning a computer.

The “real” feature of nocell is that it’s a programming language, having abstraction, modularity, and so on. But I suspect that, at least on first glance, the intended user will see those features as part of the learning curve. We must tempt them with something else. Some ideas for this are:

  1. Well-designed (instant!), automatic formatting for standard types of structures. We might be able nicely to structure functions and tables (with rules and so on) easily. We could generate output for different backends (Excel, HTML, text, …). In any case, nocell should have some concept of stylesheets.
  2. Probabilistic programming. Given that the model is very restricted we could probably write quite a good inference engine. It would be very nice if the user could start by writing a deterministic model and then “upgrade” to a probabilistic one. We need to figure out some way of representing distributions in Excel.
  3. A library of straightforward solutions for common difficult problems. For example “looking things up in a table” (or relational operations in general) is a common task, usually solved with VLOOKUP (or, slightly better, INDEX~/~MATCH).
  4. Size. What makes a large spreadsheet large is often simply that the same formulae and structures are repeated over and over again. I’m imaging that, say, the power sector in the 2050 Calculator could be a single page of code, most of which would be commments or documentation. (Although, I suppose this is really just a restatement of the advantages of a programming language.)
  5. Documentation. cf. Literate programming, but I don’t have a good sense of how this might work.
  6. Physical units (and other types). Given that the language is likely to be non–Turing complete, perhaps we could offer a lot of help for the user by doing type inference, ranging from checking (and converting!) physical units, to helping out with incomplete programs, emitting good error messages, and so on.

    Can we have no compile-time errors?! (Similar to the way in which Elm has no run-time errors, but even more so.) In other words, any input would produce some output (ideally, as one is typing).

Architecture

A spreadsheet may be thought of as a very restricted functional programming language. Every non-empty cell is either a value or a function of the values of a fixed set of other cells. The directed graph of these dependencies does not contain cycles.[fn:1] Thus, I propose that we define this language, and implement nocell as a compiler to this language. In fact, I think it makes sense to have a three-level architecture, as follows, from the lowest level to the highest:

grid
“Bytecode for a virtual spreadsheet.” Cell should express those features of spreadsheets that we wish to support: cells, arranged in two-dimensional grids; built-ins corresponding to spreadsheet functions; values are likely to be string, float (perhaps integer), error; Formatting (possibly semantic). May contain other features (e.g., ranges that can be collapsed) if we want to support those in some system.

The point of cell is to separate the “compile to a spreadsheet” challenge from the “produce an Excel file” challenge.

cell
An intermediate language intended to be an abstract model of “generalised spreadsheets“. Not Turing-complete, possibly by allowing structural recursion only (eg, maps and folds) over structures of bounded size. Perhaps typed (cf., Haskell Core, where the claim is that being fully typed is a good check on the compiler).
nocell
Whatever the proper language is. “Runtime” means “compile to cell.”

Notes on version alpha

  1. One output sheet only.
  2. All references are named, and are converted to relative references in the output.
  3. Version 0 of nocell is whatever the JAGS language is.
  4. Version 0 of cell is whatever looks intermediate between JAGS and grid.

Further reading

See Further reading.

Footnotes

[fn:1] Well, there is a notion of “circular references” in Excel but I propose to leave that for another day.