Automatic code generation for Scala functions and expressions via the Curry-Howard isomorphism
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
docs
project
src
.gitignore
.travis.yml
LICENSE
README.md
build.sbt
sonatype.sbt

README.md

Build Status Coverage Status License Github Tag Maven Central

Quotation

These are all interesting examples, but is there a practical side to Curry-Howard isomorphism? Probably not in everyday programming. -- Bartosz Milewski (2015), Category Theory for Programmers, Chapter 9: Function types.

The curryhoward library aims to use the Curry-Howard isomorphism as a tool for practical applications.

curryhoward

This is a library for automatic implementation of Scala expressions via the Curry-Howard isomorphism.

Quick start:

// build.sbt
libraryDependencies += "io.chymyst" %% "curryhoward" % "latest.integration"

// Scala
import io.chymyst.ch.implement

// Automatically derive code for this function, based on its type signature:
def f[X, Y]: (X  Y)  (Int  X)  (Int  Y) = implement

// The macro `implement` will automatically generate this code for the function body:
// {
//  (g: X ⇒ Y) ⇒ (r: Int ⇒ X) ⇒ (e: Int) ⇒ g(r(e))
// }

Features

  • Automatically fill in the function body, given the function's type alone (implement)
  • Automatically generate an expression of a specified type from given arguments (ofType)
  • Works as a macro at compile time; when a type cannot be implemented, emit a compile-time error
  • Supports function types, tuples, sealed traits / case classes / case objects
  • Both conventional Scala syntax def f[T](x: T): T and curried syntax def f[T]: T ⇒ T can be used equivalently
  • Java-style argument groups can be used, e.g. A ⇒ (B, C) ⇒ D, in addition to using a tuple type, e.g. A ⇒ ((B, C)) ⇒ D
  • When a type can be implemented in more than one way, heuristics ("least information loss") are used to prefer implementations that are more likely to satisfy algebraic laws
  • Emit a compile-time error when a type can be implemented in more than one way despite using heuristics
  • Debugging and logging options are available
  • DSL for inspecting the generated code (STLC lambda-terms) at run time; facilities are provided for symbolic evaluation and checking equational laws
  • Tutorial explains how to do that in detail

Usage

There are two main functions, implement and ofType.

The function implement works when defining methods, and is used syntactically as a placeholder for the method's code:

import io.chymyst.ch.implement

def f[X, Y]: X  Y  X = implement

// The code `(x: X) ⇒ (y: Y) ⇒ x` is generated for the body of the function `f`.
  
f(123)("abc") // returns 123

// The applicative `map2` function for the Reader monad.
def map2[E, A, B, C](readerA: E  A, readerB: E  B, f: A  B  C): E  C = implement
// Generates the code (e: E) ⇒ f(readerA(e), readerB(e)) 

The function ofType is designed for generating expressions using previously computed values:

import io.chymyst.ch.ofType

case class User(name: String, id: Long)

val x: Int = 123
val s: String = "abc"
val f: Int  Long = _.toLong // whatever

val userId = ofType[User](f, s, x).id
// Generates the expression User(s, f(x)) from previously computed values f, s, x, and returns the `id`

The generated code is purely functional and assumes that all given values and types are free of side effects.

The code is generated by the implement and ofType macros at compile time.

A compile-time error occurs when there are several inequivalent implementations for a given type, or if the type cannot be implemented at all.

See the tutorial for more details.

How it works

The macros implement and ofType examine the given type expression via reflection (at compile time). The type expression is rewritten as a formula the intuitionistic propositional logic (IPL) with universally quantified propositions.

This is possible due to the Curry-Howard isomorphism, which maps functions with fully parametric types to theorems in the (IPL) with universally quantified propositions.

For example, the type signature of the function

def f[X, Y]: X  Y  X = (x: X)  (y: Y)  x

is mapped to the propositional theorem ∀ X : ∀ Y : X ⇒ (Y ⇒ X) in the IPL.

The resulting IPL formula is passed on to an IPL theorem prover. The theorem prover performs an exhaustive proof search to look for possible lambda-terms (terms in the simply-typed lambda-calculus, STLC) that implement the given type. After that, the terms are simplified, so that equivalent terms that are different only by alpha-conversion, beta-conversion, or eta-conversion are eliminated.

Finally, the terms are measured according to their "information loss score", and sorted so that one or more terms with the least information loss are returned (and all other terms ignored). The Scala macro then converts the lambda-term(s) into a Scala expression, which is substituted instead of implement into the right-hand side of the function definition.

All this happens at compile time, so compilation may take longer if a lot of terms are being generated.

If there are additional values available for constructing the expression, the types of those additional values are added as extra function arguments.

For example, if required to implement a type Option[B] given value x of type Option[A] and value f of type A ⇒ Option[B], the library will first rewrite the problem as that of implementing a function type Option[A] ⇒ (A ⇒ Option[B]) ⇒ Option[B] with type parameters A and B. Having obtained a solution, i.e. a term of that type, the library will then apply this term to arguments x and f. The resulting term will be returned as Scala code that uses the given values x and f.

In addition, the curryhoward library provides a DSL for manipulating the generated lambda-calculus terms symbolically. This DSL can be used to rigorously verify algebraic laws (at run time).

The current implementation uses an IPL theorem prover based on the sequent calculus called LJT as presented in:

D. Galmiche, D. Larchey-Wendling. Formulae-as-Resources Management for an Intuitionistic Theorem Prover (1998). In 5th Workshop on Logic, Language, Information and Computation, WoLLIC'98, Sao Paulo.

Some changes were made to the order of LJT rules, and some trivial additional rules implemented, in order to generate as many as possible different implementations, and also in order to support Scala case classes and case objects (i.e. named conjunctions).

The original presentation of LJT is found in:

R. Dyckhoff, Contraction-Free Sequent Calculi for Intuitionistic Logic, The Journal of Symbolic Logic, Vol. 57, No. 3, (Sep., 1992), pp. 795-807.

For a good overview of approaches to IPL theorem proving, see these talk slides:

R. Dyckhoff, Intuitionistic Decision Procedures since Gentzen (2013)

See the youtube presentation for more details about how curryhoward works.

This lecture is a pedagogical explanation of the Curry-Howard correspondence in the context of functional programming.

Unit tests

sbt test

Build the tutorial (thanks to the tut plugin):

sbt tut

Revision history

  • 0.3.7 Implement the typeExpr macro instead of the old test-only API. Detect and use vals from the immediately enclosing class. Minor performance improvements and bug fixes (alpha-conversion for STLC terms). Tests for automatic discovery of some monads.
  • 0.3.6 STLC terms are now emitted for implement as well; the JVM bytecode limit is obviated; fixed bug with recognizing Function10.
  • 0.3.5 Added :@@ and @@: operations to the STLC interpreter. Fixed a bug whereby Tuple2(x._1, x._2) was not simplified to x. Fixed other bugs in alpha-conversion of type parameters.
  • 0.3.4 Reduced verbosity by default. Fixed a bug uncovered during the demo in the February 2018 meetup presentation.
  • 0.3.3 Automatic renaming of type variables in lambda-terms; anyOfType; minor bug fixes.
  • 0.3.2 More aggressive simplification of named conjunctions; a comprehensive lambda-term API with a new tutorial section.
  • 0.3.1 Code cleanups, minor fixes, first proof-of-concept for symbolic law checking via lambda-terms API.
  • 0.3.0 Experimental API for obtaining lambda-terms. Simplified the internal code by removing the type parameter T from AST types.
  • 0.2.4 Support named type aliases and ordering heuristics for conjunctions and disjunctions; bug fixes for conventional function types not involving type parameters and for eta-contraction
  • 0.2.3 Fix stack overflow when using recursive types (code is still not generated for recursive functions); implement loop detection in proof search; bug fixes for alpha-conversion of type-Lambdas
  • 0.2.2 Bug fix for back-transform in rule named-&R
  • 0.2.1 Checking monad laws for State monad; fix some problems with alpha-conversion of type-Lambdas
  • 0.2.0 Implement allOfType; use eta-contraction to simplify and canonicalize terms (simplify until stable); cache sequents already seen, limit the search tree by information loss score to avoid exponential blow-up in some examples
  • 0.1.0 Initial release; support case classes and tuples; support implement and ofType; full implementation of LJT calculus

Status

  • The theorem prover for the full IPL is working and appears to be error-free
  • Heuristics ("least information loss") seem to be working adequately in many cases
  • STLC interpreter is reasonably full-featured
  • 92% unit test coverage
  • Cross-compiled to Scala 2.11.11 and 2.12.4, published on Maven
  • This project is now included in the Scala community build

Roadmap

  • Implement type class derivation for standard type classes (Functor, Contravariant, Foldable, Traversable, Monad, Applicative, etc.) whenever possible and unambiguous; make this compatible with scalaz and cats libraries, so that one can write implicit val x: Functor[P] = implement
  • Keep trying to use curryhoward for production code and see what new features may be desirable

Known limitations

  • No support for non-case classes, classes with non-standard constructors, or class hierarchies other than a flat sealed trait / final case class hierarchies.
  • Very limited support for recursive case classes (including List): generated code may fail and, in particular, cannot contain recursive functions. A non-recursive example that fails to generate sensible code: T ⇒ List[T] (the generated code always returns empty list).
  • Functions with zero arguments are currently not supported, e.g. ofType[Int ⇒ () ⇒ Int] will not compile.
  • If the type contains lots of repeated copies of the same type parameter, or lots of Option[T], heuristics will sometimes fail to produce the desired implementation.

Examples of functionality

The following code examples show how various functions are implemented automatically, given their type.

"Weak" Peirce's law:

def f[A, B]: ((((A  B)  A)  A)  B)  B = implement

"Weak" law of tertium non datur:

def f[A, B]: (Either[A, A  B]  B)  B = implement

Automatic implementations of pure, map, and flatMap for the Reader monad:

def pure[E, A]: A  (E  A) = implement
def map[E, A, B]: (E  A)  (A  B)  (E  B) = implement
def flatMap[E, A, B]: (E  A)  (A  E  B)  (E  B) = implement

Constant (non-parametric) types are treated as type parameters:

def f[A, B]: A  Int  (A, Int) = implement

f("abc")(123) // returns the tuple ("abc", 123)

Tuples, sealed traits, and case classes/case objects are supported, including Option and Either:

def eitherCommut[A, B]: Either[A, B]  Either[B, A] = implement

def eitherAssoc[A, B, C]: Either[A, Either[B, C]]  Either[Either[A, B], C] = implement

Case objects (and case classes with zero-argument constructors) are treated as named versions of the Unit type.

Case classes and sealed traits can be nested and can have type parameters.

Type aliases (type P[T] = ...) are supported as well.

Lambda-terms can be obtained and manipulated symbolically.

type R[X, A] = X  A

def mapReader[X, A, B]: R[X, A]  (A  B)  R[X, B] = implement
// mapReader is now a compiled function of the required type
val mapReaderTerm = mapReader.lambdaTerm
// mapReaderTerm is a lambda-term representing the code of mapReader
mapReaderTerm.prettyPrint // returns the string "a ⇒ b ⇒ c ⇒ b (a c)"

Symbolic computations with lambda-terms can be used for a rigorous verification of equational laws for the generated code. See the tutorial for some examples of such computations.

Supported syntax

Code can be generated based on a given type and possibly on given values:

  1. def f[...](...): ... = implement -- the type and extra values are specified on the left-hand side
  2. ofType[...](...) -- the type and extra values are specified within an expression
  3. allOfType[...](...) -- similar to ofType[...](...), except that now all inequivalent implementations with the lowest information loss are returned
  4. anyOfType[...](...) - similar to allOfType except all found implementations are returned, including those with higher information loss
import io.chymyst.ch._

// Conventional Scala syntax for functions.
def f1[T, U](x: T, y: T  U) : (T, U) = implement

// Fully or partially curried functions are supported.
def f2[T, U](x: T): (T  U)  (T, U) = implement

def f3[T, U]: T  (T  U)  (T, U) = implement

// Generating code within expressions.
final case class User(name: String, id: Long)

val f: Int  Long = _.toLong

ofType[User](123, "abc", f).id // This expression evaluates to `123L`.

If the implement macro is used as the body of a class method, values from the class constructor and val members may be used:

import io.chymyst.ch._

final case class User[A](name: String, id: A) {
  def map[B](f: A  B): User[B] = implement
}

Debugging and logging

The logging options are controlled via the JVM property "curryhoward.log". The value of this property is a comma-separated list of keywords. The full logging is switched on by putting -Dcurryhoward.log=macros,terms,prover,verbose on the Java or SBT command line.

The verbose logging option will print the lambda-term that the macro functions generate, and a warning when more than one term was found. The macros logging option will print the code that the macro functions generate. The prover logging option will print the steps in the proof search, including the new sequents generated by each applied rule. The trace logging option will print more debugging information about the proof search. The terms logging option will print the terms generated, in the short notation.

With none of these options given, only minimal diagnostic messages are printed, with terms in a short notation:

  • information message when a term is returned
  • warning message when several implementations are found with the same (lowest) information loss

Heuristics for choosing different implementations

If the theorem prover finds several alternative implementations of a type, it attempts to find the implementation with the smallest "information loss".

The "information loss" of a term is defined as an integer number being the sum of:

  • the number of (curried) arguments that are ignored by some function,
  • the number of tuple parts that are computed but subsequently not used,
  • the number of case clauses that do not use their arguments,
  • the number of disjunction or conjunction parts that are inserted out of order,
  • the number of times an input value is used (if more than once).

Choosing the smallest "information loss" is a heuristic that enables automatic choice of the correct implementations for a large number of cases (but, of course, not in all cases).

For example, here are correct implementations of pure, map, and flatMap for the State monad:

def pure[S, A]: A  (S  (A, S)) = implement
def map[S, A, B]: (S  (A, S))  (A  B)  (S  (B, S)) = implement
def flatMap[S, A, B]: (S  (A, S))  (A  S  (B, S))  (S  (B, S)) = implement

Note that there are several inequivalent implementations for the State monad's map and flatMap, but only one of them loses no information -- and thus has a higher chance of satisfying the correct laws.

The theorem prover does not check any equational laws. It selects the terms with the smallest level of information loss, in hopes that there will be only one term selected, and that it will be the desired term that satisfies equational laws of whatever sort the users expect.

The theorem prover will generate a (compile-time) error when there are two or more implementations with the smallest level of information loss.

If there are several possible implementations but only one implementation with the smallest level of information loss, the theorem prover will choose that implementation but print a warning message such as

Warning:scalac: type (S → (A, S)) → (A → B) → S → (B, S) has 2 implementations (laws need checking?)

This message means that the resulting implementation is probably the right one, but there was a choice to be made. If there exist some equational laws that apply to this function, the laws need to be checked by the user (e.g. in unit tests).

In some cases, there are several inequivalent implementations that all have the same level of "information loss." The function allOfType returns all these implementations.

An experimental API is provided for examining the lambda-terms corresponding to the returned implementations. The tutorial gives more detail about using that API.

The lambda-term API is experimental because, in particular, it exposes the internal implementation of STLC, which could change in future versions of curryhoward.

Examples of heuristic choice

The "smallest information loss" heuristic allows us to select the "better" implementation in the following example:

def optionId[X]: Option[X]  Option[X] = implement

optionId(Some(123)) == 123
optionId(None) == None

There are two possible implementations of the type Option[X] ⇒ Option[X]: the "trivial" implementation (always returns None), and the "interesting" implementation (returns the same value as given). The "trivial" implementation is rejected by the algorithm because it ignores the information given in the original data, so it has higher "information loss".

Another heuristic is to prefer implementations that use more parts of a disjunction. This avoids implementations that e.g. always generate the Some subtype of Option and never generate None.

Another heuristic is to prefer implementations that preserve the order of parts in conjunctions and disjunctions.

For example,

def f[X]: ((X, X, X))  (X, X, X) = implement

will generate the code for the identity function on triples, that is, ((a, b, c)) ⇒ (a, b, c). There are many other implementations of this type, for example ((a, b, c)) ⇒ (b, c, a). However, these implementations do not preserve the order of the elements in the input data, which is considered as a "loss of information".

The analogous parts-ordering heuristic is used for disjunctions, which selects the most reasonable implementation of

def f[X]: Either[X, X]  Either[X, X] = implement