A simple program specializer (based on partial evaluation) for a subset of Scheme. It includes a binding time analyzer, a residual program generator and an arity raiser.
Racket
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
examples
recurs
tests
.gitignore
.travis.yml
readme.rst
tests.rkt
unmix.rkt
x-ann.rkt
x-ann.sex
x-macro.rkt
x-match.rkt
x-match.sex
x-misc.rkt
x-synt.rkt
xann.rkt
xann.sex
xar.rkt
xar.sex
xaraa.rkt
xaraa.sex
xarps.rkt
xarps.sex
xarta.rkt
xarta.sex
xcgr.rkt
xcgr.sex
xctmw.rkt
xctmw.sex
xctmwrl.rkt
xctmwrl.sex
xensg.rkt
xensg.sex
xfcd.rkt
xfcd.sex
xgen.rkt
xgen.sex
xggg.rkt
xio.rkt
xmainpe.rkt
xmainpe.sex
xmenu.rkt
xpcd.rkt
xpcd.sex
xpe.ann
xpe.rkt
xpe.sex
xpiu.rkt
xpiu.sex
xpost.rkt
xpost.sex
xpre.rkt
xpre.sex
xresfn.rkt
xresfn.sex
xsepsd.rkt
xsepsd.sex
xsettings.rkt

readme.rst

THE SPECIALIZER UNMIX

Build status

v4.0 for Racket

(based on v3.0 https://code.google.com/p/unmix/)

Sergei A.Romanenko

Keldysh Institute of Applied Mathematics

Russian Academy of Sciences

Miusskaya Sq.4, SU-125047, Moscow, Russia

Ported to Racket by Danil Annenkov, Irkutsk, Russia

annenkov@ib-soft.ru

July, 1990

Revised: December 1992, December 1993, December 2013

COPYRIGHT NOTICE

Permission to use, copy, modify, and distribute this software and its documentation for any personal or educational use without fee is hereby granted, provided that:

  • This copyright notice is retained in both source code and supporting documentation.
  • Modified versions of this software cannot be redistributed unless accompanied by a complete history (date, author, description) of modifications made; the intension here is to give appropriate credit to those involved, whilst simultaneously ensuring that any recipient can determine the origin of the software.
  • These same conditions will also be applied to any software system derived either in full or in part from this system.

The name Unmix is not a trademark, registered or otherwise, and you are free to mention this name in published material, public and private correspondence, and other documents without restriction or obligation.

Unmix is provided as is without express or implied warranty.

VERSION 4.0 CHANGES

All the functionality of v3.0 is available.

Most significant changes made during porting to Racket:

  • used Racket module system ("require" instead of "load");
  • added explicit conversion to mutable pairs in some places (Racket pair is immutable by default and "set-car!" and "set-cdr!" can't be applied);
  • added some macros and utils to handle mutable pairs;
  • target programs generated as Racket modules with ".rkt" extension (and can be ran using Racket);
  • Robert Glueck's recursive version (in "resurs" dir) merged to single "xmainpe.rkt" file (there are three files in v3.0: "xmainpe.scm", "xpe.scm", "xggg.scm");
  • added "xsettings.rkt". There you can set iterative or recursive mode, target program output format etc;

WHAT IS UNMIX?

If a program has several input parameters, it can be "specialized" with respect to the values of some of the parameters, in which case we get a "residual" program with less input parameters than the original program.

The parameters whose values are known at the time the program is being specialized are referred to as "static" parameters, all other parameters are referred to as "dynamic" ones.

Unmix is a system that specializes programs written in a subset of Scheme, and consists of two phases: "preprocessor" and "generator of residual programs". The specialization is done in two steps.

At the first step, preprocessor takes a Scheme program and a descriptor of the program's input parameters. The program is a non-empty sequence of function definitions. The first function definition in the program is assumed to be the "main" function of the program. The descriptor is a string of letters "s" and "d", the length of the descriptor being equal to the number of the main function's parameters.

The descriptor of the program's parameters classifies each parameter of the main function as either static or dynamic. If a parameter is static, the corresponding letter in the descriptor is "s", otherwise the letter is "d".

Preprocessor takes as input a program and a descriptor and produces an "annotated" version of the original program, which contains some directions for the generator of residual programs.

At the second step, the generator of residual programs takes as input an annotated program and a sequence of file names (separated by one or more spaces). The number of file names must be equal to the number of the program's static input parameters. Each of the files must contain zero, one, or more Scheme S-expressions to be used as values of the corresponding static parameters.

THE STRUCTURE OF THE PREPROCESSOR

The preprocessing consists of several stages.

First, the original program is "desugared", i.e. compiled from Scheme to Mixwell, which is the internal language of the specializer.

Second, all variables and operations in the program are classified as either static or dynamic and this information is inserted into the program. The result is an annotated version of the original program. Annotated Mixwell programs will be regarded as programs written in Mixwell-Ann, a version of Mixwell supplemented with additional constructs to express annotations.

Second, some of the defined function calls in the annotated program are annotated as "residual" in order to avoid infinite unfolding of function calls and duplication of function calls that may result from some calls being unfolded.

THE STRUCTURE OF THE GENERATOR

The generator consists of "partial evaluator", which generates residual program by symbolically evaluating Mixwell expressions, and "postprocessor", which performs some additional transformations of the residual program and then translates the residual Mixwell program to Scheme.

THE STRUCTURE OF THE POSTPROCESSOR

The postprocessing comprises the following stages.

  1. the first Call Graph Reduction,
  2. Arity Raising,
  3. the second Call Graph Reduction,
  4. Compilation from Mixwell into Scheme.

More information may be found in the source programs of Unmix.

HOW TO RUN UNMIX?

This version of Unmix can be run under Racket.

Use
racket unmix.scm

to run UNMIX from project root directory.

UNMIX can be ran using DrRacket: load unmix.scm to environment and run it (press F5).

If UNMIX ran from project root, use relative paths to programs to be specialized.

For example:

examples/zip

You can also use unmix.scm from directories with programs to be specialized. In that case all program files can be accessed without specifying the path, just file name.

When Unmix starts it displays a menu on the screen, which provides further information.

THE INPUT LANGUAGE OF UNMIX

Unmix itself is written in "Scheme with EXtensions" (files with the extention sex). Before being loaded and executed, the sex-files have to be compiled to scm-files (with the extension scm), containing programs in Scheme without extensions.

For bootstrapping reasons, several parts of Unmix have been written in Scheme directly, so that they are contained in files with the extension scm and needn't be compiled.

Here is the syntax of the subset of Scheme accepted by the specializer Unmix:

<Program> ::=  <ProcDef> <ProcDef>*               ;; Program

<ProcDef>  ::=  (define (<Pname> <Vname>*) <Exp>) ;; Procedure
                                                  ;; definition

<Exp>     ::=  <Vname>                      ;; Variable
          |    (quote <S-expression>)       ;; Constant
          |    <Literal>                    ;; Literal constant
          |    (if <Exp> <Exp> <Exp>)       ;; Conditional
          |    (let (<Binding>*) <Exp>)     ;; Let-expression
          |    (rcall (<Pname> <Exp>*))     ;; Residual call
          |    (generalize <Exp>)           ;; Generalizer
          |    (<Pname> <Exp>*)             ;; Procedure call
          |    (<Mname> <S-Expression>*)    ;; Macro

<Binding> ::=  (<Vname> <Exp>)              ;; Local binding

<Pname>   ::=  <Symbol>                     ;; Procedure name
<Vname>   ::=  <Symbol>                     ;; Variable name
<Mname>   ::=  <Symbol>                     ;; Macro name

<Literal> ::= <boolean> | <number> | <character>
          |   <string> | <vector>

All procedures called in the program must be without side-effects. For this reason, the terms "procedure" and "function" will be used in the description of Unmix interchangeably.

Constructs (generalize <Exp>) and (rcall (<Pname> <Exp>*)) are used to insert into the program hand-made annotations, which permit the user to control the specializer. They are useless for ordinary programming.

Construct (generalize <Exp>) tells the specializer that the result of specializing <Exp> must be dynamic, even if <Exp> is static. When the program is compiled in the usual way, this construct is equivalent to <Exp>.

Construct (rcall (<Pname> <Exp>*)) tells the specializer that the result of specializing the procedure call (<Pname> <Exp>*) must be a residual call. When the program is compiled in the usual way, this construct is equivalent to (<Pname> <Exp>*).

Some useful macro definitions may be found in the file "x-match.sex". File "x-synt.scm" contains a definition of "extend-syntax", a powerful tool for defining macro extensions.

THE LANGUAGE MIXWELL

Mixwell is the internal language of the specializer Unmix. Here is its syntax:

<Program> ::=  <ProcDef> <ProcDef>*

<ProcDef> ::=  (<Pname> (<Vname>*) = <Exp>)

<Exp>     ::=  <Vname>                      ;; Variable
          |    (quote <S-expression>)       ;; Constant
          |    (if <Exp> <Exp> <Exp>)       ;; Conditional
          |    (call  <Pname> <Exp>*)       ;; Defined function call
          |    (rcall <Pname> <Exp>*)       ;; Defined function call
          |    (xcall <Pname> <Exp>*)       ;; External function call
          |    (<Pname> <Exp>*)             ;; External function call

<Pname>   ::=  <Symbol>                     ;; Procedure name
<Vname>   ::=  <Symbol>                     ;; Variable name

The construct (call <Pname> <Exp>*) is a call on the procedure <Pname> defined in the program, which will be unfolded during partial evaluation.

The construct (rcall <Pname> <Exp>*) is a call on the procedure <Pname> defined in the program, which will give rise to a residual call during partial evaluation.

The construct (xcall <Pname> <Exp>*) is a call on the procedure <Pname> defined somewhere outside the program. If <Pname> is different from the symbols STATIC, IFS, IFD, RCALL, CALL, and XCALL, the keyword XCALL is omitted and the construct takes the form (<Pname> <Exp>*).

THE LANGUAGE MIXWELL-ANN

<Ann-Program> ::=
    <RP-Names> <D-Program> <S-Program>      ;; Program
<RP-Names>    ::=  (<Pname>*)               ;; Residual procedure
                                               ;; names
<D-Program>   ::=                           ;; Dynamic program
    (<A-ProcDef> <A-ProcDef>*)
<S-Program>   ::=  (<ProcDef>*)             ;; Static program
<A-ProcDef>   ::=
    (<Pname> <ParList> <ParList> = <A-Exp>) ;; Annotated procedure
                                               ;; definition
<ParList>     ::=  (<Vname>*)               ;; Parameter list

<A-Exp>  ::=
    <Vname>                                 ;; Variable
  | (static <Exp>)                          ;; Static subexpression
  | (ifs <Exp> <A-Exp> <A-Exp>)             ;; Static conditional
  | (ifd <A-Exp> <A-Exp> <A-Exp>)           ;; Dynamic conditional
  | (call  <Pname> (<Exp>*) (<A-Exp>*))     ;; Unfoldable defined
                                               ;; function call
  | (rcall <Pname> (<Exp>*) (<A-Exp>*))     ;; Residual defined
                                               ;; function call
  | (xcall <Pname> <A-Exp>*)                ;; External function call
  | (<Pname> <A-Exp>*)                      ;; External function call

EXAMPLES

In addition to the files that contain the specializer, there are several files containing a number of example programs to be specialized. Here we list the programs with some suggestions about the way in which they can be specialized.

Program Source file Parameter description Static data
Zipper zip.sex SD zip123.dat
Maximum substring mcs.sex SD mcs123.dat
MP Interpreter mp.sex SD mprev.dat
TM Interpreter tm.sex SDDD tmtst.dat
Parser prs.sex SD prsexp.dat

REFERENCES

[Barzdin 88] G.Barzdin. Mixed Computation and Compiler Basis. In D.Bjorner, A.P.Ershov and N.D.Jones, editors, Partial Evaluation and Mixed Computation, pages 15-26, North-Holland, 1988.

[Beckman 76] L.Beckman, A.Haraldson, O.Oskarsson, E.Sandewall. A Partial
Evaluator, and Its Use as a Programming Tool. Artificial Intelligence, 7(4):319-357, 1976.
[Bulyonkov 84] M.A.Bulyonkov. Polyvariant Mixed Computation for Analyzer
Programs. Acta Informatica, 21:473-484, 1984.
[Burstall 77] R.M.Burstall and J.Darlington. A Transformation System for
Developing Recursive Programs. Journal of the ACM, 24(1):44-67, 1977.
[Dixon 71] J.Dixon. The Specializer, a Method of Automatically Writing
Computer Programs. Division of Computer Research and Technology, National Institute of Health, Bethenda, Maryland, 1971.
[Ershov 78] On the Essence of Compilation. In E.J.Neuhold, editor,
Formal Description of Programming Concepts, pages 391-420, North-Holland, 1978.
[Ershov 81] A.P.Ershov. The Transformational Machine: Theme and
Variations. In J.Grushka and M.Chytil, editors, Mathematical Foundations of Computer Science, Strbske Pleso, Czechoslovakia, pages 16-32, Lecture Notes in Computer Science, Vol.118, Springer-Verlag, 1981.
[Futamura 71] Partial Evaluation of Computation Process - An Approach to
a Compiler-Compiler. Systems, Computers, Controls, 2(5):45-50, 1971.
[Hughes 88] J.Hughes. Backward Analysis of Functional Programs. In
D.Bjorner, A.P.Ershov and N.D.Jones, editors, Partial Evaluation and Mixed Computation, pages 187-208, North-Holland, 1988.
[Jones 85] N.D.Jones, P.Sestoft and H.Sondergaard. An Experiment in
Partial Evaluation: The Generation of a Compiler Generator. In J.-P.Jouannaud, editor, Rewriting Techniques and Applications, Dijon, France, pages 124-140, Lecture Notes in Computer Science, Vol.202, Springer-Verlag, 1985.
[Jones 86] N.D.Jones and A.Mycroft. Data Flow Analysis of Applicative
Programs Using Minimal Function Graphs. In Thirteens ACM Symposium on Principles of Programming Languages, St.Petersburg, Florida, pages 296-306, ACM, 1986.
[Jones 88] Automatic Program Specialization: A Re-Examination from Basic
Principles. In D.Bjorner, A.P.Ershov and N.D.Jones, editors, Partial Evaluation and Mixed Computation, pages 225-282, North-Holland, 1988.
[Mogensen 88] T.Mogensen. Partially Static Structures in a
Self-Applicable Partial Evaluator. In D.Bjorner, A.P.Ershov and N.D.Jones, editors, Partial Evaluation and Mixed Computation, pages 325-347, North-Holland, 1988.
[Ostrovski 88] B.N.Ostrowski. Implementation of Controlled Mixed
Computation in System for Automatic Development of Language-Oriented Parsers. In D.Bjorner, A.P.Ershov and N.D.Jones, editors, Partial Evaluation and Mixed Computation, pages 385-403, North-Holland, 1988.
[Romanenko 88] S.A.Romanenko. A Compiler Generator Produced by a
Self-Applicable Specializer Can Have a Surprisingly Natural and Understandable Structure. In D.Bjorner, A.P.Ershov and N.D.Jones, editors, Partial Evaluation and Mixed Computation, pages 445-463, North-Holland, 1988.
[Romanenko 90] S.A.Romanenko. Arity Raiser and Its Use in Program
Specialization. In N.Jones, editor, ESOP '90, 3rd European Symposium on Programming, Copenhagen, Denmark, May 15-18, 1990, pages 341-360, Lecture Notes in Computer Science, Vol. 432, Springer-Verlag, 1990.
[Sestoft 86] The Structure of a Self-Applicable Partial Evaluator. In
H.Ganzinger and N.D.Jones, editors, Programs as Data Objects, Copenhagen, Denmark, 1985, pages 236-256, Lecture Notes in Computer Science, Vol. 217, Springer-Verlag, 1986.
[Schmidt 86] D.A.Schmidt. Denotational Semantics. Allyn and Bacon,
Boston, 1986.
[Sestoft 88] P.Sestoft. Automatic Call Unfolding in a Partial Evaluator.
In D.Bjorner, A.P.Ershov and N.D.Jones, editors, Partial Evaluation and Mixed Computation, pages 485-506, North-Holland, 1988.
[Turchin 72] V.F.Turchin. Equivalent Transformation of Recursive
Functions Defined in Refal. In Teoriya Yazykov i Metody Programmirovaniya. Trudy Simposiuma, pages 31-42, Alushta-Kiev, 1972 (in Russian).
[Turchin 79] V.F.Turchin. A Supercompiler System Based on the Language
Refal. SIGPLAN Notices, 14(2):46-54, February 1979.
[Turchin 82] V.F.Turchin, R.M.Nirenberg and D.V.Turchin. Experiments
with a Supercompiler. In 1982 ACM Symposium on Lisp and Functional Programming, Pittsburgh, Pennsylvania, pages 47-55, ACM, 1982.
[Turchin 86] V.F.Turchin. The Concept of a Supercompiler. ACM
Transactions on Programming Languages and Systems, 8(3):292-325, July 1986.
[Turchin 88] V.F.Turchin. The Algorithm of Generalization in the
Supercompiler. In D.Bjorner, A.P.Ershov and N.D.Jones, editors, Partial Evaluation and Mixed Computation, pages 531-549, North-Holland, 1988.
[Wadler 88] P.Wadler. Deforestation: Transforming Programs to Eliminate
Trees. In European Symposium on Programming, Lecture Notes in Computer Science, Springer-Verlag, 1988.

APPENDIX. SOME MACROS USED IN UNMIX

Unmix, as well as the example programs, has been written in Scheme extended with the following macros.

GENERALIZED CASE-EXPRESSION

(MATCH  (arg ...)
        (pat ...  & guard => exp ...) ...)

The expressions "arg ..." are evaluated to produce S-expressions "S-exp ...". "S-exp ..." are then matched against the corresponding patterns "pat ...". If the matching succeeds for some clause

(pat ... & guard => exp ...)

the variables in "pat ..." get bound to the corresponding subexpressions in "S-exp ...", and then the expression "guard" is evaluated in the extended environment. If the result of "guard" is not #f, the expressions "exp ..." are evaluated in the extended environment, otherwise the next clause is tried. If the guard is #t, "& guard" may be omitted.

The patterns have the following syntax:

<pat> ::= '<S-exp>             matches <S-exp>.
        | <literal>            matches <literal>.
        | <var>                matches anything, <var> is bound.
        | _                    matches anything.
        | (<var> as <pat>)     matches <pat>, <var> is bound.
        | (<pat> . <pat>)      matches a pair with <pat>'s as elements.

<var> ::= <symbol>
<literal> ::=
        | ()
        | <boolean>
        | <number>
        | <character>
        | <string>
        | <vector>

GENERALIZED LET-EXPRESSION

(WITH  ((pat arg) ...) exp ...)

The expressions "arg ..." are evaluated to produce S-expressions "S-exp ...". "S-exp ..." are supposed to match the patterns "pat ...", in which case the variables in "pat ..." get bound to the corresponding subexpressions in "S-exp ...", and then the expressions "exp ..." are evaluated in the extended environment. If some of "S-exp ..." do not match against patterns "pat ...", the result of the form WITH is unspecified, because there is no actual analysis of the structure of "S-exp ...". The syntax of patterns is exactly the same as in the case of the form MATCH.

The form

(WITH* ((pat1 arg1) . (pat arg) ...) exp ...)

is equivalent to

(WITH ((pat1 arg1)) (WITH* ((pat arg) ...) exp ...)

RESTRICTED GENERALIZED CASE-EXPRESSION

(SELECT (arg ...)
        (rpat ...  & guard => exp ...) ...)

The expressions "arg ..." are evaluated to produce S-expressions "S-exp ...". "S-exp ..." are then matched against the corresponding restricted patterns "rpat ...". If the matching succeeds for some clause

(rpat ... & guard => exp ...)

the variables in "pat ..." get bound to the corresponding subexpressions in "S-exp ...", and then the expression "guard" is evaluated in the extended environment. If the result of "guard" is not #f, the expressions "exp ..." are evaluated in the extended environment, otherwise the next clause is tried. If the guard is #t, "& guard" may be omitted.

The syntax of restricted patterns coincides with that of the ordinary patterns appearing in the construct MATCH described above, but their meaning is slightly different.

If a restricted pattern <pat> doesn't have the form (<pat'> . <pat''>), it has the same meaning as the ordinary pattern <pat>.

If an S-expression <S-exp> is not a pair, the result of matching <S-exp> against a pattern (<pat'> . <pat''>) is unspecified (i.e. matching <S-exp> against such a pattern may produce either an error or unpredictable results).

If an S-expression <S-exp> is a pair (<S-exp'> . <S-exp''>), and a pattern <pat> has the form (<pat'> . ()), then <S-exp> matches <pat>, iff <S-exp'> matches <pat'>. In other words, a restricted pattern of the form (<pat'> . ()) is completely equivalent to the restricted pattern (<pat'> . _).

If an S-expression <S-exp> is a pair (<S-exp'> . <S-exp''>), and a pattern has the form (<pat'> . <pat''>), where <pat''> is not (), then <S-exp> matches <pat>, iff <S-exp'> matches <pat'> and <S-exp''> matches <pat''>.

The fact that restricted patterns are less careful at examining S-expressions than the ordinary patterns are, enables them to be compiled into efficient code.

RCALL

(RCALL (fname arg ...))

This construct is used for telling the specializer that the function call (fname arg ...) is a residual one. In all other respects this construct is equivalent to (fname arg ...).

GENERALIZE

(GENERALIZE exp)

This construct is used for telling the specializer that the result of specializing (GENERALIZE exp) must be dynamic even if exp is static. In all other respects this construct is equivalent to exp.