Programming methodology continues to rapidly change.
There have been vast improvements in increasing the reliability
of programs.
Several interesting changes involve the use of interpreted languages
compared to compiled languages.
Some languages offered translations into an intermediate code representation.
Some translate into machine specific code.
The result is that interpreted languages are now approaching the speed of
compiled languages.
There are cases where interpreted languages are faster than compiled
languages due to run-time optimization strategies.
One such interpreted language is
javascript:
it is standards based, it runs quickly, and its syntax is familiar.
javascript
started in the Web browser and is now available on the server
side too. Programming environments like nodejs and
spidermonkey have made javascript
available in many situations.
Along with the Web came a number of document markup languages, such as HTML.
These languages provided a means of displaying documents on screen
and could be formatted into print.
Over time, HTML
document information was separated from its display.
The display of HTML
documents were encoded in CSS.
These languages seemed syntactically heavy and inappropriate for transmitting
data between machines with limited bandwidth.
This led to the development of JSON to represent simple data structures in
an readable format that borrowed its structure from javascript
.
JSON
is easily used to provide configuration information to programs,
although JSON
fields can appear in any order.
Given the complexity of HTML
documents a number of authoring languages were
developed, such as markdown.
The purpose of the languages was to facilitate
the generation of HTML
documents.
There are many other authoring languages that have been used
in the past, such as TeX and
troff.
Although TeX
pre-dates the Web, MathJax can be used
to render TeX
based math formulas.
One service that makes use of javascript
and JSON
is couchdb
database management system server. It uses JSON
for the configuration of its
database and for delivering data to clients.
couchapp
is a client-server application framework based
on a Web browser front-end and a couchdb
back-end.
A feature of this framework is that all the configuration can be written using Web
standards: CSS
, HTML
, and javascript
. Since couchdb
uses
HTTP and
HTTPS communicating
with Web browsers is relatively straight forward.
This seems an appropriate time to revisit the 'Literate Programming' idea of Knuth, published in 1984, in the context of Web standards.
markover.js is a javascript
library implemented as a
CommonJS module.
This document is a literate program that can be used to make the markover
application.
It describes the source code used in markover
and can generate the source code.
That is, the same authoring document (the markover
document)
is used tangle
itself into a program (the source code) and to weave
itself into a viewable document (the display document).
The markover
document is written in markdown
.
The primary source code target is a javascript
object.
The javascript
object might be used as a configuration file (in the form of a JSON
object),
or as a couchapp (as a JSON
object or a CommonJS module),
or a javascript library (a CommonJS module).
The primary display document target is HTML
, CSS
and MathJax
.
Other tools have targeted similar platforms, like the
literate
option in coffeescript.
However, literate coffeescript lacks the ability to tangle
code. The order of the code is constrained by the needs of the
programming language compiler. markover
is more similiar to
Knuth's system by allowing greater flexibility in the order that
the program is presented. But it doesn't provide as much
flexibility as Knuth's system.
markover
has enough flexibility to produce source code for many text based programming
language. However, it is designed to work with a single javascript
object.
markdown
has had a number of extensions added to it.
One set, called GFM (GitHub Flavored Markdown), is used by markover
.
In GFM markdown
a section of code is marked off, like this:
console.log('Hello, World!')
which is produced by:
```js
console.log('Hello, World!')
```
The word after the fence normally identifies the language that the code block is written in.
markover
extends this syntax to include a field name.
This is used to add
the code block to the object field. When the field name is repeated, the additional
code is appended to the object field. So, when markover
processes a markdown
document, it looks
for lines starting with a fence, followed by an optional language id, a hash tag, an optional
plus sign, followed by the field name.
The source code is appended line by line until the closing fence is found.
Code blocks that don't have a field name are ignored.
The optional plus sign is a flag that the field should be quoted in the output.
This conveniently reduces the need to use a large number of escape characters in the
markover
document.
By flagging a field as a quoted string,
markover
will make appropriate escape sequences.
This keeps the structure of fields fairly neat
and more consistent with the embedded languages
that are sometimes used inside the strings.
For example:
```json#foo
[1,2]
```
will set the field 'foo' to be an array with two values. Whereas,
```json#+foo
[1,2]
```
will set the field to the string "[1,2]".
There are a few special markers used in markover
.
A language of noweave
indicates
that the code block will not be included in the weave
output.
Field names of #first
and #last
are used to
include arbitrary beginning and ending material, respectively.
A missing language with a field name (like #foo
) will default
to json
.
By convention markover
documents have a few common fields.
A title
which names the document and a tag
which is a one-line
description of the document. You may include a version
field as well.
markover
Web Literate Programming
The current version of this document is:
0.0.3
That's all there is to know about the structure of markover
documents.
There are no specific dependencies on the target language.
For a CommonJS module there is usually some beginning and ending material that is needed. For this module, the object is written within an anonymous function so that the name space won't be polluted.
!(function() {
var markover =
The trailing material allows the object to be used in several environments. This is attached to the anonymous function after the literate object is declared. It does several checks to determine which environment it is operating in.
if (typeof module !== 'undefined' && typeof exports === 'object') {
module.exports = markover
} else if (typeof define === 'function' && define.amd) {
define(function() { return markover })
} else {
this.markover = markover
}
}).call(function () {
return this || (typeof window !== 'undefined' ? window : global)
})
It seems like every system as standard boilerplate material that must be included.
HTML
documents has quite a bit.
Here is the typical start of an HTML document, which comes before the actual header.
<!DOCTYPE html>
<html lang="en" class="">
<head>
The HTML
has a section the separates the header from the body of the document.
</head>
<body>
And some ending material.
</body>
</html>
These meta provide additional information about the display document encoding. Setting the viewport scale helps display the document on small screens.
<meta charset="utf-8">
<meta http-equiv="Content-Language" content="en">
<meta name="viewport" content="initial-scale=1">
The following code enables MathJax
on the Web.
It requires a configuration block before the MathJax code can be included.
In order to avoid confusion with other uses of symbols, markover
only
uses \\(
and \\)
to delimit inline math equations.
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
showProcessingMessages: false,
tex2jax: { inlineMath: [['\\(','\\)']] },
TeX: { equationNumbers: {autoNumber: "AMS"} }
});
</script>
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
For example, here is an equation that will be processed with MathJax
and appear as in-line
equations in the text: \( x \lt y \) or gamma \( \Gamma \). However if MathJax
isn't properly
included, the in-line equations will still be readable. MathJax
also provides display equations,
such as:
The display document makes some use of jquery. This tools handles
most of the inconsistentcies between Web browser and simplifies the manipulation of
HTML
documents.
<script type="text/javascript"
src="http://code.jquery.com/jquery-2.1.3.min.js">
</script>
JQuery is used to load highlight.js
parsers for the languages used in this document.
There isn't a one-to-one relationship between languages and parsers. For example,
the HTML
parser is located in xml
.
<script type="text/javascript">
["bash", "xml", "javascript", "json", "css", "markdown"].forEach(function(e) {
var s = $('<script>')
s.type = 'text/javascript'
s.src = 'http://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.4/languages/' + e + '.min.js'
$('head').append(s)
})
</script>
The format of the display document is largely controlled by the CSS
.
Part of the CSS
is based on the highlight.js rendering style.
The following stylesheet
details how the rest of the document is displayed.
Is this necessary? No. Instead of including the stylesheet here, you can link to a stylesheet
in the header section. Or, parts of the the style can be included in the header.
For example, there are standard highlight.js
style sheets available on CDNs
that
can be included for the language syntax highlighting. In this case, I wanted a little more
control over the presentation on screen and in print.
I've created a complete style sheet here. This style sheet will display well on a variety of screen sizes and in print.
This first section resets all HTML
elements to reasonable default values.
This is used in the heading of the page and has selectors that effect elements
throughout the document.
html, body, div, span, object, iframe, h1, h2, h3, h4, h5, h6, p, blockquote, pre,
abbr, address, cite, code, del, dfn, em, img, ins, kbd, q, samp,
small, strong, sub, sup, var, b, i, dl, dt, dd, ol, ul, li, a,
fieldset, form, label, legend, table, caption, tbody, tfoot, thead, tr, th, td,
article, aside, canvas, details, figcaption, figure,
footer, header, hgroup, menu, nav, section, summary, time, mark, audio, video {
display: block;
margin:0;
padding:0;
border:0;
outline:0;
line-height: 140%;
font-size: 100%;
vertical-align:baseline;
text-decoration: none;
background:transparent;
}
a, code, span, b, i, em { display: inline; }
The following stylesheet operates within the content generated by applying weave
to the source document.
markover
marks up 3 areas before the main document body:
title, tag, and a table of contents.
Title and tag are each in a single element and only appear once in the display document.
#content {
font-family: "HelveticaNeue-Light", "Helvetica Neue Light", "Helvetica Neue", Helvetica, Arial, "Lucida Grande", sans-serif;
}
#content #title {
clear: both;
text-align: center;
font-size: 140%;
font-weight: bold;
margin: 2em 0 0 0;
}
#content #tag {
clear: both;
text-align: center;
font-size: 110%;
margin: 0 0 1em 0;
}
The table of contents, in a single toc
id, is slightly more complex.
The table is on the left for wide windows only.
#content #toc {
padding: 0 0 0 1em;
max-width: 96%;
page-break-after: always;
}
@media screen and (min-width: 641px) {
#content #toc {
float: left;
max-width: 26%;
}
}
#content .tocsn { float: left; }
#content .tocsr { overflow: hidden; }
#content #toc ol {
list-style-type: none;
}
#content #toc ol ol {
margin: 0 0 0 1em;
}
The main document is in a single doc
id.
Make the toc and doc with the same margins.
The 'overflow: hidden' prevents the main text from flowing around the float content.
#content #doc,
#content #toc {
margin: 0 1em 0 1em;
}
#content p { text-indent: 1em; }
#content code { font-size: 120%; }
#content pre code {
font-family: "Courier New", Courier, monospace;
font-size: 100%;
}
@media screen and (min-width: 641px) {
#content #doc {
overflow: hidden;
max-width: 70%;
}
}
Field names are in a field
class.
#content pre .field {
display:block;
margin: 1em 1em 0em 0em;
}
#content pre .field::after {
content: " :=";
}
The code blocks are in the code
tag and
have classes that identify their language.
#content pre code {
display: block;
margin: 1em;
padding: 1em;
border: solid 1px #aaa;
}
@media screen {
#content pre code {
overflow: auto;
max-height: 30em;
}
}
@media print {
#content pre {
page-break-inside: avoid;
}
#content pre code {
overflow: visible;
word-wrap: break-word;
}
}
Each language uses a slightly different light background color.
@media screen {
#content .lang-js { background-color: ivory; }
#content .lang-md { background-color: lightyellow; }
#content .lang-html { background-color: floralwhite; }
#content .lang-json { background-color: honeydew; }
#content .lang-css { background-color: cornsilk; }
#content .lang-sh { background-color: lemonchiffon; }
}
The header font sizes are set.
#content h1,
#content h2,
#content h3,
#content h4,
#content h5,
#content h6 {
margin: 0.5em 0em;
page-break-after: avoid
}
#content h1 { font-size: 140%; }
#content h2 { font-size: 130%; }
#content h3 { font-size: 120%; }
#content h4,
#content h5,
#content h6 { font-size: 110%; }
These styles handle user interactions.
#content a:hover { color: #aaa; }
#content a:visited,
#content a { color: #000;}
@media screen {
#content a {
text-decoration: underline;
}
}
Links are displayed in the print document.
@media print {
#content a {
text-decoration: none;
}
#content #doc a[href]:after {
content: " (" attr(href) ")";
font-size: 90%;
}
}
highlight.js
specific styles for code blocks.
Basic type markup:
.hljs {
overflow: auto;
padding: 0.5em;
color: #333;
}
.hljs-comment, .diff .hljs-header, .hljs-javadoc {
color: #999;
font-style: italic;
}
.hljs-keyword, .css .rule .hljs-keyword, .hljs-winutils, .nginx .hljs-title,
.hljs-subst, .hljs-request, .hljs-status {
color: #333;
font-weight: bold;
}
.hljs-number, .hljs-hexcolor, .ruby .hljs-constant {
color: #088;
}
.hljs-string, .hljs-tag .hljs-value, .hljs-phpdoc, .hljs-dartdoc, .tex .hljs-formula {
color: #d14;
}
.hljs-title, .hljs-id, .scss .hljs-preprocessor {
color: #900;
font-weight: bold;
}
.hljs-list .hljs-keyword, .hljs-subst {
font-weight: normal;
}
.hljs-class .hljs-title, .hljs-type, .vhdl .hljs-literal, .tex .hljs-command {
color: #458;
font-weight: bold;
}
Additional markup
.hljs-tag, .hljs-tag .hljs-title, .hljs-rules .hljs-property, .django .hljs-tag .hljs-keyword {
color: #000080;
font-weight: normal;
}
.hljs-attribute, .hljs-variable, .lisp .hljs-body {
color: #008080;
}
.hljs-regexp {
color: #009926;
}
.hljs-built_in {
color: #0086b3;
}
.hljs-symbol, .ruby .hljs-symbol .hljs-string, .lisp .hljs-keyword,
.clojure .hljs-keyword, .scheme .hljs-keyword, .tex .hljs-special,
.hljs-prompt {
color: #990073;
}
Other uncommon specialize markup:
.hljs-preprocessor, .hljs-pragma, .hljs-pi, .hljs-doctype, .hljs-shebang,
.hljs-cdata {
color: #999;
font-weight: bold;
}
.hljs-deletion {
background: #fdd;
}
.hljs-addition {
background: #dfd;
}
.diff .hljs-change {
background: #0086b3;
}
.hljs-chunk {
color: #aaa;
}
Using CSS
it is possibly to have a separate
layout for print media.
markover
overrides certain values for print media.
@media print {
.hljs {
overflow: visible;
word-wrap: break-word;
}
.hljs-number, .hljs-hexcolor, .ruby .hljs-constant {
color: #888;
}
.hljs-string, .hljs-tag .hljs-value, .hljs-phpdoc, .hljs-dartdoc, .tex .hljs-formula {
color: #ddd;
}
.hljs-title, .hljs-id, .scss .hljs-preprocessor {
color: #999;
}
.hljs-class .hljs-title, .hljs-type, .vhdl .hljs-literal, .tex .hljs-command {
color: #888;
}
.hljs-tag, .hljs-tag .hljs-title, .hljs-rules .hljs-property, .django .hljs-tag .hljs-keyword {
color: #888;
}
.hljs-attribute, .hljs-variable, .lisp .hljs-body {
color: #888;
}
.hljs-regexp {
color: #999;
}
.hljs-symbol, .ruby .hljs-symbol .hljs-string, .lisp .hljs-keyword,
.clojure .hljs-keyword, .scheme .hljs-keyword, .tex .hljs-special,
.hljs-prompt {
color: #999;
}
.hljs-built_in {
color: #888;
}
.hljs-deletion {
background: #ddd;
}
.hljs-addition {
background: #ddd;
}
.diff .hljs-change {
background: #888;
}
}
This implementation is based on marked,
which is a javascript
markdown parser with a lot of flexibility.
It provides a number of options that are very useful for markover
.
weaveStream, tangleStream and untangleStream are based on nodejs
stream
transformation. They read from a source stream, transform the input,
and write out to an output stream.
I'm using a lazy implementation of the nodejs
stream transform.
Instead of processing the input buffer piece by piece, I collect
the entire stream into a pool
. At the end, the entire pool
is flushed to the output pipe. It takes a single argument which
is a set of options which should include a convert
function that
converts a string into the transformed string.
The default convert
function does not transform the string.
The options are passed to the convert
function.
pool
calls the convert
function in the context of the
optional self
object.
This gives the convert
function access to all the facilities
of the identified object (methods and members).
function(options) {
options = options || {}
var dam = new require('stream').Transform()
dam.options = options
dam.options.convert = dam.options.convert || function(string, options) { return string }
dam.options.self = dam.options.self || this
dam.lake = []
dam._transform = function(chunk, encoding, done) {
dam.lake.push(chunk.toString())
done()
}
dam._flush = function(done) {
dam.push(dam.options.convert.call(dam.options.self, dam.lake.join(''), dam.options))
dam.push('\n')
dam.lake = []
done()
}
return dam
}
For displaying a document in HTML
there are certain character
sequences that must be escaped and then unescaped.
function (html, encode) {
return html
.replace(!encode ? /&(?!#?\w+;)/g : /&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''');
}
function(html) {
return html.replace(/&([#\w]+);/g, function(_, n) {
n = n.toLowerCase();
if (n === 'colon') return ':';
if (n === 'lt') return '<';
if (n === 'gt') return '>';
if (n === 'quot') return '"';
if (n.charAt(0) === '#') {
return n.charAt(1) === 'x'
? String.fromCharCode(parseInt(n.substring(2), 16))
: String.fromCharCode(+n.substring(1));
}
return '';
})
}
The tangle
process is the most important process in markover
.
The tangle
process has to exist before markover
can work.
During my initial implementation I prototyped javascript
code that could tangle
a complex markdown
document.
Then I inserted the tangle
prototype into a prototype
markover
document.
After further adjustments I produced a markdown
document
that could be tangled into a functional identical tangle
prototype.
At this point I could make improvements to the document,
tangle
it, verify the results, and update the tangle
process.
If you don't have a tangle
process, you can manually create
a prototype by applying the tangle
algorithm described here
to this document. Once your prototype works, you can use it
to produce the tangle
process described in this document.
markover
uses marked
to parse markover
documents.
This parser has enough flexibility to implement the main
features of markover
: tangle
and weave
.
I haven't investigated all the options that marked
provides. But I have found a set of options that produce
good results. The gfm
option, which is 'GitHub Flavored Markdown'
is particularly important to turn on.
This option implements the fence style code blocks that
markover
is dependent on.
All the renderer output functions are set to noop
except
for code
.
Quoting is an important element of markover
.
It uses % escape sequences to convert
native codes into escape codes.
The following fields describe all of the formating
that the tangle
process uses in creating source code
from markover
documents.
{%n%n
%n}%n%n
,%n%n
"
":
Once a markover
document has been tangled,
there are 3 results: the starting material,
the fields, and the ending material.
The starting and ending material are passed
straight through.
The tangleFormat
function handles the
processing of the fields. The fields are
stored in an object. For each field, the
delimited field name is emitted and then
the value.
For javascript
output the delimited field
is in double quotes.
For languages that don't use objects and
fields, the delimited field name could be
put in a source code comment.
Not that fields are not in any particular
order.
The final part of the processing is to translate the % escape sequences back to their original representation.
function(ff, fields, lf) {
var res = [ff, this.tangleFormatPrefix]
var firsttime = true
var v
for (var e in fields) {
v = fields[e]
if (firsttime) {
firsttime = false
} else {
res.push(this.tangleFormatDelimiter)
}
res.push(this.tangleFormatFieldPrefix + e + this.tangleFormatFieldSuffix)
res.push(v)
}
res.push(this.tangleFormatSuffix)
res.push(lf)
return res.join('')
.replace(/%q/gm, '\"')
.replace(/%n/gm, '\n')
.replace(/%t/gm, '\t')
.replace(/%s/gm, '\\')
.replace(/%p/gm , '%')
}
Given a makeover
source string tangleJSON
returns the converted source code using marked
.
Options can have an array of field names that
should be excluded from the output.
function (str, options) {
options = options || {}
options.exclude = options.exclude || []
We get a reference to the Renderer object so that it can be modified as needed.
var marked = require('marked')
var p = new marked.Renderer()
marked.setOptions({
renderer: p,
gfm: true,
tables: true,
breaks: false,
pedantic: false,
sanitize: true,
smartLists: true,
smartypants: false
})
The main part of the rendered used in
tangle
is the code
function which
handles the code blocks.
Code blocks without field names are ignored.
Quoted code blocks are encoded in strings.
Then the entire string is % escaped.
Each code is stored as a triple of strings: field, quote, & text.
p.code = function(code, lang, escaped) {
if (!lang)
return ''
var field = lang.replace(/\S*#[+]?(\S*)/, "$1")
if (lang == field)
return ''
var quote = lang.replace(/\S*#([+]?)\S*/, "$1")
lang = lang.replace(/(\S*)#[+]?\S*/, "$1")
if (lang == '') lang = 'json'
if (quote == '+') {
code = ('"' +
code
.replace(/\\/gm, '\\\\')
.replace(/\"/gm /*"*/, '\\"')
.replace(/\n/gm , '\\n')
.replace(/\t/gm , '\\t')
+ '"')
}
var esccode = code
.replace(/%/gm , '%p')
.replace(/"/gm /*"*/, '%q')
.replace(/\n/gm, '%n')
.replace(/\t/gm, '%t')
.replace(/\\/gm, '%s')
return ',["' + field + '", "' + (quote == '+') + '", "' + esccode + '"]'
}
All other rendering blocks are stopped.
function noop() { return ''}
p.blockquote = noop
p.html = noop
p.heading = noop
p.hr = noop
p.list = noop
p.listitem = noop
p.paragraph = noop
p.table = noop
p.tablerow = noop
p.tablecell = noop
p.strong = noop
p.em = noop
p.codespan = noop
p.br = noop
p.del = noop
p.link = noop
p.image = noop
marked
does all the parsing at this point.
The resulting set of triples is converted into
an array of objects.
var doc = '[' + marked(str).substring(1) + ']'
var obj = JSON.parse(doc)
var fields = {}
var ff = ''
var lf = ''
Each triple is decoded and appended to any existing text. There is a subtle difference in the quoted versus non-quoted appending: the quoted append inserts an escaped newline, while the non-quoted inserts a % escaped newline (which is converted into an actual newline in the output. Note that inserting these newlines prevents quote fields from being concatenated without a newline character.
var res = obj.forEach(function(e) {
var k = e[0]
var q = (e[1] == "true")
var v = e[2]
var doit = true
if (options.exclude.indexOf(k) != -1)
doit = false
if (doit) {
if (fields[k] != undefined) {
if (q) {
v = fields[k].substring(0,fields[k].length-2) + '\\n' + v.substring(2)
} else {
v = fields[k] + '%n' + v
}
}
fields[k] = v
}
})
Once all the fields are prepared, the special first and last
field names are located and the results are formatted for output.
If the sourcetarget
options is given, then the quoted source document
is stored at the location.
if (fields["first"] != undefined) {
ff = fields["first"]
delete fields["first"]
}
if (fields["last"] != undefined) {
lf = fields["last"]
delete fields["last"]
}
if (options.sourcetarget) {
fields[options.sourcetarget] =
('"' +
str
.replace(/\\/gm, '\\\\')
.replace(/\"/gm /*"*/, '\\"')
.replace(/\n/gm , '\\n')
.replace(/\t/gm , '\\t')
.replace(/%/gm , '%p')
+ '"')
}
return this.tangleFormat(ff, fields, lf)
}
tangleStream
can produce a transform object or
apply it to stdin and stdout. The input stream is
pooled, then tangled and pushed to the output stream.
function (opts) {
var options = {convert: this.tangleJSON, self:this}
for(var i in opts)
options[i] = opts[i]
var p = this.pool(options)
if (!p.options.notstdio) {
process.stdin.pipe(p).pipe(process.stdout)
return true
}
return p
}
The weave
process complements tangle
by producing
a readable document that describes the program's operation.
Many document systems, such as doxygen
use the programming languages structure to attach documentation.
Literate programming emphasizes the documentation structure
over the programming structure.
Since the tangle
process provides field names for targets and allows
fields to be concatenated, the markover
document layout can
be structured in the order that makes the most sense for the
description of the program.
Of course, parts of a program might be best described in the
order the the programming language requires, and this can easily be
done in markover
as well.
The weave
process maintains the order that the markover
document is written in.
It also weaves into the source code additional markup to
clarify the syntax of the code.
By processing the headings in the markover
document
a table of contents can be developed.
As headings are rendered into HTML
, they are appended
to a toc
. Each toc
element starts as a triple: text,
level, id.
An array n
is maintained to enumerate the headings.
function(text, level, raw, toc, prefix) {
var i = raw.toLowerCase().replace(/[^\w]+/g, '-')
var n = [0, 0, 0, 0, 0, 0, 0]
HTML
defines 6 levels of headings.
This seems more than sufficient.
if (level > 6) level = 6
else if (level < 1) level = 1
toc.push([text, level, i])
The n
array is regenerated to reflect the
current heading count.
var l
for (var x = 0; x < toc.length; x++) {
l = toc[x][1]
n[l] = n[l] + 1
while (++l < toc.length)
n[l] = 0
}
A section number is calculated from the n
array
and added to the toc
element.
var sn = n.slice(1, level+1).join('.') + (level == 1 ? '.' : '')
toc[toc.length-1].push(sn)
And the HTML
rendering of the heading is returned, lightly formatted.
return '<h'
+ level
+ ' id="'
+ prefix
+ i
+ '">'
+ sn + ' ' + text
+ '</h'
+ level
+ '>\n'
}
Highlighting code is handled by highlight.js
.
This utility function returns a code block that
is highlighted according to the identified language.
function (lang, code) {
var hljs = require('highlight.js')
code = hljs.highlight(lang, this.unescape(code), true)
return code
}
This function formats a code block for the HTML
document.
The formatting consists of labelling parts of the code with
HTML
tags with suitable classes. Code blocks that don't
have an identified language are not highlighted.
function(code, lang, escaped, fields, prefix) {
var c = code
if (!lang) {
return '<pre><code>'
+ this.escape(c)
+ '\n</code></pre>'
}
Code blocks that have a language, but don't have a field name are highlighted.
var i = lang.indexOf('#')
if (i == -1) {
if (lang == "noweave") return ''
return '<pre><code class="'
+ prefix
+ this.escape(lang, true)
+ '">'
+ this.weaveHighlight(lang, c).value
+ '\n</code></pre>\n'
}
Code blocks that have a field name are highlighted
as well unless the language is noweave
.
Field names are appended and retained in
weaving too. These field names can be used in the
output of the HTML
document.
This is useful for the defining the structure
of the HTML
document.
var field = lang.substring(i+1)
lang = lang.substring(0, i)
if (lang == "") lang = "json"
if (field.substring(0,1) == '+') {
field = field.substring(1)
}
if (fields[field] == undefined) {
fields[field] = c
} else {
fields[field] = fields[field] + '\n' + c
}
if (lang == "noweave") return ''
return '<pre><div class="field">'
+ field + '</div>'
+ '<code class="'
+ prefix
+ this.escape(lang, true)
+ '">'
+ this.weaveHighlight((lang != "" ? lang : "json"), c).value
+ '\n</code></pre>\n'
}
The markover
parsing for weaving is done by marked
.
We start with the standard marked
rendered and then
modify routines to prepare the HTML
generation.
function () {
var marked = require('marked')
var p = new marked.Renderer()
marked.setOptions({
renderer: p,
gfm: true,
tables: true,
breaks: false,
pedantic: false,
sanitize: true,
smartLists: true,
smartypants: false
})
Create the toc
, field
, and self
members to keep
the needed state during parsing.
p.field = []
p.toc = []
p.self = this
Process the heading with this object so that heading can access the utility functions. Do the same for code blocks. This returns the parser and the renderer.
p.heading = function (text, level, raw) {
return this.self.heading.call(this.self, text, level, raw, p.toc, this.options.headerPrefix)
}
p.code = function (code, lang, escaped) {
return this.self.weaveCode.call(this.self, code, lang, escaped, p.field, this.options.langPrefix)
}
return [marked, p]
}
This function formats the final HTML
document
using marked
.
function (str, options) {
options = options || {}
var m = this.weaveMarked()
var marked = m[0]
var p = m[1]
var doc = marked.call(this, str)
Utility function
function f(l,d) { return this.unescape(p.field[l] || d) }
Format the basic elements of the document. Use reasonable default values where applicable.
var title = f("title", "Weave Output")
var tag = f("tag", "from weaving")
var headstylesheet = f("headstylesheet", "")
var stylesheet = f("stylesheet", "")
var headline = f("head", "<!DOCTYPE html><html><head>")
var transitionline = f("transition", "</head><body>")
var tailline = f("tail", "</body></html>") + '\n'
var headtitle = '<title>' + title + ' - ' + tag + '</title>\n'
var titleline = '<div id="title">' + title + '</div>'
var tagline = '<div id="tag">' + tag + '</div>'
var headstyleline = headstylesheet == '' ? '' : '<style>' + headstylesheet + '</style>'
var styleline = stylesheet == '' ? '' : '<style>' + stylesheet + '</style>'
Create and mark up the table of contents. Open or close ordered list elements whenever there is a change in the heading level.
var tocitems = p.toc.map(function(e, i, a) {
var ll = 0
var o = ''
if (i > 0) ll = a[i-1][1]
if (ll < e[1]) {
while (ll++ < e[1])
o = o + '<ol>'
} else if (ll > e[1]) {
while (ll-- > e[1])
o = o + '</ol>'
}
return (
o + '<li><div class="tocsn">' + e[3] + ' </div><div class="tocsr"><a href="#' + e[2] + '">'
+ e[0] + '</a></div>'
)
})
Append enough ordered list ending tags and put the results in toc
var toc = ''
var ll
var o
if (p.toc.length > 0) {
ll = p.toc[p.toc.length-1][1]
o = ''
while (ll-- > 0)
o = o + '</ol>'
toc = '<div id="toc"><h1>Contents</h1>\n' + tocitems.join('\n') + o + '</div>\n'
}
Return the assembled document.
The bulk of the document is in the doc
variable.
var heading = ''
var tailing = ''
if (options.contentonly != true) {
heading = headline + headtitle + headstyleline + styleline + transitionline
tailing = tailline
} else
heading = styleline
return (heading +
'<div id="content">' +
titleline + tagline + toc +
'<div id="doc">' + doc + '</div>' +
'</div>' + tailing)
}
weaveStream
can produce a transform object or
apply it to stdin and stdout. The input stream is
pooled, then weaved and pushed to the output stream.
function (opts) {
var options = {convert: this.weaveJSON, self:this}
for(var i in opts)
options[i] = opts[i]
var p = this.pool(options)
if (!p.options.notstdio) {
process.stdin.pipe(p).pipe(process.stdout)
return true
}
return p
}
untangle is a utility that embeds a code block
into a markover
document.
Basically it has a fence prefix and suffix.
function (opts) {
opts = opts || {}
var options
options = {
convert: function (str, options) {
return '```\n' + str + '```\n'
},
self: this
}
for(var i in opts)
options[i] = opts[i]
var p = this.pool(options)
if (!options.notstdio) {
process.stdin.pipe(p).pipe(process.stdout)
return true
}
return p
}
To bootstrap try:
node -e "require('./markover').tangleStream()" < README.md > markover1.js && node -e "require('./markover1').tangleStream()" < README.md > markover2.js && diff markover2.js markover1.js && mv markover1.js markover.js
You can rebuild markover
from this document (assuming it is called README.md) by running the following in a Unix shell:
node -e "require('./markover').tangleStream()" < README.md > markover1.js && mv markover1.js markover.js
To weave:
node -e "require('./markover').weaveStream()" < README.md > index.html
If you have used npm install
to get this package then don't use the relative path.
markover
documents do not display completely correctly on standard markdown
renders used on GitHub and npmjs.
The field names that are embedded in the code block fence are not interpreted.
The MathJax
processing is not rendered either.
And, of course, there is no table of contents.
Nevertheless the displayed markover
document is readable.
The raw text of the markover
documents are more readable.
The raw text of the HTML
display document is not very readable:
the markdown
text isn't too bad, but the
code blocks are extremely difficult to read.
The output CommonJS
module is very readable. It is included in
the source since it is needed to bootstrap markover
.
Downloading the package and viewing the HTML
document is the best way to
see markover
in action.
markover
offers authors the ability to write software in an order that is
most suitable for understanding the code.
It has facilities for presenting documentation on screen and in print with
full access to TeX
equation formatting.
The markdown
document is readable as plain text and easily manipulated with marked
.
Yet untangling
source code into a markover
document is trivial.
Moving from traditional programming to literate programming can be done incrementally.
In writing markover
I found that it very easy to add reminders.
There was no extra effort to embed notes into 'structure comments'.
While reviewing the markover
document or the HTML
document,
I naturally revisited the notes to revise them or implement the ideas.
Including references to related work is straight forward and will be
help reader's interested in more detailed information.
In short, writing markover
in this style improved the implementation.
Is it possible to write documents for every program? Yes, to varying degrees. Experimental code might have little documentation. Difficult algorithms implementations might have a lot more. Code with many contributors might have more documenation. In any case, the software can start with little documentation and incremental be improved as needed.
Knuth's system offers parameterize macros. MathJax
does have this ability in math
mode.
But this implementation only uses MathJax
for the display document.
Including this capability might be the next required step.
markover
has some ability to substitute field values in the HTML
output.
Perhaps the an escape sequence could be used in a code block to subsitute the
contents of another field. A pair of backtick characters might be used for the escape
sequence, like:
console.log("This is ``title``")
Care would be needed to avoid mutually recursive macro substitutions.
Adding a table of contents was straight forward given the capabilities of the marked
library.
Adding an index would be considerably more challenging.
Another feature of Knuth's system is a trailing footnote that indicates what other sections
of the document contributed to the current block of code.
Using a series of regex pattern matching as in highlight.js
could yield good results for
a wide range of languages.
Unlike Knuth's system, markover
has a fairly straight forward mechanism for tangling code.
Knuth's system provides a lot more flexibility for re-ordering code and inserting code at
arbitrary places in the code.
markover
is more restrictive, only providing the field names as targets for code blocks.
It seems like this is enough flexibility and it allows the author to structure code blocks
in a readable fashion that is copied into place producing fairly readable source code.
If a more fine grain target is need, the source code can be divided into component functions
or subroutines. Then each subroutine can be targeted separately.
Today there are sophisticate IDEs for javascript and other languages.
The generated source files resemble the markover
document.
It should be possible for good programmers to work with the generated
source files and modify the original document.
A possible enhancement is to generate a source map file along with the
source code. Tools like uglify2
can read input source map files and generate source map files for its
mangled output. A tool like this could be included in markover
itself.
In Knuth's system he went to some lengths to avoid using system libraries,
much less 3rd party software.
The development of the Web has turned this equation around.
A tool like markover
would be much more difficult to create without
using libraries like marked
and highlight.js
.
CDN
providers improve user experience by downloading libraries
quickly to their systems.
The downside is the need to update programs as these resources change.