The Goofy Markup Language Processor
Last Update: Version 1.4.4, December 19, 2017.
Slight changes to CLI, new "lines" functionality. See changelog.md.
Update November 8, 2017: See Simple Markup for a 300 line version of this code for simple markup to HTML. This code will get one more update and Simple Markup will take it's place..
The term Markup Language Processor reflects it's origins as a program to convert "marked up" text to HTML. It has since turned into something completely different: a set of functions for converting text in nearly any conceivable way via a User Defined Data File.
The main documentation is viewed through
index.php run under your localhost
The idea was to be able to change the converted output, not by modifying code, but modifying the data that defined the output conversions.
I saw the basic idea of "some markup to HTML" as a sole program as myopic. For example, Markdown only does Markdown; Textile only does Textile; BBCode only does BBCode; etc.
The reason for GMLP was to convert any markup language to HTML with a definition file for each markup language. So GMLP does Markdown, Textile, BBCode, etc. And anyone can add their own definition file.
Markdown support is only slowly being finished and Textile support has been removed. (The Markdown code is "hastily, sloppily" code and needs yet another re-write.)
But GMLP is not just for markup to HTML. It is a fully functional "any text input to any text output" processor, with no programming required, though basic familiarity with PHP and regular expressions is.
- A really small PHP API and not a Class.
- Not a single string literal test for markup in the code.
- Conversions are based on user defined regular expressions.
- User defined functions can be added to support conversions.
- Can be integrated into a Blog/CMS type website (though the code is a bit slow).
- Can be run from the command line to convert files.
The algorithm is currently wbout 800 lines and a basic definition file (data and code) is about 300 lines – which means basic Markdown conversion in 1100 lines (though there is some tweaking to do for it). (Compared to Markdown.pl, at about 1400 lines, the size of the code is not that relevent; but Markdown can only do Markdown.)
- The code lacks proper test/benchmark code, though that has been begun.
- Some of this code is a bit sloppy and unclear. The documentation is poor.
- The "charcater" markup code ("emphasis") is lame.
The changes to occur to the input text is defined by a particularly formatted PHP associative array of "actions" to perform on the input data.
A definition file has sections for conversions based on characters, words, lines and blocks (multi-line), with each section defining "rules" for how to convert the input text.
For more complex conversions the output can be supported by user defined functions. And there are "hooks" to run functions with the data at certain points in the conversion process.
There are additional sections for internal options and for defining paragraph marks and end of line terminators.
Please note that the following Definition Files (and their included code) are not fully completed. That is a fault of implementation and not limitation. Which means they will eventually be finished and work as expected as soon as I (or anybody else) finishes their code.
The Definition Files
Definition Files are in the
markdownMarkdown to HTML
txt2markdownplain text to Markdown
htm2markdownHTML to Markdown
txt2htmlplain text to HTML
html2txtHTML to plain text
md2txtMarkdown to plain text
bbcodeBBCode to HTML
replacestring search and replace (example)
comments-strip comment lines/blocks
htmlstripremove all HTML from input
phpdocPhpdoc function comments to PHP.NET style documentation
Markdown is nearly complete. It's about 400 lines now.
HTML to Markdown
HTML to Markdown has just been started (one day's worth of coding) and is based on
to-markdown by Dom Christie. (It is a good example of the kind of things that can
Text to Markdown
Text to Markdown is experimental, and a way to particularly format text documents without using Markdown, but to be readily converted to Markdown. A line of only UPPERCASE WORDS converted to a # header, or only Camel Case words to ##, for example.
Text to HTML
Text to HTML is new and similar to above but output is HTML (like Text -> MD -> HTML without the MD).
GMLP comes with
index.php which will convert the main document
gmlp.txt as well
.php source files, and the Markdown and To-Markdown test files.
The default markup is called "Goofy Markup" and should be considered example only.
There is also a CLI executable,
gmlp, for command line converting and testing (see