Skip to content

Markdown Formatter

Vladimir Schneider edited this page Jan 27, 2020 · 19 revisions

Formatter renders the AST as markdown with various formatting options to clean up and make the source consistent. This also comes with an API to allow extensions to provide formatting options and handle rendering of markdown for custom nodes.

ℹ️ in versions prior to 0.60.0 formatter functionality was implemented in flexmark-formatter module and required an additional dependency.

The formatter module also implements an API for implementing markdown document translation to another language described in Translation Helper API

The Formatter class is a renderer that outputs markdown and formats it to specified options. Use it in place of HtmlRenderer to get formatted markdown. It can also be used to convert indentations from one ParserEmulationProfile to another:

package com.vladsch.flexmark.samples;

import com.vladsch.flexmark.formatter.Formatter;
import com.vladsch.flexmark.parser.Parser;
import com.vladsch.flexmark.profile.pegdown.Extensions;
import com.vladsch.flexmark.profile.pegdown.PegdownOptionsAdapter;
import com.vladsch.flexmark.util.data.DataHolder;
import com.vladsch.flexmark.util.data.MutableDataSet;

public class PegdownToCommonMark {
     final private static DataHolder OPTIONS = PegdownOptionsAdapter.flexmarkOptions(
            Extensions.ALL
    );

    static final MutableDataSet FORMAT_OPTIONS = new MutableDataSet();
    static {
        // copy extensions from Pegdown compatible to Formatting, but leave the rest default
        FORMAT_OPTIONS.set(Parser.EXTENSIONS, Parser.EXTENSIONS.get(OPTIONS));
    }

    static final Parser PARSER = Parser.builder(OPTIONS).build();
    static final Formatter RENDERER = Formatter.builder(FORMAT_OPTIONS).build();

    // use the PARSER to parse pegdown indentation rules and RENDERER to render CommonMark

}

will convert pegdown 4 space indent to CommonMark list item text column indent.

Pegdown Input
#Heading
-----
paragraph text 
lazy continuation

* list item
    > block quote
    lazy continuation

~~~info
        with uneven indent
           with uneven indent
     indented code
~~~ 

        with uneven indent
           with uneven indent
     indented code
1. numbered item 1   
1. numbered item 2   
1. numbered item 3   
    - bullet item 1   
    - bullet item 2   
    - bullet item 3   
        1. numbered sub-item 1   
        1. numbered sub-item 2   
        1. numbered sub-item 3   
    
    ~~~info
            with uneven indent
               with uneven indent
         indented code
    ~~~ 
    
            with uneven indent
               with uneven indent
         indented code
CommonMark Output

Converted to CommonMark indents, ATX heading spaces added, blank lines added, fenced and indented code indents minimized:

# Heading

-----

paragraph text
lazy continuation

* list item

  > block quote
  > lazy continuation

~~~info
   with uneven indent
      with uneven indent
indented code
~~~

       with uneven indent
          with uneven indent
    indented code

1. numbered item 1
2. numbered item 2
3. numbered item 3
   - bullet item 1
   - bullet item 2
   - bullet item 3
     1. numbered sub-item 1
     2. numbered sub-item 2
     3. numbered sub-item 3

   ~~~info
      with uneven indent
         with uneven indent
   indented code
   ~~~

          with uneven indent
             with uneven indent
       indented code

Get the full sample source PegdownToCommonMark.java.

Options

These are options available in the Formatter class. Extensions which handle formatting of their custom node can and do provide their own formatting options.

These are defined in the Formatter class:

  • FORMATTER_EMULATION_PROFILE: default Parser.PARSER_EMULATION_PROFILE, emulation profile to use for formatting. Can be used to change indenting rules from the ones used by the parser.

  • MAX_BLANK_LINES: default 2, maximum number of blank lines to keep in the file

  • MAX_TRAILING_BLANK_LINES: default 1, maximum trailing blank lines in file

  • SPACE_AFTER_ATX_MARKER: default DiscretionaryText.ADD, handling of space after atx marker

  • SETEXT_HEADER_EQUALIZE_MARKER: default true, when true equalizes the setext marker to header text length

  • ATX_HEADER_TRAILING_MARKER: default EqualizeTrailingMarker.AS_IS, trailing atx # markers:

    • AS_IS: do nothing
    • ADD: add the same number of # as opening marker
    • EQUALIZE: add the same number of # as opening marker
    • REMOVE: remove
  • THEMATIC_BREAK: default (String)null, string to use for thematic break. null means leave as is.

  • BLOCK_QUOTE_BLANK_LINES: default true, add blank lines around block quotes

  • BLOCK_QUOTE_MARKERS: default BlockQuoteMarker.ADD_COMPACT_WITH_SPACE

    • AS_IS: no change, first line marker is propagated to full block quote content
    • ADD_COMPACT: use > and >>..>> for nested block quotes
    • ADD_COMPACT_WITH_SPACE: use > and >>..>> for nested block quotes
    • ADD_SPACED: use > and > > ..> > for nested block quotes
  • INDENTED_CODE_MINIMIZE_INDENT: default true, when true will remove extra indent common to all content lines

  • FENCED_CODE_MINIMIZE_INDENT: default true, when true will remove extra indent common to all content lines

  • FENCED_CODE_MATCH_CLOSING_MARKER: default true, when true opening marker will be used for closing marker

  • FENCED_CODE_SPACE_BEFORE_INFO: default false, when true a space will be added between open marker and info string

  • FENCED_CODE_MARKER_LENGTH: default 3, minimum code fence marker length

  • FENCED_CODE_MARKER_TYPE: default CodeFenceMarker.ANY,

    • ANY: no change, whatever is used
    • BACK_TICK: change to back ticks
    • TILDE: change to ~
  • LIST_ADD_BLANK_LINE_BEFORE: default false, when true will add a blank line before the first list item if it follows a paragraph

  • LIST_RENUMBER_ITEMS: default true, when true renumbers the ordered list items

  • LIST_BULLET_MARKER: default ListBulletMarker.ANY,

    • ANY: no change
    • DASH: change all to -
    • ASTERISK: change all to *
    • PLUS: change all to +
  • LIST_NUMBERED_MARKER: default ListNumberedMarker.ANY,

    • ANY: no change
    • DOT: change all to .
    • PAREN: change all to )
  • LIST_SPACING: default ListSpacing.AS_IS,

    • AS_IS: no change
    • LOOSEN: loose if has loose item
    • TIGHTEN: tight if has tight item
    • LOOSE: always loose
    • TIGHT: always tight
  • REFERENCE_PLACEMENT: default ElementPlacement.AS_IS,

    • AS_IS: no change
    • DOCUMENT_TOP: put all references at top of document
    • GROUP_WITH_FIRST: group all with first reference
    • GROUP_WITH_LAST: group all with last reference
    • DOCUMENT_BOTTOM: document bottom
  • REFERENCE_SORT: default ElementPlacementSort.AS_IS, only applies if REFERENCE_PLACEMENT is not AS_IS

    • AS_IS: no change
    • SORT: sort in alphabetical order by reference text
    • SORT_UNUSED_LAST: sort in alphabetical order by reference text, put unreferenced ones last
  • KEEP_IMAGE_LINKS_AT_START: default false, when true image links will always be wrapped to be the first non space on the line

  • KEEP_EXPLICIT_LINKS_AT_START: default false, when true image links will always be wrapped to be the first non space on the line

  • KEEP_HARD_LINE_BREAKS: default true, when false hard line breaks are eliminated along with their EOL.

  • KEEP_SOFT_LINE_BREAKS: default true, when false soft line breaks are eliminated. Allows to format markdown for processors that treat soft line breaks as hard line breaks.

  • APPEND_TRANSFERRED_REFERENCES: default false, when true will append transferred references to the bottom of the document being formatted.

  • SKIP_FENCED_CODE, default false. When true will convert fenced code to indented code in generated markdown.

  • SKIP_CHAR_ESCAPE, default false. When true will not escape special characters.

  • RIGHT_MARGIN, default 0, since 0.60.0, if >0 then text will be wrapped to given margin.

  • APPLY_SPECIAL_LEAD_IN_HANDLERS, default true, since 0.60.0, when true will escape special lead-in characters which wrap to beginning of line and un-escape any which wrap from beginning of line. Used to prevent special characters inside paragraph body from starting a new element when wrapped to beginning of line.

Source Code Samples

Best source of sample code is existing extensions that implement custom node formatting and the formatter module itself:

  • flexmark-ext-abbreviation
  • flexmark-ext-definition
  • flexmark-ext-footnotes
  • flexmark-ext-jekyll-front-matter
  • flexmark-ext-tables
  • flexmark-formatter