Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix separators in blocks #403

Merged
merged 13 commits into from
Feb 13, 2024
Merged

Fix separators in blocks #403

merged 13 commits into from
Feb 13, 2024

Conversation

lostenderman
Copy link
Collaborator

Closes #376.

Follows #377.

@Witiko
Copy link
Owner

Witiko commented Feb 9, 2024

Testfile testfiles/CommonMark_0.30/block_quotes/014.test:

Some commands produced unexpected outputs:

  • Command context --once --luatex --nonstopmode test.tex (template input) produced expected output.
  • Command context --once --luatex --nonstopmode test.tex (template verbatim) produced unexpected output with the following diff:
    *** /tmp/tmpeei3arit/test-expected-031.log
    --- /tmp/tmpeei3arit/test-actual-031.log
    ***************
    *** 2,5 ****
    --- 2,8 ----
      blockQuoteBegin
      emphasis: foo
      blockQuoteEnd
    + interblockSeparator
    + blockQuoteBegin
    + blockQuoteEnd
      END document

@lostenderman: On the first glance, this seems to be an issue with trailing newlines, i.e. one of the templates ends the input with a trailing newline whereas the other does not. This should make no difference to the parser.

@Witiko
Copy link
Owner

Witiko commented Feb 10, 2024

On the first glance, this seems to be an issue with trailing newlines, i.e. one of the templates ends the input with a trailing newline whereas the other does not.

Using the Docker image built by the CI for commit 0ae98fc from this PR, I checked that this is not an issue with trailing newlines:

$ docker run --rm -it 'ghcr.io/witiko/markdown:0ae98fc2-latest-no_docs'
$ markdown-cli <<< $'>\n> *foo*\n>  '  # Parse text from `testfiles/CommonMark_0.30/block_quotes/014.test`.
\markdownRendererDocumentBegin
\markdownRendererBlockQuoteBegin
\markdownRendererEmphasis{foo}
\markdownRendererBlockQuoteEnd \markdownRendererDocumentEnd

$ markdown-cli <<< $'>\n> *foo*\n>  \n'  # Add a trailing newline.
\markdownRendererDocumentBegin
\markdownRendererBlockQuoteBegin
\markdownRendererEmphasis{foo}
\markdownRendererBlockQuoteEnd \markdownRendererDocumentEnd

$ markdown-cli <<< $'>\n> *foo*\n>  \n\n'  # Add another trailing newline.
\markdownRendererDocumentBegin
\markdownRendererBlockQuoteBegin
\markdownRendererEmphasis{foo}
\markdownRendererBlockQuoteEnd \markdownRendererDocumentEnd

Instead, the issue seems to be with the way ConTeXt handles trailing spaces at the end of a line in verbatim input. In most TeX engines, the trailing spaces are removed when TeX reads input. Therefore, users cannot type hard line breaks using trailing spaces. This is a known shortcoming of using verbatim input to type markdown. In ConTeXt, trailing spaces are replaced by a pair of tabs, so that TeX does not remove them:

markdown/markdown.dtx

Lines 35290 to 35324 in 822abcc

% The \mref{startmarkdown} and \mref{stopmarkdown} macros are implemented using the
% \mref{markdownReadAndConvert} macro.
%
% In Knuth's \TeX, trailing spaces are removed very early on when a line is
% being put to the input buffer.~[@knuth86b, sec. 31]. According to
% @eijkhout92 [sec. 2.2], this is because ``these spaces are hard to see in
% an editor''. At the moment, there is no option to suppress this behavior in
% (Lua)\TeX, but \Hologo{ConTeXt} MkIV funnels all input through its own input
% handler. This makes it possible to suppress the removal of trailing spaces
% in \Hologo{ConTeXt} MkIV and therefore to insert hard line breaks into
% markdown text.
%
% \end{markdown}
% \begin{macrocode}
\startluacode
document.markdown_buffering = false
local function preserve_trailing_spaces(line)
if document.markdown_buffering then
line = line:gsub("[ \t][ \t]$", "\t\t")
end
return line
end
resolvers.installinputlinehandler(preserve_trailing_spaces)
\stopluacode
\begingroup
\catcode`\|=0%
\catcode`\\=12%
|gdef|startmarkdown{%
|ctxlua{document.markdown_buffering = true}%
|markdownReadAndConvert{\stopmarkdown}%
{|stopmarkdown}}%
|gdef|stopmarkdown{%
|ctxlua{document.markdown_buffering = false}%
|markdownEnd}%
|endgroup

By replacing the trailing spaces in the last line of the markdown text from testfiles/CommonMark_0.30/block_quotes/014.test with two tabs, I can reproduce the error from the CI (see #403 (comment)):

$ docker run --rm -it 'ghcr.io/witiko/markdown:0ae98fc2-latest-no_docs'
$ markdown-cli <<< $'>\n> *foo*\n>  '  # Parse text from `testfiles/CommonMark_0.30/block_quotes/014.test`.
\markdownRendererDocumentBegin
\markdownRendererBlockQuoteBegin
\markdownRendererEmphasis{foo}
\markdownRendererBlockQuoteEnd \markdownRendererDocumentEnd

$ markdown-cli <<< $'>\n> *foo*\n>\t\t'  # Replace trailing spaces with two tabs.
\markdownRendererDocumentBegin
\markdownRendererBlockQuoteBegin
\markdownRendererEmphasis{foo}
\markdownRendererBlockQuoteEnd \markdownRendererInterblockSeparator
{}\markdownRendererBlockQuoteBegin

\markdownRendererBlockQuoteEnd \markdownRendererDocumentEnd

Can you please update the parser, so that the tabs make no difference? I understand that this is a corner case but we want the parser to be at least somewhat resilient to fuzzy input (in the absence of proper fuzz-testing).

@Witiko Witiko marked this pull request as ready for review February 13, 2024 22:05
@Witiko Witiko merged commit 204d213 into Witiko:main Feb 13, 2024
10 of 11 checks passed
@Witiko Witiko deleted the fix-separators branch February 13, 2024 23:08
Witiko added a commit that referenced this pull request Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Correctly produce paragraph separators inside block-level elements
2 participants