Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line-by-line conversion to HTML #59

Closed
krlmlr opened this issue Jan 13, 2019 · 5 comments
Closed

Line-by-line conversion to HTML #59

krlmlr opened this issue Jan 13, 2019 · 5 comments
Labels
Milestone

Comments

@krlmlr
Copy link

@krlmlr krlmlr commented Jan 13, 2019

In the tibble 2.0.1 blog post, converting each line of output separately helped work around how the blackfriday markdown -> HTML converter treats newlines embedded in HTML: tidyverse/tidyverse.org@32d6829. Perhaps conversion to HTML should treat newlines separately? I'm not sure if and where this change should happen, though.

In the reprex below, applying strsplit() just seems to work even if the newline is inside a block. I don't understand why.

options(crayon.enabled = TRUE)
text <- crayon::blue("a\nb")
fansi::sgr_to_html(text)
#> [1] "<span style='color: #0000BB;'>a\nb</span>"

text_split <- unlist(strsplit(text, "\n", fixed = TRUE))
fansi::sgr_to_html(text_split)
#> [1] "<span style='color: #0000BB;'>a</span>"
#> [2] "<span style='color: #0000BB;'>b</span>"

Created on 2019-01-13 by the reprex package (v0.2.1)

@brodieG
Copy link
Owner

@brodieG brodieG commented Jan 13, 2019

Interesting, I'll need to investigate to see what's going on, because clearly there are still newlines at the end of the lines. It may be blackfriday only replaces internal ones, e.g. because it does something equivalent to readLines (in whatever language it's written in) that creates vectors of line strings without the terminating newline, and then does the equivalent of writeLines which puts them back in. Probably the main thing to look at is where the internal newlines were coming from.

I believe the above works because sgr_to_html closes the span elements at the end of each CHARSXP to ensure it produces valid HTML (going from memory here, could be wrong). It was easier to do it this way than track if there were any open HTML tags at the beginning or end of each line.

It seems though this does not need to be resolved immediately. Please let me know if you're looking for some changes in the near term, otherwise this will probably sit here for a while. I also don't think (not sure) that fansi should take too much initiative in splitting by newlines, as in many cases you want to preserve the newlines so things render correctly inside PRE blocks. Not sure. Another possibility is that this could be done as part of the hook scripts as you effectively did (i.e. the built in fansi hook script could do this).

@brodieG brodieG added the question label Jan 13, 2019
@krlmlr
Copy link
Author

@krlmlr krlmlr commented Jan 13, 2019

Thanks. No need to rush here. I'm also not sure where this change belongs. Doesn't seem like the primary concern of sgr_to_html(), on the other hand it would be great if hooks just worked out of the box. Maybe a simple wrapper that calls strsplit() and then map_chr(..., paste, collapse = "\n") ?

I agree that it might be worth to look at the origin of the internal newlines too.

@brodieG brodieG added this to the 0.4.1 milestone Jan 19, 2019
@brodieG
Copy link
Owner

@brodieG brodieG commented Jan 4, 2020

Note to self, related to: tidyverse/tidyverse.org#266

@brodieG
Copy link
Owner

@brodieG brodieG commented Jan 4, 2020

Interesting, I'll need to investigate to see what's going on, because clearly there are still newlines at the end of the lines. It may be blackfriday only replaces internal ones, e.g. because it does something equivalent to readLines (in whatever language it's written in) that creates vectors of line strings without the terminating newline, and then does the equivalent of writeLines which puts them back in. Probably the main thing to look at is where the internal newlines were coming from.

From additional investigation what seems to be happening is that blackfriday replaces newlines that are inside "SPAN" tags (and maybe others too?). sgr_to_html allows newlines by default inside SPANs, that is, if it is applying a particular SGR style and a new line is encountered, that style remains unchanged as that is the correct semantic interpretation and produces more compact strings. So outputs such as:

<span ...>line 1\n
line2</span>

are perfectly okay as far as fansi is concerned, and in fact, better than:

<span ...>line 1</span>\n
<span ...>line 2</span>

If we want to produce the latter output we can split by \n as @krlmr notes as fansi automatically closes and re-opens SPANs across STRSXP elements (but not within a CHRSXP). I think this was done to ensure no vector element produces invalid HTML (or lazyness, or some combination of the two). We'll need to ensure this behavior remains.

@brodieG brodieG closed this in 7ccb892 Jan 9, 2020
@brodieG
Copy link
Owner

@brodieG brodieG commented Jan 9, 2020

Just for completeness, the solution to this particular issue is to set the split.nl parameter to set_knit_hooks to TRUE which will internally do roughly what @krlmlr work-around above does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.