Optimize LazyHTML.Tree.to_html with large trees #18

ypconstante · 2025-08-07T01:19:54Z

Today LazyHTML.Tree.to_html is really slow on large trees. Replacing the binary accumulator with a list makes calls with large tree significantly faster, and speeds up a little calls with smaller trees.
This does increase a little the memory usage, but considering the performance improvement seems to be worth.

Elixir 1.18.4
Erlang 28.0.1
JIT enabled: true

...

##### With input big #####
Name                                ips        average  deviation         median         99th %
to_html (io data)                 83.02       12.05 ms     ±7.11%       11.88 ms       16.49 ms
to_html (io data reverse)         76.58       13.06 ms    ±14.34%       12.11 ms       17.62 ms
to_html (main)                    14.72       67.95 ms    ±30.57%       75.58 ms       90.30 ms

Comparison: 
to_html (io data)                 83.02
to_html (io data reverse)         76.58 - 1.08x slower +1.01 ms
to_html (main)                    14.72 - 5.64x slower +55.90 ms

Memory usage statistics:

Name                         Memory usage
to_html (io data)                 5.96 MB
to_html (io data reverse)         8.55 MB - 1.43x memory usage +2.58 MB
to_html (main)                    6.73 MB - 1.13x memory usage +0.77 MB

**All measurements for memory usage were the same**

##### With input medium #####
Name                                ips        average  deviation         median         99th %
to_html (io data)                283.62        3.53 ms     ±2.98%        3.50 ms        3.86 ms
to_html (io data reverse)        264.60        3.78 ms    ±15.05%        3.69 ms        7.21 ms
to_html (main)                   232.41        4.30 ms    ±39.28%        4.02 ms       14.17 ms

Comparison: 
to_html (io data)                283.62
to_html (io data reverse)        264.60 - 1.07x slower +0.25 ms
to_html (main)                   232.41 - 1.22x slower +0.78 ms

Memory usage statistics:

Name                         Memory usage
to_html (io data)                 2.05 MB
to_html (io data reverse)         2.63 MB - 1.28x memory usage +0.57 MB
to_html (main)                    2.31 MB - 1.12x memory usage +0.25 MB

**All measurements for memory usage were the same**

##### With input small #####
Name                                ips        average  deviation         median         99th %
to_html (io data reverse)        1.35 K      740.29 μs     ±5.85%      726.38 μs      917.88 μs
to_html (io data)                1.32 K      758.74 μs     ±7.34%      736.79 μs      971.52 μs
to_html (main)                   1.25 K      800.30 μs    ±17.72%      738.42 μs     1414.52 μs

Comparison: 
to_html (io data reverse)        1.35 K
to_html (io data)                1.32 K - 1.02x slower +18.45 μs
to_html (main)                   1.25 K - 1.08x slower +60.01 μs

Memory usage statistics:

Name                         Memory usage
to_html (io data reverse)       524.79 KB
to_html (io data)               456.34 KB - 0.87x memory usage -68.45313 KB
to_html (main)                  517.45 KB - 0.99x memory usage -7.34375 KB

tag = "main"

read_file = fn name ->
  __ENV__.file
  |> Path.dirname()
  |> Path.join(name)
  |> File.read!()
  |> LazyHTML.from_document()
  |> LazyHTML.to_tree()
end

inputs = %{
  "big" => read_file.("big.html"),
  "medium" => read_file.("medium.html"),
  "small" => read_file.("small.html")
}

Benchee.run(
  %{
    "to_html" => &LazyHTML.Tree.to_html/1
  },
  inputs: inputs,
  pre_check: true,
  time: 10,
  memory_time: 2,
  save: [path: "benchs/results/to-html-#{tag}", tag: tag]
)

Benchee.report(load: "benchs/results/to-html-*")

josevalim · 2025-08-07T04:21:29Z

In theory the binary should be faster and use fewer memory since we are reusing the match context:

ERL_COMPILER_OPTIONS=bin_opt_info mix compile --force

Can you please share the benchmark script you are using? You can also try this diff and see if it changes anything.

diff --git a/lib/lazy_html.ex b/lib/lazy_html.ex
index 358b37e..9e5f8c3 100644
--- a/lib/lazy_html.ex
+++ b/lib/lazy_html.ex
@@ -498,7 +498,7 @@ defmodule LazyHTML do
   """
   @spec html_escape(String.t()) :: String.t()
   def html_escape(string) when is_binary(string) do
-    LazyHTML.Tree.append_escaped(string, "")
+    LazyHTML.Tree.append_escaped(string)
   end

   # Access
diff --git a/lib/lazy_html/tree.ex b/lib/lazy_html/tree.ex
index 29edc32..66e9e6a 100644
--- a/lib/lazy_html/tree.ex
+++ b/lib/lazy_html/tree.ex
@@ -134,7 +134,7 @@ defmodule LazyHTML.Tree do
   # [1]: https://github.com/phoenixframework/phoenix_html/blob/v4.2.1/lib/phoenix_html/engine.ex#L29-L35

   @doc false
-  def append_escaped(text, html) do
+  def append_escaped(text, html \\ "") do
     append_escaped(text, text, 0, 0, html)
   end

josevalim · 2025-08-07T04:22:40Z

Also, if you are using iodata, you don't need Enum.reverse, you can write operations like this [html, " ", attrs, "/ >"]. It should reduce your memory usage and improve performance.

ypconstante · 2025-08-07T04:26:45Z

Benchee script is in the PR description, html files are the ones used by Floki benchmark.

On main branch - ERL_COMPILER_OPTIONS=bin_opt_info mix compile --force

ERL_COMPILER_OPTIONS=bin_opt_info mix compile --force
Compiling 3 files (.ex)
     warning: OPTIMIZED: match context reused
     │
 111 │        do: append_text(rest, text, whitespace_size + 1, ctx, html)
     │        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     │
     └─ lib/lazy_html/tree.ex:111

     warning: OPTIMIZED: match context reused
     │
 123 │        do: append_escaped(rest, text, 0, whitespace_size, html)
     │        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     │
     └─ lib/lazy_html/tree.ex:123

     warning: OPTIMIZED: match context reused
     │
 164 │       append_escaped(rest, text, offset + size + 1, 0, html)
     │       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     │
     └─ lib/lazy_html/tree.ex:164

     warning: OPTIMIZED: match context reused
     │
 164 │       append_escaped(rest, text, offset + size + 1, 0, html)
     │       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     │
     └─ lib/lazy_html/tree.ex:164

     warning: OPTIMIZED: match context reused
     │
 164 │       append_escaped(rest, text, offset + size + 1, 0, html)
     │       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     │
     └─ lib/lazy_html/tree.ex:164

     warning: OPTIMIZED: match context reused
     │
 164 │       append_escaped(rest, text, offset + size + 1, 0, html)
     │       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     │
     └─ lib/lazy_html/tree.ex:164

     warning: OPTIMIZED: match context reused
     │
 164 │       append_escaped(rest, text, offset + size + 1, 0, html)
     │       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     │
     └─ lib/lazy_html/tree.ex:164

     warning: OPTIMIZED: match context reused
     │
 169 │     append_escaped(rest, text, offset, size + 1, html)
     │     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     │
     └─ lib/lazy_html/tree.ex:169

Same output with the patch

jonatanklosko · 2025-08-07T06:49:23Z

This is actually crazy.

When adding the original implementation I benchmarked a few versions and the current one was clearly superior, with runtime being close or better and 5 times lower memory usage.

However, this is no longer the case on main (as indicated by @ypconstante results). I tracked it down and it's a regression from #14. Now, if I make this change, it restores the previous behaviour:

-  def append_escaped(text, html) do
+  defp append_escaped(text, html) do

jonatanklosko · 2025-08-07T07:02:45Z

In theory the binary should be faster and use fewer memory since we are reusing the match context

Just to clarify, it's not about match context (which has to do with recursive traversal, and that's the same in both implementations), it's about the runtime optimising binary appends. I don't think bin_opt_info will tell us any difference, because in the iodata implementation we are simply not building a binary :D

josevalim · 2025-08-07T07:11:24Z

Just to clarify, it's not about match context (which has to do with recursive traversal, and that's the same in both implementations), it's about the runtime optimising binary appends

In my mind, "runtime optimizing binary appends" is the match context handling. If the match context was not optimized, then we would not see this optimization. But I found it weird nothing was warned about no match context being created and anothing about the append_attrs or to_html functions.

josevalim · 2025-08-07T07:13:54Z

Nah, you are right, I am getting two different optimizations confused. Apologies. Ship it.

jonatanklosko · 2025-08-07T07:43:24Z

Closing in favour of #19.

@ypconstante thank you very much for the PR, otherwise we wouldn't spot the regression!

Optimize LazyHTML.Tree.to_html on large files

81d1b29

IO data without reverse

8e464bf

jonatanklosko mentioned this pull request Aug 7, 2025

Fix regression in LazyHTML.Tree.to_html/1 memory usage #19

Merged

jonatanklosko closed this Aug 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize LazyHTML.Tree.to_html with large trees #18

Optimize LazyHTML.Tree.to_html with large trees #18

Uh oh!

ypconstante commented Aug 7, 2025 •

edited

Loading

Uh oh!

josevalim commented Aug 7, 2025

Uh oh!

josevalim commented Aug 7, 2025

Uh oh!

ypconstante commented Aug 7, 2025 •

edited

Loading

Uh oh!

jonatanklosko commented Aug 7, 2025 •

edited

Loading

Uh oh!

jonatanklosko commented Aug 7, 2025

Uh oh!

josevalim commented Aug 7, 2025

Uh oh!

josevalim commented Aug 7, 2025

Uh oh!

jonatanklosko commented Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Optimize LazyHTML.Tree.to_html with large trees #18

Optimize LazyHTML.Tree.to_html with large trees #18

Uh oh!

Conversation

ypconstante commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josevalim commented Aug 7, 2025

Uh oh!

josevalim commented Aug 7, 2025

Uh oh!

ypconstante commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonatanklosko commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonatanklosko commented Aug 7, 2025

Uh oh!

josevalim commented Aug 7, 2025

Uh oh!

josevalim commented Aug 7, 2025

Uh oh!

jonatanklosko commented Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ypconstante commented Aug 7, 2025 •

edited

Loading

ypconstante commented Aug 7, 2025 •

edited

Loading

jonatanklosko commented Aug 7, 2025 •

edited

Loading