html: another shot at security doc

Be clearer about the operation of the tokenizer and the parser (and their differences), and be explicit about the need for re-serialization when they are being used in security contexts. Change-Id: Ieb8f2a9d4806fb7a8849a15671667396e81c53b9 Reviewed-on: https://go-review.googlesource.com/c/net/+/484795 Auto-Submit: Roland Shoemaker <roland@golang.org> Reviewed-by: Damien Neil <dneil@google.com> Run-TryBot: Roland Shoemaker <roland@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
btasker · Apr 17, 2023 · eb1572c · eb1572c
1 parent 9001ca7
commit eb1572c
Showing 1 changed file with 14 additions and 8 deletions.
diff --git a/html/doc.go b/html/doc.go
@@ -99,14 +99,20 @@ Care should be taken when parsing and interpreting HTML, whether full documents
 or fragments, within the framework of the HTML specification, especially with
 regard to untrusted inputs.
 
-This package provides both a tokenizer and a parser. Only the parser constructs
-a DOM according to the HTML specification, resolving malformed and misplaced
-tags where appropriate. The tokenizer simply tokenizes the HTML presented to it,
-and as such does not resolve issues that may exist in the processed HTML,
-producing a literal interpretation of the input.
-
-If your use case requires semantically well-formed HTML, as defined by the
-WHATWG specification, the parser should be used rather than the tokenizer.
+This package provides both a tokenizer and a parser, which implement the
+tokenization, and tokenization and tree construction stages of the WHATWG HTML
+parsing specification respectively. While the tokenizer parses and normalizes
+individual HTML tokens, only the parser constructs the DOM tree from the
+tokenized HTML, as described in the tree construction stage of the
+specification, dynamically modifying or extending the docuemnt's DOM tree.
+
+If your use case requires semantically well-formed HTML documents, as defined by
+the WHATWG specification, the parser should be used rather than the tokenizer.
+
+In security contexts, if trust decisions are being made using the tokenized or
+parsed content, the input must be re-serialized (for instance by using Render or
+Token.String) in order for those trust decisions to hold, as the process of
+tokenization or parsing may alter the content.
 */
 package html // import "golang.org/x/net/html"