Encoding heuristic should be applied to first non-ASCII byte sequence.
These tests use undeclared CP1252 since a plausible Latin-1 POD source that can be detected as UTF-8 is much harder to contrive. This also documents our expectation that 'smart quote' symbols in CP1252 will be rendered as control character if no =encoding was declared.
Thanks to Slaven Rezic for the report (RT#79238).
rather than to any byte sequence in the same chunk as the first non-ASCII byte
Fix a seekrut vs htmlversion typo
suppress spurious warnings
Improved handling of markup nesting for code
Skip decoding on strings with the utf8 flag set and add the 'parse_characters' option.
This option allows the user to supply POD source that has already been decoded to Perl's internal character format
Thanks to Ben Bullock for the test case. I did not use his solution, though. Instead, cache header text separately from body text, and use the text-only output in the TOC and its IDs. This also eliminates other elements from the TOC, such as the `<b>` elements seen in `t/xhtml10.t`, which I think makes for more consistent TOC output in general. Closes #31.
…tring rather than a byte string
Don't whine about non-ASCII bytes in code/comments
Add handle_code as overrideable method on Pod::Simple::XHTML
Don't turn on codes_in_verbatim by default for the XHTML formatter