Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEXT-216: Add HTML5 Entities #312

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions src/main/java/org/apache/commons/text/StringEscapeUtils.java
Original file line number Diff line number Diff line change
Expand Up @@ -328,6 +328,20 @@ public int translate(final CharSequence input, final int index, final Writer wri
new LookupTranslator(EntityArrays.ISO8859_1_ESCAPE),
new LookupTranslator(EntityArrays.HTML40_EXTENDED_ESCAPE)
);
/**
* Translator object for escaping HTML version 5.0.
*
* While {@link #escapeHtml5(String)} is the expected method of use, this
* object allows the HTML escaping functionality to be used
* as the foundation for a custom translator.
*/
public static final CharSequenceTranslator ESCAPE_HTML5 =
new AggregateTranslator(
new LookupTranslator(EntityArrays.HTML50_EXTENDED_ESCAPE),
new LookupTranslator(EntityArrays.HTML40_EXTENDED_ESCAPE),
new LookupTranslator(EntityArrays.ISO8859_1_ESCAPE),
new LookupTranslator(EntityArrays.BASIC_ESCAPE)
);
/**
* Translator object for escaping individual Comma Separated Values.
*
Expand Down Expand Up @@ -445,6 +459,22 @@ public int translate(final CharSequence input, final int index, final Writer wri
new NumericEntityUnescaper()
);

/**
* Translator object for unescaping escaped HTML 5.0.
*
* While {@link #unescapeHtml5(String)} is the expected method of use, this
* object allows the HTML unescaping functionality to be used
* as the foundation for a custom translator.
*/
public static final CharSequenceTranslator UNESCAPE_HTML5 =
new AggregateTranslator(
new LookupTranslator(EntityArrays.HTML50_EXTENDED_UNESCAPE),
new LookupTranslator(EntityArrays.HTML40_EXTENDED_UNESCAPE),
new LookupTranslator(EntityArrays.ISO8859_1_UNESCAPE),
new LookupTranslator(EntityArrays.BASIC_UNESCAPE),
new NumericEntityUnescaper()
);

/**
* Translator object for unescaping escaped XML.
*
Expand Down Expand Up @@ -588,6 +618,22 @@ public static final String escapeHtml4(final String input) {
return ESCAPE_HTML4.translate(input);
}

// HTML and XML
//--------------------------------------------------------------------------
/**
* Escapes the characters in a {@code String} using HTML entities.
*
* <p>Supports all known HTML 5.0 entities.</p>
*
* @param input the {@code String} to escape, may be null
* @return a new escaped {@code String}, {@code null} if null string input
*
* @see <a href="https://html.spec.whatwg.org/multipage/named-characters.html">HTML 5.0 Entities</a>
*/
public static final String escapeHtml5(final String input) {
return ESCAPE_HTML5.translate(input);
}

// Java and JavaScript
//--------------------------------------------------------------------------
/**
Expand Down Expand Up @@ -792,6 +838,18 @@ public static final String unescapeHtml4(final String input) {
return UNESCAPE_HTML4.translate(input);
}

/**
* Unescapes a string containing entity escapes to a string
* containing the actual Unicode characters corresponding to the
* escapes. Supports all known HTML 5.0 entities.
*
* @param input the {@code String} to unescape, may be null
* @return a new unescaped {@code String}, {@code null} if null string input
*/
public static final String unescapeHtml5(final String input) {
return UNESCAPE_HTML5.translate(input);
}

/**
* Unescapes any Java literals found in the {@code String}.
* For example, it will turn a sequence of {@code '\'} and
Expand Down
Loading