When parsing HTML text it is sometimes advantageous to get the unencoded, un-normalized text and keeping any newlines and whitespaces found. This helps to keep a near exact copy of text within an element in certain situations, for example, multi-line chat/comment threads, etc.
Proposed solution
Implement Joup's wholeText() function introduced last year.
Example HTML
<div class="commentthread_comment_text" id="comment_content_2577697791650773248">
Me : Make a 2nd game ?
<br>Dev : Nah man , too much work.
<br>Me : So what's it gonna be ?
<br>Dev : REMASTER !!!!
<br>
and apply new GREL function wholeText()
value.parseHtml().select("div.commentthread_comment_text")[0].wholeText()
parse and output would stay consistent as original, keeping any new lines and whitespaces found
Me : Make a 2nd game ?
Dev : Nah man , too much work.
Me : So what's it gonna be ?
Dev : REMASTER !!!!
instead of current GREL function htmlText() that internally uses Jsoup text() where whitespace is normalized and trimmed and new lines are not kept to help disambiguate further in certain situations:
value.parseHtml().select("div.commentthread_comment_text")[0].htmlText()
which outputs as
Me : Make a 2nd game ? Dev : Nah man , too much work. Me : So what's it gonna be ? Dev : REMASTER !!!!
Alternatives considered
play chess? buy more Tesla stock?
Additional context
Docs: https://jsoup.org/apidocs/org/jsoup/nodes/Element.html#wholeText()
When parsing HTML text it is sometimes advantageous to get the unencoded, un-normalized text and keeping any newlines and whitespaces found. This helps to keep a near exact copy of text within an element in certain situations, for example, multi-line chat/comment threads, etc.
Proposed solution
Implement Joup's
wholeText()function introduced last year.Example HTML
and apply new GREL function
wholeText()parse and output would stay consistent as original, keeping any new lines and whitespaces found
instead of current GREL function
htmlText()that internally uses Jsouptext()where whitespace is normalized and trimmed and new lines are not kept to help disambiguate further in certain situations:which outputs as
Alternatives considered
play chess? buy more Tesla stock?
Additional context
Docs: https://jsoup.org/apidocs/org/jsoup/nodes/Element.html#wholeText()