-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
better automatic word-break behavior #1949
Conversation
Use of uninitialized value in pattern match (m//) .. cgi-bin/LJ/CleanHTML.pm line 1419. Use of uninitialized value in concatenation (.) or string .. cgi-bin/DW/Controller/Entry.pm line 1496.
Should be treated the same as the br tag. This fixes the reported problem where if someone used <wbr> in an entry, it would get autoclosed with </wbr>, which is invalid markup. Now it will get changed to <wbr /> instead.
In most areas (entries, subjects, comments) the HTML cleaner will insert a wbr tag in any "word" (text unbroken by whitespace) longer than 40 characters, exactly at the 40th character point, and for every 40 characters thereafter, if autoformatting is active. This behavior could be improved, so let's try checking each 40 characters for punctuation characters, and if found, insert the word break at that point instead.
1. The \B match failed if the word ended with punctuation, so just check to see if we're at the end of the word after the match succeeds. 2. Make sure the regex finds the LAST punctuation character in the string. Unexpectedly, it was matching the first one instead. 3. Don't do a breakpoint shift if the last punctuation character in the string is the first character in the string - a common edge case resulting in a premature word break. 4. Refactor printing logic to remove unneeded else case from conditional. Also: more tests! Tests are great.
ooh yay that's been a problem for like 15 years, it just happens rarely enough that nobody ever bothered fixing it...
|
My new motto: no problem too rare, no fix too over-engineered. Actually, that's a terrible motto, never mind 😜 |
yeah but let's face it it's TOTALLY TRUE
<3
|
is( $orig_post, $clean_post, "Choose last punctuation in string" ); | ||
|
||
$orig_post = qq{"This_is_a_test_of_the_emergency_word_break_system."}; | ||
$clean_post = qq{"This_is_a_test_of_the_emergency_word_br<wbr />eak_system."}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is supposed to break mid-word as opposed to post-_
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the underscore character in this test because it's a special character included in \w, and I was making sure the word break wasn't inserted after the initial quotation mark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Although if you don't want underscore treated differently, that's understandable, but I couldn't think of a simple way to overcome that exception.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope this is fine! I was just curious about the thinking.
Now the HTML cleaner will prefer punctuation marks for word breaks, if available, and has been told that </wbr> is not a thing. Includes many tests for the new behavior.
Fixes #1948.