better automatic word-break behavior #1949

kareila · 2017-01-24T20:54:12Z

Now the HTML cleaner will prefer punctuation marks for word breaks, if available, and has been told that is not a thing. Includes many tests for the new behavior.

Fixes #1948.

Use of uninitialized value in pattern match (m//) .. cgi-bin/LJ/CleanHTML.pm line 1419. Use of uninitialized value in concatenation (.) or string .. cgi-bin/DW/Controller/Entry.pm line 1496.

Should be treated the same as the br tag. This fixes the reported problem where if someone used in an entry, it would get autoclosed with , which is invalid markup. Now it will get changed to instead.

In most areas (entries, subjects, comments) the HTML cleaner will insert a wbr tag in any "word" (text unbroken by whitespace) longer than 40 characters, exactly at the 40th character point, and for every 40 characters thereafter, if autoformatting is active. This behavior could be improved, so let's try checking each 40 characters for punctuation characters, and if found, insert the word break at that point instead.

1. The \B match failed if the word ended with punctuation, so just check to see if we're at the end of the word after the match succeeds. 2. Make sure the regex finds the LAST punctuation character in the string. Unexpectedly, it was matching the first one instead. 3. Don't do a breakpoint shift if the last punctuation character in the string is the first character in the string - a common edge case resulting in a premature word break. 4. Refactor printing logic to remove unneeded else case from conditional. Also: more tests! Tests are great.

rahaeli · 2017-01-24T20:55:59Z

ooh yay that's been a problem for like 15 years, it just happens rarely enough that nobody ever bothered fixing it...

kareila · 2017-01-24T20:58:37Z

My new motto: no problem too rare, no fix too over-engineered.

Actually, that's a terrible motto, never mind 😜

rahaeli · 2017-01-24T20:59:20Z

yeah but let's face it it's TOTALLY TRUE <3

zorkian · 2017-01-25T21:12:49Z

t/clean-event.t

+is( $orig_post, $clean_post, "Choose last punctuation in string" );
+
+$orig_post  = qq{"This_is_a_test_of_the_emergency_word_break_system."};
+$clean_post = qq{"This_is_a_test_of_the_emergency_word_br<wbr />eak_system."};


This is supposed to break mid-word as opposed to post-_?

I used the underscore character in this test because it's a special character included in \w, and I was making sure the word break wasn't inserted after the initial quotation mark.

(Although if you don't want underscore treated differently, that's understandable, but I couldn't think of a simple way to overcome that exception.)

Nope this is fine! I was just curious about the thinking.

kareila added 4 commits January 24, 2017 08:20

fix undefined string warnings

188786a

Use of uninitialized value in pattern match (m//) .. cgi-bin/LJ/CleanHTML.pm line 1419. Use of uninitialized value in concatenation (.) or string .. cgi-bin/DW/Controller/Entry.pm line 1496.

[dreamwidth#1948] add wbr tag to list of void elements

378067b

Should be treated the same as the br tag. This fixes the reported problem where if someone used in an entry, it would get autoclosed with , which is invalid markup. Now it will get changed to instead.

robodw-issues added the status: untriaged label Jan 24, 2017

zorkian reviewed Jan 25, 2017

View reviewed changes

zorkian merged commit 66e9cc1 into dreamwidth:develop Feb 1, 2017

kareila deleted the 1948-wbr-fix branch February 1, 2017 14:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better automatic word-break behavior #1949

better automatic word-break behavior #1949

kareila commented Jan 24, 2017

rahaeli commented Jan 24, 2017 via email

kareila commented Jan 24, 2017

rahaeli commented Jan 24, 2017 via email

zorkian Jan 25, 2017

kareila Jan 25, 2017

kareila Jan 25, 2017

zorkian Feb 1, 2017

better automatic word-break behavior #1949

better automatic word-break behavior #1949

Conversation

kareila commented Jan 24, 2017

rahaeli commented Jan 24, 2017 via email

kareila commented Jan 24, 2017

rahaeli commented Jan 24, 2017 via email

zorkian Jan 25, 2017

Choose a reason for hiding this comment

kareila Jan 25, 2017

Choose a reason for hiding this comment

kareila Jan 25, 2017

Choose a reason for hiding this comment

zorkian Feb 1, 2017

Choose a reason for hiding this comment