Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better automatic word-break behavior #1949

Merged
merged 4 commits into from
Feb 1, 2017

Conversation

kareila
Copy link
Member

@kareila kareila commented Jan 24, 2017

Now the HTML cleaner will prefer punctuation marks for word breaks, if available, and has been told that </wbr> is not a thing. Includes many tests for the new behavior.

Fixes #1948.

Use of uninitialized value in pattern match (m//) .. cgi-bin/LJ/CleanHTML.pm line 1419.

Use of uninitialized value in concatenation (.) or string .. cgi-bin/DW/Controller/Entry.pm line 1496.
Should be treated the same as the br tag.

This fixes the reported problem where if someone
used <wbr> in an entry, it would get autoclosed
with </wbr>, which is invalid markup.  Now it
will get changed to <wbr /> instead.
In most areas (entries, subjects, comments) the HTML cleaner will
insert a wbr tag in any "word" (text unbroken by whitespace)
longer than 40 characters, exactly at the 40th character point,
and for every 40 characters thereafter, if autoformatting is active.

This behavior could be improved, so let's try checking each
40 characters for punctuation characters, and if found, insert
the word break at that point instead.
1. The \B match failed if the word ended with punctuation, so just check to
   see if we're at the end of the word after the match succeeds.

2. Make sure the regex finds the LAST punctuation character in the string.
   Unexpectedly, it was matching the first one instead.

3. Don't do a breakpoint shift if the last punctuation character in the string
   is the first character in the string - a common edge case resulting in a
   premature word break.

4. Refactor printing logic to remove unneeded else case from conditional.

Also: more tests! Tests are great.
@rahaeli
Copy link
Contributor

rahaeli commented Jan 24, 2017 via email

@kareila
Copy link
Member Author

kareila commented Jan 24, 2017

My new motto: no problem too rare, no fix too over-engineered.

Actually, that's a terrible motto, never mind 😜

@rahaeli
Copy link
Contributor

rahaeli commented Jan 24, 2017 via email

is( $orig_post, $clean_post, "Choose last punctuation in string" );

$orig_post = qq{"This_is_a_test_of_the_emergency_word_break_system."};
$clean_post = qq{"This_is_a_test_of_the_emergency_word_br<wbr />eak_system."};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supposed to break mid-word as opposed to post-_?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the underscore character in this test because it's a special character included in \w, and I was making sure the word break wasn't inserted after the initial quotation mark.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Although if you don't want underscore treated differently, that's understandable, but I couldn't think of a simple way to overcome that exception.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope this is fine! I was just curious about the thinking.

@zorkian zorkian merged commit 66e9cc1 into dreamwidth:develop Feb 1, 2017
@kareila kareila deleted the 1948-wbr-fix branch February 1, 2017 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants