Fully strip block XML before TTSifying #16718

Hamms · 2017-07-31T22:46:11Z

No description provided.

ashercodeorg

LGTM, pending comments.

ashercodeorg · 2017-08-01T12:31:14Z

dashboard/app/models/concerns/text_to_speech.rb

@@ -100,6 +100,18 @@ def self.tts_upload_to_s3(text, filename)
    end
  end

+  def self.sanitize(text)


Perhaps sanitize_and_render? Otherwise, I would expect sanitize to accept and return a string, nothing more.

I might be confused; accepting and returning a string is precisely what it does

Nope, I didn't realize what TTSSafeRenderer.render did.

ashercodeorg · 2017-08-01T12:31:17Z

dashboard/test/models/concerns/text_to_speech_test.rb

-  test 'sanitize html' do
-    assert_equal @level_with_raw_html.tts_markdown_instructions_text, "This should have  no excess formatting \n"
+  test 'sanitize html and xml' do
+    assert_equal @level_with_raw_html.tts_markdown_instructions_text, "This should have  no excess formatting\n"


The expected and actual values should be flipped (for better error output on failure).

(also elsewhere in the file, feel welcome to do as a separate PR, or leave for me to do)

ashercodeorg · 2017-08-01T12:37:19Z

dashboard/test/models/concerns/text_to_speech_test.rb

+  test 'sanitize html and xml' do
+    assert_equal @level_with_raw_html.tts_markdown_instructions_text, "This should have  no excess formatting\n"
+    assert_equal @level_with_block_html.tts_markdown_instructions_text, "This block should get stripped:\n"
+    assert_equal @level_with_xml.tts_markdown_instructions_text, "This block should get stripped:\n"


Should I be surprised that there is only one \n rather than two in the expected output?

Depends on how familiar you are with Redcarpet 😄

Two newlines in the input are necessary to separate paragraphs and to treat the xml as its own "block", as is standard for markdown input.

Newlines within the content itself are irrelevant, as is standard for markdown output, but for some reason one is always added to the end.

ashercodeorg · 2017-08-01T12:38:06Z

dashboard/app/models/concerns/text_to_speech.rb

+    #
+    # to avoid this, we simply and aggressively strip out any xml before passing
+    # the rest to redcarpet, despite the risk of invoking the wrath of Zalgo.
+    TTSSafeRenderer.render(text.gsub(/<xml>.*<\/xml>/m, ''))


Is there any concern of there being multiple </xml> strings in text? If so, does the /m invoke the desired behavior?

Oh, good catch. There is indeed and I should make this regex less greedy.

The /m doesn't influence that side effect, though; it's just there to make sure the regex with deal with newlines

Hamms · 2017-08-02T00:39:51Z

Added a Loofah implementation rather than regex because why on earth was I trying to parse XML using regex.

PTAL

ashercodeorg · 2017-08-02T13:07:21Z

LGTM (Ignore the PTAL [now deleted]).

…order

ashercodeorg approved these changes Aug 1, 2017

View reviewed changes

Hamms force-pushed the blockly-tts-no-xml branch from a9d088a to 6d016f5 Compare August 2, 2017 00:36

Hamms added 5 commits August 4, 2017 11:20

add TTS.sanitize method for stripping XML

83f1e67

update TTS.render calls with TTS.sanitize

8474abc

update test

c78ee09

use Loofah rather than regex for XML sanitization to appease Zalgo

30e5033

swap order of assert arguments to semantically match expected/actual …

8a00df2

…order

Hamms force-pushed the blockly-tts-no-xml branch from d541846 to 8a00df2 Compare August 4, 2017 18:24

Hamms merged commit 345b462 into staging Aug 5, 2017

Hamms deleted the blockly-tts-no-xml branch August 5, 2017 00:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fully strip block XML before TTSifying #16718

Fully strip block XML before TTSifying #16718

Hamms commented Jul 31, 2017

ashercodeorg left a comment

ashercodeorg Aug 1, 2017

Hamms Aug 1, 2017

ashercodeorg Aug 1, 2017

ashercodeorg Aug 1, 2017

ashercodeorg Aug 1, 2017

Hamms Aug 1, 2017

ashercodeorg Aug 1, 2017

Hamms Aug 1, 2017

Hamms commented Aug 2, 2017

ashercodeorg commented Aug 2, 2017

Fully strip block XML before TTSifying #16718

Fully strip block XML before TTSifying #16718

Conversation

Hamms commented Jul 31, 2017

ashercodeorg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hamms commented Aug 2, 2017

ashercodeorg commented Aug 2, 2017