Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully strip block XML before TTSifying #16718

Merged
merged 5 commits into from Aug 5, 2017
Merged

Fully strip block XML before TTSifying #16718

merged 5 commits into from Aug 5, 2017

Conversation

Hamms
Copy link
Contributor

@Hamms Hamms commented Jul 31, 2017

No description provided.

Copy link
Contributor

@ashercodeorg ashercodeorg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending comments.

@@ -100,6 +100,18 @@ def self.tts_upload_to_s3(text, filename)
end
end

def self.sanitize(text)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps sanitize_and_render? Otherwise, I would expect sanitize to accept and return a string, nothing more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be confused; accepting and returning a string is precisely what it does

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I didn't realize what TTSSafeRenderer.render did.

test 'sanitize html' do
assert_equal @level_with_raw_html.tts_markdown_instructions_text, "This should have no excess formatting \n"
test 'sanitize html and xml' do
assert_equal @level_with_raw_html.tts_markdown_instructions_text, "This should have no excess formatting\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected and actual values should be flipped (for better error output on failure).

(also elsewhere in the file, feel welcome to do as a separate PR, or leave for me to do)

test 'sanitize html and xml' do
assert_equal @level_with_raw_html.tts_markdown_instructions_text, "This should have no excess formatting\n"
assert_equal @level_with_block_html.tts_markdown_instructions_text, "This block should get stripped:\n"
assert_equal @level_with_xml.tts_markdown_instructions_text, "This block should get stripped:\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I be surprised that there is only one \n rather than two in the expected output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on how familiar you are with Redcarpet 😄

Two newlines in the input are necessary to separate paragraphs and to treat the xml as its own "block", as is standard for markdown input.

Newlines within the content itself are irrelevant, as is standard for markdown output, but for some reason one is always added to the end.

#
# to avoid this, we simply and aggressively strip out any xml before passing
# the rest to redcarpet, despite the risk of invoking the wrath of Zalgo.
TTSSafeRenderer.render(text.gsub(/<xml>.*<\/xml>/m, ''))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any concern of there being multiple </xml> strings in text? If so, does the /m invoke the desired behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good catch. There is indeed and I should make this regex less greedy.

The /m doesn't influence that side effect, though; it's just there to make sure the regex with deal with newlines

@Hamms
Copy link
Contributor Author

Hamms commented Aug 2, 2017

Added a Loofah implementation rather than regex because why on earth was I trying to parse XML using regex.

PTAL

@ashercodeorg
Copy link
Contributor

LGTM (Ignore the PTAL [now deleted]).

@Hamms Hamms merged commit 345b462 into staging Aug 5, 2017
@Hamms Hamms deleted the blockly-tts-no-xml branch August 5, 2017 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants