Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

relativize_paths silently, incorrectly transforms gcse:search #1319

Closed
sirosen opened this issue Feb 20, 2018 · 5 comments
Closed

relativize_paths silently, incorrectly transforms gcse:search #1319

sirosen opened this issue Feb 20, 2018 · 5 comments

Comments

@sirosen
Copy link

sirosen commented Feb 20, 2018

relativize_paths silently transforms gcse:search

Steps to reproduce

  1. Create an html file with Google's GCSE search snippet
    This includes an element for the Google JS to manipulate, <gcse:search></gcse:search>
  2. Run nanoc with rules to apply relativize_paths on that html file, with type: html
  3. Observe output will read <search></search>

Expected behavior

I think that nanoc should not be transforming that element at all. However, maybe there's some subtlety I'm missing about strict HTML specification.

Either of the below would be also acceptable:

  • nanoc should print a warning that the element is transformed -- it can surprise people, after all
  • nanoc should crashfail the build on the element

Actual behavior

Instead, nanoc succeeds, but changes the element.

Details

I'm surprised to be the first person reporting this -- maybe not that many people are using GCSE, or using relativize_paths, or maybe I'm doing something else wrong.

@denisdefreyne
Copy link
Member

I can confirm this is a problem!

This appears to be problematic behavior with Nokogiri/libxml for parsing HTML… when reassembling the content, the namespaces are somehow left out.

@denisdefreyne
Copy link
Member

This appears to be a common problem without a proper solution (see e.g. jekyll/jemoji#43).

Suggestions for workarounds that I’ve found are along the form of

<div id='gcse'></div>
<script>
  $('#gcse').html("<gcse:search></gcse:search>");
</script>

… which is pretty icky.

This might not be fixable within Nanoc itself, unfortunately, but perhaps Nanoc can offer an automatic workaround.

@denisdefreyne
Copy link
Member

Icky workaround in #1320.

@sirosen
Copy link
Author

sirosen commented Feb 21, 2018

Thanks for the really quick detective work on this.
There's been an available patch to fix libxml sitting unattended for a few years now, so this might get fixed someday... but probably not soon.

Given that GCSE is all JS-based, it will never work when blocked by NoScript, etc.
In that light, the workaround you've noted doesn't have any significant practical faults.

This is all a matter of where the grossness exists, not whether or not it exists.

If you decide that you dislike #1320 very strongly, I think it would be satisfactory to detect gcse:search in content prior to applying filters and print a warning with a link:

Warning!
  Nanoc relies on libxml to parse HTML. This is known to cause issues with GCSE.
  Recommended workaround: https://github.com/nanoc/nanoc/issues/1319#issuecomment-367234453

@denisdefreyne
Copy link
Member

denisdefreyne commented Feb 22, 2018

Sucks to hear about the libxml patch that has been lying around for so long!

My main problem with #1320 is that it only applies to gcse:search, not g:plusone, fb:like, …. The only way to solve this problem in general is to fix the HTML parser, or use a different one.

It might be worth looking into using Oga, for instance.

denisdefreyne added a commit that referenced this issue Feb 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants