Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppress invalid html errors #10

Merged
merged 1 commit into from
Oct 11, 2016

Conversation

jeremyhansonfinger
Copy link

@jeremyhansonfinger jeremyhansonfinger commented Oct 11, 2016

Nokogiri is raising false invalid HTML errors. Temporarily suppressing HTML errors so we can continue to test the content linting.

@justinthec @nwtn @stariqmi

@perrupa
Copy link

perrupa commented Oct 11, 2016

Looks good, this solves the unclosed <doctype>and <meta> errors we were getting on the amazon channel. 👍

@nwtn
Copy link

nwtn commented Oct 11, 2016

didn’t 🎩 but LGTM! thanks!

@jeremyhansonfinger jeremyhansonfinger merged commit 3288f8e into master Oct 11, 2016
@jeremyhansonfinger jeremyhansonfinger deleted the suppress-invalid-html-errors branch October 11, 2016 20:22
@justinthec
Copy link

justinthec commented Oct 11, 2016

I'd prefer to have it commented out instead of deleting the code.

Edit: Nvm it got merged; carry on.

@jeremyhansonfinger
Copy link
Author

Ah @justinthec, I checked with Dave before doing it and that was his request. We're going to meet this week with @stariqmi to discuss what else we can do to validate html—obviously your contribution is more than welcome.

@nwtn
Copy link

nwtn commented Oct 11, 2016

ack sorry @justinthec! i think our style guide says not to comment out blocks like that in general; hopefully it won't take long to re-add it back in

@justinthec
Copy link

No worries at all!

Could you give me a brief outline of what cases were throwing false positives?

@jeremyhansonfinger
Copy link
Author

So far looks like meta tags and possibly also doctypes. Maybe any tags that are valid unclosed?

@jeremyhansonfinger
Copy link
Author

Am just trying to test locally to see exactly where it failed

@nwtn
Copy link

nwtn commented Oct 11, 2016

this was an example that got flagged, even though it's perfectly valid:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <style type='text/css'>
    body {
      font-family: sans-serif;
    }
  </style>
  <title>Succesfully Connected to Amazon</title>
</head>
<body>

  <p>Redirecting back to Shopify…</p>

  <script>
    window.onunload = function refreshParent() {
      window.opener.location.reload();
    }
    window.close();
  </script>
</body>
</html>

@jeremyhansonfinger
Copy link
Author

Also got this error:

image

@justinthec
Copy link

image

I'm getting no errors when trying to parse the example you gave @nwtn , weird

@nwtn
Copy link

nwtn commented Oct 13, 2016

@jeremyhansonfinger were you able to reproduce the errors locally?

@jeremyhansonfinger
Copy link
Author

jeremyhansonfinger commented Oct 13, 2016

@nwtn @justinthec My rubies and gems got messed up after the Sierra upgrade so I got stuck dealing with that first. I haven't been able to reproduce the second error yet—I haven't yet figured out where end_marker is being assigned a value of nil , and I'll try to reproduce the first one before our meeting.

@justinthec
Copy link

end_marker will be nil when the search for the End marker returns no results. This means that the End marker got removed somehow, since it is always added in at the file parsing stage here: https://github.com/Shopify/erb-lint/blob/master/lib/erb_lint/parser.rb#L16

What are you going to produce that error @jeremyhansonfinger

@jeremyhansonfinger
Copy link
Author

jeremyhansonfinger commented Oct 13, 2016

@justinthec Run my local version of policial with erb-lint at e10ea84 and commit that code to jeremyhansonfinger/content-lint-demo-2#17

@jeremyhansonfinger
Copy link
Author

jeremyhansonfinger commented Oct 13, 2016

Okay so @justinthec, here's what I've determined.

The error comes from
image

[20] pry(ERBLint::Parser)> file_tree.children.empty?
=> false
[21] pry(ERBLint::Parser)> file_tree.children.last.name != END_MARKER_NAME
=> true
[22] pry(ERBLint::Parser)> p END_MARKER_NAME
"erb_lint_end_marker"
=> "erb_lint_end_marker"
[23] pry(ERBLint::Parser)> file_tree.children.last.name
=> "body"
[24] pry(ERBLint::Parser)> file_tree.children.last
=> #(Element:0x3ff1b9031188 {
  name = "body",
  children = [
    #(Text "\n\n\n  "),
    #(Element:0x3ff1b6b0d0b4 { name = "meta", attributes = [ #(Attr:0x3ff1b6b0cfec { name = "charset", value = "UTF-8" })] }),
    #(Text "\n  "),
    #(Element:0x3ff1b6b0c4c0 {
      name = "style",
      attributes = [ #(Attr:0x3ff1b6b0c3d0 { name = "type", value = "text/css" })],
      children = [ #(CDATA "\n    body {\n      font-family: sans-serif;\n    }\n  ")]
      }),
    #(Text "\n  "),
    #(Element:0x3ff1b6ae58e8 { name = "title", children = [ #(Text "Succesfully Connected to Amazon")] }),
    #(Text "\n\n\n\n  "),
    #(Element:0x3ff1b6ae52f8 { name = "p", children = [ #(Text "Redirecting back to Shopify…")] }),
    #(Text "\n\n  "),
    #(Element:0x3ff1b6ae4c40 {
      name = "script",
      children = [ #(CDATA "\n    window.onunload = function refreshParent() {\n      window.opener.location.reload();\n    }\n    window.close();\n  ")]
      }),
    #(Text "\n\n\n"),
    #(Element:0x3ff1b6ae4574 {
      name = "erb_lint_end_marker",
      children = [ #(Text "\n  This is used to calculate the line number of the last line.\n  This is only necessary until Text#line is fixed in Nokogiri.\n")]
      })]
  })
[25] pry(ERBLint::Parser)> p file_tree
#<Nokogiri::HTML::DocumentFragment:0x3ff1b920d1b4 name="#document-fragment" children=[#<Nokogiri::XML::Element:0x3ff1b9031188 name="body" children=[#<Nokogiri::XML::Text:0x3ff1b6b0d1cc "\n\n\n  ">, #<Nokogiri::XML::Element:0x3ff1b6b0d0b4 name="meta" attributes=[#<Nokogiri::XML::Attr:0x3ff1b6b0cfec name="charset" value="UTF-8">]>, #<Nokogiri::XML::Text:0x3ff1b6b0c614 "\n  ">, #<Nokogiri::XML::Element:0x3ff1b6b0c4c0 name="style" attributes=[#<Nokogiri::XML::Attr:0x3ff1b6b0c3d0 name="type" value="text/css">] children=[#<Nokogiri::XML::CDATA:0x3ff1b6ae5c58 "\n    body {\n      font-family: sans-serif;\n    }\n  ">]>, #<Nokogiri::XML::Text:0x3ff1b6ae59ec "\n  ">, #<Nokogiri::XML::Element:0x3ff1b6ae58e8 name="title" children=[#<Nokogiri::XML::Text:0x3ff1b6ae5654 "Succesfully Connected to Amazon">]>, #<Nokogiri::XML::Text:0x3ff1b6ae5410 "\n\n\n\n  ">, #<Nokogiri::XML::Element:0x3ff1b6ae52f8 name="p" children=[#<Nokogiri::XML::Text:0x3ff1b6ae4fec "Redirecting back to Shopify…">]>, #<Nokogiri::XML::Text:0x3ff1b6ae4da8 "\n\n  ">, #<Nokogiri::XML::Element:0x3ff1b6ae4c40 name="script" children=[#<Nokogiri::XML::CDATA:0x3ff1b6ae4920 "\n    window.onunload = function refreshParent() {\n      window.opener.location.reload();\n    }\n    window.close();\n  ">]>, #<Nokogiri::XML::Text:0x3ff1b6ae46b4 "\n\n\n">, #<Nokogiri::XML::Element:0x3ff1b6ae4574 name="erb_lint_end_marker" children=[#<Nokogiri::XML::Text:0x3ff1b6ae4268 "\n  This is used to calculate the line number of the last line.\n  This is only necessary until Text#line is fixed in Nokogiri.\n">]>]>]>
=> #(DocumentFragment:0x3ff1b920d1b4 {
  name = "#document-fragment",
  children = [
    #(Element:0x3ff1b9031188 {
      name = "body",
      children = [
        #(Text "\n\n\n  "),
        #(Element:0x3ff1b6b0d0b4 { name = "meta", attributes = [ #(Attr:0x3ff1b6b0cfec { name = "charset", value = "UTF-8" })] }),
        #(Text "\n  "),
        #(Element:0x3ff1b6b0c4c0 {
          name = "style",
          attributes = [ #(Attr:0x3ff1b6b0c3d0 { name = "type", value = "text/css" })],
          children = [ #(CDATA "\n    body {\n      font-family: sans-serif;\n    }\n  ")]
          }),
        #(Text "\n  "),
        #(Element:0x3ff1b6ae58e8 { name = "title", children = [ #(Text "Succesfully Connected to Amazon")] }),
        #(Text "\n\n\n\n  "),
        #(Element:0x3ff1b6ae52f8 { name = "p", children = [ #(Text "Redirecting back to Shopify…")] }),
        #(Text "\n\n  "),
        #(Element:0x3ff1b6ae4c40 {
          name = "script",
          children = [ #(CDATA "\n    window.onunload = function refreshParent() {\n      window.opener.location.reload();\n    }\n    window.close();\n  ")]
          }),
        #(Text "\n\n\n"),
        #(Element:0x3ff1b6ae4574 {
          name = "erb_lint_end_marker",
          children = [ #(Text "\n  This is used to calculate the line number of the last line.\n  This is only necessary until Text#line is fixed in Nokogiri.\n")]
          })]
      })]
  })

@jeremyhansonfinger
Copy link
Author

I'm going to see if the same thing happens when I just run it locally in erb-lint, without policial.

@justinthec
Copy link

policial should have absolutely no effect on the parsing that ERBLint does. The problem here is that erb_lint_end_marker needs to be outside of body. Nokogiri is doing something weird since all of the tags that were inside the head are now inside the body. Also all of the html tags and doctype tags have dissapeared.

@jeremyhansonfinger
Copy link
Author

^^ Yes. It structures the tree the same way, given file_content = the code example Dave provided.

@jeremyhansonfinger
Copy link
Author

Yes, that is exactly what's going on. Ugh.

@justinthec
Copy link

justinthec commented Oct 13, 2016

I can't think of a proper solution to this at the moment. It is very hard to maintain consistency between the parse tree and the source file, if things move around and disappear...

I'm also about to enter midterm season so just let it be for now. Also be aware that the FinalNewline linter may report an inaccurate errors due to this bug since the end marker is not at the end. Disable it on production if it becomes a problem.

@jeremyhansonfinger
Copy link
Author

Long shot, maybe we can try locking nokogiri to an earlier version?

I'll move onto trying to replicate the error from Bugsnag with the end_marker being removed.

@jeremyhansonfinger
Copy link
Author

Thanks, Justin.

Could replicate this error—

image

—in 3288f8e, when the ParserError had been suppressed. It's part of the same issue with the file_tree sometimes not reflecting the source material accurately. It's just not caught in this case until https://github.com/Shopify/erb-lint/blob/master/lib/erb_lint/linters/final_newline.rb#L17.

Could suppress that too but it sounds from what you're saying like it's best to turn off FinalNewline for now if we want to move forward with trying to test ContentStyle.

@justinthec
Copy link

@jeremyhansonfinger any updates? How's ContentStyle going?

@jeremyhansonfinger
Copy link
Author

@justinthec: @nathanmarks is trying to figure out how to best build a tree for erb files without using nokogiri. I'm not sure how up to date this branch is, but: https://github.com/Shopify/erb-lint/tree/erb

My standalone ContentStyle works locally based on html output from hotcop, and one team has used it to audit their content so far. I need to figure out how to speed it up, though.

@nathanmarks
Copy link

nathanmarks commented Dec 23, 2016

@jeremyhansonfinger @justinthec

I'm going to be using that branch immediately after the new year in conjunction with this linter: (removed because public -Jeremy)

Having access to a ruby AST is going to allow us to do some really focused deprecations in the future for our components (properties, values, etc).

It's working rather well so far.

@jeremyhansonfinger
Copy link
Author

I'm stepping through your branch @nathanmarks, I think this will be really useful, and I'll try to work with it in the new year to get content-style working in policial again.

jeremyhansonfinger added a commit that referenced this pull request Oct 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants