Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL of img tag is included in the p-name and content.value #53

Closed
aaronpk opened this issue Jan 12, 2018 · 2 comments
Closed

URL of img tag is included in the p-name and content.value #53

aaronpk opened this issue Jan 12, 2018 · 2 comments

Comments

@aaronpk
Copy link
Owner

aaronpk commented Jan 12, 2018

This issue is part of #52.

The URL of img tags is included in the parsed name value as well as the content.value. This means XRay is able to detect that the name and content are a duplicate, so it correctly identifies this as a note, but then it has no way to remove the img URL from the text content.

HTML

<html>
  <head>
    <title>Test</title>
  </head>
  <body class="h-entry">
    <p class="e-content p-name">This is a photo post with an <code>img</code> tag inside the content. <img class="u-photo" src="http://target.example.com/photo.jpg"></p>
  </body>
</html>

mf2 json

        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "name": [
                    "This is a photo post with an img tag inside the content. http://target.example.com/photo.jpg"
                ],
                "photo": [
                    "http://target.example.com/photo.jpg"
                ],
                "content": [
                    {
                        "html": "This is a photo post with an <code>img</code> tag inside the content. <img class=\"u-photo\" src=\"http://target.example.com/photo.jpg\">",
                        "value": "This is a photo post with an img tag inside the content. http://target.example.com/photo.jpg"
                    }
                ]
            }
        }

https://pin13.net/mf2/?id=20180112183559529

Assuming microformats/microformats2-parsing#16 is resolved, this case would still fail. Should URLs also not be included in plaintext values?

@aaronpk aaronpk changed the title URL of img tag is included in the p-name URL of img tag is included in the p-name and content.value Jan 12, 2018
@aaronpk
Copy link
Owner Author

aaronpk commented Jan 12, 2018

Here is the XRay version of this page:

{
    "type": "entry",
    "photo": [
        "http://target.example.com/photo.jpg"
    ],
    "content": {
        "text": "This is a photo post with an img tag inside the content. http://target.example.com/photo.jpg",
        "html": "This is a photo post with an <code>img</code> tag inside the content."
    }
}

@aaronpk
Copy link
Owner Author

aaronpk commented Jan 12, 2018

I have successfully handled this case, thanks to @tantek's suggestion of running the p-name dedupe check before munging the value of the content HTML.

{
    "type": "entry",
    "photo": [
        "http://target.example.com/photo.jpg"
    ],
    "content": {
        "text": "This is a photo post with an img tag inside the content.",
        "html": "This is a photo post with an <code>img</code> tag inside the content."
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant