-
-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Sitemap image errors #4591
Comments
I think this should potentially use the user cover image, rather than the user's profile picture? |
Ah - yes, totally agree (with a conditional to check that it exists). Incidentally, how are images being generated for posts? Is it only the cover image? Is that correct? Or should it be including all images on the page? I'm not 100% clear on what images in the sitemap is for. If it's for displaying in search results, then I guess it should be 1 image - but if it's for discoverability in Google Image search then it should be probably be all of them. |
Currently, it is using the feature/cover image for the post (which is the one we have in the DB). Reading the spec: https://support.google.com/webmasters/answer/178636 it seems the intention is to make these images available to Google Image search. Therefore perhaps we should provide both the cover and profile picture for a user and doctor the protocol? In terms of grabbing all the images from a post, that will require non-trivial processing of posts. I'd recommend this be raised as a separate issue for the improvement of sitemaps (whereas this issue is a bug affecting their usability). Definitely doable, and will become more doable after image import is done. |
Roger, then for now just cover image imo. I don't think profile photo is at all relevant to search results |
This PR switches to using cover image - but a user may still select an image by url without a protocol. Hence we will now use url validation and only add image if there is a valid protocol - guessing image protocol is likely to result in further errors. |
I'm not sure that's the case. If someone uses a protocol relative URL what they're indicating is that it's okay to access the image via whatever transport (http or https) is currently being used. So there shouldn't be any harm in normalizing the URLs. |
That is assuming the user knows the difference, also what I mean is us guessing the protocol not the user. E.g a user may have a url of //whatever.jpg without a protocol. |
More's the point, any guess we make will be at the point the sitemap is generated, not at the point the sitemap is served. Protocol-relative URLs are meant to defer the decision to the point the url is served, but if the sitemap protocol can't deal with this, then we're forced to make an arbitrary decision at an arbitrary point. Therefore I thought it was more sane to just not include these images. |
I'm not following your example. If a user enters a protocol relative image URL we don't need to guess because by definition both http and https must work. If the user enters a URL as relative but it won't resolve at http, for example, then it's already going to be broken in general usage. Sorry if I'm just not understanding the issue but it should be completely safe to take a protocol relative URL and turn it into http:://... |
First, you're assuming a user knows what they're doing and not just copy and pasting. Yes if the image doesn't resolve it doesn't really matter, but it seemed entirely arbitrary to me to to guess which protocol to use at the point of generation. Do we always guess http:// or always guess https:// or do it based on how the URL is configured for the blog? |
Broken is broken and unless we're going to attempt to resolve and verify all images as part of the sitemap generation, the user providing bad URLs is beyond the scope of sitemaps. As far as guessing goes, that's my point--it's not really guessing because a protocol relative URL means both http and https must work. I think I would probably just choose http because in case it is broken it's probably more common to have a redirect to https than the other way around. |
I still think we are assuming too much, could there be a data-uri instead? I can add http if the url starts with '//' - I don't like it, I still prefer to simply not add the image to sitemap if the protocol isn't stated but if the consensus is that this is what we should then fine. |
That's fine, I'm not trying to hold this up based on including those images in the sitemap. My only point was that adding "http:" to a protocol relative URL is a completely safe operation. |
closes TryGhost#4591 - switches to using author cover image - adds a protocol of http if using a protocol relative url
Google doesn't like images without a protocol. Not sure what the fix for this is. Should gravatar images even be picked up by the sitemap?
The text was updated successfully, but these errors were encountered: