Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Sitemap image errors #4591

Closed
JohnONolan opened this issue Dec 5, 2014 · 13 comments · Fixed by #4636
Closed

Bug: Sitemap image errors #4591

JohnONolan opened this issue Dec 5, 2014 · 13 comments · Fixed by #4636
Assignees
Labels
bug [triage] something behaving unexpectedly

Comments

@JohnONolan
Copy link
Member

image

Google doesn't like images without a protocol. Not sure what the fix for this is. Should gravatar images even be picked up by the sitemap?

@JohnONolan JohnONolan added the bug [triage] something behaving unexpectedly label Dec 5, 2014
@JohnONolan JohnONolan added this to the Current Backlog milestone Dec 5, 2014
@ErisDS
Copy link
Member

ErisDS commented Dec 8, 2014

I think this should potentially use the user cover image, rather than the user's profile picture?

@JohnONolan
Copy link
Member Author

Ah - yes, totally agree (with a conditional to check that it exists).

Incidentally, how are images being generated for posts? Is it only the cover image? Is that correct? Or should it be including all images on the page?

I'm not 100% clear on what images in the sitemap is for. If it's for displaying in search results, then I guess it should be 1 image - but if it's for discoverability in Google Image search then it should be probably be all of them.

@ErisDS
Copy link
Member

ErisDS commented Dec 8, 2014

Currently, it is using the feature/cover image for the post (which is the one we have in the DB). Reading the spec: https://support.google.com/webmasters/answer/178636 it seems the intention is to make these images available to Google Image search.

Therefore perhaps we should provide both the cover and profile picture for a user and doctor the protocol?

In terms of grabbing all the images from a post, that will require non-trivial processing of posts. I'd recommend this be raised as a separate issue for the improvement of sitemaps (whereas this issue is a bug affecting their usability). Definitely doable, and will become more doable after image import is done.

@JohnONolan
Copy link
Member Author

Roger, then for now just cover image imo. I don't think profile photo is at all relevant to search results

@cobbspur
Copy link
Member

This PR switches to using cover image - but a user may still select an image by url without a protocol.

Hence we will now use url validation and only add image if there is a valid protocol - guessing image protocol is likely to result in further errors.

@jaswilli
Copy link
Contributor

guessing image protocol is likely to result in further errors.

I'm not sure that's the case. If someone uses a protocol relative URL what they're indicating is that it's okay to access the image via whatever transport (http or https) is currently being used. So there shouldn't be any harm in normalizing the URLs.

@cobbspur
Copy link
Member

That is assuming the user knows the difference, also what I mean is us guessing the protocol not the user.

E.g a user may have a url of //whatever.jpg without a protocol.
In this instance we do not know if that will fail validation, we could http or https but they could also fail if we guess wrong. Hence we should not add it at all in this instance.

@ErisDS
Copy link
Member

ErisDS commented Dec 13, 2014

More's the point, any guess we make will be at the point the sitemap is generated, not at the point the sitemap is served. Protocol-relative URLs are meant to defer the decision to the point the url is served, but if the sitemap protocol can't deal with this, then we're forced to make an arbitrary decision at an arbitrary point. Therefore I thought it was more sane to just not include these images.

@jaswilli
Copy link
Contributor

I'm not following your example. If a user enters a protocol relative image URL we don't need to guess because by definition both http and https must work.

If the user enters a URL as relative but it won't resolve at http, for example, then it's already going to be broken in general usage.

Sorry if I'm just not understanding the issue but it should be completely safe to take a protocol relative URL and turn it into http:://...

@ErisDS
Copy link
Member

ErisDS commented Dec 13, 2014

First, you're assuming a user knows what they're doing and not just copy and pasting. Yes if the image doesn't resolve it doesn't really matter, but it seemed entirely arbitrary to me to to guess which protocol to use at the point of generation.

Do we always guess http:// or always guess https:// or do it based on how the URL is configured for the blog?

@jaswilli
Copy link
Contributor

Broken is broken and unless we're going to attempt to resolve and verify all images as part of the sitemap generation, the user providing bad URLs is beyond the scope of sitemaps.

As far as guessing goes, that's my point--it's not really guessing because a protocol relative URL means both http and https must work.

I think I would probably just choose http because in case it is broken it's probably more common to have a redirect to https than the other way around.

@cobbspur
Copy link
Member

I still think we are assuming too much, could there be a data-uri instead?

I can add http if the url starts with '//' - I don't like it, I still prefer to simply not add the image to sitemap if the protocol isn't stated but if the consensus is that this is what we should then fine.

@jaswilli
Copy link
Contributor

That's fine, I'm not trying to hold this up based on including those images in the sitemap. My only point was that adding "http:" to a protocol relative URL is a completely safe operation.

cobbspur added a commit to cobbspur/Ghost that referenced this issue Dec 14, 2014
closes TryGhost#4591

- switches to using author cover image
- adds a protocol of http if using a protocol relative url
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug [triage] something behaving unexpectedly
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants