Skip to content
This repository has been archived by the owner on Oct 6, 2021. It is now read-only.

Escaping #7

Open
ArmorDarks opened this issue Sep 2, 2016 · 0 comments
Open

Escaping #7

ArmorDarks opened this issue Sep 2, 2016 · 0 comments

Comments

@ArmorDarks
Copy link
Member

This is just a reminder, that everything that goes into sitemap (urls, captions, etc) should be escaped properly: https://support.google.com/webmasters/answer/183668?hl=en (see Non-alphanumeric and non-latin characters):

Non-alphanumeric and non-latin characters. We require your Sitemap file to be UTF-8 encoded (you can generally do this when you save the file). As with all XML files, any data values (including URLs) must use entity escape codes for the characters listed in the table below.  A sitemap can contain only ASCII characters; it can't contain upper ASCII characters or certain control codes or special characters such as * and {}. If your Sitemap URL contains these characters, you'll receive an error when you try to add it.
Character   Escape Code
Ampersand   &   &
Single Quote    '   '
Double Quote    "   "
Greater Than    >   >
Less Than   <   &lt; In addition, all URLs (including the URL of your Sitemap) must be encoded for readability by the web server on which they are located and URL-escaped. However, if you are using any sort of script, tool, or log file to generate your URLs (anything except typing them in by hand), this is usually already done for you. If you submit your Sitemap and you receive an error that Google is unable to find some of your URLs, check to make sure that your URLs follow the RFC-3986 standard for URIs, the RFC-3987 standard for IRIs, and the XML standard.

Here is an example of a URL that uses a non-ASCII character (ü), as well as a character that requires entity escaping (&):
http://www.example.com/ümlat.html&q=name
Here is that same URL, ISO-8859-1 encoded (for hosting on a server that uses that encoding) and URL escaped:
http://www.example.com/%FCmlat.html&q=name
Here is that same URL, UTF-8 encoded (for hosting on a server that uses that encoding) and URL escaped:
http://www.example.com/%C3%BCmlat.html&q=name
Here is that same URL, entity escaped:
http://www.example.com/%C3%BCmlat.html&amp;q=name
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant