Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rules around the URL casing #745

Open
yordis opened this issue May 4, 2021 · 5 comments
Open

Rules around the URL casing #745

yordis opened this issue May 4, 2021 · 5 comments

Comments

@yordis
Copy link

yordis commented May 4, 2021

Hey folk, hopefully, I didn't miss the docs, my apologies if I did.

I am trying to find a rule related to the URL casing, like, should my URLs be case insensitive, or not. and should I use dashes or underscore for separating words?

Would appreciate some alignment related to the topic.

Thanks in advanced,

@shaneqld
Copy link

Your URLs consist of the hostname (lowercase, case insensitive), version (e.g. v1beta1, which is lowercase), resource name and optionally custom method.

AIP-122 details resource names. Specifically:

Collection identifiers must be in camelCase

user-settable resource IDs should conform to RFC-1034; which restricts to letters, numbers, and hyphen
user-settable resource IDs should restrict letters to lower-case

AIP-136 details custom methods. Specifically:

If word separation is required, camelCase must be used

Here's an excerpt from a Google API showing this in action:

post: "/v3beta1/{parent=projects/*/locations/*/agents/*/sessions/*}/entityTypes"

In this example above, projects, locations, agents, sessions and entityTypes are collection identifiers (camel case) whereas the * characters are resource IDs (lowercase, numbers and hyphen).

And another example using a custom method:

post: "/v3beta1/{session=projects/*/locations/*/agents/*/sessions/*}:matchIntent"

Whether the path should be case sensitive, my guess is yes.

@gibson042
Copy link
Contributor

That section of AIP-122 doesn't even make sense... RFC 1034 describes DNS, and section 3.5 covers the preferred syntax for host names (note: not arbitrary domain names) and was superseded by RFC 1123 anyway (to allow the first character of any label in an Internet host name to be an ASCII letter or digit). Note also that each DNS domain name is a sequence of labels, and "conform to RFC-1034" fails to convey intent that seems to be "be valid as a DNS label in an Internet host name conforming with the preferred syntax of RFC 1034 {as updated by,without the updates of} RFC 1123". That said, though, I can't imagine why API resource names should having anything to do with DNS domain name labels, and Unicode Standard Annex #31 (IDENTIFIER AND PATTERN SYNTAX) would seem much more applicable.

@yordis
Copy link
Author

yordis commented Jun 1, 2021

How valuable would be to downcase everything and use either - or _ (only one) to avoid bikeshedding?

Or use camelCase but we can't use acronyms never, only use uppercase to separate words 🤷🏻

I am getting lost reading the AIP, and I am craving for more rules that avoid bikeshedding

@lukesneeringer
Copy link
Contributor

@gibson042 Can you send a PR with what you think it ought to say?

@gibson042
Copy link
Contributor

Can you send a PR with what you think it ought to say?

@lukesneeringer Yes, but I'd really like someone to explain the intent of the current text first. The Resource ID segments section seems clear, albeit with a surprising DNS RFC reference—IIUC, user-settable resource IDs should be matched by /^[a-z]([a-z0-9-]{0,61}[a-z0-9])?$/ (which allows consecutive dashes that would be disallowed by UAX #31, and disallows underscores that would be allowed). But the main Guidance section is unclear, because RFC 1123 "Internet host name" updates the "Preferred name syntax" from RFC 1034, but neither of those documents have a concept of "DNS names" as suggested by the AIP (which I suspect is intending to reference the host name label production [i.e., the parts of a domain name that are separated by unescaped dots], although again it is not clear why URL path segments should have any connection to that concept).

For reference, RFC 3986 defines URL path segments as consisting of any combination of unreserved (ASCII alphanumeric/dash/dot/underscore/tilde), pct-encoded, sub-delims (any of !$&'()*+,;=), ":", and "@", and assigns special treatment to empty segments (disallowed after an initial slash for some paths), dot segments (. and .., subject to relative interpretation), and segments containing a colon (disallowed as the initial segment of a relative reference, where it would be confuseable with the separator after an initial URI scheme). AIPs can restrict path egments to a smaller set, but is there any reason for that restriction to correspond with the unrelated DNS host name label?

There's nothing wrong with "up to 63 ASCII characters with an initial letter, terminal alphanumeric, and inner alphanumerics and dashes, preferably all lowercase", it's just strange to couple it to DNS. Should I assume that we want to keep the general concept and drop the coupling, or should it go further and allow punctuation such as underscore (which is an ID_Continue character supported in Unicode identifiers)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants