Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Case insensitive matching #67

Closed
jamesaoverton opened this issue Nov 18, 2015 · 9 comments
Closed

Case insensitive matching #67

jamesaoverton opened this issue Nov 18, 2015 · 9 comments

Comments

@jamesaoverton
Copy link
Member

PURL.org seems to be case insensitive. These two URLs resolve to the same target:

Apache is case sensitive by default, so I think we need to add (?i) to the front of every RedirectMatch.

@jamesaoverton
Copy link
Member Author

Arg. PURL.org is case sensitive, but http://purl.obolibrary.org/obo/ is not. So I get an Ontobee 404 for http://purl.obolibrary.org/obo/obi_0000070.

What behaviour(s) do we want to enforce? @cmungall @alanruttenberg @kltm

@jamesaoverton jamesaoverton added this to the Initial release milestone Nov 19, 2015
@cmungall
Copy link
Contributor

Can you give more context - what is the sequence by which we end up at http://purl.obolibrary.org/obo/obi_0000070? Is it a problem if http://purl.obolibrary.org/obo/obi_0000070 does not resolve?

@jamesaoverton
Copy link
Member Author

Sure. The current redirect sequence is:

  1. http://purl.obolibrary.org/obo/obi_0000070
  2. 302 http://www.berkeleybop.org/ontologies/obi_0000070
  3. 303 http://purl.obolibrary.org/obo/obi/about/obi_0000070
  4. 302 http://www.ontobee.org/browser/rdf.php?o=OBI&iri=http://purl.obolibrary.org/obo/obi_0000070
  5. 301 http://www.ontobee.org/ontology/OBI?iri=http://purl.obolibrary.org/obo/obi_0000070
  6. 404 Term Not Found: Ending up with a 404 is OK for me, for "obi" at least.

As of this moment, the new system is case sensitive, so http://url.ontodev.org/obo/obi_0000070 doesn't match the OBI term redirect regex and fails over to the sequence above.

If we make the new system case insensitive for all PURLs (without other changes), then the OBI term redirect rule will normalize the case from "obi" to "OBI" and the response will be 200:

  1. http://purl.obolibrary.org/obo/obi_0000070
  2. 302 http://www.ontobee.org/ontology/OBI?iri=http://purl.obolibrary.org/obo/OBI_0000070
  3. 200 Ontobee term page

The key difference is the case of the final term ID between item 4 and item 2.

So more PURLs would resolve than currently do, and people might come to rely on IDs with the wrong case. This is actually more general: PURL.org handles everything it can in a case insensitive way, but any PURLs that currently bounce through BBOP are effectively case sensitive. I think our options are:

A. make all rules case insensitive, and allow more PURLs to resolve
B. make specific rules sensitive and others insensitive, to match current behaviour as closely as we can

Maybe it doesn't matter, at the end of the day. But it's a difference in behaviour that I want to make a clear decision about.

This was supposed to be simple...

@alanruttenberg
Copy link
Member

URLs used in RDF are case sensitive, therefore IDs should be cases sensitive. URLs for artifacts should be case insensitive - that's the common expectation for web URLs.

So, B. not so much to match current behavior as to ensure that the IDs are correct and the other URLs match convention.

https://httpd.apache.org/docs/2.2/rewrite/flags.html#flag_nc

@alanruttenberg
Copy link
Member

To be clear, http://purl.obolibrary.org/obo/obi_0000070 should 404

@alanruttenberg
Copy link
Member

Note

http://purl.obolibrary.org/obo/OBI_0000070
The redirect to ontobee should be a 303 not 302 per HTTPRange-14
Ontobee responding with 200 is correct

Same for all other IDs. Non-id URLs can 302

@alanruttenberg
Copy link
Member

If we want to be forgiving we can redirect IDs with the wrong case sensitivity to a page that tells the user that this is the case and they should correct any incorrect URLs to the correct case-sensitive version.

@alanruttenberg
Copy link
Member

Note that most of our IDSPACEs are upper case, but the id policy allows for lowercase as well.

(from id policy)
PN_CHARS_BASE_OBO ::= [A-Z] | [a-z]
IDSPACE ::= PN_CHARS_BASE_OBO+ ("_" PN_CHARS_BASE_OBO+ )
LOCALID ::= [0-9]+
OBO_IDENTIFIER ::= IDSPACE ":" LOCALID

The yaml has an id field but not an idspace field. Either the id field should be understood as the idspace and the case should be fixed, or the there should be an additional idspace entry with the correct case.

@jamesaoverton
Copy link
Member Author

I agree with Alan's reasoning. I'll start implementing the following:

  1. rename the YAML id key to idspace, with a link in the README to the ID policy; this was my intent, and I checked for correct case when I created each YAML file
  2. term redirect PURLs are case sensitive and redirect with HTTP status 303
  3. all other PURLs are case insensitive and redirect with HTTP status 302

I don't have time to implement Alan's "forgiving" suggestion. If we want to do that, please file a new issue for it.

jamesaoverton added a commit to jamesaoverton/purl.obolibrary.org that referenced this issue Nov 21, 2015
- makes explicit link to OBO ID policy
- `id` is reserved in Python, so `idspace` is slightly better
- update example files
- update Python scripts
- update all config files
- update README with link to policy
jamesaoverton added a commit to jamesaoverton/purl.obolibrary.org that referenced this issue Nov 21, 2015
jamesaoverton added a commit to jamesaoverton/purl.obolibrary.org that referenced this issue Nov 21, 2015
- term redirects are still case sensitive
- regex entries are still free-form
- update examples
- update translate scripts
- update README examples
This was referenced Nov 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants