migrate PURL configuration #4

Closed
jamesaoverton opened this Issue Nov 5, 2015 · 3 comments

Projects

None yet

2 participants

@jamesaoverton
Member

To use this new system, we need to migrate our configuration from the current system. We should to be able to verify that migration is complete and correct (see also #3).

Given a copy of the current database table, a semi-automatic, one-time conversion should be pretty easy, and we can be pretty sure that we haven't missed anything. I don't know if this is possible.

Without a copy of the table, we can use the search interface: https://purl.org/docs/purl.html#search. The results can be scraped from the HTML, or we can fetch XML like this.

The search interface only returns 100 results at a time. This is enough for all the specific ontologies that I have checked: 44 for /obo/bfo/*, 66 for /obo/obi/*, 63 for /obo/iao/*, 1 for /obo/pro/*, 7 for /obo/go/*, 6 for /obo/chebi/*.

However, for the /obo/* path there are more than 100 results. Most of the results I see belong to PRO. And I think there might be other redirect magic going on for /obo/*. @cmungall?

@cmungall
Contributor
cmungall commented Nov 5, 2015

Isn't there an easier more incremental approach?

Can we use the existing purl server as a fallthrough?

For example, at t0 the apache config can just be something like

/obo/ http://purl.oclc.org/obo/ PARTIAL

Then we point http://purl.obolibrary.org/obo/ at the apache server. It introduces one extra level of redirects, but everything should just work.

Then we incrementally add entries to the apache config, overriding oclc.

When we are comfortable, we can eliminate the fallthrough (after scraping with the API to make sure there are no stragglers), but this can be as incremental as we like

@jamesaoverton
Member

I hadn't thought of that. Good idea!

@jamesaoverton
Member

I have code for XML migration. My experience so far is that most projects need a little manual tweaking, but it's pretty good.

We need to make decisions about the config/obo.yml file:

  1. Global failover to http://purl.oclc.org/obo/
  2. Match term IDs:
    a. one entry for each project in obo.yml
    b. match any term and redirect to a rule in the project-specific file, e.g. ^/obo/(\S+)_(\S+)$ to /obo/lc($1)/about/$1_$2; this requires that every project has a YAML configuration file with that rule
  3. Match other top-level paths: /obo/obi, /obo/obi.owl, /obo/obi.obo, etc.
  4. Determine what other rules are required. Is there a way to get this from the OCLC search interface?
@jamesaoverton jamesaoverton added a commit that referenced this issue Nov 19, 2015
@jamesaoverton jamesaoverton Add global failovers, see #4
- related to #7
f6e50c8
@jamesaoverton jamesaoverton added this to the Initial release milestone Nov 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment