New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate PURL configuration #4

Closed
jamesaoverton opened this Issue Nov 5, 2015 · 3 comments

Comments

Projects
None yet
2 participants
@jamesaoverton
Member

jamesaoverton commented Nov 5, 2015

To use this new system, we need to migrate our configuration from the current system. We should to be able to verify that migration is complete and correct (see also #3).

Given a copy of the current database table, a semi-automatic, one-time conversion should be pretty easy, and we can be pretty sure that we haven't missed anything. I don't know if this is possible.

Without a copy of the table, we can use the search interface: https://purl.org/docs/purl.html#search. The results can be scraped from the HTML, or we can fetch XML like this.

The search interface only returns 100 results at a time. This is enough for all the specific ontologies that I have checked: 44 for /obo/bfo/*, 66 for /obo/obi/*, 63 for /obo/iao/*, 1 for /obo/pro/*, 7 for /obo/go/*, 6 for /obo/chebi/*.

However, for the /obo/* path there are more than 100 results. Most of the results I see belong to PRO. And I think there might be other redirect magic going on for /obo/*. @cmungall?

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Nov 5, 2015

Contributor

Isn't there an easier more incremental approach?

Can we use the existing purl server as a fallthrough?

For example, at t0 the apache config can just be something like

/obo/ http://purl.oclc.org/obo/ PARTIAL

Then we point http://purl.obolibrary.org/obo/ at the apache server. It introduces one extra level of redirects, but everything should just work.

Then we incrementally add entries to the apache config, overriding oclc.

When we are comfortable, we can eliminate the fallthrough (after scraping with the API to make sure there are no stragglers), but this can be as incremental as we like

Contributor

cmungall commented Nov 5, 2015

Isn't there an easier more incremental approach?

Can we use the existing purl server as a fallthrough?

For example, at t0 the apache config can just be something like

/obo/ http://purl.oclc.org/obo/ PARTIAL

Then we point http://purl.obolibrary.org/obo/ at the apache server. It introduces one extra level of redirects, but everything should just work.

Then we incrementally add entries to the apache config, overriding oclc.

When we are comfortable, we can eliminate the fallthrough (after scraping with the API to make sure there are no stragglers), but this can be as incremental as we like

@jamesaoverton

This comment has been minimized.

Show comment
Hide comment
@jamesaoverton

jamesaoverton Nov 5, 2015

Member

I hadn't thought of that. Good idea!

Member

jamesaoverton commented Nov 5, 2015

I hadn't thought of that. Good idea!

@jamesaoverton

This comment has been minimized.

Show comment
Hide comment
@jamesaoverton

jamesaoverton Nov 12, 2015

Member

I have code for XML migration. My experience so far is that most projects need a little manual tweaking, but it's pretty good.

We need to make decisions about the config/obo.yml file:

  1. Global failover to http://purl.oclc.org/obo/
  2. Match term IDs:
    a. one entry for each project in obo.yml
    b. match any term and redirect to a rule in the project-specific file, e.g. ^/obo/(\S+)_(\S+)$ to /obo/lc($1)/about/$1_$2; this requires that every project has a YAML configuration file with that rule
  3. Match other top-level paths: /obo/obi, /obo/obi.owl, /obo/obi.obo, etc.
  4. Determine what other rules are required. Is there a way to get this from the OCLC search interface?
Member

jamesaoverton commented Nov 12, 2015

I have code for XML migration. My experience so far is that most projects need a little manual tweaking, but it's pretty good.

We need to make decisions about the config/obo.yml file:

  1. Global failover to http://purl.oclc.org/obo/
  2. Match term IDs:
    a. one entry for each project in obo.yml
    b. match any term and redirect to a rule in the project-specific file, e.g. ^/obo/(\S+)_(\S+)$ to /obo/lc($1)/about/$1_$2; this requires that every project has a YAML configuration file with that rule
  3. Match other top-level paths: /obo/obi, /obo/obi.owl, /obo/obi.obo, etc.
  4. Determine what other rules are required. Is there a way to get this from the OCLC search interface?

jamesaoverton added a commit that referenced this issue Nov 19, 2015

@jamesaoverton jamesaoverton added this to the Initial release milestone Nov 19, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment