Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Protein Ontology regex specification #959

Closed
nataled opened this issue Oct 12, 2023 · 5 comments · Fixed by #960
Closed

Update Protein Ontology regex specification #959

nataled opened this issue Oct 12, 2023 · 5 comments · Fixed by #960
Assignees
Labels
Update Used in combination with prefix, metaprefix, or collection for updates to entries

Comments

@nataled
Copy link
Contributor

nataled commented Oct 12, 2023

Prefix

pr

Explanation

The given regex in Bioregistry is incomplete. In general there are two types of local identifiers in PR:

  • purely digital
  • UniProtKB identifier (including isoforms)

The full regex is thus:
^(\d+|[OPQ][0-9][A-Z0-9]{3}0-9?|[A-NR-Z]0-9{1,2}(-\d+)?)$

Contributor ORCID

0000-0001-5809-9523

@nataled nataled added the Update Used in combination with prefix, metaprefix, or collection for updates to entries label Oct 12, 2023
@cthoyt
Copy link
Member

cthoyt commented Oct 12, 2023

Awesome, thanks @nataled.

cthoyt added a commit that referenced this issue Oct 12, 2023
Closes #959

Co-Authored-By: Darren A. Natale <13770634+nataled@users.noreply.github.com>
@cthoyt cthoyt mentioned this issue Oct 12, 2023
@nataled
Copy link
Contributor Author

nataled commented Oct 12, 2023

The regex given above is from the OBO PURL system, but I can see that it is not as precise as the one I use internally:

^(?:\d{9}|[OPQ][0-9][A-Z0-9]{3}0-9?|[A-NR-Z]0-9{1,2}(?:-\d+)?)$

This version specifies the allowed number of digits (nine) and also specifies that the parentheses are not intended for string capture. Perhaps this version will work better?

@cthoyt
Copy link
Member

cthoyt commented Oct 12, 2023

Yes, this one appears to work, but what is going on with the colons?

@nataled
Copy link
Contributor Author

nataled commented Oct 12, 2023

The ?: at the beginning of each parenthetical expression says "don't save the contents of the parentheses match". Those parenthetical expressions are there only because they represent optional parts. Taught to me and recommended by James Overton.

@nataled
Copy link
Contributor Author

nataled commented Oct 12, 2023

I noticed that the first regex appears to have links in it. When I did a copy/paste they showed up.

cthoyt added a commit that referenced this issue Oct 13, 2023
Closes #959

---------

Co-authored-by: Darren A. Natale <13770634+nataled@users.noreply.github.com>
Co-authored-by: David Linke <2648874+dalito@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Update Used in combination with prefix, metaprefix, or collection for updates to entries
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants