-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SwissProt and TrEMBL vs. UniProt Knowledgebase #17
Comments
@ariutta, @Chris-Evelo, @AlexanderPico the MIRIAM name does not reflect the subset... same applies for the Swissprot subset. Is that OK? Or should the proper names be "UniProt Knowledgebase/TrEMBL" (etc)? |
The way I have always understood it. (but may be wrong) is. Indentifiers.org/Miriam also ONLY do regex pattern matching to see if a URL is valid. There is no regex rules to split TREMBL only URLs from Trembl URsfor items that are included in the Swissprot subset. So they will never be able to make the distinction. |
@egonw, you're right -- "UniProt Knowledgebase" refers to both Swissprot and TrEMBL. When I opened this issue I incorrectly thought "http://identifiers.org/uniprot" only referred to TrEMBL, but here's part of an email reply from Nick Juty:
Since "http://identifiers.org/uniprot" refers to both TrEMBL and Swissprot, I don't know how we should indicate the official name and the Miriam URN in datasources.txt. I'm open to opinions on whether "UniProt Knowledgebase/TrEMBL" is best and whether "urn:miriam:uniprot" should be listed for 1) both TrEMBL and Swissprot or 2) just TrEMBL. |
For me the important question is whether we can up with a method to use URN's that are both technically correct and that allow biologists to actually judge immediately from that URN whether data is evaluated (as it is when from UniProt) or not. We might have to go back to identifiers.org or the UniProt team for that. |
Hi Chris, To the best of my knowledge the is no way by looking at just the URL/URN even the uniprot one if the object has been added to swissprot or is just Trembl, especially as this could change over time. The biologist would have to look the data up at uniprot. identifiers.org do not keep individual data records. They only store ID regex patterns passing all id level calls down to the underlying data source (in this case uniport) |
Just thinking out loud now. But things that are in SwissProt actually have a SwissProt ID too right? And the UniProt entry should (and probably will) link to that. Can we use that somehow? |
Yes as far I know things in Swissprot have a second completely different SwissProtID having a linkset to that would be nice but never done in OPS. |
Another comment from Nick Juty:
|
Currently identifiers.org has "UniProt Knowledgebase" as the preferred name and additionally has these alternative names listed for the http://identifiers.org/uniprot/ data collection:
If you want me to suggest any additions or changes, just let me know. |
(Moved this to issue #25.) |
Just to throw another wrench into this discussion… The datasource names and identifiers used by BridgeDb are not only reliant on our collective best sense of what it should be, nor on what identifiers.org http://identifiers.org/ does, but also critically dependent on what the primary resources such as Ensembl decide to do. The BridgeDb database build has been simplified over many years to depend on resources like Ensembl to make these decisions, since they dedicate a ton of time to the problem and represent a widely-use community resource. In many ways this relieves us of having to figure this out and make the necessarily compromising decisions. In other words, let’s just do what Ensembl does and it simplifies BOTH the decision making and the build process. If something about how the source data from Ensembl is really really offensive, or caused specific data integrity issues, or breaks a critical analysis workflow, well then we should take it up with Ensembl and ask if they can change it. The more we drift from them, the more we have to maintain these differences in code and in practice. My perspective on keeping things simple :)
|
@AlexanderPico, keeping things simple sounds good. I took a look at the Ensembl entry for CALM2, and here's how they refer to Uniprot:
They don't appear to distinguish between SwissProt and TrEMBL. |
Ah, not the Ensembl website, but rather their databases. Specifically the data tables we use to make the bridgedb database. This is different from their singular representation. It covers all the alias representations, including uniprot.
|
Closed by 192a18f |
Our datasources.txt official names (last column) don't always match the recommended Miriam names. For example, datasources.txt has "UniProt/Trembl" where Miriam has "UniProt Knowledgebase". (Update: not an exact match. See later comments.)
The text was updated successfully, but these errors were encountered: