Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARQL documentation #153

Open
nataled opened this issue Jul 5, 2019 · 18 comments

Comments

Projects
None yet
2 participants
@nataled
Copy link
Collaborator

commented Jul 5, 2019

In looking to craft additional SPARQL queries, I noticed that the examples refer to things that are not documented within the "Properties defined in PRO" page (https://proconsortium.org/pro_doc.shtml). For example, query 13 refers to "?gn_id" and query 3 refers to "?mod" but neither of these are described on the indicated page. I understand that the page refers specifically to properties, but perhaps we can expand it? Even a list of possible fields would be a useful start. Can such be generated automatically?

@chumingc

This comment has been minimized.

Copy link
Collaborator

commented Jul 5, 2019

@nataled

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 5, 2019

When you say "all possible subjects and objects" do you refer to the precise strings? Because surely the number of fields that we can query on is not huge--it would be the same size even if PRO was (for example) one-tenth the current size. Other than by reviewing every possible sample query, how would a user know that they can look for ?glp (gene-level parent). Even after reviewing them all, the list would likely be incomplete.

@chumingc

This comment has been minimized.

Copy link
Collaborator

commented Jul 5, 2019

@nataled

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 5, 2019

What you are saying isn't fully correct. Even knowing that "PR:000049129 only_in_taxon NCBITaxon:9606" can be abstracted into a pattern (such as "?PRO_identifier only_in_taxon ?NCBITaxon_identifier"), the user would still need to know the precise variable names used in the subject and object. I know from the sample queries that what I called above "PRO_identifier" is actually called "PRO_term" but I shouldn't have to look at the samples to find this information, especially since the samples don't contain all the possible variables. The fact that these variable names are not documented hampers the ability to create new queries or customize the output. For example, let's say I want to add a column to the output that displays the taxon identifier. I can tell from the sample queries that in UniProt I'd use "?organism" to do it. But that same variable name doesn't work in PRO (nor does "?_organism" or "?org" or "?_org" or "?taxonomy" or "?NCBITaxon" etc etc). I also can tell that using "?pro_term" instead of "?PRO_term" will not work, so these variable names must be encoded somehow.

@chumingc

This comment has been minimized.

Copy link
Collaborator

commented Jul 5, 2019

@nataled

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 5, 2019

I think we aren't understanding each other. I know I can name a variable anything I want, and I know that it is just a placeholder. But once that variable name is decided, it cannot be called something else. So if I define variable "$name" in my program/script (using perl syntax just as example), and I want to display the contents of that variable, I cannot say "print $label" within that program and expect any result.

What I described in my previous reply was a real case: I took one of the sample queries, and tried to add a column to the result to display the organism. A very simple change, but I couldn't do it even knowing that there is definitely a field for this in PRO. But since I didn't know the name of the field/variable, I failed to get any information in the column. This is why the variable name documentation is needed.

@chumingc

This comment has been minimized.

Copy link
Collaborator

commented Jul 5, 2019

@nataled

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 5, 2019

You can't simply add a variable in the select clause, it needs to be somehow related to at least one of
the triples in the query.

Okay, good, we're getting closer to understanding each other. This is almost what I'm talking about. I say almost because I'm already able to see all of what you show above using DESCRIBE. Of concern to me are those things I cannot see. Sticking with taxonomy as an example, and using a species specific term, the following does not show that the indicated protein has some associated information about taxonomy, even though it for sure is in the OWL:

DESCRIBE http://purl.obolibrary.org/obo/PR_000049129

Now suppose I want to do the query that we have as sample query 1 on our SPARQL page, but I want to add a column for Taxon. As you said earlier, I can make up any variable name for the display simply by adding it to the SELECT command:

SELECT ?PRO_term ?Label ?Category ?Taxon

Of course, this by itself is not enough, because internally there is no such thing as ?Taxon. I'd have to BIND the actual variable name to that ?Taxon:

BIND(str(?_NCBITaxon) as ?Taxon)
         ^^^^^^^^^^^

The problem is that I have no idea what should be in place of ?_NCBITaxon to make the query work.

Now, if you are saying such information isn't in the RDF (which would be exceedingly surprising), that's one thing. So let's take another example mentioned before (?glp) which definitely is something that can be queried. If I'm not the person that designed the endpoint, and there were no sample queries provided, how would I go about finding out that ?glp has meaning?

@chumingc

This comment has been minimized.

Copy link
Collaborator

commented Jul 5, 2019

@nataled

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 5, 2019

So in order to find out the names of variables to use in a query, I can convert PRO to turtle, and examine the output? Seems very strange, but I can try it. Based on your comments, I would expect that I'd be able to reproduce your finding that (for example) ?_Category means the PRO category, while ?category does not. Is that correct?

@chumingc

This comment has been minimized.

Copy link
Collaborator

commented Jul 5, 2019

@nataled

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 5, 2019

This is obvious after the fact, but I want to know how you knew that ?_Category is part of the triple pattern. How did you know that you needed to say

?PRO_term rdfs:comment ?_Category .

instead of

?PRO_term rdfs:comment ?category .

Based on your previous response, it is indicated by the turtle view of the ontology, presumably there is some place that, using turtle, I'd somewhere see the string "_Category" but not see the string "category". I just want confirmation that I correctly understand what you're saying.

@chumingc

This comment has been minimized.

Copy link
Collaborator

commented Jul 5, 2019

@nataled

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 8, 2019

After playing around a bit, I now see what you're saying (maybe), in that it really isn't the name of the variable that's important, it's the field name (such as rdfs:comment or rdfs:subClassOf). I determined (for example) that I can get the definition using obo:IAO_0000115. I presume that querying on a field that occurs only once would work okay (but maybe not one, such as xref, that can occur multiple times; not sure). I therefore expected that I could query over taxonomy, since it is seen only once per term and because obo:RO_0002160 (only in taxon) is mentioned on that documentation page. Unfortunately, that wasn't the case. No matter what I try I cannot even display the taxon identifier.

@chumingc

This comment has been minimized.

Copy link
Collaborator

commented Jul 8, 2019

@nataled

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 8, 2019

My short-term goal is to simply display the taxon identifier (like "NCBITaxon:9606") for retrieved entries. So while the first query below works (to display the definition), the second does not, even though I only changed IAO_0000115 to RO_0002160. Note that all I'm doing is displaying the information, not filtering based on it.

PREFIX obo: http://purl.obolibrary.org/obo/
SELECT ?PRO_term ?Category ?TextOfInterest
FROM http://purl.obolibrary.org/obo/pr
WHERE
{
?PRO_term rdfs:comment ?_Category .
?PRO_term obo:IAO_0000115 ?_TextOfInterest .
FILTER (regex(?_Category,"Category=.*modification"))
BIND(strafter(strbefore(str(?_Category), "."), "=") as ?Category) .
BIND(str(?_TextOfInterest) as ?TextOfInterest) .
}

=======

PREFIX obo: http://purl.obolibrary.org/obo/
SELECT ?PRO_term ?Category ?TextOfInterest
FROM http://purl.obolibrary.org/obo/pr
WHERE
{
?PRO_term rdfs:comment ?_Category .
?PRO_term obo:RO_0002160 ?_TextOfInterest .
FILTER (regex(?_Category,"Category=.*modification"))
BIND(strafter(strbefore(str(?_Category), "."), "=") as ?Category) .
BIND(str(?_TextOfInterest) as ?TextOfInterest) .
}

1 similar comment
@nataled

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 9, 2019

My short-term goal is to simply display the taxon identifier (like "NCBITaxon:9606") for retrieved entries. So while the first query below works (to display the definition), the second does not, even though I only changed IAO_0000115 to RO_0002160. Note that all I'm doing is displaying the information, not filtering based on it.

PREFIX obo: http://purl.obolibrary.org/obo/
SELECT ?PRO_term ?Category ?TextOfInterest
FROM http://purl.obolibrary.org/obo/pr
WHERE
{
?PRO_term rdfs:comment ?_Category .
?PRO_term obo:IAO_0000115 ?_TextOfInterest .
FILTER (regex(?_Category,"Category=.*modification"))
BIND(strafter(strbefore(str(?_Category), "."), "=") as ?Category) .
BIND(str(?_TextOfInterest) as ?TextOfInterest) .
}

=======

PREFIX obo: http://purl.obolibrary.org/obo/
SELECT ?PRO_term ?Category ?TextOfInterest
FROM http://purl.obolibrary.org/obo/pr
WHERE
{
?PRO_term rdfs:comment ?_Category .
?PRO_term obo:RO_0002160 ?_TextOfInterest .
FILTER (regex(?_Category,"Category=.*modification"))
BIND(strafter(strbefore(str(?_Category), "."), "=") as ?Category) .
BIND(str(?_TextOfInterest) as ?TextOfInterest) .
}

@chumingc

This comment has been minimized.

Copy link
Collaborator

commented Jul 9, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.