Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to describe all Bio2RDF entities #7

Open
micheldumontier opened this issue Jun 14, 2014 · 11 comments
Open

Need to describe all Bio2RDF entities #7

micheldumontier opened this issue Jun 14, 2014 · 11 comments
Labels

Comments

@micheldumontier
Copy link
Member

Currently there is no support for entity resolution for arbitrary entities that appear in Bio2RDF datasets. Consider the statistics page for affymetrix probesets
http://download.bio2rdf.org/release/3/html/affymetrix.html

The first type http://bio2rdf.org/wormbase_vocabulary:Resource does not resolve even though WormBase is part of the Bio2RDF distribution.

The second type http://bio2rdf.org/uniprot_vocabulary:Resource does not resolve.

the web application should send out a federated query to SPARQL endpoints to gather triples to describe these terms.

@vemonet

@vemonet
Copy link
Member

vemonet commented Jun 17, 2014

bio2rdf.org now supporting every dataset that is part of the Bio2RDF release 2 or 3.

Actually working on bio2rdf.org resolving any arbitrary entity.

@fbelleau
Copy link

@micheldumontier @vemonet

I have deployed a first version of Talend implementation of the queryAll service based on the statistics of the release 3.

http://queryall.rest.bio2rdf.org/

The project in beta mode is here

https://github.com/fbelleau/bio2rdf-queryall

Michel infirm Vincent of the needed modification, then he will be possible to integrate the resolution of the 400 namespace URIs into Bio2RRDF main REST service.

@vemonet
Copy link
Member

vemonet commented Jul 1, 2014

For Uniprot you were trying to resolve http://bio2rdf.org/uniprot_vocabulary:Resource
But seems like you are using UniProt URI : http://purl.uniprot.org/core/Resource
http://uniprot.bio2rdf.org/describe/?url=http%3A%2F%2Fpurl.uniprot.org%2Fcore%2FResource&sid=99

I have begun to implement the "QueryAll" like service in beta.bio2rdf.org
For example try to resolve some URI :
curl -i -H 'Accept: application/rdf+xml' http://beta.bio2rdf.org/genbank:BC149752
curl -i -H 'Accept: application/rdf+xml' http://beta.bio2rdf.org/ec:2.7.1.1

There might be some problems with some namespaces, feel free to point it out !

And note that when asking html through content-negociation it returns only the Virtuoso fct describe page for the entity (if it exists)

@micheldumontier
Copy link
Member Author

Vincent,
the redirection to the uniprot endpoint is on the bio2rdf side (not mine). Still, asking for curl -i -H 'Accept: application/rdf+xml' http://bio2rdf.org/wormbase_vocabulary:Resource does not federate over the lsr endpoint, where there is an additional "owl:Class" type assertion.

@micheldumontier
Copy link
Member Author

What's the status on this item?

@vemonet
Copy link
Member

vemonet commented Jul 17, 2014

Update of the REST service : to resolve a bio2rdf URI the service send a federated query to every bio2rdf triplestore to describe this URI
Deployed on bio2rdf.org

But now the service don't redirect to the virtuoso page when html is asked

@vemonet
Copy link
Member

vemonet commented Jul 17, 2014

@vemonet
Copy link
Member

vemonet commented Jul 18, 2014

http://bio2rdf.org/hgnc:4945

Found a bug due to :

com.hp.hpl.jena.shared.BadURIException: Only well-formed absolute URIrefs can be included in RDF/XML output: <http://> Code: 57/REQUIRED_COMPONENT_MISSING in HOST: A component that is required by the scheme is missing.

This error happens only when Jena try to write the RDF in the RDF/XML format. No problem when it's the n-triple or turtle format.
I think that Jena is more strict with the URI formation when it writes in RDF/XML

@vemonet
Copy link
Member

vemonet commented Jul 18, 2014

The Jena error is triggered by this triple :

<http://bio2rdf.org/hgnc:4945> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://hgnc.bio2rdf.org> .

@vemonet
Copy link
Member

vemonet commented Jul 18, 2014

Found out where the problem comes from :
According to Jena http://hgnc.bio2rdf.org is not a valid URI
http://hgnc.bio2rdf.org/ is a valid URI.

@vemonet
Copy link
Member

vemonet commented Jul 18, 2014

It comes from uniprot endpoint
Note that I was using the latest version of Jena (http://mvnrepository.com/artifact/org.apache.jena/apache-jena-libs/2.11.2)
And here is the code I was using to convert the triple :

String rdfInput = "<http://bio2rdf.org/hgnc:4945> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://hgnc.bio2rdf.org/> .";
Reader reader = new StringReader(rdfInput);
Writer writer = new StringWriter();
Model model = ModelFactory.createDefaultModel();
model.read(reader,  "default", "N-TRIPLE"); 
model.write(writer, "RDF/XML"); 
System.out.println(writer.toString());

@vemonet vemonet removed their assignment May 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants