Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev-zaha-icl #1190

Draft
wants to merge 437 commits into
base: main
Choose a base branch
from
Draft

dev-zaha-icl #1190

wants to merge 437 commits into from

Conversation

picas9dan
Copy link
Contributor

No description provided.

@picas9dan picas9dan requested a review from gpeb2 May 7, 2024 11:54
Copy link
Collaborator

@gpeb2 gpeb2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few comments, feel free to reply here or we can set up a Skype of Teams call if required.

SELECT * WHERE {{
?Carpark a carpark:Carpark; rdfs:label ?Label .
SERVICE <http://sg-ontop:8080/sparql> {{
SELECT (geof:distance(?Coords, "<http://www.opengis.net/def/crs/OGC/1.3/CRS84> POINT({lon} {lat})"^^geo:wktLiteral, <http://www.opengis.net/def/uom/OGC/1.0/metre>) as ?Distance) ?Carpark ?Coords
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you didn't know <http://www.opengis.net/def/crs/OGC/1.3/CRS84> is the default CRS with GeoSPARQL so shouldn't be needed here. Saying that, versions of Ontop before version 5, which the stack now uses, might have needed this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kok-foong, could you help confirm if <http://www.opengis.net/def/crs/OGC/1.3/CRS84> is needed here for the query that finds the carpark nearest to a give set of coordinates, in view of the current setup and future compatibility on your end?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kok-foong, please check my recommendation in this comment.

"http://www.theworldavatar.com/kg/",
"https://www.theworldavatar.com/kg/",
]
WKT_LITERAL_PREFIX = "<http://www.opengis.net/def/crs/OGC/1.3/CRS84> "
Copy link
Collaborator

@gpeb2 gpeb2 May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linked to the other comment, but be aware that this is not the only CRS that WKT literals can come in, although I would generally encourage using this one when possible as it is the default, is effectively the same as EPSG:4326, and can be omitted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specification for WKT literals that I can refer to? So that I can safely parse WKT literals into approprate data objects. Or is there a library that I can use for this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the GeoSPARQL standard doesn't actually support CRSs and officially assumes that all WKT literals are in CRS84. Ontop extends the official standard by allowing an optional CRS IRI to be appended to the start of the WKT literal. An example of some of these can be seen in this OBDA file.

I'd probably recommend forcing developers to reproject their non-EPSG:4326 geometries to EPSG:4326 in the source query of their OBDA mappings using the PostGIS ST_Transform function and not specifying the CRS IRIs in the WKTLiterals. This will save confusion as Ontop is not able to perform operations on geometries in different projections and you won't need to parse out the CRS IRIs.

Copy link
Contributor Author

@picas9dan picas9dan May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the projection to EPSG:4326 and removal of CRS IRI from WKTLiteral a to-do item that you would suggest to be done asap, following which Zaha should not check for CRS IRI and it should accept the WKTLiteral verbatim? Then I'll need to coordinate with Kok Foong to make the change.

Otherwise, the current code works for both scenarios when <http://www.opengis.net/def/crs/OGC/1.3/CRS84> is prepended and when it is not, so no changes are needed (unless the code must also anticipate other CRS IRIs).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the code as it is is fine for now, I just wanted to make sure you were aware of the potential issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kok-foong pointed me to Jena library for parsing WKT literal, specifically the method WKTReader::extract. Referencing the Jena implementation, I wrote a simple method to parse WKT literals.

If the SRS URI is found to be CRS84, then the WKT will be visualised. Otherwise, it will be presented verbatim in a table. The code for this handling is here.

Comment on lines 109 to 111
# TODO: remove custom replacement of VALUES clause with FILTER
# once Karthik fixes the bug that causes ontop to throw error
# on VALUES of facility IRIs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the error you are seeing here?
I know that Ontop version 4 didn't handle VALUES efficiently but the stack now uses version 5 which does.

I also notice that you are working with IRIs of the form <Type:...> and I'm not sure that's a good idea. According to the URI/IRI specifications the first part of a URL/IRI (before the first ":") should be its schema, generally taken from the official list of schemas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the error that I'm seeing. I only encounter it when I use VALUES clause on IRIs of data centres and factories. I have asked @nagarajankarthik about this. I'm not sure what the timeline for fixing this issue is.
Screenshot 2024-05-07 at 8 49 58 PM

For reference, here is the SPARQL query

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX om: <http://www.ontology-of-units-of-measure.org/resource/om-2/>
PREFIX ontocompany: <http://www.theworldavatar.com/kg/ontocompany/>

SELECT DISTINCT ?IRI ?GeneratedHeatValue ?GeneratedHeatUnit WHERE {
  VALUES ?IRI { <https://www.theworldavatar.com/kg/infrastructure/datacenter/c84c1d32-69de-508f-a419-83096735aadd> <https://www.theworldavatar.com/kg/infrastructure/datacenter/7f8a8c48-516b-5757-a332-2a264052bdf8> }
  ?IRI ontocompany:hasGeneratedHeat/om:hasValue ?GeneratedHeat .
  ?GeneratedHeat om:hasNumericalValue ?GeneratedHeatValue .
  OPTIONAL { ?GeneratedHeat om:hasUnit/skos:notation ?GeneratedHeatUnit . }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding my use of <Type:"surface form">. The purpose of this is to serve as a marker for entities that need to be linked to exact IRIs and it is not meant to be a faithful representation of an URI/IRI as per general specifications.

In literature, I have seen people propose special tokens for this purpose e.g. <entity><type>Facility</type><label>ExxonMobil</label></entity>. However, I find this too verbose.

If you think that it is preferable to use special tokens in this case, I'll make the change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the error that I'm seeing. I only encounter it when I use VALUES clause on IRIs of data centres and factories. I have asked @nagarajankarthik about this. I'm not sure what the timeline for fixing this issue is.

It look like the issue is that the UUIDs are being stored using the uuid type in Postgres but Ontop is trying to compare them to the last component of the IRIs, which are treated as strings. The solution is to cast the uuids to TEXT in the OBDA source query, using something like:
uuid::TEXT

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding my use of <Type:"surface form">. The purpose of this is to serve as a marker for entities that need to be linked to exact IRIs and it is not meant to be a faithful representation of an URI/IRI as per general specifications.

I don't fully understand the interplay between your code, Redis and the triples but what you say makes enough sense so its probably ok.


SELECT * WHERE {{
?Carpark a carpark:Carpark; rdfs:label ?Label .
SERVICE <http://sg-ontop:8080/sparql> {{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a comment, don't think it's a good idea to hardcode this, if we spin up the stack with a different stack name this breaks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the endpoint http://sg-ontop:8080/sparql (which I suppose is referencing the local network of the SG stack) the same as http://174.138.23.221:3838/ontop/ui/sparql? I already configure the latter as an environment variable, so if the 2 endpoints are the same, then I'll interpolate the query string with the latter. Otherwise, I'll define a new environment variable for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they are not the same, one is for accessing within the stack and one is from outside

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible solution is to have an access agent within the stack which you send queries to, instead of sending it directly to blazegraph which will figure out the correct ontop endpoint

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the query with http://sg-ontop:8080/sparql and http://174.138.23.221:3838/ontop/ui/sparql and it seems that they give the same results/behaviours? Does the difference lie only in speed? In other words, is the use of either endpoint correct?

A possible solution is to have an access agent within the stack which you send queries to, instead of sending it directly to blazegraph which will figure out the correct ontop endpoint

Yes, this will ensure that Zaha doesn't need to know the internal endpoints of the SG stack.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the query with http://sg-ontop:8080/sparql and http://174.138.23.221:3838/ontop/ui/sparql

When you use the IP address Blazegraph will be sending the query from the outside world instead of communicating within the stack

Copy link
Contributor Author

@picas9dan picas9dan May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible solution is to have an access agent within the stack which you send queries to, instead of sending it directly to blazegraph which will figure out the correct ontop endpoint

Will this be something that you plan to implement within next week or so?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible solution is to have an access agent within the stack which you send queries to, instead of sending it directly to blazegraph which will figure out the correct ontop endpoint

Will this be something that you plan to implement within next week or so?

It's not straightforward to implement this for the Sg stack/dataset, can explain more to you in person

Copy link
Contributor Author

@picas9dan picas9dan May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is not in your work plan in the next 1-2 weeks then for immediate action I'll keep the same query and use a environment variable for http://sg-ontop:8080/sparql.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants