Skip to content
This repository has been archived by the owner on Dec 14, 2021. It is now read-only.

Convert ASGS data to use simplified/aligned version of the ontology #13

Closed
dr-shorthair opened this issue Dec 15, 2019 · 21 comments · Fixed by #14 or AGLDWG/asgs-ont#25
Closed

Convert ASGS data to use simplified/aligned version of the ontology #13

dr-shorthair opened this issue Dec 15, 2019 · 21 comments · Fixed by #14 or AGLDWG/asgs-ont#25
Assignees
Labels
enhancement New feature or request Priority: Medium
Projects

Comments

@dr-shorthair
Copy link

dr-shorthair commented Dec 15, 2019

https://github.com/CSIRO-enviro-informatics/loci.cat/wiki/Simplifying-the-initial-ontologies describes a simplification of the ASGS datasets to match a more unified Loc-I ontology pattern. The goal is to simplify/harmonize the SPARQL queries.

The transformations required are illustrated by-example as follows.

Original form:

<http://linked.data.gov.au/dataset/asgs2016/meshblock/20663970000>
  rdf:type asgs:MeshBlock ;
  rdf:type geo:Feature ;
  asgs:category "Primary Production" ;
  asgs:mbCode2016 "20663970000" ;
  geox:hasAreaM2 [
      data:value 58387600.000000007450580596923828125 ;
      ns21:crs <http://www.opengis.net/def/crs/EPSG/0/3577> ;
    ] ;
  geox:hasAreaM2 [
      data:value 95157257.606680378 ;
      ns21:crs <http://www.opengis.net/def/crs/EPSG/0/3857> ;
    ] ;
  reg:register <http://linked.data.gov.au/dataset/asgs2016/meshblock/> ;
  geo:hasGeometry [
      rdf:type geo:Geometry ;
      geo:asGML """<gml:MultiSurface ..."""^^geo:gmlLiteral ;
    ] ;
.

<http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel1/20503108801>
  rdf:type asgs:StatisticalAreaLevel1 ;
  rdf:type geo:Feature ;
  asgs:isStatisticalAreaLevel1Of <http://linked.data.gov.au/dataset/asgs2016/meshblock/20663970000> ;
  asgs:sa1Maincode2016 "20503108801" ;
  asgs:statisticalArea1Sa111DigitCode "20503108801" ;
.

Preferred form

  1. asgs:categorydcterms:type whose object is a URI denoting a concept
  2. asgs:mbCode2016 etc → dcterms:identifier with a specific literal datatype
  3. asgs:isStatisticalAreaLevel1Of etc → geo:sfContains and add matching geo:sfWithin for the inverse case
  4. reg:registerloci:isMemberOf and inverse
  5. type of geometry is explicit or geometry is externalized
<http://linked.data.gov.au/dataset/asgs2016/meshblock/20663970000>
  rdf:type asgs:MeshBlock ;
  rdf:type asgs:Feature ;
  rdf:type geo:Feature ;
  geox:hasAreaM2 [
      data:value 58387600.000000007450580596923828125 ;
      qb4st:crs <http://www.opengis.net/def/crs/EPSG/0/3577> ;
    ] ;
  geox:hasAreaM2 [
      data:value 95157257.606680378 ;
      qb4st:crs <http://www.opengis.net/def/crs/EPSG/0/3857> ;
    ] ;
  loci:isMemberOf <http://linked.data.gov.au/dataset/asgs2016/meshblock/> ;
  dcterms:identifier "20663970000"^^asgs-id:mbCode2016 ;
  dcterms:type asgs-cat:primary-production ;
  geo:hasGeometry [
      rdf:type sf:MultiSurface ;
      geo:asGML """<gml:MultiSurface ..."""^^geo:gmlLiteral ;
    ] ;
  geo:hasGeometry <http://gds.loci.cat/geometry/asgs16_mb/20663970000> ;
  geo:sfWithin <http://linked.data.gov.au/dataset/asgs2016/stateorterritory/2> ;
  geo:sfWithin <http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel1/20503108801> ;
.

<http://linked.data.gov.au/dataset/asgs2016/meshblock/>  a loci:Dataset , a rdf:Bag ; 
    rdfs:member <http://linked.data.gov.au/dataset/asgs2016/meshblock/20663970000> .

<http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel1/20503108801>
  a asgs:Feature ;
  a asgs:StatisticalAreaLevel1 ;
  a geo:Feature ;
  loci:isMemberOf <http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel1/> ;
  dcterms:identifier "20503108801"^^asgs-id:sa1Maincode2016 ;
  dcterms:identifier "20503108801"^^asgs-id:statisticalArea1Sa111DigitCode ;
  geo:hasGeometry <http://gds.loci.cat/geometry/asgs16_sa1/20503108801> ;
  geo:sfContains <http://linked.data.gov.au/dataset/asgs2016/meshblock/20663970000> ;
  geo:sfWithin <http://linked.data.gov.au/dataset/asgs2016/stateorterritory/2> ;
  geo:sfWithin <http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel2/205031088> ;
.

<http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel1/>  a loci:Dataset , a rdf:Bag ; 
    rdfs:member <http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel1/20503108801> .
@dr-shorthair
Copy link
Author

dr-shorthair commented Dec 19, 2019

Item 1.:

INSERT { ?m dcterms:type ?c . }
WHERE {
	?m a asgs:MeshBlock ; asgs:category ?C .
        BIND( IRI( concat( "http://linked.data.gov.au/def/asgs-cat/", LCASE(REPLACE( ?C , "[ /]" , "-" )))) as ?c )
}

DELETE{ ?m asgs:category ?C .}
WHERE { ?m a asgs:MeshBlock ; asgs:category ?C . }

@dr-shorthair
Copy link
Author

dr-shorthair commented Dec 19, 2019

Item 2.:

INSERT { ?m dcterms:identifier ?typedCode .}
WHERE {
    ?m a geo:Feature ; ?p ?code .
    FILTER ( CONTAINS ( lcase( str( ?p )) , "code" ) ) 
    BIND ( STRAFTER ( str( ?p ) , "http://linked.data.gov.au/def/asgs#" ) AS ?codeType )
    BIND ( IRI ( CONCAT ( "http://linked.data.gov.au/def/asgs/id#" , ?codeType ) ) AS ?codeDataType )
    BIND ( STRDT ( ?code , ?codeDataType ) AS ?typedCode )
}

DELETE { ?m ?p ?code .}
WHERE { 
    ?m a geo:Feature ; ?p ?code .
    FILTER ( CONTAINS ( lcase( str( ?p )) , "code" ) ) 
 }

INSERT { ?m dcterms:title ?typedName .}
WHERE {
    ?m a geo:Feature ; ?p ?name .
    FILTER ( CONTAINS ( lcase( str( ?p )) , "name" ) ) 
    BIND ( STRAFTER ( str( ?p ) , "http://linked.data.gov.au/def/asgs#" ) AS ?nameType )
    BIND ( IRI ( CONCAT ( "http://linked.data.gov.au/def/asgs/id#" , ?nameType ) ) AS ?nameDataType )
    BIND ( STRDT ( ?name , ?nameDataType ) AS ?typedName )
}

DELETE { ?m ?p ?name.}
WHERE { 
    ?m a geo:Feature ; ?p ?name.
    FILTER ( CONTAINS ( lcase( str( ?p )) , "name" ) ) 
 }

@dr-shorthair
Copy link
Author

dr-shorthair commented Dec 19, 2019

Item 3.:

INSERT { ?gf1 geo:sfContains ?gf2 . ?gf2 geo:sfWithin ?gf1 . }
WHERE {
    ?gf1 ?p ?gf2 .
    BIND ( STRAFTER ( str( ?p ) , "http://linked.data.gov.au/def/asgs#" ) AS ?pname )
    FILTER ( REGEX ( ?pname  , "^is[a-zA-Z0-9]+Of" ) ) 
}

DELETE { ?gf1 ?p ?gf2 . }
WHERE {
    ?gf1 ?p ?gf2 .
    BIND ( STRAFTER ( str( ?p ) , "http://linked.data.gov.au/def/asgs#" ) AS ?pname )
    FILTER ( REGEX ( ?pname  , "^is[a-zA-Z0-9]+Of" ) ) 
}

@dr-shorthair
Copy link
Author

dr-shorthair commented Dec 19, 2019

Item 4.:

INSERT { 
    ?f loci:isMemberOf ?r . 
    ?r a rdf:Bag , loci:Dataset ; rdfs:member ?f . 
}
WHERE { ?f reg:register ?r . }

DELETE { ?f reg:register ?r . }
WHERE { ?f reg:register ?r . }

@dr-shorthair
Copy link
Author

dr-shorthair commented Dec 20, 2019

Item 0.: (ensure that every individual feature is explicitly typed as a geo:Feature and asgs:Feature)

INSERT { ?gf a geo:Feature . }
WHERE {
	?gf a [ rdfs:subClassOf+ geo:Feature ; ] .
}

INSERT { ?gf a asgs:Feature . }
WHERE {
	?gf a [ rdfs:subClassOf+ asgs:Feature ; ] .
}

@dr-shorthair
Copy link
Author

Item 5.:

INSERT { ?gf geo:hasGeometry ?gg . }
WHERE { 
     ?gf a asgs:Feature .  
     OPTIONAL { ?gf a asgs:StateOrTerritory . BIND( IRI( CONCAT( "http://gds.loci.cat/geometry/asgs16_ste/" , REPLACE( str( ?gf), '^.*(#|/)', "" ))) AS ?gg ) }
     OPTIONAL { ?gf a asgs:StatisticalAreaLevel4 . BIND( IRI( CONCAT( "http://gds.loci.cat/geometry/asgs16_sa4/" , REPLACE( str( ?gf), '^.*(#|/)', "" ))) AS ?gg ) }
     OPTIONAL { ?gf a asgs:StatisticalAreaLevel3 . BIND( IRI( CONCAT( "http://gds.loci.cat/geometry/asgs16_sa3/" , REPLACE( str( ?gf), '^.*(#|/)', "" ))) AS ?gg ) }
     OPTIONAL { ?gf a asgs:StatisticalAreaLevel2 . BIND( IRI( CONCAT( "http://gds.loci.cat/geometry/asgs16_sa2/" , REPLACE( str( ?gf), '^.*(#|/)', "" ))) AS ?gg ) }
     OPTIONAL { ?gf a asgs:StatisticalAreaLevel1 . BIND( IRI( CONCAT( "http://gds.loci.cat/geometry/asgs16_sa1/" , REPLACE( str( ?gf), '^.*(#|/)', "" ))) AS ?gg ) }
     OPTIONAL { ?gf a asgs:MeshBlock . BIND( IRI( CONCAT( "http://gds.loci.cat/geometry/asgs16_mb/" , REPLACE( str( ?gf), '^.*(#|/)', "" ))) AS ?gg ) }
}

@ashleysommer
Copy link
Contributor

@dr-shorthair @jyucsiro
Same question as I had on the GNAF repository...
Removing the ASGS-ontology predicates such as asgs:category, asgs:mbCode2016, asgs:isStatisticalAreaLevel1Of, etc, and replacing them with simplified/harmonized predicates means the 'asgs' profile/view is no longer aligned with the ASGS ontology. Should these changes be implemented in a different profile? Should we have keep the current representation as the 'asgs' profile and build a new 'loci' profile with these changes?

@jyucsiro
Copy link

jyucsiro commented Jan 7, 2020

@ashleysommer Simon made changes to the ASGS ontology that clarified the subProperty hierarchy for categories and codes. This allows backward compatibility of the current method mapping.

The simplified/harmonized predicates uses less of the asgs: (now) specialised predicates. Still aligned but the "preferred" approach just uses the parent property, rather than the specialised predicates...

Should we have keep the current representation as the 'asgs' profile and build a new 'loci' profile with these changes?

I think having both asgs and loci profiles would be a sensible approach at this time. We're still evaluating whether the asgs profile would be useful in an ongoing way or whether we just run with loci. Would we be able to have both for now?

@jyucsiro
Copy link

jyucsiro commented Jan 7, 2020

Crossreferencing #13 (comment)

I can see what you mean now. I think having 2 profiles - loci and asgs would be the approach. Otherwise, the alternative is to duplicate or reason for the category/code predicates. The intention of the loci profile was to reuse predicates from well-known ontologies as much as possible.

@ashleysommer
Copy link
Contributor

ashleysommer commented Jan 10, 2020

Another observation:
@dr-shorthair I see you've taken the approach of turning category name into a codelist item.
ie. category label "Primary Production" -> <asgs-cat/primary-production>.
That may not be necessary.

In the raw ASGS data, the Meshlocks have a category_name (string) and category (integer).
category_name is what we currently use for the category label in the ASGS RDF view, however the category from the raw data is an integer and is currently not used in any rdf representation.
So when simplifying this representation we probably use the category integer as the identifier for the codelist item.

@dr-shorthair
Copy link
Author

In general it is expected that all classifications and code-lists be published as web resources, so that
(a) they can be used more broadly, and
(b) the definitions obtained as-needed by dereferencing the URI.

I don't care what token is used for the local name, but the general principle is that the category is denoted by a URI.

@dr-shorthair
Copy link
Author

What is the " 'asgs' profile/view " ?
As @jyucsiro notes, the goal is to have a loc-i view as primary, with more refined views as minor elaborations.

@ashleysommer
Copy link
Contributor

@dr-shorthair
When I talk about "view", I am referring to choosing a profile using the content-negotiation-by-profile feature in PyLDAPI. It means a single feature can have multiple different representations, depending on the profile you choose. You can see them listed in the 'alternates view' like here.

When resolving the resource URI, if you don't explicitly choose a profile (using the "?_view=" query param) then you will get a default view, in our ASGS pyldapi deployment we have a default view called "asgs" in which the WFS feature is mapped to / aligned to the ASGS ontology. It looks like the "Original Form" snippet you have above.

@dr-shorthair
Copy link
Author

The goal of the 'simplification' was to make querying across datasets easier by replacing some properties that had been created in new namespaces with properties from standard namespaces. The SPARQL queries should be more portable across datasets. I think I managed to do this without any loss of information. It did involve some additional datatypes and controlled vocabularies, but the similarities between the primary dataset structures are more obvious.

@ashleysommer
Copy link
Contributor

@dr-shorthair I'm not disputing that, it is a good change.
I'm saying in order to implement these changes, I'm introducing a 'loci' view, which will be the default profile, it will contain these changes, while leaving the 'asgs' view untouched and fully aligned with the ASGS ontology.

@ashleysommer ashleysommer mentioned this issue Jan 13, 2020
@dr-shorthair
Copy link
Author

dr-shorthair commented Jan 14, 2020

@ashleysommer note that "the ASGS ontology" has been modified and streamlined. It has been refactored into multiple graphs (files), with some of these tagged owl:deprecated true in particular

https://github.com/AGLDWG/asgs-ont/blob/master/asgs-path.ttl is not explicitly deprecated, but is maintained in a separate graph as its capabilities are not currently used in any data that we have access to.

@ashleysommer
Copy link
Contributor

ashleysommer commented Jan 14, 2020

@dr-shorthair oh, I see what you mean now.

So the old original 'asgs' view (with the asgs:mbCode2016 and asgs:statisticalAreaLevel1Of etc) is now not needed at all in our pyldapi deployment of the ASGS dataset?

Are the ABS guys (ie, Laurent) across the ontology changes and approve of them?

I was under the impression there were people in ABS using this pyldapi implementation (either our deployment, or their own instance) and relying on the original ontology predicates.

@dr-shorthair
Copy link
Author

I contacted Laurent to verify if it was OK to make changes and he concurred.
I'm being pretty careful to document them well, and not to throw anything away, just mark it 'deprecated'. AFAICT we are the only people maintaining an active deployment. The plan would be to hand it over to them, but there is nothing currently depending on it.

@dr-shorthair
Copy link
Author

@ashleysommer ashleysommer reopened this Jan 14, 2020
loci-kanban automation moved this from Closed to In progress Jan 14, 2020
@jyucsiro
Copy link

@ashleysommer yep - I don't see a need for an asgs view given changes that @dr-shorthair made to simplify the asgs/loci view, unless there is something I'm missing or a feature others would want.

on the abs-structures vs non-abs-structures, we don't need to tackle the non-abs-structures yet, but if it is simple to do, then it is a nice-to-have. the abs structures are a must have for our next release - these are:

  • Urban Centres and Localities (UCLs), Section of State Structures (SOS) and Section of State Range (SOSR) Structures
  • Remoteness Areas (RAs)
  • Indigenous Structure - ILOC, IARE, IREG
  • SUA, GCCSA

loci-kanban automation moved this from In progress to Closed Jan 29, 2020
@dr-shorthair
Copy link
Author

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request Priority: Medium
Projects
loci-kanban
  
Done
5 participants