LOV crawler & Vaadin update #93

chile12 · 2018-12-01T18:52:31Z

Updated Vaadin, since it caused multiple issues especially under windows.
Included an 'Ontology' option in the constraint selection, which will not skip owl, rdf, and rdfs constraints tests.
Made the excluded ontologies configurable, (by default it still excludes owl, rdf, rdfs), introducing an 'x' option in the CLI.
Revised the LOV endpoint class, and created basically a crawler, which searches actively for possible resource locations, I also updated the default schemaDecl.csv
I introduced a directoary containing standard vocabularies which are always loaded (based on vann properties (containing rut, shacl, and xsd as ontology with regex restrictions)
SchemaSource now collects all imports transitively, a special function now can also load the source with all its imports.
other minor stuff

How Has This Been Tested?

added a test for the LOV crawler (deactivated by default)
added xsd type tests
added a test for imported ontologies

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.
All new and existing tests passed.

7.7.1 causes this issue under Windows: https://dev.vaadin.com/ticket/20285

…tions), ontology option in UI

# Conflicts: # pom.xml # rdfunit-commons/pom.xml # rdfunit-core/pom.xml # rdfunit-core/src/test/resources/org/aksw/rdfunit/validate/data/owl/ontology.ttl # rdfunit-examples/pom.xml # rdfunit-io/pom.xml # rdfunit-junit/pom.xml # rdfunit-manual-tests/pom.xml # rdfunit-model/pom.xml # rdfunit-validate/pom.xml # rdfunit-w3c-dqv/pom.xml

coveralls · 2018-12-01T19:00:41Z

Coverage decreased (-2.2%) to 55.936% when pulling 9281c12 on fix/updateVaadin into d96c328 on master.

jimkont · 2018-12-03T08:10:34Z

rdfunit-commons/src/main/resources/org/aksw/rdfunit/configuration/autoGeneratorsOWL.ttl

-	                                    ?C1 rdf:type/rdfs:subClassOf* ?C2 .
-	                                    FILTER ( ?C2 IN (rdfs:Literal, rdf:langString, rdfs:Datatype, owl:DatatypeProperty))
+	                                    ?D1 rdf:type/rdfs:subClassOf* ?C3 .
+	                                    FILTER ( ?C3 IN (rdfs:Literal, rdf:langString, rdfs:Datatype, owl:DatatypeProperty))


good catch 👍

jimkont

overall this is a great contribution, thanks a lot @chile12 !!!

This PR covers a lot of features that almost all are good to go except from two remarks:

the test results need to be reverted back to LIst (from Set) for conformance to shacl and
the automatic following of owl:imports probably needs to be moved one lever higher to avoid possible additional test geneartion and execution time

rdfunit-core/src/test/java/org/aksw/rdfunit/validate/integration/OwlIntegrationTest.java

rdfunit-io/src/main/java/org/aksw/rdfunit/io/reader/RdfReaderFactory.java

rdfunit-model/src/main/java/org/aksw/rdfunit/model/impl/results/TestExecutionImpl.java

jimkont · 2018-12-04T05:25:36Z

rdfunit-core/src/main/java/org/aksw/rdfunit/prefix/LOVEndpoint.java

+     * @param redirects - list of established redirects which led to this point
+     * @return - the optional content location
+     */
+    private Optional<String> getContentLocation(String urlStr, SerializationFormat format, ArrayList<String> redirects){


The code here looks very useful, would worth considering reusing it in the rdfunit-io module / RdfReader and we re-use it all sort of dereferencing we do. I'll make an issue for tracking this after this get's merged

jimkont · 2018-12-04T05:36:09Z

rdfunit-core/src/main/java/org/aksw/rdfunit/sources/SchemaSource.java

 /**
 * @author Dimitris Kontokostas
 * @since 9/16/13 1:51 PM
 */
 @ToString
-@EqualsAndHashCode(exclude={"model", "schemaReader"})
+@EqualsAndHashCode(exclude={"model", "schemaReader", "imports"})


This change probably needs some more discussion. Initial versions of RDFUnit had transitive owl:imports closure computation through the Jena ontology model but the effect was kind of similar (here there is more control on what gets loaded though).

The issues I had and decided to revert this was that multiple sources could be re-importing the same imports and since the test generation is done on a SchmaSource level, we generate multiple copies of the same tests, which adds big delay in the test generation phase as well as on the resulting execution.

The other problem I had was big delays in resolving non-resolvable IRIs, but this is not a problem in this case since we reconcile with the predefined schemas

To tackle the first issue I would propose to, instead of merging the imported models here, we move this functionality outside of the class and generate additional SchemaSources that we can easier deduplicate. wdyt?

Yes, sounds good. I will create the subclass.

one approach would be to get all the schema sources that are defined explicitly (or the ones we got automatically).
We iterate over all of them and we get all transitive owl:imports from each schema source instance.
We gather all import IRIs into a set (for deduplication) and then reconcile them with the schema service.

at the end, we return the original schema sources along with the ones we computed as a new collection for input to RDFnit

This is basically already implemented, except that I load each unique import into one model.
So you would like to change the interface like this?

public Model getTransitiveModel() -> public Collection[SchemaSource] getTransitiveSchemaSources()

Where would use it then? Atm I only use it in TagRdfUnitTestGenerator...

I was thinking to use this a bit differently so that people use the owl:import computation on demand.

instead of creating a separate class that is fixed in the code, I was thinking of a (static) function that takes the original
SchemaSource list and returns an augmented list back and this function would be called in the RDFUnitConfiguration.getAllSchemata().

This way we could even have a parameter specifying if we want to compute the owl:imports or not in the future, or maybe allow to get sources even not defined in the SchemaService.

So, that function would look something like the following (not tested):

public static List<SchemaSource> augmentWithOwlImports(List<SchemaSource> originalSources) { ImmutableList.Builder<SchemaSource> augmentedSources = ImmutableList.builder(); augmentedSources.addAll(originalSources); Set<String> schemaIris = originalSources.stream().map(SchemaSource::getSchema).collect(Collectors.toSet()); Set<SchemaSource> currentSources = new HashSet<>(originalSources); while (!currentSources.isEmpty()) { Set<SchemaSource> computedSources = currentSources.stream() .map(SchemaSource::getModel) .flatMap(m -> m.listObjectsOfProperty(OWL.imports).toList().stream()) .filter(RDFNode::isResource) .map(RDFNode::asResource) .map(Resource::getURI) .filter(uri -> !schemaIris.contains(uri)) .map(SchemaService::getSourceFromUri) .filter(Optional::isPresent) .map(Optional::get) .collect(Collectors.toSet()); Set<String> newIris = computedSources.stream().map(SchemaSource::getSchema).collect(Collectors.toSet()); schemaIris.addAll(newIris); augmentedSources.addAll(computedSources); currentSources = computedSources; } return augmentedSources.build(); }

wdyt?

Implemented as stated (worked without changing a line 👍 ), introduced an -i option in the cli

jimkont · 2018-12-05T10:12:21Z

💯 thanks a lot @chile12

Markus Freudenberg added 13 commits November 5, 2018 13:44

Updating tp Vaadin 7.7.2

8c0beaf

7.7.1 causes this issue under Windows: https://dev.vaadin.com/ticket/20285

Providing new LOV crawler.

15adbe1

Providing new LOV crawler.

03dfb7c

Introducing standard vocabularies (including XSD with pattern restric…

e9be168

…tions), ontology option in UI

Implementing XSD typ autogenerated tests for owl/rdfs

94f0575

Intermediate Commit

dfc3a86

Post merge cleanup, minor bug in RDFSRANGENL test generator

2a9bbdb

Included multiple xsd datatype tests

97a9b3c

Some cleanup and tests

b0770f7

Some further cleanup

ab886c8

Further clean up

f15b344

Further clean up

9bbe82b

jimkont reviewed Dec 3, 2018

View reviewed changes

jimkont reviewed Dec 4, 2018

View reviewed changes

Markus Freudenberg added 3 commits December 4, 2018 17:41

Implementing suggestions from @jimkont

dd40f16

Implementing suggestions from @jimkont

5ab39e1

Implementing suggestions from @jimkont

9281c12

jimkont merged commit 05c1c8e into master Dec 5, 2018

jimkont deleted the fix/updateVaadin branch June 8, 2019 08:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LOV crawler & Vaadin update #93

LOV crawler & Vaadin update #93

chile12 commented Dec 1, 2018

coveralls commented Dec 1, 2018 •

edited

Loading

jimkont Dec 3, 2018

jimkont left a comment

jimkont Dec 4, 2018

jimkont Dec 4, 2018

chile12 Dec 4, 2018

jimkont Dec 4, 2018 •

edited

Loading

chile12 Dec 4, 2018 •

edited

Loading

jimkont Dec 5, 2018

chile12 Dec 5, 2018

jimkont commented Dec 5, 2018

LOV crawler & Vaadin update #93

LOV crawler & Vaadin update #93

Conversation

chile12 commented Dec 1, 2018

How Has This Been Tested?

Types of changes

Checklist:

coveralls commented Dec 1, 2018 • edited Loading

jimkont Dec 3, 2018

Choose a reason for hiding this comment

jimkont left a comment

Choose a reason for hiding this comment

jimkont Dec 4, 2018

Choose a reason for hiding this comment

jimkont Dec 4, 2018

Choose a reason for hiding this comment

chile12 Dec 4, 2018

Choose a reason for hiding this comment

jimkont Dec 4, 2018 • edited Loading

Choose a reason for hiding this comment

chile12 Dec 4, 2018 • edited Loading

Choose a reason for hiding this comment

jimkont Dec 5, 2018

Choose a reason for hiding this comment

chile12 Dec 5, 2018

Choose a reason for hiding this comment

jimkont commented Dec 5, 2018

coveralls commented Dec 1, 2018 •

edited

Loading

jimkont Dec 4, 2018 •

edited

Loading

chile12 Dec 4, 2018 •

edited

Loading