-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LOV crawler & Vaadin update #93
Conversation
7.7.1 causes this issue under Windows: https://dev.vaadin.com/ticket/20285
…tions), ontology option in UI
# Conflicts: # pom.xml # rdfunit-commons/pom.xml # rdfunit-core/pom.xml # rdfunit-core/src/test/resources/org/aksw/rdfunit/validate/data/owl/ontology.ttl # rdfunit-examples/pom.xml # rdfunit-io/pom.xml # rdfunit-junit/pom.xml # rdfunit-manual-tests/pom.xml # rdfunit-model/pom.xml # rdfunit-validate/pom.xml # rdfunit-w3c-dqv/pom.xml
?C1 rdf:type/rdfs:subClassOf* ?C2 . | ||
FILTER ( ?C2 IN (rdfs:Literal, rdf:langString, rdfs:Datatype, owl:DatatypeProperty)) | ||
?D1 rdf:type/rdfs:subClassOf* ?C3 . | ||
FILTER ( ?C3 IN (rdfs:Literal, rdf:langString, rdfs:Datatype, owl:DatatypeProperty)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall this is a great contribution, thanks a lot @chile12 !!!
This PR covers a lot of features that almost all are good to go except from two remarks:
- the test results need to be reverted back to LIst (from Set) for conformance to shacl and
- the automatic following of
owl:imports
probably needs to be moved one lever higher to avoid possible additional test geneartion and execution time
rdfunit-core/src/test/java/org/aksw/rdfunit/validate/integration/OwlIntegrationTest.java
Show resolved
Hide resolved
rdfunit-io/src/main/java/org/aksw/rdfunit/io/reader/RdfReaderFactory.java
Show resolved
Hide resolved
rdfunit-model/src/main/java/org/aksw/rdfunit/model/impl/results/TestExecutionImpl.java
Outdated
Show resolved
Hide resolved
* @param redirects - list of established redirects which led to this point | ||
* @return - the optional content location | ||
*/ | ||
private Optional<String> getContentLocation(String urlStr, SerializationFormat format, ArrayList<String> redirects){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here looks very useful, would worth considering reusing it in the rdfunit-io
module / RdfReader
and we re-use it all sort of dereferencing we do. I'll make an issue for tracking this after this get's merged
/** | ||
* @author Dimitris Kontokostas | ||
* @since 9/16/13 1:51 PM | ||
*/ | ||
@ToString | ||
@EqualsAndHashCode(exclude={"model", "schemaReader"}) | ||
@EqualsAndHashCode(exclude={"model", "schemaReader", "imports"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change probably needs some more discussion. Initial versions of RDFUnit had transitive owl:imports
closure computation through the Jena ontology model but the effect was kind of similar (here there is more control on what gets loaded though).
The issues I had and decided to revert this was that multiple sources could be re-importing the same imports and since the test generation is done on a SchmaSource level, we generate multiple copies of the same tests, which adds big delay in the test generation phase as well as on the resulting execution.
The other problem I had was big delays in resolving non-resolvable IRIs, but this is not a problem in this case since we reconcile with the predefined schemas
To tackle the first issue I would propose to, instead of merging the imported models here, we move this functionality outside of the class and generate additional SchemaSources that we can easier deduplicate. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sounds good. I will create the subclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one approach would be to get all the schema sources that are defined explicitly (or the ones we got automatically).
We iterate over all of them and we get all transitive owl:imports
from each schema source instance.
We gather all import IRIs into a set (for deduplication) and then reconcile them with the schema service.
at the end, we return the original schema sources along with the ones we computed as a new collection for input to RDFnit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically already implemented, except that I load each unique import into one model.
So you would like to change the interface like this?
public Model getTransitiveModel() -> public Collection[SchemaSource] getTransitiveSchemaSources()
Where would use it then? Atm I only use it in TagRdfUnitTestGenerator...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking to use this a bit differently so that people use the owl:import computation on demand.
instead of creating a separate class that is fixed in the code, I was thinking of a (static) function that takes the original
SchemaSource list and returns an augmented list back and this function would be called in the RDFUnitConfiguration.getAllSchemata()
.
This way we could even have a parameter specifying if we want to compute the owl:imports or not in the future, or maybe allow to get sources even not defined in the SchemaService.
So, that function would look something like the following (not tested):
public static List<SchemaSource> augmentWithOwlImports(List<SchemaSource> originalSources) {
ImmutableList.Builder<SchemaSource> augmentedSources = ImmutableList.builder();
augmentedSources.addAll(originalSources);
Set<String> schemaIris = originalSources.stream().map(SchemaSource::getSchema).collect(Collectors.toSet());
Set<SchemaSource> currentSources = new HashSet<>(originalSources);
while (!currentSources.isEmpty()) {
Set<SchemaSource> computedSources = currentSources.stream()
.map(SchemaSource::getModel)
.flatMap(m -> m.listObjectsOfProperty(OWL.imports).toList().stream())
.filter(RDFNode::isResource)
.map(RDFNode::asResource)
.map(Resource::getURI)
.filter(uri -> !schemaIris.contains(uri))
.map(SchemaService::getSourceFromUri)
.filter(Optional::isPresent)
.map(Optional::get)
.collect(Collectors.toSet());
Set<String> newIris = computedSources.stream().map(SchemaSource::getSchema).collect(Collectors.toSet());
schemaIris.addAll(newIris);
augmentedSources.addAll(computedSources);
currentSources = computedSources;
}
return augmentedSources.build();
}
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented as stated (worked without changing a line 👍 ), introduced an -i option in the cli
💯 thanks a lot @chile12 |
How Has This Been Tested?
Types of changes
Checklist: