-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Federated query optimization #152
Comments
Hi,
The test of existence of join between two properties can be bypassed using the property below :
# source selection with bind (exists {t1 . t2} as ?b_i)
FEDERATE_JOIN = false
put this in a file with name corese.properties
run corese with the -init option :
java corese.jar -init corese.properties
Best
Olivier
… De: "elordis" ***@***.***>
À: "Wimmics" ***@***.***>
Cc: "Subscribed" ***@***.***>
Envoyé: Mercredi 1 Novembre 2023 07:22:04
Objet: [Wimmics/corese] Federated query optimization (Issue #152)
Hello.
We are using Corese to federate two of our SPARQL endpoints. One contains main
graph, other is a graph view over a relational database with operational data.
We want to do a graph search like this:
SELECT ?DeviceRoot_id ?DeviceRoot_DeviceHasNetworkInstance_id
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingMac
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingInterface
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingVlan
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingEntryType
WHERE {
?df_7_0 <http://example.net/device.owl#DeviceRoot> ?DeviceRoot_id. # Main graph
?DeviceRoot_id <http://example.net/device.owl#DeviceHasNetworkInstance>
?DeviceRoot_DeviceHasNetworkInstance_id. # Main graph
?DeviceRoot_DeviceHasNetworkInstance_id
<http://example.net/device.owl#NetworkInstanceHasFDB>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB. # Main graph
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB
<http://example.net/device.owl#FDBHasMacTable>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable. #
Main graph
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#hasMac> "01:00:0c:cc:cc:cc"; # RDB graph
<http://example.net/device.owl#matchingMac>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingMac;
# RDB graph
<http://example.net/device.owl#matchingInterface>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingInterface;
# RDB graph
<http://example.net/device.owl#matchingVlan>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingVlan;
# RDB graph
<http://example.net/device.owl#matchingEntryType>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingEntryType.
# RDB graph
}
On sample data (few thousand triplets) everything works fine, but on real
datasets performance looks very bad.
From what I've been able to dug up, the main culprit is query that Corese uses
to extract data:
select *
where {
bind (exists {select *
where {
?s <http://example.net/device.owl#DeviceRoot> ?o .
}
limit 1 } as ?b0)
bind (exists {select *
where {
?s <http://example.net/device.owl#DeviceHasNetworkInstance> ?o .
}
limit 1 } as ?b1)
bind (exists {select *
where {
?s <http://example.net/device.owl#NetworkInstanceHasFDB> ?o .
}
limit 1 } as ?b2)
bind (exists {select *
where {
?s <http://example.net/device.owl#FDBHasMacTable> ?o .
}
limit 1 } as ?b3)
bind (exists {select *
where {
?s <http://example.net/device.owl#hasMac> ?o .
}
limit 1 } as ?b4)
bind (exists {select *
where {
?s <http://example.net/device.owl#matchingMac> ?o .
}
limit 1 } as ?b5)
bind (exists {select *
where {
?s <http://example.net/device.owl#matchingInterface> ?o .
}
limit 1 } as ?b6)
bind (exists {select *
where {
?s <http://example.net/device.owl#matchingVlan> ?o .
}
limit 1 } as ?b7)
bind (exists {select *
where {
?s <http://example.net/device.owl#matchingEntryType> ?o .
}
limit 1 } as ?b8)
bind (exists {select *
where {
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#hasMac> '01:00:0c:cc:cc:cc' .
}
limit 1 } as ?b9)
bind (exists {select *
where {
?df_7_6 <http://example.net/device.owl#DeviceRoot> ?DeviceRoot_id .
?DeviceRoot_id <http://example.net/device.owl#DeviceHasNetworkInstance>
?DeviceRoot_DeviceHasNetworkInstance_id .
}
limit 1 } as ?b10)
bind (exists {select *
where {
?DeviceRoot_id <http://example.net/device.owl#DeviceHasNetworkInstance>
?DeviceRoot_DeviceHasNetworkInstance_id .
?DeviceRoot_DeviceHasNetworkInstance_id
<http://example.net/device.owl#NetworkInstanceHasFDB>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB .
}
limit 1 } as ?b11)
bind (exists {select *
where {
?DeviceRoot_DeviceHasNetworkInstance_id
<http://example.net/device.owl#NetworkInstanceHasFDB>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB .
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB
<http://example.net/device.owl#FDBHasMacTable>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable .
}
limit 1 } as ?b12)
bind (exists {select *
where {
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB
<http://example.net/device.owl#FDBHasMacTable>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable .
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#hasMac> '01:00:0c:cc:cc:cc' .
}
limit 1 } as ?b13)
bind (exists {select *
where {
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB
<http://example.net/device.owl#FDBHasMacTable>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable .
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#matchingMac>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingMac
.
}
limit 1 } as ?b14)
bind (exists {select *
where {
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB
<http://example.net/device.owl#FDBHasMacTable>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable .
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#matchingInterface>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingInterface
.
}
limit 1 } as ?b15)
bind (exists {select *
where {
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB
<http://example.net/device.owl#FDBHasMacTable>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable .
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#matchingVlan>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingVlan
.
}
limit 1 } as ?b16)
bind (exists {select *
where {
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB
<http://example.net/device.owl#FDBHasMacTable>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable .
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#matchingEntryType>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingEntryType
.
}
limit 1 } as ?b17)
bind (exists {select *
where {
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#hasMac> '01:00:0c:cc:cc:cc' .
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#matchingMac>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingMac
.
}
limit 1 } as ?b18)
bind (exists {select *
where {
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#hasMac> '01:00:0c:cc:cc:cc' .
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#matchingInterface>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingInterface
.
}
limit 1 } as ?b19)
bind (exists {select *
where {
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#hasMac> '01:00:0c:cc:cc:cc' .
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#matchingVlan>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingVlan
.
}
limit 1 } as ?b20)
bind (exists {select *
where {
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#hasMac> '01:00:0c:cc:cc:cc' .
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#matchingEntryType>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingEntryType
.
}
limit 1 } as ?b21)
bind (exists {select *
where {
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#matchingMac>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingMac
.
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#matchingInterface>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingInterface
.
}
limit 1 } as ?b22)
bind (exists {select *
where {
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#matchingMac>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingMac
.
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#matchingVlan>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingVlan
.
}
limit 1 } as ?b23)
bind (exists {select *
where {
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#matchingMac>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingMac
.
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#matchingEntryType>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingEntryType
.
}
limit 1 } as ?b24)
bind (exists {select *
where {
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#matchingInterface>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingInterface
.
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#matchingVlan>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingVlan
.
}
limit 1 } as ?b25)
bind (exists {select *
where {
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#matchingInterface>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingInterface
.
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#matchingEntryType>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingEntryType
.
}
limit 1 } as ?b26)
bind (exists {select *
where {
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#matchingVlan>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingVlan
.
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable
<http://example.net/device.owl#matchingEntryType>
?DeviceRoot_DeviceHasNetworkInstance_NetworkInstanceHasFDB_FDBHasMacTable_matchingEntryType
.
}
limit 1 } as ?b27)
}
limit 1000
It looks like the query leads to combinatorial explosion that just hangs up our
graph storage.
Are there any workarounds to make things more performant?
—
Reply to this email directly, [ #152 |
view it on GitHub ] , or [
https://github.com/notifications/unsubscribe-auth/ABKXNJCVDEMIAPL27WBXD33YCHTAZAVCNFSM6AAAAAA6Y2AJXSVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3TCNZXHA2DSOA
| unsubscribe ] .
You are receiving this because you are subscribed to this thread. Message ID:
***@***.***>
|
Thank you. Things have improved by a lot. Performance is still not good on heavy queries but at least things should work in most cases.
Is it possible to increase "chunk size" for those |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello.
We are using Corese to federate two of our SPARQL endpoints. One contains main graph, other is a graph view over a relational database with operational data. We want to do a graph search like this:
On sample data (few thousand triplets) everything works fine, but on real datasets performance looks very bad.
From what I've been able to dug up, the main culprit is query that Corese uses to extract data:
It looks like the query leads to combinatorial explosion that just hangs up our graph storage.
Are there any workarounds to make things more performant?
The text was updated successfully, but these errors were encountered: