-
Notifications
You must be signed in to change notification settings - Fork 1
4. FedShop Data Generator
The FedShop Data Generator consists of three WatDiv template models located in experiments/bsbm/model, which adhere closely to the BSBM specification. The use of WatDiv models makes it easy to modify the schema through the configuration file experiments/bsbm/config.yaml
.
The majority of FedShop's parameters are defined in experiments/bsbm/config.yaml
, including the number of products to generate, as well as the number of vendors and rating sites to be included.
- Create the catalogue of products (200000 by default)
- Batch(0)= Create 10 autonomous vendors and 10 autonomous rating sites sharing products from the catalogue (products are replicated with local URL per vendors and rating sites). The distribution law can be controlled with parameters declared in
experiments/bsbm/config.yaml
- Workload = Instantiate the 12 template queries with 10 different random place-holders, such that each query return results.
- Compute the minimal source assignment of each of the 120 queries of the Workload on Batch(0)
- For
i
from 1 to 9-
Batch(i)
=Batch(i-1)
+ 10 new vendors + 10 rating sites - Compute the minimal source assignment for each query of the Workload over Batch(I)
-
In this section, we'll provide you with a step-by-step guide to generating data for the benchmark. By following these instructions, you'll be able to generate the necessary data and obtain the results in experiments/bsbm/model/dataset
.
- Write a template configuration file for WatDiv by following their instructions:
Please note that in the examples below, we use
{%component_name}
to expose components that can be modified downstream by the generation script. All the templates for generating product, vendor, and rating sites are available under experiments/bsbm/model/watdiv.
#namespace <prefix>=<URI>
means that each prefix will be linked to a specific URI, like the PREFIX clause in SPARQL's queries.For example, we can have the following prefix
#namespace bsbm=http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/
#namespace __provenance={%provenance}
means that each entity that uses provenance uses it as the base URI.
#namespace __output_org=fragmented
means that each entity will be generated in a separate file. This is particularly helpful to replicate each product to a shop/rating site downstream.
#namespace __output_dir={%export_output_dir}
means that each entity's files will be generated in theexport_output_dir
#namespace <prefix>=<URI>
#namespace __provenance={%provenance}
#namespace __output_org=fragmented
#namespace __output_dir={%export_output_dir}
// ===== ENTITIES & LITERAL PROPERTIES ===== //
// ----- <linked_subject> ----- //
<type> <linked_subject_class> {%<linked_subject>_n}
<pgroup> <predicate_probability>
#predicate <predicate> <object>
</pgroup>
</type>
// ----- <main_subject> ----- //
<type> <main_subject_class> {%<main_subject>_n}
<pgroup> <predicate_probability>
#predicate <predicate> <object>
</pgroup>
</type>
#association <main_subject_class> <predicate> <main_subject_type> 2 <number_of_main_subject_type> <association_probability> NORMAL // (Many-To-Many) All <main_subject_class> have <number_of_main_subject_type> <main_subject_type>, following a normal distribution, with the probability of <association_probability>.
#association <main_subject_class> <predicate> <linked_subject_class> 2 1 <association_probability> NORMAL // (Many-To-One) All <main_subject_class> have only 1 <linked_subject_class>, following Normal distribution, with the probability of <association_probability>.
Using the template provided above, we can create a file for the Product
entity, such as the one shown below.
#namespace bsbm=http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/
#namespace rdfs=http://www.w3.org/2000/01/rdf-schema#
#namespace rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#
#namespace dc=http://purl.org/dc/elements/1.1/
#namespace __provenance={%provenance}
#namespace __output_org=fragmented
#namespace __output_dir={%export_output_dir}
#namespace __replicated=true
// ===== ENTITIES & LITERAL PROPERTIES ===== //
// ----- Producer ----- //
<type> bsbm:Producer {%producer_n}
<pgroup> 1.0
#predicate rdfs:label string{%label_wc}
</pgroup>
<pgroup> 1.0
#predicate rdfs:comment string{%producer_comment_wc}
</pgroup>
<pgroup> 1.0
#predicate bsbm:country country
</pgroup>
<pgroup> 1.0
#predicate bsbm:publishDate date 2000-07-20 2005-06-23
</pgroup>
</type>
// ----- Product ----- //
<type> bsbm:Product {%product_n}
<pgroup> 1.0
#predicate rdfs:label string{%label_wc}
</pgroup>
<pgroup> 1.0
#predicate rdfs:comment string{%comment_wc}
</pgroup>
<pgroup> 1.0
#predicate bsbm:productPropertyTextual1 string{%textual_wc}
</pgroup>
<pgroup> 1.0
#predicate bsbm:productPropertyTextual2 string{%textual_wc}
</pgroup>
<pgroup> 1.0
#predicate bsbm:productPropertyTextual3 string{%textual_wc}
</pgroup>
<pgroup> {%productPropertyTextual4_p}
#predicate bsbm:productPropertyTextual4 string{%textual_wc}
</pgroup>
<pgroup> {%productPropertyTextual5_p}
#predicate bsbm:productPropertyTextual5 string{%textual_wc}
</pgroup>
<pgroup> 1.0
#predicate bsbm:productPropertyNumeric1 integer 1 2000 normal
</pgroup>
<pgroup> 1.0
#predicate bsbm:productPropertyNumeric2 integer 1 2000 normal
</pgroup>
</type>
<pgroup> 1.0
#predicate bsbm:productPropertyNumeric3 integer 1 2000 normal
</pgroup>
</type>
<pgroup> {%productPropertyNumeric4_p}
#predicate bsbm:productPropertyNumeric4 integer 1 2000 normal
</pgroup>
<pgroup> {%productPropertyNumeric5_p}
#predicate bsbm:productPropertyNumeric5 integer 1 2000 normal
</pgroup>
<pgroup> 1.0
#predicate bsbm:publishDate date 2000-09-20 2006-12-23
</pgroup>
</type>
// ----- ProductFeature ----- //
<type> bsbm:ProductFeature {%feature_n}
<pgroup> 1.0
#predicate rdfs:label string{%label_wc}
</pgroup>
<pgroup> 1.0
#predicate rdfs:comment string{%feature_comment_wc}
</pgroup>
<pgroup> 1.0
#predicate bsbm:publishDate date 2000-05-20 2000-06-23
</pgroup>
</type>
// ----- ProductType ----- //
<type> bsbm:ProductType {%type_n}
<pgroup> 1.0
#predicate rdfs:label string{%label_wc}
</pgroup>
<pgroup> 1.0
#predicate rdfs:comment string{%type_comment_wc}
</pgroup>
<pgroup> 1.0
#predicate bsbm:publishDate date 2000-05-20 2000-06-23
</pgroup>
</type>
// Every products have serveral product type than others
#association bsbm:Product rdf:type bsbm:ProductType 2 {%type_c} 1.0 NORMAL
// Every products have serveral product features
#association bsbm:Product bsbm:productFeature bsbm:ProductFeature 2 {%feature_c} 1.0 NORMAL
// Every product has a producer
#association bsbm:Product bsbm:producer bsbm:Producer 2 1 1.0 NORMAL
The main subject of this template file is Product.
The template also includes Producer, ProductFeature, and ProductType as linked subjects.Linked subjects are entities that are connected to the main subject of the file, which in this case is Product.
The general template for generating federation members and virtual catalogs is similar, but there are some differences which are highlighted above.
#namespace <prefix>=<URI>
#namespace __provenance={%provenance}
#namespace __output_org=monolithic
#namespace __output_dir={%export_output_dir}
#namespace __output_file={%ratingsite_id}
#namespace __output_dep={%export_dep_output_dir}
#namespace __output_dep_org=fragmented
#namespace __output_dep_rename_exception_predicates=<http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/country>;
// ===== ENTITIES & LITERAL PROPERTIES ===== //
// ----- <global_subject> ----- //
<type> <global_subject_class> {%<global_subject>_n}
</type>
// ----- <linked_subject> ----- //
<type> <linked_subject_class> {%<linked_subject>_n}
<pgroup> <predicate_probability>
#predicate <predicate> <object>
</pgroup>
</type>
// ----- <main_subject> ----- //
<type> <main_subject_class> {%<main_subject>_n}
<pgroup> <predicate_probability>
#predicate <predicate> <object>
</pgroup>
</type>
// Every <linked_subject> is related to [1] <main_subject> (drawn with a ZIPFIAN) with the probability 1.0
#association <linked_subject> <predicate> <main_subject> 2 1 NORMAL NORMAL
// Every generated <linked_subject> is related to [1] <global_subject> (drawn with a ZIPFIAN) with the probability 1.0
#association1 <linked_subject> <predicate> <global_subject> 2 1 NORMAL NORMAL
// Every generated existing <linked_subject> are related to [Many] <other_linked_subject> (drawn with a ZIPFIAN) with the probability 1.0
#association1 <linked_subject> <predicate> <other_linked_subject> 2 1 NORMAL NORMAL
We'll now provide a template for RatingSite that replicates products from the Virtual Catalog, building upon the previous template we've discussed. It's worth noting that the same principles apply to Vendor, as well.
#namespace __output_org=monolithic
means that all quads generated for all rating-site entities will be contained in one file. This is to facilitate distribution and maintenance of endpoints downstream.
#namespace __output_dep={%export_dep_output_dir}
indicates where to look for dependencies, e.g, Product.
#namespace __output_dep_org=fragmented
means the the dependency, i.e, Product, are in separate files.
#namespace __output_dep_rename_exception_predicates=<http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/country>;
indicate a semicolon-separated list of URI that will not be localized when replicated
#namespace bsbm=http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/
#namespace rdfs=http://www.w3.org/2000/01/rdf-schema#
#namespace rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#
#namespace dc=http://purl.org/dc/elements/1.1/
#namespace rev=http://purl.org/stuff/rev#
#namespace foaf=http://xmlns.com/foaf/0.1/
#namespace __provenance={%provenance}
#namespace __output_org=monolithic
#namespace __output_dir={%export_output_dir}
#namespace __output_file={%ratingsite_id}
#namespace __output_dep={%export_dep_output_dir}
#namespace __output_dep_org=fragmented
#namespace __output_dep_rename_exception_predicates=<http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/country>;
// ===== ENTITIES & LITERAL PROPERTIES ===== //
// ----- Product ----- //
<type> bsbm:Product {%product_n}
</type>
// ----- RatingSite ----- //
<type> bsbm:RatingSite 1
<pgroup> 1.0
#predicate rdfs:label string{%label_wc}
</pgroup>
<pgroup> 1.0
#predicate bsbm:country country
</pgroup>
</type>
// ----- Person ----- //
<type> bsbm:Person {%person_n}
<pgroup> 1.0
#predicate foaf:name name{%person_name_wc}
</pgroup>
<pgroup> 1.0
#predicate foaf:mbox_sha1sum integer
</pgroup>
<pgroup> 1.0
#predicate bsbm:country country
</pgroup>
<pgroup> 1.0
#predicate bsbm:publishDate date 2008-5-20 2008-8-23
</pgroup>
</type>
// ----- Review ----- //
<type> bsbm:Review {%review_n}
<pgroup> 1.0
#predicate dc:title string{%title_wc}
</pgroup>
<pgroup> 1.0
#predicate rev:text string{%text_wc}
</pgroup>
<pgroup> {%rating1_p}
#predicate bsbm:rating1 integer 1 10 normal
</pgroup>
<pgroup> {%rating2_p}
#predicate bsbm:rating2 integer 1 10 normal
</pgroup>
<pgroup> {%rating3_p}
#predicate bsbm:rating3 integer 1 10 normal
</pgroup>
<pgroup> {%rating4_p}
#predicate bsbm:rating4 integer 1 10 normal
</pgroup>
<pgroup> 1.0
#predicate bsbm:publishDate date
</pgroup>
<pgroup> 1.0
#predicate bsbm:reviewDate date 2007-01-01 2007-12-31
</pgroup>
</type>
// Every bsbm:Review is related to [1] bsbm:RatingSite (drawn with a ZIPFIAN) with the probability 1.0
#association bsbm:Review dc:publisher bsbm:RatingSite 2 1 NORMAL NORMAL
// Every generated bsbm:Review is related to [1] bsbm:Product (drawn with a ZIPFIAN) with the probability 1.0
#association1 bsbm:Review bsbm:reviewFor bsbm:Product 2 1 NORMAL NORMAL
// Every generated existing bsbm:Review are related to [Many] bsbm:Person (drawn with a ZIPFIAN) with the probability 1.0
#association1 bsbm:Review rev:reviewer bsbm:Person 2 1 NORMAL NORMAL
Below is an explanation of our configuration file config.yaml
. The aim is to customize the exposed components from the WatDiv configuration template mentioned in the previous section. We use a YAML-based hierarchical configuration system called OmegaConf to achieve this. Custom resolvers such as normal_truncated
and get_docker_endpoint
are defined in rsfb/utils.py.
- generation
- workdir: Where you want to generate all your data and queries
- n_batch: Number of steps of generation of data and queries
- n_query_instances: Number of different versions of one query
- n_federation_members: Number of different sources you want(e.g.
"${sum: ${generation.schema.vendor.params.vendor_n}, ${generation.schema.ratingsite.params.ratingsite_n}}"
correspond to the sum of the number of vendors with the number of rating sites) - verbose: If you want the workflow log information while it running or not
- stats
- confidence_level: Accuracy of the resulting data
- generator
- dir: Where WatDiv is located
- exec: Where the executable is located
- virtuoso
- compose_file: Where the docker compose file is located(e.g. we have one virtuoso endpoint per batch, and one docker per batch)
- service_name: Generic name of docker
- endpoints: List of all virtuoso endpoints corresponding to their running docker
- container_names: List of all running docker
- schema
-
<subject>
- is_source: If you want the subject to be a federation member or not
- provenance: Template of the URI(in case of global subject, we just put a base URI, but in case of federation members subject, we put a template URI look like this:
http://www.{%<subject>_id}.fr/
) - template: Where the template file is located
- scale_factor: Percentage of for a new batch (1 corresponds to 100%)
- export_output_dir: Where all the nq file will be generated
- params: Corresponds to the number of objects (more precisely subjects who're linked with the ), corresponds to the probabilities and law which every predicate and some object follows(in the case of objects, is to generate value. But in the case of predicates is just to determine if we have the predicate or not)
-
Following the explanation provided above, the configuration file config.yaml
is shown below.
generation:
workdir: "experiments/bsbm"
n_batch: 10
n_query_instances: 10
n_federation_members: "${sum: ${generation.schema.vendor.params.vendor_n}, ${generation.schema.ratingsite.params.ratingsite_n}}"
verbose: true
stats:
confidence_level: 0.95
generator:
dir: "generators/watdiv"
exec: "${generation.generator.dir}/bin/Release/watdiv"
virtuoso:
compose_file: "${generation.workdir}/docker/virtuoso.yml"
service_name: "bsbm-virtuoso"
endpoints: "${get_docker_endpoints: ${generation.virtuoso.compose_file}, ${generation.virtuoso.service_name}}"
container_names: "${get_virtuoso_containers: ${generation.virtuoso.compose_file}, ${generation.virtuoso.service_name}}"
schema:
# Configuration for ONE batch
product:
is_source: false
provenance: http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/ # Prefix URI
# Products are generated once, independant from vendor and person
template: "${generation.workdir}/model/watdiv/bsbm-product.template"
scale_factor: 1
export_output_dir: "${generation.workdir}/model/tmp/product"
params:
# type
product_n: 200000 # Number of distinct products generate
producer_n: "${get_product_producer_n: ${generation.schema.product.params.product_n}}" # One producer per product
feature_n: "${get_product_feature_n: ${generation.schema.product.params.product_n}}" # One feature per product
feature_c: 9
type_n: "${get_product_type_n: ${generation.schema.product.params.product_n}}" # One type per product
type_c: 9
# pgroup
productPropertyTextual4_p: 0.7 # Probability to have the predicate productPropertyTextual4
productPropertyTextual5_p: 0.8 # Probability to have the predicate productPropertyTextual5
productPropertyNumeric4_p: 0.7 # Probability to have the predicate productPropertyNumeric4
productPropertyNumeric5_p: 0.8 # Probability to have the predicate productPropertyNumeric5
textual_wc: "${normal_truncated: 9, 3, 3, 15}"
label_wc: "${normal_truncated: 2, 1, 1, 3}"
comment_wc: "${normal_truncated: 100, 20, 50, 150}"
type_comment_wc: "${normal_truncated: 35, 10, 20, 50}"
feature_comment_wc: "${normal_truncated: 35, 10, 20, 50}"
producer_comment_wc: "${normal_truncated: 35, 10, 20, 50}"
vendor:
is_source: true
provenance: http://www.{%vendor_id}.fr/ # Template URI
template: "${generation.workdir}/model/watdiv/bsbm-vendor.template"
export_output_dir: "${generation.workdir}/model/dataset"
export_dep_output_dir: "${generation.schema.product.export_output_dir}"
scale_factor: 1
params:
vendor_n: "${multiply: 10, ${generation.n_batch}}" # Increase the number of vendors with a step of 10 per batch
offer_n: "${normal_dist: 3, 1, 2000}" # Specs: 100 productsVendorsRatio * 20 avgOffersPerProduct, ref: bsbmtools
product_n: "${generation.schema.product.params.product_n}" # All generated products will sell by a vendor
label_wc: "${normal_truncated: 2, 1, 1, 3}"
comment_wc: "${normal_truncated: 35, 10, 20, 50}"
ratingsite:
is_source: true
provenance: http://www.{%ratingsite_id}.fr/
template: "${generation.workdir}/model/watdiv/bsbm-ratingsite.template"
export_output_dir: "${generation.workdir}/model/dataset"
export_dep_output_dir: "${generation.schema.product.export_output_dir}"
scale_factor: 1
params:
#type
ratingsite_n: "${multiply: 10, ${generation.n_batch}}" # Increase the number of vendors with a step of 10 per batch
product_n: "${generation.schema.product.params.product_n}" # All generated products will have a rating
review_n: "${normal_dist: 3, 1, 10000}" # Specs: 10000
person_n: "${divide: ${generation.schema.ratingsite.params.review_n}, 20}" # Number of people who rate a product
person_name_wc: "${normal_truncated: 3, 1, 2, 4}"
label_wc: "${normal_truncated: 2, 1, 1, 3}"
text_wc: "${normal_truncated: 125, 20, 50, 200}"
title_wc: "${normal_truncated: 9, 3, 4, 15}"
#pgroup
rating1_p: 0.7 # Probability to have the predicate rating1
rating2_p: 0.7 # Probability to have the predicate rating2
rating3_p: 0.7 # Probability to have the predicate rating3
rating4_p: 0.7 # Probability to have the predicate rating4
Once config.yaml
properly set, you can launch the generation of the FedShop benchmark with the following command:
python rsfb/benchmark.py generate data experiments/bsbm/config.yaml [OPTIONS]
OPTIONS:
--clean [benchmark|metrics|instances][+db]: clean the benchmark|metrics|instances then (optional) destroy all database containers
--touch : mark a phase as "terminated" so snakemake would not rerun it.
Generating data for the benchmark is a complex and lengthy process, resulting in numerous artefacts created under the experiment/bsbm
directory. The datasets are generated under experiments/bsbm/model/dataset
.
To generate queries for the benchmark, we adapted the queries from the BSBM Explore Use Case. An example with q04
is provided in this section, while all the generated queries are available in experiments/bsbm/queries. The instantiated queries along with their reference source selection/results should be obtained in experiments/bsbm/evaluation/benchmark/generation/
at the end of the process.
Below is the original q04 from BSBM:
PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?product ?label ?propertyTextual
WHERE {
{
?product rdfs:label ?label .
?product rdf:type %ProductType% .
?product bsbm:productFeature %ProductFeature1% .
?product bsbm:productFeature %ProductFeature2% .
?product bsbm:productPropertyTextual1 ?propertyTextual .
?product bsbm:productPropertyNumeric1 ?p1 .
FILTER ( ?p1 > %x% )
} UNION {
?product rdfs:label ?label .
?product rdf:type %ProductType% .
?product bsbm:productFeature %ProductFeature1% .
?product bsbm:productFeature %ProductFeature3% .
?product bsbm:productPropertyTextual1 ?propertyTextual .
?product bsbm:productPropertyNumeric2 ?p2 .
FILTER ( ?p2> %y% )
}
}
ORDER BY ?label
OFFSET 5
LIMIT 10
with the following placeholders:
Parameter | Description |
---|---|
%ProductType% | A randomly selected Class URI from the class hierarchy (leaf level). |
%ProductFeature1% %ProductFeature2% %ProductFeature3% | Three different, randomly selected product feature URI that correspond to the chosen product type. |
%x% %y% | Two random numbers between 1 and 500 |
The injection engine works as follows:
1. First iteration, build the `value_selection` query. It's done by projecting all placeholders and disabling any `FILTER`, `ORDER BY`, `LIMIT`, etc.
2. For the next iterations, try in order 3 options then inject. Only try the next option if the current option doesn't work:
2.1 Option 1: Exclude the partially injected query to refill
2.2 Option 2: extract the needed value for placeholders from value_selection.csv
2.3 Option 3: Relax the query knowing there is NO solution mapping for a given combination of placeholders
2.4 Inject placeholder values:
- For every uninjected constant, ordered by priority:
- If this is the first injection, or the operator is unary, inject with the `instance_id `-th row of `value_selection`
- Else, each operator has its own rule to inject missing constants:
- Comparison op, e.g, ?a > ?b: first try to select a random value constrained by the operator
- Dependant-difference ($!) or independant-different (!=): choose randomly a value that is different to injected constant
- Containment ("in"): choose randomly one out of 10 least common words
3. Restore original `SELECT`, `FILTER`, `ORDER BY`, `LIMIT`, etc.
- With this in mind, the right strategy for q04 should be, in plain English:
1. Inject all placeholders tied to `sameAs` exclusively first.
2. Inject constrained and filter placeholder of the BGP left of UNION
3. Inject constrained and filter placeholder of the BGP right of UNION
- We will annotate this query for the benchmark. First all placeholders
%placeholder%
will be replaced with a variable?placeholder
and then annotated withconst
directive in the comment just above the triple pattern. The syntax is as follows:
const[EXCLUSIVE][PRIORITY] VARIABLE
EXCLUSIVE = "!": evaluate the template query with marked triple patterns exclusively first
PRIORITY = "*": triple patterns with the more "*" will be evaluated first
For example:
const ?placeholder # Replace placeholder
- The result is as follows:
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?product ?label ?propertyTextual
WHERE {
{
?product rdfs:label ?label .
# const!* ?ProductType
?product rdf:type ?localProductType .
?localProductType owl:sameAs ?ProductType .
# const!* ?ProductFeature1
?product bsbm:productFeature ?localProductFeature1 .
?localProductFeature1 owl:sameAs ?ProductFeature1.
# const** ?ProductFeature2 != ?ProductFeature1
?product bsbm:productFeature ?localProductFeature2 .
?localProductFeature2 owl:sameAs ?ProductFeature2.
?product bsbm:productPropertyTextual1 ?propertyTextual .
?product bsbm:productPropertyNumeric1 ?p1 .
# const** ?x < ?p1
FILTER ( ?p1 > ?x )
} UNION {
?product rdfs:label ?label .
# const!* ?ProductType
?product rdf:type ?localProductType .
?localProductType owl:sameAs ?ProductType .
# const!* ?ProductFeature1
?product bsbm:productFeature ?localProductFeature1 .
?localProductFeature1 owl:sameAs ?ProductFeature1 .
# const* ?ProductFeature3 != ?ProductFeature2, ?ProductFeature1
?product bsbm:productFeature ?localProductFeature3 .
?localProductFeature3 owl:sameAs ?ProductFeature3 .
?product bsbm:productPropertyTextual1 ?propertyTextual .
?product bsbm:productPropertyNumeric2 ?p2 .
# const ?y < ?p2
FILTER ( ?p2 > ?y )
}
}
ORDER BY ?label
OFFSET 5
LIMIT 10
Note that the triple pattern marked with
@skip
will be deactivated (commented) by the execution engine.
python rsfb/benchmark.py generate queries experiments/bsbm/config.yaml [OPTIONS]
OPTIONS:
--clean [benchmark|metrics|instances][+db]: clean the benchmark|metrics|instances then (optional) destroy all database containers
--touch : mark a phase as "terminated" so snakemake would not rerun it.
The queries are generated under experiments/bsbm/benchmark/generation
, as illustrated below:
-
injected.sparql
: given the template query inexperiments/bsbm/queries
, this is the instantiated query with all placeholders replaced with real values. -
provenance[.opt].sparql
: source selection query built by wrapping each triple pattern ininjected.sparql
withGRAPH
.opt
is an optimized version with better decomposition. -
provenance[.opt].csv
: the results obtained by executingprovenance[.opt].sparql
. -
results.csv
: the results obtained by executinginjected.sparql
.