Approximate Aggregate Queries on KGs

This repository contains the code used for the experiments described in the paper titled

Aggregate Queries on Knowledge Graphs: Fast Approximation with Semantic-aware Sampling .

Requirements

The experiments have been run on an CentOS Linux release 7.3.1611 computer with a 2.1 GHz Xeon processor and 64GB RAM. All programs are written with java1.8 and make use of two external libraries Apache Commons Math, EJML.

Datasets and Query Workloads

Datasets: DBpedia dataset, Freebase dataset, Yago dataset, each one consists of an entity file and an edge file.

The experiment uses QALD-4, WebQuestions, and Synthetic queries as benchmarks for DBpedia, Freebase, and Yago datasets, respectively.

Example of the dataset (DBpedia)

edge file

entity1_id	entity2_id	entity1_name	predicate	entity2_name
3471751	1712537	Porsche_944	assembly	Germany

entity file

entity_id	entity_name	type
3471751	Porsche_944	automobile
1712537	Germany	country

Usage

Effectiveness and Efficiency Evaluation

Our Method

Our proposed method, with a specific entity, a predicate, a type of target entity and a aggregate function, can be run using the following command:

java -jar Approximate-Aggregation-Queries-jar-with-dependencies.jar <specific-entity> <predicate> <target-entity-type> <aggregate-function>

For the query example, "How many software has been developed by organizations founded in California?", we can run with the following command:

java -jar Approximate-Aggregation-Queries-jar-with-dependencies.jar California foundationPlace software COUNT

Output

For our method, we output the following statistical results and running time for each query round by round. Here is the output of the above query example:

Ground-truth : 124
-------------------Round 1-------------------
Approximate result : 115.23
Margin of error (MoE) : 1.52
Relative error (%) : 7.07%
-------------------Round 2-------------------
Approximate result : 124.58
Margin of error (MoE) : 0.71
Relative error (%) : 0.46%
---------------------------------------------
Response Time (ms) : 235 ms

Comparing Methods

We compare our method with different methods, that are, EAQ, GraB,QGA and SGQ. To run these algorithms, we need to run the following commands separately:

SGQ: java -jar BaseLineSGQ-jar-with-dependencies.jar <specific-entity> <predicate> <target-entities-type> <aggregate-function>
GraB: java -jar BaseLineGraB-jar-with-dependencies.jar <specific-entity> <predicate> <target-entities-type> <aggregate-function>
QGA: java -jar BaseLineQGA-jar-with-dependencies.jar <specific-entity> <predicate> <target-entities-type> <aggregate-function>
EAQ: python BaseLineEAQ.py <specific-entity> <target-entities-type> <aggregate-function>

For example:

java -jar BaseLineSGQ-jar-with-dependencies.jar California foundationPlace software COUNT

Output

For each method, we output the following statistical results and running time respectively. Here is the output of SGQ for the above query example:

Ground-truth : 124
---------------------------------------------
Approximate result : 110
Relative error (%) : 11.29%
---------------------------------------------
Response Time (ms) : 405 ms

Effect of Each Step on the Performance

The effect of each step on the performance (Section 7.3 of our paper) can be demonstrated using the command below by specifying a certain step (we use S1, S2 and S3 to represent each step):

java -jar Approximate-Aggregation-Queries-jar-with-dependencies.jar <specific-entity> <predicate> <target-entities-type> <aggregate-function> <which-phase>

For example, we can specific <which-phase> as S1 to see the effect of S1 on our method for each query.

java -jar Approximate-Aggregation-Queries-jar-with-dependencies.jar California foundationPlace software COUNT S1

Output

For each step, we output the following statistical results and running time. Here is the output when we change S1 from our semantic-aware random walk smapling to CNARW:

Ground-truth : 124
-------------------Round 1-------------------
Approximate result : 108.21
Margin of error (MoE) : 2.21
Relative error (%) : 12.73%
-------------------Round 2-------------------
Approximate result : 113.51
Margin of error (MoE) : 0.98
Relative error (%) : 8.46%
---------------------------------------------
Response Time (ms) : 501 ms

Interactive Performance

We study the interactive performance (Section 7.4) of our method by reducing the user-specific error bound e during the query processing, we need to run it using the following command:

java -jar InteractivePerformance-jar-with-dependencies.jar <specific-entity> <predicate> <target-entities-type> <aggregate-function>

For example:

java -jar InteractivePerformance-jar-with-dependencies.jar California foundationPlace software COUNT

Output

For each reduction of e, we output the additional runtime required for our method. Here is the output of the above query example:

-------------------user-specific error bound reduce from 0.05 to 0.04-------------------
add time : 32 ms
-------------------user-specific error bound reduce from 0.04 to 0.03-------------------
add time : 56 ms
-------------------user-specific error bound reduce from 0.03 to 0.02-------------------
add time : 24 ms
-------------------user-specific error bound reduce from 0.02 to 0.01-------------------
add time : 152 ms

Parameter Sensitivity

As mentioned in Section 7.5 of our paper, we study the sensitivity of our approach on some important parameters, including the user-specific error bound e, confidence level 1-α, repeat factor r, and desired sample ratio λ. We run this experiment through the following command by varying the values of parameters:

java -jar ParameterSensitivity-jar-with-dependencies.jar <a-specific-entity> <a-predicate> <target-entities-type> <aggregate-function> <error-bound> <confidence-level> <repeat-factor> <desired-sample-ratio>

For example:

java -jar ParameterSensitivity-jar-with-dependencies.jar California foundationPlace software COUNT 0.01 0.95 3 0.3

Output

For each parameter, we output the following statistical results and running time. Here is the output for user-specific error bound e=0.01, confidence level 1-α=0.95, repeat factor r=3, and desired sample ratio λ=0.3:

Ground-truth : 124
-------------------Round 1-------------------
Approximate result : 115.23
Margin of error (MoE) : 1.52
Relative error (%) : 7.07%
-------------------Round 2-------------------
Approximate result : 124.58
Margin of error (MoE) : 0.71
Relative error (%) : 0.46%
---------------------------------------------
Response Time (ms) : 235 ms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Approximate-Aggregation-Queries-jar-with-dependencies.jar

Approximate-Aggregation-Queries-jar-with-dependencies.jar

BaseLineGraB-jar-with-dependencies.jar

BaseLineGraB-jar-with-dependencies.jar

BaseLineQGA-jar-with-dependencies.jar

BaseLineQGA-jar-with-dependencies.jar

BaseLineSGQ-jar-with-dependencies.jar

BaseLineSGQ-jar-with-dependencies.jar

InteractivePerformance-jar-with-dependencies.jar

InteractivePerformance-jar-with-dependencies.jar

ParameterSensitivity-jar-with-dependencies.jar

ParameterSensitivity-jar-with-dependencies.jar

README.md

README.md

Repository files navigation

Approximate Aggregate Queries on KGs

Requirements

Datasets and Query Workloads

Usage

Effectiveness and Efficiency Evaluation

Our Method

Comparing Methods

Effect of Each Step on the Performance

Interactive Performance

Parameter Sensitivity

About

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
Approximate-Aggregation-Queries-jar-with-dependencies.jar		Approximate-Aggregation-Queries-jar-with-dependencies.jar
BaseLineGraB-jar-with-dependencies.jar		BaseLineGraB-jar-with-dependencies.jar
BaseLineQGA-jar-with-dependencies.jar		BaseLineQGA-jar-with-dependencies.jar
BaseLineSGQ-jar-with-dependencies.jar		BaseLineSGQ-jar-with-dependencies.jar
InteractivePerformance-jar-with-dependencies.jar		InteractivePerformance-jar-with-dependencies.jar
ParameterSensitivity-jar-with-dependencies.jar		ParameterSensitivity-jar-with-dependencies.jar
README.md		README.md

KGLab-HDU/ICDE-2022-Approximate-Aggregation-Queries

Folders and files

Latest commit

History

Repository files navigation

Approximate Aggregate Queries on KGs

Requirements

Datasets and Query Workloads

Usage

Effectiveness and Efficiency Evaluation

Our Method

Comparing Methods

Effect of Each Step on the Performance

Interactive Performance

Parameter Sensitivity

About

Resources

Stars

Watchers

Forks