This is a CMake project that allows one to perform directional (and linear) top-k queries using an R-Tree as index.
This project depends on the NLopt library for solving non-linear problems required to execute the directional queries.
- Perform sequential linear and directional top-k queries given a CSV file containing the dataset.
- Perform linear and directional top-k queries using an R-Tree index given a CSV file containing the dataset, a file containing the values of k to take into account and a file contaning the set of query weights of the queries to perform.
- Perform directional queries using the TIGHT method (I/O optimality of accesses).
- Perform directional queries using the LOOSE method (no non-linear problems to solve at the cost of higher I/O accesses).
- Track the performances of the queries (execution time, memory accesses, problems solved...).
In the main function, edit the following variables:
datasetPath
: path of the CSV file containing the dataset to process.kValuesFile
: path of the txt file containing the set of k values for the queries to perform.queriesFile
: path of the txt file containing the set of query weights for the queries to perform.
Edit the following macros:
DIM
: number of dimensions of the dataset to consider.BETA
: value of beta to use for the directional query.TREE METHOD
: set it to TIGHT or LOOSE .
For the sake of simplicity, we provide some examples of files used and the formatting required in order to properly run the project:
datasetPath
: CSV file containing the dataset to process. NOTE: values must be normalized in range [0,1] - Sample dataset.kValuesFile
: TXT file containing the set of k values for the queries to perform - Sample k.queriesFile
: TXT file containing the set of query weights for the queries to perform - Sample queries.
The output consists of a CSV file that will be located in the project root folder. This file, stats.csv
, contains statistics about the dataset used, the corresponding R-Tree and the mean of the results of the (linear and directional) queries performed.
This section presents the explanation of the columns of the output file.
n
: Cardinality of the dataset.d
: Dimensionality of the dataset.k
: Top-k value considered.Rtree boxes
: Number of boxes (or nodes) of the built Rtree.Rtree leaves
: Number of leaf nodes of the built Rtree.Method
: Method used to perform the directional queries (Either TIGHT or LOOSE).Linear RTree time [us]
: Time taken, on average, to perform the linear top-k queries with the value of k and the queries specified.Linear Rtree numPoint
: Number of points accessed, on average, over the linear top-k queries performed.Linear Rtree numLeaves
: Number of leaf nodes accessed of the R-tree, on average, over the linear top-k queries performed.Linear Rtree numBoxes
: Number of nodes (or boxes) accessed of the R-tree, on average, over the linear top-k queries performed.Linear Rtree Bounds Computed
: Number of bounds computed, on average, over the linear top-k queries performed to decide if we can stop visiting a subtree.Linear Rtree Bounds Computation Time [us]
: Time taken to solve the previously described problems, on average, over the linear top-k queries performed.Linear Rtree Scores
: Number of scores computed (it should coincide withLinear Rtree numPoint
).Linear Rtree Scores Computation Time [us]
: Time taken to solve the previously described scores, on average, over the linear top-k queries performed.Directional RTree time [us]
: Time taken, on average, to perform the directional top-k queries with the value of k and the queries specified.Directional Rtree numPoint
: Number of points accessed, on average, over the directional top-k queries performed.Directional Rtree numLeaves
: Number of leaf nodes accessed of the R-tree, on average, over the directional top-k queries performed.Directional Rtree numBoxes
: Number of nodes (or boxes) accessed of the R-tree, on average, over the directional top-k queries performed.Directional Rtree Problems
: Number of non-linear problems computed, on average, over the directional top-k queries performed to decide if we can stop visiting a subtree.Directional Rtree ProblemTime [us]
: Time taken to solve the previously described problems, on average, over the directional top-k queries performed.Directional Rtree Scores
: Number of directional query scores computed (it should coincide withDirectional Rtree numPoint
).Directional Rtree Scores Computation Time [us]
: Time taken to solve the previously described scores, on average, over the directional top-k queries performed.Linear RTtree Bounds Computed For Directional
: Number of bounds computed for the directional query with the LOOSE method.Linear RTtree Bounds Computed For Directional Computation Time [us]
: Time taken to solve the previously described bounds, on average, over the directional top-k queries performed with the LOOSE method.