PEDS is a framework that allows users to estimate the price of data shared between buyers and sellers and generate explanations of the estimated price. It is an extension of GProM (https://github.com/IITDBGroup/gprom) that adds provenance support for complex queries on relational database systems. Provenance is information about how a query's result was produced over several database operations. That is, for a row in a table returned by a query we capture from which rows it was derived from the input table and by which operations. PEDS builds on the capabilities of GProM to rewrite input queries into rewritten queries for more complex actions. PEDS captures where and how provenance through annotations and their respective columns along with calculating a distance metric between two tuples during integration of data. PEDS also provides meaningful top-k patterns as an explanation that are extracted based on various metrics determining the pattern's contribution to the estimated price.
To run a simple PEDS sinerio, you can write a command in the following format:
- to estimate the price
- ./scripts/eig_run.sh ${log_level} "IG OF (${query});"
- Example: ./scripts/eig_run.sh 3 "IG OF (select * from owned o FULL OUTER JOIN shared s ON(o.county = s.county AND o.year = s.year));"
- to compute explanations
- ./scripts/eig_run.sh ${log_level} "IGEXPL TOP ${k} OF (${query});"
- Example: ./scripts/eig_run.sh 3 "IGEXPL TOP 10 OF (select * from owned o FULL OUTER JOIN shared s ON(o.county = s.county AND o.year = s.year));"
Below, we show sample data from a real-world Air Quality Index dataset(AQI) for the example queries above. This demo shows a simple sinerio to familiarize the users with two of PEDS functionality.
- (i) That computed the degree of new information and
- (ii) That shows meaningful patterns found after integration step.
sample data for owned
year | county | dayswaqi | maqi |
-------------------------------------
2021 | Colbert | 274 | 200 |
2021 | Jackson | 366 | 200 |
2022 | Jefferson | 348 | 271 |
2022 | Autauga | 179 | 177 |
sample data for shared
year | county | gdays | maqi |
----------------------------------
2021 | Jackson | 85 | 156 |
2022 | Colbert | 66 | 200 |
2022 | Jefferson | 66 | 221 |
2021 | Colbert | 66 | 168 |
2022 | Autauga | 122 | 177 |
output for first command. Shows IG only
year | county | dayswaqi | maqi | gdays | IG_year | IG_county | IG_dayswaqi | IG_maqi | IG_gdays | Total_IG |
-----------------------------------------------------------------------------------------------------------------
2021 | Colbert | 274 | 168 | 66 | 0 | 0 | 0 | 2 | 2 | 4 |
2021 | Jackson | 366 | 156 | 85 | 0 | 0 | 0 | 3 | 4 | 7 |
2022 | Jefferson | 348 | 221 | 66 | 0 | 0 | 0 | 5 | 2 | 7 |
2022 | Autauga | 179 | 177 | 122 | 0 | 0 | 0 | 0 | 5 | 5 |
2022 | Colbert | null | 200 | 66 | 0 | 0 | 0 | 3 | 2 | 5 |
output for second command. Shows the best patterns and the f_score based on which they are ranked on
year | county | dayswaqi | maqi | gdays | imp | info | cov | f_score |
--------------------------------------------------------------------------
2022 | * | * | * | 66 | 12 | 2 | 2 | 11.29 |
* | Colbert | * | * | 66 | 9 | 2 | 2 | 10.29 |
* | * | * | * | 66 | 16 | 1 | 3 | 9.14 |
2021 | * | * | * | * | 17 | 1 | 3 | 5.87 |
2022 | * | * | * | * | 11 | 1 | 2 | 4.25 |
PEDS installation follows the installation of GProM. The wiki has detailed installation instructions. The installation follows the standard procedure using GNU build tools. Checkout the git repository, install all dependencies and run:
./autogen.sh
./configure
make
sudo make install