Skip to content

A paper to introduce a Shape Index in structured descentralized environment for Link Traversal Query Processing optimization

License

Notifications You must be signed in to change notification settings

constraintAutomaton/AWM-shape-index-short-paper

Repository files navigation

Opportunities for Shape-based Optimization of Traversal Queries over Decentralized Environments

Short paper submitted for the "Alberto Mendelzon International Workshop on Foundations of Data Management". The experiment repository is available via this hypermedia link.

Building a PDF

The authors compiled the PDF version using pdflatex (you can use your favorite latex compiler). We created a makefile to facilitate the building of the PDF version. One can simply execute make main.pdf or make to produce the PDF version if pdflatex and the other dependencies of the TeX Live suite are installed on the machine of the user.

Abstract

Data on the web is naturally unindexed and decentralized. Centralizing web data, particularly personal data, for real-world applications raises ethical and legal concerns. Yet, compared to centralized query approaches, decentralization-friendly alternatives such as Link Traversal Query Processing (LTQP) are significantly less performant and understood. The two main difficulties of LTQP are the lack of apriori information about data sources and the high number of HTTP requests. Exploring decentralized-friendly ways to document unindexed networks of data sources could lead to solutions to alleviate those difficulties. RDF data shape is the state-of-the-art for data validation of linked data documents, thus it is worthwhile to investigate their potential for LTQP optimization. In our work, we built an early version of a source selection algorithm for LTQP using RDF data shape mappings with linked data documents and measured its performance in a realistic setup. In this article, we present our algorithm and early results thus opening opportunities for further research for shape-based optimization of link traversal queries. Our initial experiments show that with little maintenance and work from the server, our method can reduce up to 80\% the execution time and 97\% the number of links traversed during realistic queries. Given our early results and the descriptive power of RDF data shapes it would be worthwhile to investigate non-heuristic-based query planning using RDF shapes.

Results

Figure presenting the main result.

figure displaying the main results The execution time with shape indexes is consistently lower (up to 80% with D1V3 and S1V3) or equal to with the type indexes (except for D3V3 and D3V4), and always uses fewer HTTP requests.

Conclusion

The shape index approach shows that more precise environment-aware source selection in LTQP can significantly reduce query execution time. It is still an early effort, but we think that a solution inspired by our approach could be beneficial for the query and publication of fragmented document-based linked data. The solution does not require a lot of computational power from the data publisher during queries and updates (considering no change in data model) of the data source. Additionally, we believe that using a shape index could help improve the interoperability of applications' data quality. There are still multiple questions left to be answered such as how to handle private data, what is the overhead and complexity of the method (given the expressiveness of RDF data shapes language~\cite{Delva2021, staworko_et_al:LIPIcs:2015:4985, 10.1007/978-3-319-68288-4_7} and practice in shape definitions~\cite{lieber_iswc_poster_2020, staworko_et_al:LIPIcs:2015:4985, Staworko2018ContainmentOS} ), can the shape index alone or with data summarisation structure be used to improve query planning without sacrificing query execution times?

About

A paper to introduce a Shape Index in structured descentralized environment for Link Traversal Query Processing optimization

Resources

License

Stars

Watchers

Forks