Skip to content

pkumod/gStore

Repository files navigation

gStore logo

English | 中文 | Website | 网站

GitHub commit activity GitHub contributors Docker Image Version (V1.2) Static Badge

gStore is an open-source graph database engine (or "triple store") born for managing large RDF datasets with the SPARQL query language. It works with Linux systems and amd64, arm64, and loongarch processors. gStore is a collaborative effort between the Data Management Lab of Peking University, University of Waterloo, and awesome contributors from the open-source community.

🔑 gStore is released under the BSD 3-Caluse License, with several third-party libraries under their own licenses. Check LICENSE for details.

🐛 Check out FAQ for frequently asked questions. Known bugs and limitations are listed in BUGS and LIMIT. If you find any bugs, please feel free to open an issue.

🎤 If you have any questions or suggestions, please open a thread in GitHub Discussions.

📖 For recommendations, project roadmap, and more, check online documentation.

The formal help document is in English(EN) and 中文(ZH).

The formal experiment result is in Experiment.

We have built an IRC channel named #gStore on freenode, and you can visit the homepage of gStore.

Get gStore

gStore has been uploaded to gitee (code cloud), which is recommended for faster download for users in mainland China. The website is https://gitee.com/PKUMOD/gStore.

You can also open https://github.com/pkumod/gStore, download gStore.zip, then decompress the zip package.

From Docker

$ docker pull pkumodlab/gstore-docker:latest

Complete instruction documentation is on the Docker Deployment Instructions.

From Source

To compile gStore, first clone the repository:

git clone https://github.com/pkumod/gStore.git

Complete instruction documentation is on the Installation Instructions.

Quick Start

N-Triple Data format introduction

​ RDF data should be provided in n-triple format (XML is not currently supported), and queries must be provided in SPARQL1.1 syntax. The following is an example of the n-triple format file:

@prefix foaf:  <http://xmlns.com/foaf/0.1/> .	
_:a  foaf:name   "Johnny Lee Outlaw" .
_:a  foaf:mbox   <mailto:jlow@example.com> .
_:b  foaf:name   "Peter Goodguy" .
_:b  foaf:mbox   <mailto:peter@example.org> .
_:c  foaf:mbox   <mailto:carol@example.org> .

Triples are typically stored in the W3C-defined NT file format and represent three RDF data, where the values wrapped in < and >are urIs of an entity, and the values wrapped in '"" are literals representing the value of an attribute of the entity, followed by'^^ to indicate the type of the value. The following three RDF data points represent two attributes of John, gender and age, with values of male and 28 respectively. The last one indicates that John and Li have a friend relationship.

<John> <gender> "male"^^<http://www.w3.org/2001/XMLSchema#String>.
<John> <age> "28"^^<http://www.w3.org/2001/XMLSchema#Int>.
<John> <friend> <Li>.

​ More specific information about N-Triple please check N-Triple. Not all syntax in SPARQL1.1 is parsed and answered in gStore; for example, property paths are beyond the capabilities of the gStore system.

Initialize the system database

bin/ginit

Create database

bin/gbuild -db lubm -f data/lubm/lubm.nt 

Database list

bin/gshow

Database query

bin/gquery -db lubm -q data/lubm/lubm_q0.sql 

Complete instruction documentation is on the Quick Start.

Cite gStore

If you use gStore in your research, please cite the following paper:

@article{zou2014gstore,
  title={gStore: a graph-based SPARQL query engine},
  author={Zou, Lei and {\"O}zsu, M Tamer and Chen, Lei and Shen, Xuchuan and Huang, Ruizhe and Zhao, Dongyan},
  journal={The VLDB journal},
  volume={23},
  pages={565--590},
  year={2014},
  publisher={Springer}
}

Or cite this repository:

@misc{gStore,
  author = {gStore Authors},
  title = {gStore},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/pkumod/gStore}},
}

Change log

1.2(stable):2023-11-11

New features in gStore 1.2 are listed as follows:

  • Optimizing ORDER BY statements: streamlining the execution logic of ORDER BY, removing unnecessary type judgments and conversions, and significantly improving execution efficiency.
  • Optimized Build Module: Supports building empty libraries.
  • Optimizing the Triple Parser: Supports pure numeric IRIs, IRIs consisting only of numbers and letters, and IRIs starting with numbers.
  • New API interfaces: gStore 1.2's ghttp and gRPC services have added five interfaces for uploading files, downloading files, counting system resources, renaming, and obtaining backup paths.
  • New built-in advanced functions: gStore 1.2 version adds seven advanced functions, namely single source shortest path (SSSP, SSSPLen), label propagation (labelProp), weakly connected component (WCC), global/local clustering coefficient (clusteringCoeff), louvain algorithm (louvain), K-hop count (kHopCount), and K-hop neighbor (kHopNeighbor).
  • Added support for calling CONCAT functions in SELECT statements.
  • Optimizing some local commands and API interfaces: Optimizing the local command gconsole, optimizing the interfaces for building, loading, and statistical graph databases, and fixing potential bugs that may lead to memory leaks.
  • Support for Multiple Data Formats: Added support for multiple formats such as Turtle, TriG, RDF/XML, RDFa, and JSON-LD.
  • Optimization of custom graph analysis algorithm editing function: Redesign the interface of the custom graph analysis algorithm editing function, optimize the dynamic compilation algorithm, and improve compilation efficiency.
  • Bug fixes: Fixed a series of bugs.

1.0:2022-10-01

New features in gStore 1.0 are listed as follows:

  • Support of user-defined graph analysis functions: users can manage their own graph analysis functions through the API interfaces or the visual management platform gStore-workbench. Users can obtain the number of nodes and edges of the graph and neighbors of any given node, etc. through interface functions and use them as basic units to implement their own graph analysis functions. Dynamic compilation and execution of user-defined graph analysis functions are supported.
  • The gRPC network interface service: gRPC is a high-performance network interface service based on HTTP protocol implemented based on the open source library workflow, which further improves the efficiency and stability of the interface service. Experiments show that gRPC achieves a great improvement in concurrent access performance compared with ghttp, the previous network interface; for example, in the case of 2000/QPS, the rate of denied access is 0%.
  • gConsole module: in gStore 1.0, we launched the gConsole module, which enables the long-session operation of gStore with contextual information.
  • Decoupling of the optimizer and executor: gStore 1.0 decouples the optimizer and executor, converting from the original deeply coupled greedy strategy to a query optimizer based on dynamic programming and a query executor based on breadth-first traversal.
  • Optimization of Top-K queries: We implemented a Top-K SPARQL processing framework based on the DP-B algorithm in gStore, including query segmentation and sub-result aggregation.
  • Support of ACID transactions: by introducing the multi-version management mechanism, gStore 1.0 can start ACID transactions for insert and delete operations, which users can open, commit, and roll back. Currently gStore 1.0 supports four isolation levels: read-uncommitted, read-committed, repeatable read and serializable.
  • Reconstruction of database kernel and optimization of the plan tree generation logic: in gStore 1.0, two types of join operations (worst-case-optimal joins and binary joins) are introduced to optimize query execution and further improve query efficiency.
  • Optimized logging module: based on the log4cplus library, the system logs can be output in a unified format. Users can configure the log output mode (console output or file output), output format, and output level.
  • New built-in advanced functions: gStore 1.0 supports four new advanced functions, namely triangleCounting, closenessCentrality, bfsCount and kHopEnumeratePath.
  • Extended support for BIND statements: gStore 1.0 supports assigning values to variables using algebraic or logical expressions in BIND statements.
  • Optimization of some local commands and API interfaces (e.g., the shutdown command), and fixing a series of bugs (e.g., more accurate gmonitor statistics).

0.9.1:2021-11-25

New features in gStore 0.9.1 are listed as follows:

  • Decoupling the parsing and execution of queries in kernel, and further improvements on the query performance through optimized join ordering and other techniques. On complex queries, the performance is improved by over 40%.
  • Rewriting of the HTTP service component, ghttp, with improved robustness and the addition of functions such as user permission, heartbeat detection, batch import, and batch deletion; API documents are added.
  • Implementation of the Personalized PageRank (PPR) extension function, which can be invoked in the SELECT clause to calculate the correlation between entities.
  • Support for arithmetic operations (e.g., ?x + ?y = 5) in the FILTER clause.
  • Support for transactional operations, such as begin, tquery (transactional query), commit, and rollback;
  • A new executive component, gserver, is added to provide another pathway for remote access of gStore aside from the ghttp component, which implements two-way communication via the socket API.
  • Unification of the format of command line arguments of executive components. The --help option is uniformly introduced (e.g., $ bin/gbuild --help or $ bin/gbuild -h), by which users can view the command manual including the meaning of each option.
  • A number of bug fixes.

0.9:2021-02-10

New features in version 0.9 include:

  • Upgrade of the SPARQL parser generator from ANTLR v3 to the newest, well-documented and well-maintained v4;
  • Support for writing numeric literals without datatype suffixes in SPARQL queries;
  • Support for arithmetic and logical operators in SELECT clause;
  • Support for the aggregates SUM, AVG, MIN and MAX in SELECT clause;
  • Additional support for built-in functions functions in FILTERs, including datatype, contains, ucase, lcase, strstarts, now, year, month, day, and abs;
  • Support for path-related functions as an extension of SPARQL 1.1, including cycle detection, shortest paths and K-hop reachability;
  • Support for full & incremental backup and recovery of databases, and automatic full backup can be enabled upon admin configuration;
  • Support for log-based rollback opertions;
  • Support for transactions with three levels of isolation: read committed, snapshot isolation and serializable;
  • Expanding data structures to hold large-scale graphs of up to five billion triples.

Advanced Help

If you want to understand the details of the gStore system, or you want to try some advanced operations(for example, using the API, server/client), please see the chapters below.


Other Business

Bugs are recorded in BUG REPORT. You are welcomed to submit the bugs through Community Web questioning when you discover if they do not exist in this file.

We have written a series of short essays addressing recurring challenges in using gStore to realize applications, which are placed in Recipe Book.

You are welcome to report any advice or errors in the github Issues part of this repository, if not requiring in-time reply. However, if you want to urgent on us to deal with your reports, please email to gstore@pku.edu.cn to submit your suggestions and report bugs. A full list of our whole team is in Mailing List.

There are some restrictions when you use the current gStore project, you can see them on Limit Description.

Sometimes you may find some strange phenomena(but not wrong case), or something hard to understand/solve(don't know how to do next), then do not hesitate to visit the Frequently Asked Questions page.

Graph database engine is a new area and we are still trying to go further. Things we plan to do next is in Future Plan chapter, and we hope more and more people will support or even join us. You can support in many ways:

  • watch/star our project

  • fork this repository and submit pull requests to us

  • download and use this system, report bugs or suggestions

  • ...

People who inspire us or contribute to this project will be listed in the Thanks List chapter.