Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRIDS Driver #221

Merged
merged 73 commits into from
Dec 23, 2019
Merged

GRIDS Driver #221

merged 73 commits into from
Dec 23, 2019

Conversation

kellrott
Copy link
Member

@kellrott kellrott commented Nov 16, 2019

New alpha graph database driver that utilizes both key-value store and hash store in an attempt to optimize performance.

This patch adds the incorporates new bulk load mechanisms as well as a new GRID (GRaph Information Database) driver.

The basic improvements in the GRIDs driver are:

  • It utilizes an internal vertex and edge key system (rather than user provided GIDs), these internal keys are based on uint64. User GIDs are translated on the fly to the internal keys, which are easier to parse and manipulate.
  • Vertex and edge data is now stored in a hash table, based on github.com/akrylysov/pogreb. Edge indices, vertex lists, and index data is stored in the badger KV store.
  • Data attached to edges and vertices is only loaded when needed.
  • It has a query pipeline analysis step to identify multiple sequential traversal steps occur, so that it switches to a separate processor that loads no data an works completely in the uint64 keyspaces

Changes include:

  • Adding BulkAdd method to GDB graph interface. Which allows for a stream of graph elements to be added. Most drivers defer to new util.StreamBatch method to push streamed changes in as chunked transations
  • Badger specialized BulkAdd that uses the Badger batch write method
  • Added rate counter output to kvload tool
  • Re-worked pipeline compiler so that it more modular, and easier to augment.
  • Added engine/inspect package that provides pipeline introspection tools for identifying areas of optimization
  • Updated several dependencies
  • Added several benchmarking tests under grids/benchmark to understand how much different data access operations cost
  • Added the concept of 'paths' which are several sequential steps in a traversal, ie out-out-out or out-hasLabel(x)-out-out-hasLabel(y). These are mostly used to identify multiple steps where loading the data attached to the vertices and edges is not required.
  • Added python3.7 support to the conformance test

@adamstruck
Copy link
Contributor

adamstruck commented Dec 19, 2019

The following files need to be modified so that they use grip's logger rather than logrus:

  • grids/benchmark/insert_test.go
  • grids/benchmark/edge_test.go
  • grids/index.go
  • grids/compiler.go
  • grids/new.go
  • grids/graph.go
  • grids/schema.go
  • engine/inspect/inspect.go

grids/graph.go Outdated Show resolved Hide resolved
grids/graph.go Outdated Show resolved Hide resolved
grids/graph.go Outdated Show resolved Hide resolved
grids/graph.go Outdated Show resolved Hide resolved
grids/graph.go Outdated Show resolved Hide resolved
@kellrott kellrott merged commit 79779fd into master Dec 23, 2019
@kellrott kellrott deleted the grids branch October 6, 2020 05:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants