New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Update in-memory backend documentation #1934

Merged

dk-github merged 1 commit into JanusGraph:master from dk-github:issue_1929_update_inmemory_backend_docs

Jan 27, 2020

Contributor

dk-github commented Jan 20, 2020 •

edited

to explain possible production use cases, limitations and alternatives (issue #1929) [doc only]

Signed-off-by: Dmitry Kovalev dk.global@gmail.com

Thank you for contributing to JanusGraph!

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

Is there an issue associated with this PR? Is it referenced in the commit message?
Does your PR body contain #xyz where xyz is the issue number you are trying to resolve?
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?

For documentation related changes:

Have you ensured that format looks appropriate for the output in which it is rendered?
If this PR is a documentation-only change, have you added a [doc only]
tag to the first line of your commit message to avoid spending CPU cycles in
Travis CI when no code, tests, or build configuration are modified?

Note:

Please ensure that once the PR is submitted, you check Travis CI for build issues and submit an update to your PR as soon as possible.

janusgraph-bot added the cla: yes label

dk-github requested a review from porunov

January 20, 2020 20:37

dk-github force-pushed the issue_1929_update_inmemory_backend_docs branch from fe9a875 to a2bb322 Compare

January 20, 2020 20:40

dk-github mentioned this pull request

Clarification on InMemory Storage Backend Documentation #1836

Closed

porunov approved these changes

View reviewed changes

Member

porunov left a comment

LGTM. Thank you @dk-github !

porunov added this to the Release v0.5.0 milestone

li-boxuan reviewed

View reviewed changes

Member

li-boxuan left a comment

Nice documentation! Helps me understand JanusGraph better. Just a few points:

docs/storage-backend/inmemorybackend.md Outdated

               Shutting down the graph or terminating the process that hosts the
               JanusGraph graph will irrevocably delete all data from the graph. This
               backend is local to a particular JanusGraph graph instance and cannot be
               shared across multiple JanusGraph graphs.
               Ideal Use Case
               --------------
+              Rapid testing:

Member

li-boxuan Jan 21, 2020

Can you maybe use markdown ### syntax so that the generated doc website can show the table of contents correctly like https://docs.janusgraph.org/storage-backend/cassandra/#setup-cassandra-cluster?

Contributor Author

dk-github Jan 24, 2020

done

docs/storage-backend/inmemorybackend.md

+              - Loss of data due to _unexpected_ death of host process is acceptable (the backend provides a simple mechanism for making fast snapshots to handle _expected_ restarts)
+              - Size of the graph data makes it possible to host it in a single JVM process (i.e. a few tens of Gigabytes max, unless you use a specialized JVM and hardware)
+              - Higher performance is required, but no expertise/resources available to tune more complex backends. Due to its memory-only nature, in-memory backend typically performs faster than disk-based ones, in queries using simple indices
+              and in graph modifications. However it is not specifically optimized for performance, and does not support advanced indexing functionality.

Member

li-boxuan Jan 21, 2020

It is not very clear to me whether 'advanced indexing functionality' refers to any index, or mixed index (which requires a index backend, thus can be considered as 'advanced') only

Contributor Author

dk-github Jan 24, 2020

it vaguely refers to both the fact that there is no in-memory index backend (and so no mixed indices if using just in-memory backend), and also the fact that simple indices are basically taking advantage of the order of the column and row keys, and basically do binary search. There are no fancy additional data structures for indices specifically, and there is no database whose native indexing can be utilised.

docs/storage-backend/inmemorybackend.md

+              Limitations:
+              - Obviously the scalability is limited to the heap size of a single JVM, and no transparent resilience to failures is offered
+              - The backend offers store-level locking only, whereas a Janusgraph transaction typically changes multiple stores (e.g. vertex store and index store).

Member

li-boxuan Jan 21, 2020

Can you elaborate what 'store-level locking' is? Does it mean 'JanusGraphManagement.setConsistency(element, ConsistencyModifier.LOCK)' won't work for this backend?

Contributor Author

dk-github Jan 24, 2020

It means that it can use java read-write lock to lock the individual store it modifies, for the duration of the modification, to prevent concurrent modifications on the same store. But it does not attempt to lock all the stores involved in one Janusgraph/backend transaction, and so individual store updates from parallel transactions being committed can interleave.

So say tx1 updates the index store, but before it can update the vertex store accordingly, tx2 updates the index store with its own data. As long as this data is completely different from the tx1's, this is actually fine and allows to reduce the lock contention significantly. But if they modify the same data - you can get "ghost vertices" or "missing vertices" etc.

Note that naively implementing locking all the stores at the beginning of the transaction can lead to deadlocks unless they are locked in the same order, and can lead to high contention in case of big number of parallel transactions, because half of the stores are going to be locked all the time. Basically, implementing this correctly and efficiently would be akin to implementing a robust in-memory database engine, which is not the intent of this backend at all.

docs/storage-backend/inmemorybackend.md

+              however this can happen - e.g. when a large heap nears saturation and the GC pause exceeds configured backend timeout.
+              - The data layout used by the backend can theoretically be susceptible to fragmentation in certain scenarios
+               (with a lot of add/delete operations), thus reducing the amount of useful data that can be stored in a heap
+                of specified size. The backend provides simple mechanisms to report fragmentation and defragment the storage if required.

Member

li-boxuan Jan 21, 2020

Are those mechanisms documented somewhere? It would be nice to have a short description on how these mechanisms can be set up/activated.

Contributor Author

dk-github Jan 24, 2020

There currently is no support for activating these via configuration - one would have to have a little dig in the code to see how it can be invoked when needed. As and if some common patterns of how it can be applied for generic use case emerge - these patterns can be made configurable. For now, it can just be ignored, unless for a specific use case there is a suspicion that fragmentation might be a problem.

docs/storage-backend/inmemorybackend.md Outdated Show resolved Hide resolved


          [doc only] Updated in-memory backend documentation

f869a5e

to explain possible production use cases, limitations and alternatives (issue JanusGraph#1929)

Signed-off-by: Dmitry Kovalev <dk.global@gmail.com>

dk-github force-pushed the issue_1929_update_inmemory_backend_docs branch from 3f2561e to f869a5e Compare

January 24, 2020 14:08

Contributor Author

dk-github commented Jan 24, 2020

thank you @porunov, will merge this early next week if there are no objections of further comments

dk-github merged commit 8b339df into JanusGraph:master

dk-github deleted the issue_1929_update_inmemory_backend_docs branch

January 27, 2020 09:47

porunov mentioned this pull request

Document production in memory storage #1929

Closed

LinhaoZhu added a commit to LinhaoZhu/janusgraph that referenced this pull request


          Update the latest code (#1)

058a4a9

* Issue JanusGraph#1871: Close graph instance at end of mapper run.

Signed-off-by: Ted Wilmes <ted.wilmes@experoinc.com>

* Fixed broken dist docker-compose (updated to supported ES version)

Signed-off-by: Michal Podstawski <mpodstawski@gmail.com>

* Minor code cleanup (NPEs etc)

Signed-off-by: Michal Podstawski <mpodstawski@gmail.com>

* Spelling fixes

* actually
* amend
* assumed
* backend
* cassandra
* centric
* check
* cohabitors
* configured
* conjunction
* connections
* containing
* control
* currently
* default
* disabled-the
* exhaust
* existing
* explicitly
* generation
* geoshape
* graph-class
* graph
* gremlin
* implement
* increment
* information
* initial
* instance
* interfaces
* it's
* janus
* labels
* levenshtein
* logies
* message
* nonexistent
* overridden
* overriding
* parameterized
* params
* parsable
* partitioner
* password
* payload
* persistent
* preceded
* propagate
* pseudo
* recommended
* requires
* sequence
* submission
* temporary
* truststore
* unknown
* upgrading
* version
* writing

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* Add cell ttl support to BerkeleyDB

Signed-off-by: Pavel Ershov <owner.mad.epa@gmail.com>

* Improve added relations containers JanusGraph#1700

Signed-off-by: Pavel Ershov <owner.mad.epa@gmail.com>

* Refactor in ES module

Signed-off-by: Michał Podstawski <mpodstawski@gmail.com>

* Add fixes for TP tests on berkeley backend

Signed-off-by: Pavel Ershov <owner.mad.epa@gmail.com>

* Add fixes for TP tests on berkeley backend

Signed-off-by: Pavel Ershov <owner.mad.epa@gmail.com>

* Add log4j.properties to inmemory

Signed-off-by: Jan Jansen <jan.jansen@gdata.de>

* JANUSGRAPH-1866 Filter out only system vertices in Hadoop Vertex deserializer

Remove erroneously added unused import.
Test that schema vertices are skipped.
Hadoop vertices deserialization should skip schema vertices that are created implicitly when defining schema elements like labels.
Correct tests for HBase Snapshot input format.
Snapshot should be taken before reading the graph in order to have anything to read from.

Signed-off-by: Evgeniy Ignatiev <yevgeniy.ignatyev@gmail.com>

* Update Copyright year in documentation CTR [doc only]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* Extract JanusGraph Gremlin driver requirements

* Predicates
* Geoshape
* RelationIdenitifier

Signed-off-by: Jan Jansen <jan.jansen@gdata.de>

* * Improve the CQLIterator performance by using getPagingStateUnsafe (
this should avoid md5sum calculation of resultset)

Signed-off-by: Ganesh Guttikonda <gguttikonda@snapfish-llc.com>

* Update to TinkerPop 3.4.4

Fixes JanusGraph#1617

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* upgrading inmemory backend storage layout to reduce memory footprint (JanusGraph#1483)

Signed-off-by: Dmitry Kovalev <dk.global@gmail.com>

* Add testcontainers support for cassandra [full build]

Fixes JanusGraph#1475

* Update jacoco
* Cleanup pom.xml
* Introduce profiles for Cassandra
* Update TESTING.md

Signed-off-by: Jan Jansen <jan.jansen@gdata.de>

* Add 'Getting Started' guide to documentation [doc only]

Signed-off-by: Florian Grieskamp <florian.grieskamp@gdata.de>

* Fix installation docs missing hadoop-2 in examples CTR [doc only]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* JanusGraph release 0.3.3 [full build]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* JanusGraph release 0.4.1 [full build]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* [doc only] Updated in-memory backend documentation (JanusGraph#1934)

to explain possible production use cases, limitations and alternatives (issue JanusGraph#1929)

Signed-off-by: Dmitry Kovalev <dk.global@gmail.com>

* Split up hadoop implementations [full build]

Signed-off-by: Jan Jansen <jan.jansen@gdata.de>

* Fix inmemory docs format CTR [doc only]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* Bump jackson2.version from 2.6.6 to 2.10.2

Fixes JanusGraph#1307

Signed-off-by: Jan Jansen <jan.jansen@gdata.de>

* Bump v0.3 branch to 0.3.4-SNAPSHOT CTR [doc only]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* Bump v0.4 branch to 0.4.2-SNAPSHOT CTR [doc only]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

Co-authored-by: Ted Wilmes <twilmes@gmail.com>
Co-authored-by: micpod <57301006+micpod@users.noreply.github.com>
Co-authored-by: Josh Soref <jsoref@users.noreply.github.com>
Co-authored-by: Pavel <owner.mad.epa@gmail.com>
Co-authored-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
Co-authored-by: Jan Jansen <farodin91@users.noreply.github.com>
Co-authored-by: Evgeniy Ignatiev <YevIgn@users.noreply.github.com>
Co-authored-by: gani8780 <gguttikonda@snapfish-llc.com>
Co-authored-by: Dmitry Kovalev <dk.global@gmail.com>
Co-authored-by: rngcntr <7890887+rngcntr@users.noreply.github.com>

dk-github mentioned this pull request

PermanentLockingException while committing one of the concurrent transactions #2065

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment