Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update in-memory backend documentation #1934

Merged

Conversation

dk-github
Copy link
Contributor

@dk-github dk-github commented Jan 20, 2020

to explain possible production use cases, limitations and alternatives (issue #1929) [doc only]

Signed-off-by: Dmitry Kovalev dk.global@gmail.com


Thank you for contributing to JanusGraph!

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there an issue associated with this PR? Is it referenced in the commit message?
  • Does your PR body contain #xyz where xyz is the issue number you are trying to resolve?
  • Has your PR been rebased against the latest commit within the target branch (typically master)?
  • Is your initial contribution a single, squashed commit?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?
  • If this PR is a documentation-only change, have you added a [doc only]
    tag to the first line of your commit message to avoid spending CPU cycles in
    Travis CI when no code, tests, or build configuration are modified?

Note:

Please ensure that once the PR is submitted, you check Travis CI for build issues and submit an update to your PR as soon as possible.

@janusgraph-bot janusgraph-bot added the cla: yes This PR is compliant with the CLA label Jan 20, 2020
@dk-github dk-github force-pushed the issue_1929_update_inmemory_backend_docs branch from fe9a875 to a2bb322 Compare January 20, 2020 20:40
Copy link
Member

@porunov porunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you @dk-github !

@porunov porunov added this to the Release v0.5.0 milestone Jan 20, 2020
Copy link
Member

@li-boxuan li-boxuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice documentation! Helps me understand JanusGraph better. Just a few points:

Shutting down the graph or terminating the process that hosts the
JanusGraph graph will irrevocably delete all data from the graph. This
backend is local to a particular JanusGraph graph instance and cannot be
shared across multiple JanusGraph graphs.

Ideal Use Case
--------------
Rapid testing:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe use markdown ### syntax so that the generated doc website can show the table of contents correctly like https://docs.janusgraph.org/storage-backend/cassandra/#setup-cassandra-cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- Loss of data due to _unexpected_ death of host process is acceptable (the backend provides a simple mechanism for making fast snapshots to handle _expected_ restarts)
- Size of the graph data makes it possible to host it in a single JVM process (i.e. a few tens of Gigabytes max, unless you use a specialized JVM and hardware)
- Higher performance is required, but no expertise/resources available to tune more complex backends. Due to its memory-only nature, in-memory backend typically performs faster than disk-based ones, in queries using simple indices
and in graph modifications. However it is not specifically optimized for performance, and does not support advanced indexing functionality.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not very clear to me whether 'advanced indexing functionality' refers to any index, or mixed index (which requires a index backend, thus can be considered as 'advanced') only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it vaguely refers to both the fact that there is no in-memory index backend (and so no mixed indices if using just in-memory backend), and also the fact that simple indices are basically taking advantage of the order of the column and row keys, and basically do binary search. There are no fancy additional data structures for indices specifically, and there is no database whose native indexing can be utilised.

Limitations:

- Obviously the scalability is limited to the heap size of a single JVM, and no transparent resilience to failures is offered
- The backend offers store-level locking only, whereas a Janusgraph transaction typically changes multiple stores (e.g. vertex store and index store).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate what 'store-level locking' is? Does it mean 'JanusGraphManagement.setConsistency(element, ConsistencyModifier.LOCK)' won't work for this backend?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that it can use java read-write lock to lock the individual store it modifies, for the duration of the modification, to prevent concurrent modifications on the same store. But it does not attempt to lock all the stores involved in one Janusgraph/backend transaction, and so individual store updates from parallel transactions being committed can interleave.

So say tx1 updates the index store, but before it can update the vertex store accordingly, tx2 updates the index store with its own data. As long as this data is completely different from the tx1's, this is actually fine and allows to reduce the lock contention significantly. But if they modify the same data - you can get "ghost vertices" or "missing vertices" etc.

Note that naively implementing locking all the stores at the beginning of the transaction can lead to deadlocks unless they are locked in the same order, and can lead to high contention in case of big number of parallel transactions, because half of the stores are going to be locked all the time. Basically, implementing this correctly and efficiently would be akin to implementing a robust in-memory database engine, which is not the intent of this backend at all.

however this can happen - e.g. when a large heap nears saturation and the GC pause exceeds configured backend timeout.
- The data layout used by the backend can theoretically be susceptible to fragmentation in certain scenarios
(with a lot of add/delete operations), thus reducing the amount of useful data that can be stored in a heap
of specified size. The backend provides simple mechanisms to report fragmentation and defragment the storage if required.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those mechanisms documented somewhere? It would be nice to have a short description on how these mechanisms can be set up/activated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There currently is no support for activating these via configuration - one would have to have a little dig in the code to see how it can be invoked when needed. As and if some common patterns of how it can be applied for generic use case emerge - these patterns can be made configurable. For now, it can just be ignored, unless for a specific use case there is a suspicion that fragmentation might be a problem.

docs/storage-backend/inmemorybackend.md Outdated Show resolved Hide resolved
to explain possible production use cases, limitations and alternatives (issue JanusGraph#1929)

Signed-off-by: Dmitry Kovalev <dk.global@gmail.com>
@dk-github dk-github force-pushed the issue_1929_update_inmemory_backend_docs branch from 3f2561e to f869a5e Compare January 24, 2020 14:08
@dk-github
Copy link
Contributor Author

thank you @porunov, will merge this early next week if there are no objections of further comments

@dk-github dk-github merged commit 8b339df into JanusGraph:master Jan 27, 2020
@dk-github dk-github deleted the issue_1929_update_inmemory_backend_docs branch January 27, 2020 09:47
LinhaoZhu added a commit to LinhaoZhu/janusgraph that referenced this pull request Feb 5, 2020
* Issue JanusGraph#1871: Close graph instance at end of mapper run.

Signed-off-by: Ted Wilmes <ted.wilmes@experoinc.com>

* Fixed broken dist docker-compose (updated to supported ES version)

Signed-off-by: Michal Podstawski <mpodstawski@gmail.com>

* Minor code cleanup (NPEs etc)

Signed-off-by: Michal Podstawski <mpodstawski@gmail.com>

* Spelling fixes

* actually
* amend
* assumed
* backend
* cassandra
* centric
* check
* cohabitors
* configured
* conjunction
* connections
* containing
* control
* currently
* default
* disabled-the
* exhaust
* existing
* explicitly
* generation
* geoshape
* graph-class
* graph
* gremlin
* implement
* increment
* information
* initial
* instance
* interfaces
* it's
* janus
* labels
* levenshtein
* logies
* message
* nonexistent
* overridden
* overriding
* parameterized
* params
* parsable
* partitioner
* password
* payload
* persistent
* preceded
* propagate
* pseudo
* recommended
* requires
* sequence
* submission
* temporary
* truststore
* unknown
* upgrading
* version
* writing

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* Add cell ttl support to BerkeleyDB

Signed-off-by: Pavel Ershov <owner.mad.epa@gmail.com>

* Improve added relations containers JanusGraph#1700

Signed-off-by: Pavel Ershov <owner.mad.epa@gmail.com>

* Refactor in ES module

Signed-off-by: Michał Podstawski <mpodstawski@gmail.com>

* Add fixes for TP tests on berkeley backend

Signed-off-by: Pavel Ershov <owner.mad.epa@gmail.com>

* Add fixes for TP tests on berkeley backend

Signed-off-by: Pavel Ershov <owner.mad.epa@gmail.com>

* Add log4j.properties to inmemory

Signed-off-by: Jan Jansen <jan.jansen@gdata.de>

* JANUSGRAPH-1866 Filter out only system vertices in Hadoop Vertex deserializer

Remove erroneously added unused import.
Test that schema vertices are skipped.
Hadoop vertices deserialization should skip schema vertices that are created implicitly when defining schema elements like labels.
Correct tests for HBase Snapshot input format.
Snapshot should be taken before reading the graph in order to have anything to read from.

Signed-off-by: Evgeniy Ignatiev <yevgeniy.ignatyev@gmail.com>

* Update Copyright year in documentation CTR [doc only]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* Extract JanusGraph Gremlin driver requirements

* Predicates
* Geoshape
* RelationIdenitifier

Signed-off-by: Jan Jansen <jan.jansen@gdata.de>

* * Improve the CQLIterator performance by using getPagingStateUnsafe (
this should avoid md5sum calculation of resultset)

Signed-off-by: Ganesh Guttikonda <gguttikonda@snapfish-llc.com>

* Update to TinkerPop 3.4.4

Fixes JanusGraph#1617

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* upgrading inmemory backend storage layout to reduce memory footprint (JanusGraph#1483)

Signed-off-by: Dmitry Kovalev <dk.global@gmail.com>

* Add testcontainers support for cassandra [full build]

Fixes JanusGraph#1475

* Update jacoco
* Cleanup pom.xml
* Introduce profiles for Cassandra
* Update TESTING.md

Signed-off-by: Jan Jansen <jan.jansen@gdata.de>

* Add 'Getting Started' guide to documentation [doc only]

Signed-off-by: Florian Grieskamp <florian.grieskamp@gdata.de>

* Fix installation docs missing hadoop-2 in examples CTR [doc only]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* JanusGraph release 0.3.3 [full build]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* JanusGraph release 0.4.1 [full build]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* [doc only] Updated in-memory backend documentation (JanusGraph#1934)

to explain possible production use cases, limitations and alternatives (issue JanusGraph#1929)

Signed-off-by: Dmitry Kovalev <dk.global@gmail.com>

* Split up hadoop implementations [full build]

Signed-off-by: Jan Jansen <jan.jansen@gdata.de>

* Fix inmemory docs format CTR [doc only]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* Bump jackson2.version from 2.6.6 to 2.10.2

Fixes JanusGraph#1307

Signed-off-by: Jan Jansen <jan.jansen@gdata.de>

* Bump v0.3 branch to 0.3.4-SNAPSHOT CTR [doc only]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* Bump v0.4 branch to 0.4.2-SNAPSHOT CTR [doc only]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

Co-authored-by: Ted Wilmes <twilmes@gmail.com>
Co-authored-by: micpod <57301006+micpod@users.noreply.github.com>
Co-authored-by: Josh Soref <jsoref@users.noreply.github.com>
Co-authored-by: Pavel <owner.mad.epa@gmail.com>
Co-authored-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
Co-authored-by: Jan Jansen <farodin91@users.noreply.github.com>
Co-authored-by: Evgeniy Ignatiev <YevIgn@users.noreply.github.com>
Co-authored-by: gani8780 <gguttikonda@snapfish-llc.com>
Co-authored-by: Dmitry Kovalev <dk.global@gmail.com>
Co-authored-by: rngcntr <7890887+rngcntr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This PR is compliant with the CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants