Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue #1482 initial contribution of production in-memory backend #1483

Merged

Conversation

dk-github
Copy link
Contributor

@dk-github dk-github commented Mar 7, 2019

We would like to contribute an improved implementation of in-memory backend for JanusGraph.

Please see issue #1482 for rationale and some comparative analysis vs existing "test-only" in-memory backend.

NOTE: This PR currently suggests adding the new backend rather than modifying the existing one, as it seemed cleaner and easier to compare performance within one codebase branch.
However, having read recent discussions about adding more backends into main repository vs maintenance costs, I am starting to think that actually modifying the existing in-memory backend could be a better idea.
This is because:

  • the modifications are fairly straightforward and do not change the general structure of existing backend
  • don’t add too much extra code or any external dependencies
  • don’t require any additional documentation or setup instructions, at least initially
  • all existing (and a few extra) integrity tests pass and using it in place of current version is unlikely to create any issues for current (test-only) uses
  • the same backend will simply be fit for production use, no new backends to maintain

Thoughts?

@janusgraph-bot janusgraph-bot added the cla: no This PR is not compliant with the CLA label Mar 7, 2019
@janusgraph-bot
Copy link

Committer of one or more commits is not listed as a CLA signer, either individual or as a member of an organization.

@dk-github
Copy link
Contributor Author

Committer of one or more commits is not listed as a CLA signer, either individual or as a member of an organization.

In fact I should be listed as per this: JanusGraph/legal#128 - so not sure what it complains about...

@pluradj
Copy link
Member

pluradj commented Mar 7, 2019

The problem is likely that the Author email on your commit does not match the Signed-off-by email. You can follow these instructions to amend it.

Author: Dmitry Kovalev <Dmitry.Kovalev@gs.com>
Date:   Fri Mar 1 13:56:37 2019 +0000

    initial contribution of CompactInMemoryStoreManager - an improved in-memory backend for JanusGraph, which offers significantly reduced memory footprint compared to "classic" one, and is suitable for production use
    
    Signed-off-by: Dmitry Kovalev <dk.global@gmail.com>

@mbrukman mbrukman self-assigned this Mar 7, 2019
@mbrukman
Copy link
Member

mbrukman commented Mar 7, 2019

Yes, there's a mismatch between Author and Signed-off-by, but interestingly enough, the DCO check passed, but it is the CLA check from @janusgraph-bot that failed, and it currently does not look at DCO (there's an outstanding feature request to add it, but since it's not yet there, we're using an external service).

There's potentially an issue because the Author and Committer are different here (though that should not invalidate the CLA check); I'm looking into this right now.

@janusgraph-bot janusgraph-bot added cla: yes This PR is compliant with the CLA and removed cla: no This PR is not compliant with the CLA labels Mar 7, 2019
@mbrukman
Copy link
Member

mbrukman commented Mar 7, 2019

I think I see what the problem is; the CLA and the commit are not using the same capitalization for the email address. I anticipated this would happen when I saw your CLA submitted with upper-cased first-name/last-name in your emails, as I suspected some folks would actually lowercase their emails (and some folks on that thread already did).

In fact, I anticipated it so well that I already implemented case-insensitive email matching for CLA authors and committers (see google/code-review-bot#16); the issue here was that I hadn't actually rebuilt the binary used by @janusgraph-bot to include this latest change.

I just pushed a new build, so if my analysis is correct, the CLA issue should be resolved shortly.

@mbrukman mbrukman removed their assignment Mar 7, 2019
@dk-github
Copy link
Contributor Author

It worked indeed, thanks @mbrukman! There were internal reasons for capitals in the emails - apologies for causing trouble. But as per e-mail standard the addresses should be case-insensitive so IMO it was a generally useful fix anyway.

@JanusGraph JanusGraph deleted a comment Mar 15, 2019
@dk-github dk-github force-pushed the issue1482-compact-inmemory-backend branch from 3ead653 to a7554ea Compare March 15, 2019 00:11
@JanusGraph JanusGraph deleted a comment Mar 15, 2019
@JanusGraph JanusGraph deleted a comment Mar 15, 2019
@JanusGraph JanusGraph deleted a comment Mar 15, 2019
@JanusGraph JanusGraph deleted a comment Mar 15, 2019
@JanusGraph JanusGraph deleted a comment Mar 15, 2019
@JanusGraph JanusGraph deleted a comment Mar 15, 2019
@JanusGraph JanusGraph deleted a comment Mar 15, 2019
@JanusGraph JanusGraph deleted a comment Mar 15, 2019
@JanusGraph JanusGraph deleted a comment Mar 15, 2019
@JanusGraph JanusGraph deleted a comment Mar 15, 2019
@JanusGraph JanusGraph deleted a comment Mar 15, 2019
@JanusGraph JanusGraph deleted a comment Mar 15, 2019
@JanusGraph JanusGraph deleted a comment Mar 15, 2019
@dk-github
Copy link
Contributor Author

just a gentle weekly reminder - would be grateful if someone could review/approve

Copy link
Member

@porunov porunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dk-github Thank you for the contribution! It is a valuable improvement.

I just did a first very short glance on the code. There are places where Codacy commented. I agree with them. Also, there are some small suggestions from me. I didn't check the whole code and didn't check the logic. I think it may take some time.

@dk-github dk-github force-pushed the issue1482-compact-inmemory-backend branch 2 times, most recently from 86cfaa1 to 59a03da Compare January 7, 2020 00:03
@dk-github
Copy link
Contributor Author

many thanks @porunov for taking time to revisit this PR

I have only a small one (possibly nitpick) question here #1483 (comment)

replied to that comment, hopefully it answers the question

Also, could you please squash you commits into a single one?

done. it has woken up Codacy again though. the same code which it deemed to be "ok" when it was in the form of 2 commits, it now again thinks is "not up to standard" :)

please let me know if you want me to address any of its comments specifically - preferably which ones exactly...

Other then that, I don't have any more concerns.

great, does that mean that you are ready to approve?

Copy link
Member

@porunov porunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dk-github There are some Codacy complains which are really easy to resolve. Like remove Logger dependencies where it isn't used. Also, I see it complains on Assertion dependencies (even so assertions are used). I think, it might expect import static org.junit.jupiter.api.Assertions.assertEquals; instead of import static org.junit.jupiter.api.Assertions.*; (not sure about it. You can try it, but I am OK with both implementations .* and .assertEquals ).
Complains about long methods are fair enough, but sometimes they are hard to be re-factored. If you can split those methods on smaller methods, that would be good. If it is quite complicated, I am OK with current implementation.
A point about raw exceptions is valid also, but we use a lot of them right now. Because we don't have much style right now, I think it is optional as of committers decision (you can use raw exceptions or create custom exceptions).

So, overall it LGTM. Those tiny flows mentioned by Codacy are optional as for me. You may fix them or leave them as per your opinion.

@porunov
Copy link
Member

porunov commented Jan 8, 2020

@amcp @farodin91 @FlorianHockmann will you be able to re-review this PR?

@FlorianHockmann
Copy link
Member

@amcp @farodin91 @FlorianHockmann will you be able to re-review this PR?

I think @farodin91 already made a first pass through this and if he and you both accept this PR, then I'd say that we just merge it. I probably won't have the time to review this massive PR adequately. So, I don't want to block this.
Given that we have only recommended the in-memory backend for testing purposes until now, this can't break any production usages. Since all tests also still pass with this, the in-memory backend seems to still fulfil its original purpose. Therefore this PR isn't as critical as it may seem at first. (With critical I don't mean that it's not useful, only that it shouldn't break anything.)

Copy link
Contributor

@farodin91 farodin91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dk-github Thank you for this huge contribution.

nit: It would good to remove all unnecessary imports.

Signed-off-by: Dmitry Kovalev <dk.global@gmail.com>
@dk-github dk-github force-pushed the issue1482-compact-inmemory-backend branch from 59a03da to 988c272 Compare January 10, 2020 22:31
@FlorianHockmann
Copy link
Member

Codacy Here is an overview of what got changed by this pull request:

Issues
======
- Added 11
           

Complexity increasing per file
==============================
- janusgraph-inmemory/src/test/java/org/janusgraph/diskstorage/inmemory/BufferPageTest.java  2
- janusgraph-inmemory/src/main/java/org/janusgraph/diskstorage/inmemory/InMemoryStoreManager.java  2
- janusgraph-inmemory/src/main/java/org/janusgraph/diskstorage/inmemory/BufferPage.java  14
- janusgraph-inmemory/src/main/java/org/janusgraph/diskstorage/inmemory/InMemoryKeyColumnValueStore.java  9
- janusgraph-inmemory/src/test/java/org/janusgraph/diskstorage/inmemory/MultiPageEntryBufferTest.java  1
- janusgraph-inmemory/src/main/java/org/janusgraph/diskstorage/inmemory/SinglePageEntryBuffer.java  7
- janusgraph-inmemory/src/main/java/org/janusgraph/diskstorage/inmemory/SharedEntryBufferFragmentationReport.java  1
- janusgraph-inmemory/src/test/java/org/janusgraph/diskstorage/inmemory/InMemoryColumnValueStoreTest.java  9
- janusgraph-inmemory/src/main/java/org/janusgraph/diskstorage/inmemory/MemoryEntryList.java  2
- janusgraph-inmemory/src/main/java/org/janusgraph/diskstorage/inmemory/InMemoryKeyColumnValueStoreFragmentationReport.java  4
- janusgraph-inmemory/src/main/java/org/janusgraph/diskstorage/inmemory/InMemoryColumnValueStore.java  9
- janusgraph-inmemory/src/main/java/org/janusgraph/diskstorage/inmemory/BufferPageUtils.java  5
- janusgraph-inmemory/src/test/java/org/janusgraph/diskstorage/inmemory/TestPagedBufferColumnValueStore.java  1
- janusgraph-inmemory/src/test/java/org/janusgraph/diskstorage/inmemory/InMemoryStoreManagerTest.java  2
- janusgraph-inmemory/src/main/java/org/janusgraph/diskstorage/inmemory/MultiPageEntryBuffer.java  22
         

Clones added
============
- janusgraph-inmemory/src/test/java/org/janusgraph/diskstorage/inmemory/InMemoryColumnValueStoreTest.java  7
         

See the complete overview on Codacy

@JanusGraph JanusGraph deleted a comment from farodin91 Jan 10, 2020
return newPages;
}

public List<BufferPage> merge(Entry[] add, int iaddp, int addLimit, Entry[] del, int idelp, int delLimit, int maxPageSize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.debug("finished writing chunk " + filePath + " " + Thread.currentThread().getName());
}
} catch (IOException ex) {
throw new RuntimeException(ex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

@Override
public void mutate(Entry[] add, Entry[] del, int maxPageSize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public static final String VERY_END = "zz";

@Test
public void testRoundtrip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

@Override
public EntryList getSlice(KeySliceQuery query) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public class InMemoryStoreManagerTest
{
@Test
public void testStoreCycle() throws Exception
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try {
newStores.put(storePath.getFileName().toString(), InMemoryKeyColumnValueStore.readFrom(storePath, storePath.getFileName().toString(), parallelOperationsExecutor));
} catch (IOException ex) {
throw new RuntimeException(ex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

@Override
public void mutate(Entry[] add, Entry[] del, int maxPageSize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -104,17 +135,225 @@ public void close() throws BackendException {
kcv.clear();
}

public InMemoryKeyColumnValueStoreFragmentationReport createFragmentationReport(StoreTransaction txh) throws BackendException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

@Test
public void testPagingAndFragmentation() throws TemporaryLockingException
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


return numKcvs;
} catch (Exception ex) {
throw new RuntimeException("Problem while reading chunk " + filePath + " of store " + store.getName(), ex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@porunov porunov dismissed amcp’s stale review January 10, 2020 22:38

I believe your previous comments were resolved / answered, so I am dismissing request changes

@dk-github
Copy link
Contributor Author

dk-github commented Jan 10, 2020

Thank you @porunov @farodin91 . I have removed the unused imports/changed wildcards to specific imports - this seems to have pacified codacy a little bit.

Please let me know if I need to address anything else, or it is good to be merged.

@porunov
Copy link
Member

porunov commented Jan 10, 2020

@dk-github I think you can merge this PR after Travis tests are finished

@dk-github dk-github merged commit a5a403f into JanusGraph:master Jan 11, 2020
@dk-github
Copy link
Contributor Author

Ok, the build passed and I have merged it now.

Many thanks to everyone who took time to review it, hope it will add some value to the project!

I will let it rest for a while, and then raise a small PR to update documentation about in-memory backend production usage and limitations.

Also later this year may raise a number of small PRs with minor refactorings/improvements as/when we get to work on upgrading our system to janusgraph 0.5.0 and switching from (slightly more feature-rich) custom in-house backend to the contributed version.

@dk-github dk-github deleted the issue1482-compact-inmemory-backend branch January 11, 2020 18:17
LinhaoZhu added a commit to LinhaoZhu/janusgraph that referenced this pull request Feb 5, 2020
* Issue JanusGraph#1871: Close graph instance at end of mapper run.

Signed-off-by: Ted Wilmes <ted.wilmes@experoinc.com>

* Fixed broken dist docker-compose (updated to supported ES version)

Signed-off-by: Michal Podstawski <mpodstawski@gmail.com>

* Minor code cleanup (NPEs etc)

Signed-off-by: Michal Podstawski <mpodstawski@gmail.com>

* Spelling fixes

* actually
* amend
* assumed
* backend
* cassandra
* centric
* check
* cohabitors
* configured
* conjunction
* connections
* containing
* control
* currently
* default
* disabled-the
* exhaust
* existing
* explicitly
* generation
* geoshape
* graph-class
* graph
* gremlin
* implement
* increment
* information
* initial
* instance
* interfaces
* it's
* janus
* labels
* levenshtein
* logies
* message
* nonexistent
* overridden
* overriding
* parameterized
* params
* parsable
* partitioner
* password
* payload
* persistent
* preceded
* propagate
* pseudo
* recommended
* requires
* sequence
* submission
* temporary
* truststore
* unknown
* upgrading
* version
* writing

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* Add cell ttl support to BerkeleyDB

Signed-off-by: Pavel Ershov <owner.mad.epa@gmail.com>

* Improve added relations containers JanusGraph#1700

Signed-off-by: Pavel Ershov <owner.mad.epa@gmail.com>

* Refactor in ES module

Signed-off-by: Michał Podstawski <mpodstawski@gmail.com>

* Add fixes for TP tests on berkeley backend

Signed-off-by: Pavel Ershov <owner.mad.epa@gmail.com>

* Add fixes for TP tests on berkeley backend

Signed-off-by: Pavel Ershov <owner.mad.epa@gmail.com>

* Add log4j.properties to inmemory

Signed-off-by: Jan Jansen <jan.jansen@gdata.de>

* JANUSGRAPH-1866 Filter out only system vertices in Hadoop Vertex deserializer

Remove erroneously added unused import.
Test that schema vertices are skipped.
Hadoop vertices deserialization should skip schema vertices that are created implicitly when defining schema elements like labels.
Correct tests for HBase Snapshot input format.
Snapshot should be taken before reading the graph in order to have anything to read from.

Signed-off-by: Evgeniy Ignatiev <yevgeniy.ignatyev@gmail.com>

* Update Copyright year in documentation CTR [doc only]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* Extract JanusGraph Gremlin driver requirements

* Predicates
* Geoshape
* RelationIdenitifier

Signed-off-by: Jan Jansen <jan.jansen@gdata.de>

* * Improve the CQLIterator performance by using getPagingStateUnsafe (
this should avoid md5sum calculation of resultset)

Signed-off-by: Ganesh Guttikonda <gguttikonda@snapfish-llc.com>

* Update to TinkerPop 3.4.4

Fixes JanusGraph#1617

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* upgrading inmemory backend storage layout to reduce memory footprint (JanusGraph#1483)

Signed-off-by: Dmitry Kovalev <dk.global@gmail.com>

* Add testcontainers support for cassandra [full build]

Fixes JanusGraph#1475

* Update jacoco
* Cleanup pom.xml
* Introduce profiles for Cassandra
* Update TESTING.md

Signed-off-by: Jan Jansen <jan.jansen@gdata.de>

* Add 'Getting Started' guide to documentation [doc only]

Signed-off-by: Florian Grieskamp <florian.grieskamp@gdata.de>

* Fix installation docs missing hadoop-2 in examples CTR [doc only]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* JanusGraph release 0.3.3 [full build]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* JanusGraph release 0.4.1 [full build]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* [doc only] Updated in-memory backend documentation (JanusGraph#1934)

to explain possible production use cases, limitations and alternatives (issue JanusGraph#1929)

Signed-off-by: Dmitry Kovalev <dk.global@gmail.com>

* Split up hadoop implementations [full build]

Signed-off-by: Jan Jansen <jan.jansen@gdata.de>

* Fix inmemory docs format CTR [doc only]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* Bump jackson2.version from 2.6.6 to 2.10.2

Fixes JanusGraph#1307

Signed-off-by: Jan Jansen <jan.jansen@gdata.de>

* Bump v0.3 branch to 0.3.4-SNAPSHOT CTR [doc only]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

* Bump v0.4 branch to 0.4.2-SNAPSHOT CTR [doc only]

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

Co-authored-by: Ted Wilmes <twilmes@gmail.com>
Co-authored-by: micpod <57301006+micpod@users.noreply.github.com>
Co-authored-by: Josh Soref <jsoref@users.noreply.github.com>
Co-authored-by: Pavel <owner.mad.epa@gmail.com>
Co-authored-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
Co-authored-by: Jan Jansen <farodin91@users.noreply.github.com>
Co-authored-by: Evgeniy Ignatiev <YevIgn@users.noreply.github.com>
Co-authored-by: gani8780 <gguttikonda@snapfish-llc.com>
Co-authored-by: Dmitry Kovalev <dk.global@gmail.com>
Co-authored-by: rngcntr <7890887+rngcntr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This PR is compliant with the CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants