Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send vertex label to user log processor for all mutations #3264

Conversation

porunov
Copy link
Member

@porunov porunov commented Oct 24, 2022

Fixes #3263
Related to #3155

Signed-off-by: Oleksandr Porunov alexandr.porunov@gmail.com


Thank you for contributing to JanusGraph!

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there an issue associated with this PR? Is it referenced in the commit message?
  • Does your PR body contain #xyz where xyz is the issue number you are trying to resolve?
  • Has your PR been rebased against the latest commit within the target branch (typically master)?
  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you written and/or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE.txt file, including the main LICENSE.txt file in the root of this repository?
  • If applicable, have you updated the NOTICE.txt file, including the main NOTICE.txt file found in the root of this repository?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

@porunov porunov added this to the Release v1.0.0 milestone Oct 24, 2022
@porunov porunov force-pushed the feature/log-processors-mutation-vertex-label branch from dff3e2c to 1976b7e Compare October 24, 2022 12:58
@janusgraph-bot janusgraph-bot added the cla: external Externally-managed CLA label Oct 24, 2022
@porunov porunov force-pushed the feature/log-processors-mutation-vertex-label branch 2 times, most recently from 41855fb to 875cad6 Compare October 24, 2022 16:38
Comment on lines +5062 to +5065
String vertexLabel = changes.getVertices(Change.ADDED).iterator().next().label();
assertEquals(testVertexLabel, vertexLabel);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This two lines would fail without the business logic change because vertexLabel here would be equal to vertex even so real label of the vertex label is testVertex. With this PR this vertex now returns correct vertex label.

Comment on lines +5084 to +5087
String vertexLabel = changes.getVertices(Change.REMOVED).iterator().next().label();
assertEquals(testVertexLabel, vertexLabel);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This two lines would fail without the business logic change because vertexLabel here would be equal to vertex even so real label of the vertex label is testVertex. With this PR this vertex now returns correct vertex label.

@porunov porunov requested a review from a team October 29, 2022 11:05
@porunov
Copy link
Member Author

porunov commented Oct 29, 2022

@JanusGraph/committers I have intention to merge this breaking change on Monday using lazy consensus if there are no reviews. In case you need more time, please, let me know.
This is a blocker for #3155

Copy link
Member

@li-boxuan li-boxuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change 👍 I just have some quick questions

@@ -171,6 +171,24 @@ A new optimization has been added to compute aggregations (min, max, sum and avg
If the index backend is Elasticsearch, a `double` value is used to hold the result. As a result, aggregations on long numbers greater than 2^53 are approximate.
In this case, if the accurate result is essential, the optimization can be disabled by removing the strategy `JanusGraphMixedIndexAggStrategy`: `g.traversal().withoutStrategies(JanusGraphMixedIndexAggStrategy.class)`.

##### Breaking change for transaction logs processing

[Transaction Log](advanced-topics/transaction-log.md) processing has a breaking change.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only affects transaction logs started by users, right? I think so but just wanted to make sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually changes the mutation of two transactions logs:

  1. JanusGraph's write-ahead transaction log which can be enabled by tx.log-tx = true.
  2. Configured user's transaction logs.

InternalVertex vertex = rel.getVertex(0);
VariableLong.writePositive(out,vertex.longId());
VertexLabel vertexLabel = vertex.vertexLabel();
VariableLong.writePositive(out, vertexLabel.hasId() ? vertexLabel.longId() : 0L);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine, and I believe we should cache vertex labels aggressively. But just to understand more about the intention here: cannot the receiver execute a backend query to fetch vertex label?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not guaranteed that there will be such vertex in the graph because the vertex might be already removed.

  1. It's guaranteed to be removed if we receive the change for removed vertex.
  2. Even if we received a change for updated / added vertex - it doesn't guarantee that this vertex still exists because someone could remove that vertex faster then the change is received by the processing JanusGraph instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the vertex is already removed, why do we need the vertex label on the receiver side? I guess my real question is, can you elaborate on the following statement?

Without knowing vertex label of the mutated vertex we won't be able to properly invalidate indices which are bound to specific vertex label.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To invalidate an index which is constrained to a specific label we need to know the label of the removed / updated / added vertex

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have db-cache enabled, and we have an index which was created with indexOnly(vertexLabel) and in case we cached some query results in indexStore then we will need vertex label + vertex id + vertex properties to invalidate related cached queries like: g.V().hasLabel("someVertexLabel").has("myProperty", "propValue")
We can't invalidate the above query knowing vertex id only

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those which use 'indexOnly' features won't be invalidated using vertex id + properties only. We need to have vertex label as well for that invalidation.

Gotcha. Do you have a code snippet showing how we could invalidate normal index entries in DB cache? I thought index entries with indexOnly constraints are no different from normal index entries (from cache perspective) but from what you said it's not the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@li-boxuan added testIndexWithIndexOnlyConstraintForceInvalidationFromDBCache() to show why it's not enough to have vertex id and properties to invalidate indexStore cache.
https://github.com/JanusGraph/janusgraph/compare/875cad608d82cd5e825a0fa6f96e4f79e8f261fd..48cd7aeb3cf776db04b34b80c69bd92988a79e2b

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure but I believe vertex label is not needed here.

In invalidateUpdatedVertexProperty, you called

Collection<IndexUpdate> indexUpdates = indexSerializer.getIndexUpdates(cacheVertex, Arrays.asList(propertyPreviousVal, propertyNewVal));

to fetch a collection of index records.

In IndexSerializer::getIndexUpdates, it skips an index entry candidate if it does not conform to the index only constraint:

This makes sense when we want to generate a new index entry, but it is not necessary when we try to remove an existing index entry. If you temporarily comment out this line, I believe your test case would pass even if the LogEntry does not contain vertex label.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@li-boxuan you are right. I guess in this case it's better to add another method to IndexSerializer which won't filter indices by indexOnly constraint.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@li-boxuan I opened the PR which adds possibility to get Index updates with disabled constraints here #3279

@farodin91
Copy link
Contributor

Just one question, what happens during an upgrade? How old janusgraph instances handle these messages?

@porunov
Copy link
Member Author

porunov commented Nov 1, 2022

Just one question, what happens during an upgrade? How old janusgraph instances handle these messages?

After updated you will need to start consuming new messages only. Thus, it's necessary that you stop any mutations during upgrade, consume all old logs and only then upgrade JanusGraph nodes. After that you can enable consumption again on new logs.
If old JanusGraph instances consume these logs then the result is unpredictable.
I wish we used a better structure for logs (like protobuf maybe) to be able to extend logs with new fields without introducing a breaking change but it looks like preference was to use better compaction for logs instead of better extensibility.

@farodin91
Copy link
Contributor

Just one question, what happens during an upgrade? How old janusgraph instances handle these messages?

After updated you will need to start consuming new messages only. Thus, it's necessary that you stop any mutations during upgrade, consume all old logs and only then upgrade JanusGraph nodes. After that you can enable consumption again on new logs.
If old JanusGraph instances consume these logs then the result is unpredictable.
I wish we used a better structure for logs (like protobuf maybe) to be able to extend logs with new fields without introducing a breaking change but it looks like preference was to use better compaction for logs instead of better extensibility.

We should add a note to that during upgrade you should stop mutation.

Fixes JanusGraph#3263

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
@porunov porunov force-pushed the feature/log-processors-mutation-vertex-label branch from 875cad6 to 48cd7ae Compare November 4, 2022 15:21
porunov added a commit to porunov/janusgraph that referenced this pull request Nov 7, 2022
@porunov
Copy link
Member Author

porunov commented Nov 7, 2022

The PR is obsolete because we are able to invalidate cached indexes without knowing vertex label as stated here: #3264 (comment)

Thus, closing this PR without merging it.

@porunov porunov closed this Nov 7, 2022
@porunov porunov removed this from the Release v1.0.0 milestone Nov 7, 2022
porunov added a commit that referenced this pull request Nov 7, 2022
Related #3155 #3263 #3264

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: external Externally-managed CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Send vertex label to user log processor for all mutations
4 participants