Skip to content
This repository has been archived by the owner on Apr 16, 2022. It is now read-only.

142-feature/make-ids-optional #155

Merged
merged 27 commits into from
Oct 18, 2018
Merged

Conversation

fabiencoppens
Copy link
Collaborator

Quick PR to allow ids to be optional when mutating the graph.

@@ -86,11 +87,14 @@ private Object getFieldValue(@NonNull Object domain, @NonNull String fieldName)
final ConvertingPropertyAccessor accessor = this.getPropertyAccessor(domain);
final GremlinPersistentEntity<?> persistentEntity = this.getPersistentEntity(domain.getClass());
final PersistentProperty property = persistentEntity.getPersistentProperty(fieldName);
Assert.notNull(property, "persistence property should not be null");
if (!fieldName.equals(Constants.PROPERTY_ID)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not workaround the PROPERTY_ID, as the field which has @Id without id name. And also if you want to support Id optional, this may not the right place from the method name.

Assert.isTrue(type == GremlinEntityType.EDGE || type == GremlinEntityType.VERTEX, "should be edge/vertex type");

final String prefix = (type == GremlinEntityType.VERTEX) ? "V" : "E";
if (id != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ID optional should not change the requiredId generation, should take care of id optional outside the caller.

@fabiencoppens
Copy link
Collaborator Author

fabiencoppens commented Sep 18, 2018

@Incarnation-p-lee Thanks for the feedback. This was indeed a bit quick-and-dirty, and I see now that a much better approach would be to add an optional @GeneratedValue annotation that can be combined with the @Id annotation, such as it is commonly done in JPA as well as in neo4j's OGM.
It is complex to implement, since, if vertex ids are optional, then we can't write the EdgeFrom and EdgeTo ids in the Edge object until the vertices have been created in the underlying datastore.
In particular, the following lines of code from GremlinSourceEdgeWriter.write() become problematic:

            } else if (field.getAnnotation(EdgeFrom.class) != null) {
                sourceEdge.setVertexIdFrom(this.getIdValue(object, converter));
            } else if (field.getAnnotation(EdgeTo.class) != null) {
                sourceEdge.setVertexIdTo(this.getIdValue(object, converter));

Basically we'd have to throw exceptions if trying to write an edge for which the in and out vertices have not already been created in the datastore (and therefore still have null ids)

@Incarnation-p-lee
Copy link
Contributor

@fabiencoppens
Yes, edge side will be big problem there. @Id optional from my opinion, means the id field should be involved in object instance without setting any value optionally. Then the server will generated one for the instance.
Based on below, we can save one instance without id value, and then return the same instance with id specified. Then the @EdgeFrom can obtain the id value when save operation.
That may be a little better, but not elegant enough here. I may need some time to figure out later.
Any suggestion or advice is welcome.

@fabiencoppens
Copy link
Collaborator Author

fabiencoppens commented Sep 19, 2018

@Incarnation-p-lee Here's what I suggest. As a first step, we could add the @GeneratedValue annotation that would indicate that the field is not user-defined, and can be left null in the entity until it is created in the datastore. As well as add the few code changes to cater for that. This doesn't solve the problem of edge writes failing in graph creation, but as a first step, we could make the edge writer throw an exception if the vertex ids are null, which would imply that, if users want to use DB-generated ids, they need to create graphs in 2 steps: first create the vertices, get their ids back from the datastore, then create the edges. As you mentioned, it's not elegant or ideal but it could be a first step to make this project compatible with JanusGraph (and any other Tinkerpop-compliant DBs that use generated ids).
As a 2nd step, I suggest taking a look at how spring-data-neo4j solves this problem, and seeing if something similar can be used here.

EDIT: We did a little bit of digging in neo4j's OGM, and we found the code where they handle this use case (vertex and edge creation in one transaction, where edges need the vertex ids to be created first). Here's the relevant class, line 63: https://github.com/neo4j/neo4j-ogm/blob/master/core/src/main/java/org/neo4j/ogm/session/request/RequestExecutor.java
If you follow the logic, you'll see that they check if the payload to be saved contains edges that have vertex requirements, and they create the vertices first, get the ids back, then proceed.

@codecov-io
Copy link

codecov-io commented Sep 20, 2018

Codecov Report

Merging #155 into master will decrease coverage by 2.03%.
The diff coverage is 70.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #155      +/-   ##
==========================================
- Coverage   90.22%   88.18%   -2.04%     
==========================================
  Files          57       59       +2     
  Lines        1166     1261      +95     
  Branches      184      210      +26     
==========================================
+ Hits         1052     1112      +60     
- Misses         39       54      +15     
- Partials       75       95      +20
Impacted Files Coverage Δ
...ta/gremlin/conversion/MappingGremlinConverter.java 88% <0%> (-4.86%) ⬇️
.../gremlin/conversion/source/GremlinSourceGraph.java 100% <100%> (ø) ⬆️
...in/repository/support/SimpleGremlinRepository.java 100% <100%> (ø) ⬆️
...n/repository/support/GremlinEntityInformation.java 81.81% <40%> (-7.84%) ⬇️
...lin/conversion/source/GremlinSourceEdgeWriter.java 71.87% <50%> (-8.9%) ⬇️
.../conversion/script/GremlinScriptLiteralVertex.java 84.31% <50%> (-1.69%) ⬇️
...in/conversion/script/GremlinScriptLiteralEdge.java 84.74% <50%> (-1.47%) ⬇️
...osoft/spring/data/gremlin/common/GremlinUtils.java 90.19% <60%> (-7.59%) ⬇️
...n/conversion/result/GremlinResultsGraphReader.java 72.41% <72.41%> (ø)
...oft/spring/data/gremlin/query/GremlinTemplate.java 89.13% <72.72%> (-3.62%) ⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ddcabc8...b9ed593. Read the comment docs.

@Incarnation-p-lee
Copy link
Contributor

@fabiencoppens
Is there any strong reason we need to add @GenerateValue for id field ? If just let the server side generate the id works well, I think addition Annotation here may make the things complicated.

@Incarnation-p-lee
Copy link
Contributor

@fabiencoppens
We think it is very careful when add new Annotation exported to user.
Yes, there is one Annotation in JPA, may NEO4j. I think it doesn't means we should add that Annotation because JPA or NEO4j does have. For opening discuss here, if attempt to add new Annotation, there must be something benefit for this or the feature must take it. From my opinion, If we can handle that elegantly, the Annotation may not be necessary.

@fabiencoppens
Copy link
Collaborator Author

@Incarnation-p-lee You do realize that spring data projects are initially inspired by JPA, right? The annotation is harmless as it is just an indication. Without an annotation, how do you indicate that the id field can be nullable at entity creation time? If we don't annotate, then we are back to removing the non-null constraint on the id field, which is what I had originally done.

@Incarnation-p-lee
Copy link
Contributor

@fabiencoppens
Good point, to clear the concept, the annotation you want to add is for indicating the field will be generated by server, right ?
If so, I think this can be the reason for that annotation. Then there comes other question, do we need to handle some unexpected behavior in code, like

  • Does this annotation should be combine with @Id, or with field name id.
  • What happens when I Annotated @GeneratedId and specific the id field when creating instance.
  • How about more than one field with @GeneratedId

We may need to talk about the detail implementation.

@fabiencoppens
Copy link
Collaborator Author

@Incarnation-p-lee Those are excellent questions. In existing ORM/OGM frameworks like JPA and neo4j OGM, the @GeneratedValue annotation has to be used in conjunction with the @Id annotation, and I believe those frameworks enforce that only one field can be annotated with @Id. Is it a problem if we follow the same pattern here? As a side question, are you enforcing that only one field bears the @Id annotation?
To answer your 2nd question, the annotation is only an indication that the field can be nullable at entity creation time. If someone annotates but also specifies the value, we need to figure out what the behavior is. I don't recall what JPA does in such a case but we can research it. I'm guessing the server-generated value takes precedence in that case.

@Incarnation-p-lee
Copy link
Contributor

@fabiencoppens
Agree that @GeneratedValue should be used conjunction with @Id or act on field named id. Current implementation only allows one and only one @Id or id named field.

For 2nd question, I prefer the generated Id when create entity.

@fabiencoppens
Copy link
Collaborator Author

@Incarnation-p-lee This is still a work-in-progress. I'll be pushing a couple more pieces shortly.

@Incarnation-p-lee
Copy link
Contributor

@fabiencoppens
Thanks for you effect and contribution. Take you time.

@Incarnation-p-lee
Copy link
Contributor

@fabiencoppens
Please don't make the PR too large, 200+ changes including additions and deletions should be the limitation for one PR.

@Incarnation-p-lee
Copy link
Contributor

Incarnation-p-lee commented Sep 26, 2018

@fabiencoppens
In case I may forget to review this PR, please info me when you ready. Thanks !

@jespersm
Copy link
Collaborator

jespersm commented Oct 10, 2018

@jespersm Like @Incarnation-p-lee pointed out, this current implementation does not support creating vertices and edges in one atomic call if they have null (i.e. generated) ids, as the edges need the vertex ids to be populated. You would have to create vertices first, and once you get their generated ids from the server, create the corresponding edges.

Strictly speaking, that's not really required, is it, if aliases are introduced, like:

g.addV('Person').property('name', 'Alice').as('a').addV('Person').property('name', 'Bob').as('b').addE('Relation').from('a').to('b').as('r').select('a','b','r').by(id)

Never mind me if you are already working on this...

@fabiencoppens
Copy link
Collaborator Author

@Incarnation-p-lee There is one small change that was missed in this PR, it's on the save method of the SimpleGremlinRepository. Shall I add the commit to this PR, or just create a new one once you've merged this one (since you already approved it)? Let me know.

@Incarnation-p-lee
Copy link
Contributor

@fabiencoppens
As you may need to add test for this feature, I think you can send another PR for this.
Travis Ci network is not very stable recently, the PR may need some time. I hope it can retrieve before the end of this week.

@Incarnation-p-lee
Copy link
Contributor

Incarnation-p-lee commented Oct 11, 2018

@fabiencoppens
It looks the IT test in GremlinTemplateIT need adjust for this PR.

ERROR] Tests run: 26, Failures: 0, Errors: 5, Skipped: 0, Time elapsed: 2.955 s <<< FAILURE! - in com.microsoft.spring.data.gremlin.query.GremlinTemplateIT
[ERROR] testUpdateGraph(com.microsoft.spring.data.gremlin.query.GremlinTemplateIT)  Time elapsed: 0.15 s  <<< ERROR!
com.microsoft.spring.data.gremlin.exception.GremlinInvalidEntityIdFieldException: no field named id in class
	at com.microsoft.spring.data.gremlin.query.GremlinTemplateIT.buildTestGraph(GremlinTemplateIT.java:113)
	at com.microsoft.spring.data.gremlin.query.GremlinTemplateIT.testUpdateGraph(GremlinTemplateIT.java:312

@fabiencoppens
Copy link
Collaborator Author

fabiencoppens commented Oct 11, 2018

@Incarnation-p-lee I think I know what the problem is. In GremlinSourceGraphReader.read(), we use reflection to determine the generic types of the VertexSet and EdgeSet collections, and then we loop through the results to unmarshall the result sources to that type. However, this can only work if the VertexSet and EdgeSet are explicit collections of one domain type only (e.g. List<Person> vertexList).
In the ITs, however, you use a heterogeneous list of domain types in the Network graph's vertex set, it contains a Person and a Project, and the Network graph class has lists of Objects: private List<Object> vertexList; and private List<Object> edgeList;.
So, reflection unmarshalls the results to Objects and of course GremlinUtils can't find an id field.
This points to something I already knew but should have remembered: this current solution only works for @Graph classes that have homogeneous types of vertices and edges, e.g. List<Person> vertexList.
For heterogeneous Object lists, it is a really hard problem, since the result sources returned by the DB don't have any binding to Java domain classes, of course.
This makes me think that using reflection might not be the optimal approach. Maybe we should just extract the ids of the vertices and edges in the result source and set them explicitly in the original graph collections, without attempting to recover the whole graph. The problem with that is that we have to assume that the list of vertices and edges are returned by the DB in the same order that they were sent in the write method...
This problem is not a simple one. I don't think this PR can go as is... I think this requires discussing and maybe re-thinking the usage of the @Graph domain object.

@jespersm
Copy link
Collaborator

jespersm commented Oct 11, 2018

The problem with that is that we have to assume that the list of vertices and edges are returned by the DB in the same order that they were sent in the write method...

As I demonstrated above, you can pick "as"-aliases for each added vertex or edge without ID, ask the server to return both alias and assigned id, and just process them in whichever order they come.

For this to work, you'd keep a Map of alias to each "id-less" instance as you write out each insert.

I'd love to contribute, but I can't seem to make any of the integration test cases work against my local server, so I'm a little blocked...

@fabiencoppens
Copy link
Collaborator Author

fabiencoppens commented Oct 11, 2018

The problem with that is that we have to assume that the list of vertices and edges are returned by the DB in the same order that they were sent in the write method...

As I demonstrated above, you can pick "as"-aliases for each added vertex or edge without ID, ask the server to return both alias ans assigned id, and just process them in whichever order they come.

For this to work, you'd keep a Map of alias to instance as you write out each insert.

I'd love to contribute, but I can't seem to make anything of the integration tests work against my local, so I'm a little blocked...

@jespersm What is blocking you with the ITs? What DB are you running them against? They rely on src/test/resources/application.properties for the config.
As for your suggestion of using the as pseudo-step is interesting. Isn't that alias only used locally by the client (e.g. gremlin console or driver lib..)? Does the alias actually get propagated to the DB side? Umm, I guess it must be since it is used to alias traversal steps...

@jespersm
Copy link
Collaborator

@jespersm What is blocking you with the ITs? What DB are you running them against? They rely on src/test/resources/application.properties for the config.

Ok, I got some green now, running against a fresh Gremlin Server with just the TinkerGraph (but with TLS on). But - I have 28 test errors in GremlinTemplateIT where you seem to have 5...

I'm assuming you are running against Cosmos DB - I'll try that now.

As for your suggestion of using the as pseudo-step is interesting. Isn't that alias only used locally by the client (e.g. gremlin console or driver lib..)? Does the alias actually get propagated to the DB side?

I'm quite sure it does (at least with the serialization I tried, on Gremlin Server 3.3.3). I'm pretty sure it's also quite important for several of the more complex queries -- a bit like column aliases in SQL, I guess.

@fabiencoppens
Copy link
Collaborator Author

I'm assuming you are running against Cosmos DB - I'll try that now.

The CI build runs the ITs against Tinkergraph, I believe.

@jespersm
Copy link
Collaborator

I'm assuming you are running against Cosmos DB - I'll try that now.

The CI build runs the ITs against Tinkergraph, I believe.

Oh, I see it now in the Travis files. I'll see if I get the same results if I use the same config files. Thanks.

@jespersm
Copy link
Collaborator

However, I see that you are seemingly hardcoding the textual representation of graph traversal and some of the GraphSON syntax into the library (and GraphSON 1.0 at that?), rather than using the API of the Client.
I'm not entirely sure what the design rationales are, and perhaps I should just leave you to it -- I'll check in later.

@fabiencoppens
Copy link
Collaborator Author

However, I see that you are seemingly hardcoding the textual representation of graph traversal and some of the GraphSON syntax into the library (and GraphSON 1.0 at that?), rather than using the API of the Client.
I'm not entirely sure what the design rationales are, and perhaps I should just leave you to it -- I'll check in later.

That's a question for @Incarnation-p-lee . I'm just trying to contribute but have not participated in the project development so far.

@Incarnation-p-lee
Copy link
Contributor

@jespersm
I added you as the contributor of this repo, and please help to check the invite link.
In fact, we use tinkerpop-gremlin-server as IT test env, because it support all features, syntax of gremlin. Neither cosmosdb GraphAPI nor JenusGraph only implement the subset of Gremlin syntax.

@Incarnation-p-lee
Copy link
Contributor

@jespersm
As you can see in GremlinFactory, hard code the GraphJson for first release of spring-data-gremlin, something like default value. We can add the configuration of this from GremlinConfig, it simple here. But I need to take care of 3 - 4 repos, and no effect for this.

It is perfect if you can contribute this with PR, or I can do that later when I free, but it may take times. I file one issue for this.

@Incarnation-p-lee
Copy link
Contributor

@fabiencoppens
I will try to reproduce the IT failure in the weekend for figure out details, looks not simple ones. If no elegant solution as we considered. We can workaround this and discuss a better one in another PR.

@Incarnation-p-lee
Copy link
Contributor

@fabiencoppens
Seems not easy to figure out one perfect solution for this. The core problem is that when we retrieve Result from gremlinServer, we loss the type information for vertexes, as well as edges.
Consider one DB with existed two vertexes with the same label, when read from Result, we cannot tell the real type from result, right ?

@Vertex("label-A")
Class A {
    String name;
    Long id;
};
@Vertex("label-A")
Class B {
    String name;
    Long id;
};

For GraphRead, we may need to think about the type information about vertexes and edges when retrieve, is there any way to disable feature of no-required ID from this PR ? then we can merge this and working on the solution.

@fabiencoppens
Copy link
Collaborator Author

fabiencoppens commented Oct 15, 2018

@Incarnation-p-lee Yes this is a type erasure problem. When unmarshalling the results of a graph returned by the DB, we can't use reflection to determine the Java type of the vertices or edges, as the vertices or edges might have heterogeneous types, like in your ITs, where you have
@VertexSet List<Object> vertices
that contains 2 different types of objects, Person and Project.
As far as I can tell, the only clean solution is the one that @jespersm suggested, which is to add the as step label in the DSL for graph creation, so that we can match each object in the results to their corresponding Java type via that as alias. This would also solve creating graphs atomically when ids are generated, since the edges can rely on the aliases of the vertices, like in the example he posted above in the thread: g.addV('Person').property('name', 'Alice').as('a').addV('Person').property('name', 'Bob').as('b').addE('Relation').from('a').to('b').as('r').select('a','b','r').by(id)

As far as this PR, it is really meant to make ids optional, which is very important for some of the Tinkerpop DBs such as JanusGraph. Otherwise, this PR is not very useful.
We need to figure out workarounds, or just wait on the solution above to be implemented.

One short-term workaround would be to rely on the labels, by making them mandatory (and unique, i.e. a one-to-one between label and Java type). At graph creation time, we would build a map of label to class, and use that to unmarshall the vertex and edge collections from the results. Thoughts?

@Incarnation-p-lee
Copy link
Contributor

@fabiencoppens
About as step in query, as I learned, it is just one alias without keep any data in db, right ? When the vertex/edge existed in db already, I think we may loss the alias when query like find later. Please correct me if I am wrong.

For aligning with gremlin syntax, configurable label is recommended I think. My idea for this problem is that add one pre-defined property like _classname for each vertex/edge. And them we can use reflect to obtain the class type. See PR #168 .

@fabiencoppens
Copy link
Collaborator Author

@Incarnation-p-lee
Yes, the step alias is volatile, it is used only during the gremlin query processing, so I see your point if you want repeatability in the GraphReader, even outside of the graph creation case. However, using that step alias for graph creation is in my opinion the only way to allow the atomic creation of a graph that contains both vertices and edges, when the ids are generated, since the edges can refer to the vertices by their alias for their from and to vertices. It would be a big win, because the current implementation in this PR doesn't let you do that.

I Like your idea of an internal property that will contain the Vertex or Edge class's name. I think this needn't be exposed in the actual domain Java classes, but should be added to the entity's properties when it is saved to the DB, right? So it's a modification in the vertex and edge writers, which would add that internal property before the entity gets saved to the DB. And when we unmarshall a Graph, we'll get that property back for each instance in the vertices and edges, and de-serialize to that class. I like that idea. If we're ok on this, I'll get it done ASAP.

@Incarnation-p-lee
Copy link
Contributor

@fabiencoppens
Yes, you got my point. It is internal property transparent to domainClass, and will create when GremlinSource initialization. In this way, we can solve the problem here, I think.
You can also review this PR with basic support for pre-defined property _classname -> #168 .
After merge, you can leverage the code I think,
Thanks again for you passion!

@fabiencoppens
Copy link
Collaborator Author

@Incarnation-p-lee Changes made to incorporate the latest _classname internal property, and ITs are now green. Please review again and let me know.

Copy link
Contributor

@Incarnation-p-lee Incarnation-p-lee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome for the PR, could you please help to fix the style ? and I can merge this one then.

GREMLIN_PRIMITIVE_GRAPH, // g
GREMLIN_PRIMITIVE_EDGE_ALL, // E()
GREMLIN_PRIMITIVE_GRAPH, // g
GREMLIN_PRIMITIVE_EDGE_ALL, // E()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please align the comments

GREMLIN_PRIMITIVE_GRAPH, // g
GREMLIN_PRIMITIVE_VERTEX_ALL, // V()
GREMLIN_PRIMITIVE_GRAPH, // g
GREMLIN_PRIMITIVE_VERTEX_ALL, // V()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same as above.

@@ -62,7 +65,8 @@ public void write(@NonNull Object domain, @NonNull MappingGremlinConverter conve
} else if (field.getName().equals(GREMLIN_PROPERTY_CLASSNAME)) {
throw new GremlinEntityInformationException("Domain Cannot use pre-defined field name: "
+ GREMLIN_PROPERTY_CLASSNAME);
} else if (field.getAnnotation(EdgeFrom.class) != null) {
}
else if (field.getAnnotation(EdgeFrom.class) != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please put else after }

try {
domainClass = Class.forName((String) source.getProperties().get(Constants.GREMLIN_PROPERTY_CLASSNAME));
}
catch (ClassNotFoundException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catch after }

@Incarnation-p-lee
Copy link
Contributor

@fabiencoppens
Cool, after test passed, I will merge this PR.

@Incarnation-p-lee Incarnation-p-lee merged commit b0bba43 into master Oct 18, 2018
@Incarnation-p-lee Incarnation-p-lee deleted the feature/make-ids-optional branch October 18, 2018 05:46
@Incarnation-p-lee
Copy link
Contributor

@fabiencoppens
And then we can continue the work on #164 and #165.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants