-
Notifications
You must be signed in to change notification settings - Fork 803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TINKERPOP-1996 io() #893
TINKERPOP-1996 io() #893
Conversation
This is still fairly skeletal at this point. Just trying to make sure things work properly before building read()/write() out fully.
No more weird Map status return for read() and write(). Both just work like a terminator and self iterate to return nothing.
Without this approach the with() operator couldn't be used because the traversal would already be iterated on the call to read() and write(). In this way read() and write() are both terminators and modulators at the same time.
Included GraphReader and GraphWriter detection and added tests
Not sure why this was there in the first place. Removing it not allows Hadoop integration tests to pass, but seems to have no real effect on existing operations.
Not necessary because existing checks ignore these. For read() you can't write to a HadoopGraph directly (i.e. create vertices/edges) and for write() (and technically read()) it is ignored as it requires a GraphComputer to work.
Introduced new Graph.Features to provider better separation between graph mutation features and graph loading features - they are two different things as demonstrated by io(). Fixed HadoopIoStep/Strategy so that they properly handle the different input/output format types expected.
Killed all the old IO documentation that utilized the GraphReader/Writer classes directly as well as the Graph.io() method that is now deprecated.
These steps really aren't quite sideEffects and not quite map steps either but they seem to fit better as sideEffect. meh
Had to revert to using iterate() and stop read/write() from terminating the traversal. Kinda stinks, but we rely on iterate() quite heavily and for remoting allowing read()/write() to terminate means that the traversal will execute during traversal construction in the translator (which is early and potentially bad).
IORegistry instances are important because they feed serializer information to the Reader/Writer instances. Of all the configuration options that one seemed like the most important to make possible using with().
This will allow users to override or add to the Hadoop/Spark/OLAP configuration as needed
The GraphComputer was not being set properly in the HadoopIoStep and therefore executions of OLAP runs would not work even if withComputer(SparkGraphComputer) was set. It only worked if the gremlin.hadoop.defaultGraphComputer property was set which was weird.
These classes still have use as part of IoRegistry which is still in use and I don't see a clear way to get rid of that easily. We'd have to change the whole system for serialization configuration to accomplish that so I guess this stuff stays for now.
If this isn't there then GraphReader/Writer will blow up as it tries to mutate the graph. IoStep is an OLTP only step. For OLAP each graph implementation will need to add its own GraphComputer-ready step.
|
||
The `IO` class is a helper for the `io()` step that provides expressions that can be used to help configure it | ||
and in this case it allows direct specification of the "reader" or "writer" to use. The "reader" actually refers to | ||
a `GraphReader` implementation and the `writer" refers to a `GraphWriter` implementation. The implementations of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small typo here, a quote at the end of writer"
instead of a tick.
g.io(someOutputFile).write().iterate() | ||
---- | ||
|
||
While `io()` step is still single-threaded for OLTP style loading, it can be utilized in conjunction with OLAP which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit, in spots I think it's referenced as io()
step in others io()
-step. Might be good to stick with one or another. It looks like the -
style is prevalent in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No real complaints, just comments, hence...
VOTE: +1
} | ||
|
||
private Traverser.Admin<S> read(final File file) { | ||
try (final InputStream stream = new FileInputStream(file)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a comment here, unless you're gonna be like "oh, right, that's easy, let me do that real quick...". It would be nice if this would not be hard-coded to use FileInputStream
and instead be flexible to use any file system (or protocol). I know Java NIO has some methods to determine the file system from a URI, which can then be used to open an InputStream
/ OutputStream
. However, according to the docs it seems to require a little bit more work (FileSystemProvider
registration) and I have no idea whether that's easy or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did do that at first and then i changed it because writing to such any OutputStream
didn't seem trivial and I didn't want two different things going on there, but I'm having second thoughts now. It would be easier for GLVs to deal with URLs as they are dealing with remote sources.....I will think on that and look again on Monday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm - i started looking at this some more. i've already got a gang of work on this. if we decide we want to take another step to be able to read from a URL maybe we add that later on a separate ticket.
@@ -1704,11 +1713,11 @@ public GraphTraversal(ICollection<ITraversalStrategy> traversalStrategies, Bytec | |||
} | |||
|
|||
/// <summary> | |||
/// Adds the with step to this <see cref="GraphTraversal{SType, EType}" />. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing. Why did With
disappear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's above in the diff - expand the section above.
https://issues.apache.org/jira/browse/TINKERPOP-1996
Introduces
io()
with full support for GLVs and both OLAP and OLTP. There's a bit too much to type here to fully describe this - please take a look at the documentation in the PR as it goes into the details.All tests pass with
docker/build.sh -t -n -i
VOTE +1