Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TINKERPOP-1996 io() #893

Merged
merged 37 commits into from
Jul 31, 2018
Merged

TINKERPOP-1996 io() #893

merged 37 commits into from
Jul 31, 2018

Conversation

spmallette
Copy link
Contributor

@spmallette spmallette commented Jul 20, 2018

https://issues.apache.org/jira/browse/TINKERPOP-1996

Introduces io() with full support for GLVs and both OLAP and OLTP. There's a bit too much to type here to fully describe this - please take a look at the documentation in the PR as it goes into the details.

All tests pass with docker/build.sh -t -n -i

VOTE +1

This is still fairly skeletal at this point. Just trying to make sure things work properly before building read()/write() out fully.
No more weird Map status return for read() and write(). Both just work like a terminator and self iterate to return nothing.
Without this approach the with() operator couldn't be used because the traversal would already be iterated on the call to read() and write(). In this way read() and write() are both terminators and modulators at the same time.
Included GraphReader and GraphWriter detection and added tests
Not sure why this was there in the first place. Removing it not allows Hadoop integration tests to pass, but seems to have no real effect on existing operations.
Not necessary because existing checks ignore these. For read() you can't write to a HadoopGraph directly (i.e. create vertices/edges) and for write() (and technically read()) it is ignored as it requires a GraphComputer to work.
Introduced new Graph.Features to provider better separation between graph mutation features and graph loading features - they are two different things as demonstrated by io(). Fixed HadoopIoStep/Strategy so that they properly handle the different input/output format types expected.
Killed all the old IO documentation that utilized the GraphReader/Writer classes directly as well as the Graph.io() method that is now deprecated.
These steps really aren't quite sideEffects and not quite map steps either but they seem to fit better as sideEffect. meh
Had to revert to using iterate() and stop read/write() from terminating the traversal. Kinda stinks, but we rely on iterate() quite heavily and for remoting allowing read()/write() to terminate means that the traversal will execute during traversal construction in the translator (which is early and potentially bad).
IORegistry instances are important because they feed serializer information to the Reader/Writer instances. Of all the configuration options that one seemed like the most important to make possible using with().
This will allow users to override or add to the Hadoop/Spark/OLAP configuration as needed
The GraphComputer was not being set properly in the HadoopIoStep and therefore executions of OLAP runs would not work even if withComputer(SparkGraphComputer) was set. It only worked if the gremlin.hadoop.defaultGraphComputer property was set which was weird.
These classes still have use as part of IoRegistry which is still in use and I don't see a clear way to get rid of that easily. We'd have to change the whole system for serialization configuration to accomplish that so I guess this stuff stays for now.

The `IO` class is a helper for the `io()` step that provides expressions that can be used to help configure it
and in this case it allows direct specification of the "reader" or "writer" to use. The "reader" actually refers to
a `GraphReader` implementation and the `writer" refers to a `GraphWriter` implementation. The implementations of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo here, a quote at the end of writer" instead of a tick.

g.io(someOutputFile).write().iterate()
----

While `io()` step is still single-threaded for OLTP style loading, it can be utilized in conjunction with OLAP which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit, in spots I think it's referenced as io() step in others io()-step. Might be good to stick with one or another. It looks like the - style is prevalent in the docs.

@spmallette
Copy link
Contributor Author

Thanks @twilmes - cleaned up

@dkuppitz not sure if you've taken a moment to look at this PR or not, but could you maybe give a look at the steps/strategies for me?

Copy link
Contributor

@dkuppitz dkuppitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real complaints, just comments, hence...

VOTE: +1

}

private Traverser.Admin<S> read(final File file) {
try (final InputStream stream = new FileInputStream(file)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment here, unless you're gonna be like "oh, right, that's easy, let me do that real quick...". It would be nice if this would not be hard-coded to use FileInputStream and instead be flexible to use any file system (or protocol). I know Java NIO has some methods to determine the file system from a URI, which can then be used to open an InputStream / OutputStream. However, according to the docs it seems to require a little bit more work (FileSystemProvider registration) and I have no idea whether that's easy or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did do that at first and then i changed it because writing to such any OutputStream didn't seem trivial and I didn't want two different things going on there, but I'm having second thoughts now. It would be easier for GLVs to deal with URLs as they are dealing with remote sources.....I will think on that and look again on Monday.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm - i started looking at this some more. i've already got a gang of work on this. if we decide we want to take another step to be able to read from a URL maybe we add that later on a separate ticket.

@@ -1704,11 +1713,11 @@ public GraphTraversal(ICollection<ITraversalStrategy> traversalStrategies, Bytec
}

/// <summary>
/// Adds the with step to this <see cref="GraphTraversal{SType, EType}" />.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing. Why did With disappear?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's above in the diff - expand the section above.

@asfgit asfgit merged commit 10478be into master Jul 31, 2018
@asfgit asfgit deleted the TINKERPOP-1996 branch October 24, 2018 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants