TINKERPOP-1996 io() #893

spmallette · 2018-07-20T15:58:31Z

https://issues.apache.org/jira/browse/TINKERPOP-1996

Introduces io() with full support for GLVs and both OLAP and OLTP. There's a bit too much to type here to fully describe this - please take a look at the documentation in the PR as it goes into the details.

All tests pass with docker/build.sh -t -n -i

VOTE +1

This is still fairly skeletal at this point. Just trying to make sure things work properly before building read()/write() out fully.

No more weird Map status return for read() and write(). Both just work like a terminator and self iterate to return nothing.

Without this approach the with() operator couldn't be used because the traversal would already be iterated on the call to read() and write(). In this way read() and write() are both terminators and modulators at the same time.

Included GraphReader and GraphWriter detection and added tests

Not sure why this was there in the first place. Removing it not allows Hadoop integration tests to pass, but seems to have no real effect on existing operations.

Not necessary because existing checks ignore these. For read() you can't write to a HadoopGraph directly (i.e. create vertices/edges) and for write() (and technically read()) it is ignored as it requires a GraphComputer to work.

Introduced new Graph.Features to provider better separation between graph mutation features and graph loading features - they are two different things as demonstrated by io(). Fixed HadoopIoStep/Strategy so that they properly handle the different input/output format types expected.

Killed all the old IO documentation that utilized the GraphReader/Writer classes directly as well as the Graph.io() method that is now deprecated.

These steps really aren't quite sideEffects and not quite map steps either but they seem to fit better as sideEffect. meh

Had to revert to using iterate() and stop read/write() from terminating the traversal. Kinda stinks, but we rely on iterate() quite heavily and for remoting allowing read()/write() to terminate means that the traversal will execute during traversal construction in the translator (which is early and potentially bad).

…)/write()

IORegistry instances are important because they feed serializer information to the Reader/Writer instances. Of all the configuration options that one seemed like the most important to make possible using with().

This will allow users to override or add to the Hadoop/Spark/OLAP configuration as needed

The GraphComputer was not being set properly in the HadoopIoStep and therefore executions of OLAP runs would not work even if withComputer(SparkGraphComputer) was set. It only worked if the gremlin.hadoop.defaultGraphComputer property was set which was weird.

These classes still have use as part of IoRegistry which is still in use and I don't see a clear way to get rid of that easily. We'd have to change the whole system for serialization configuration to accomplish that so I guess this stuff stays for now.

If this isn't there then GraphReader/Writer will blow up as it tries to mutate the graph. IoStep is an OLTP only step. For OLAP each graph implementation will need to add its own GraphComputer-ready step.

twilmes · 2018-07-27T01:51:34Z

docs/src/reference/the-traversal.asciidoc

+
+The `IO` class is a helper for the `io()` step that provides expressions that can be used to help configure it
+and in this case it allows direct specification of the "reader" or "writer" to use. The "reader" actually refers to
+a `GraphReader` implementation and the `writer" refers to a `GraphWriter` implementation. The implementations of


Small typo here, a quote at the end of writer" instead of a tick.

twilmes · 2018-07-27T01:57:33Z

docs/src/upgrade/release-3.4.x.asciidoc

+g.io(someOutputFile).write().iterate()
+----
+
+While `io()` step is still single-threaded for OLTP style loading, it can be utilized in conjunction with OLAP which


Small nit, in spots I think it's referenced as io() step in others io()-step. Might be good to stick with one or another. It looks like the - style is prevalent in the docs.

spmallette · 2018-07-27T11:13:03Z

Thanks @twilmes - cleaned up

@dkuppitz not sure if you've taken a moment to look at this PR or not, but could you maybe give a look at the steps/strategies for me?

dkuppitz

No real complaints, just comments, hence...

VOTE: +1

dkuppitz · 2018-07-28T02:53:08Z

...ore/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/sideEffect/IoStep.java

+    }
+
+    private Traverser.Admin<S> read(final File file) {
+        try (final InputStream stream = new FileInputStream(file)) {


Just a comment here, unless you're gonna be like "oh, right, that's easy, let me do that real quick...". It would be nice if this would not be hard-coded to use FileInputStream and instead be flexible to use any file system (or protocol). I know Java NIO has some methods to determine the file system from a URI, which can then be used to open an InputStream / OutputStream. However, according to the docs it seems to require a little bit more work (FileSystemProvider registration) and I have no idea whether that's easy or not.

I did do that at first and then i changed it because writing to such any OutputStream didn't seem trivial and I didn't want two different things going on there, but I'm having second thoughts now. It would be easier for GLVs to deal with URLs as they are dealing with remote sources.....I will think on that and look again on Monday.

hmm - i started looking at this some more. i've already got a gang of work on this. if we decide we want to take another step to be able to read from a URL maybe we add that later on a separate ticket.

dkuppitz · 2018-07-28T02:59:20Z

gremlin-dotnet/src/Gremlin.Net/Process/Traversal/GraphTraversal.cs

@@ -1704,11 +1713,11 @@ public GraphTraversal(ICollection<ITraversalStrategy> traversalStrategies, Bytec
        }

        /// <summary>
-        ///     Adds the with step to this <see cref="GraphTraversal{SType, EType}" />.


This is a bit confusing. Why did With disappear?

It's above in the diff - expand the section above.

spmallette added 30 commits July 19, 2018 13:39

TINKERPOP-1996 Introduce read() and write() steps

ec1d05f

TINKERPOP-1996 Have the basics of OLAP read()/write() steps working

0785090

This is still fairly skeletal at this point. Just trying to make sure things work properly before building read()/write() out fully.

TINKERPOP-1996 Minor refactoring of Reading/Writing and javadoc

ff2773a

TINKERPOP-1996 read()/write() api changes for return type

d99909c

No more weird Map status return for read() and write(). Both just work like a terminator and self iterate to return nothing.

TINKERPOP-1996 Made read()/write() terminator steps

767d65b

Without this approach the with() operator couldn't be used because the traversal would already be iterated on the call to read() and write(). In this way read() and write() are both terminators and modulators at the same time.

TINKERPOP-1996 Added some javadoc and code formatting

d181563

TINKERPOP-1996 Added with() options for io()

13e552b

Included GraphReader and GraphWriter detection and added tests

TINKERPOP-1996 none() doesn't need to be removed in HadoopIoStrategy

be9db8d

Not sure why this was there in the first place. Removing it not allows Hadoop integration tests to pass, but seems to have no real effect on existing operations.

TINKERPOP-1996 Added IO to imports and javadoc fixes

328737a

TINKERPOP-1996 Deprecated Graph.io() and related infrastructure.

ae79637

TINKERPOP-1996 Fixed a bad method call for Configuring steps

9e4da01

TINKERPOP-1996 Removed OptOuts for read()/write() tests

bd275a7

Not necessary because existing checks ignore these. For read() you can't write to a HadoopGraph directly (i.e. create vertices/edges) and for write() (and technically read()) it is ignored as it requires a GraphComputer to work.

TINKERPOP-1996 Updated changelog

576649f

TINKERPOP-1996 Added docs for io()

62175c2

Killed all the old IO documentation that utilized the GraphReader/Writer classes directly as well as the Graph.io() method that is now deprecated.

TINKERPOP-1996 Moved IoStep implementations to sideEffect package

6d05805

These steps really aren't quite sideEffects and not quite map steps either but they seem to fit better as sideEffect. meh

TINKERPOP-1996 Removed use of graph.io() in docs

5bf19e2

TINKERPOP-1996 Used g.io() in tests by default

8187016

TINKERPOP-1996 Removed unecessary enum

f8e3b8a

TINKERPOP-1996 Enabled feature coverage checks for GLV tests on read(…

a580b6f

…)/write()

TINKERPOP-1996 Added iterate() to read()/write() steps in docs

ae3b149

TINKERPOP-1996 Added support for setting IoRegistries using with()

ff71c6a

IORegistry instances are important because they feed serializer information to the Reader/Writer instances. Of all the configuration options that one seemed like the most important to make possible using with().

TINKERPOP-1996 Fixed bad test assertions after last body of changes.

9423397

TINKERPOP-1996 Added some docs around IO.registry

51dc821

TINKERPOP-1996 Pass configurations from with() through to Hadoop

4d979cf

This will allow users to override or add to the Hadoop/Spark/OLAP configuration as needed

TINKERPOP-1996 Added upgrade docs

ded7c18

TINKERPOP-1996 Verification strategy to prevent io() from misuse

e6e4413

spmallette added 5 commits July 20, 2018 11:52

TINKERPOP-1996 No need to assert io() against VertexProgramStrategy

7f1bf17

TINKERPOP-1996 Testing for GraphSON and IoRegistry configuration

ae3f685

TINKERPOP-1996 Fixed verification on io()

23c71b6

TINKERPOP-1996 Added IoStep to list of unsupported steps

e9ebacf

If this isn't there then GraphReader/Writer will blow up as it tries to mutate the graph. IoStep is an OLTP only step. For OLAP each graph implementation will need to add its own GraphComputer-ready step.

TINKERPOP-1996 Prevent OLTP style execution in Hadoop of io()

fdb35c6

twilmes reviewed Jul 27, 2018

View reviewed changes

spmallette added 2 commits July 27, 2018 07:07

TINKERPOP-1996 Fixed up typos in docs

38dc70d

TINKERPOP-1996 Fixed up typos in docs

10478be

dkuppitz approved these changes Jul 28, 2018

View reviewed changes

spmallette mentioned this pull request Jul 30, 2018

TINKERPOP-1967 connectedComponent() #897

Merged

asfgit merged commit 10478be into master Jul 31, 2018

asfgit deleted the TINKERPOP-1996 branch October 24, 2018 20:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TINKERPOP-1996 io() #893

TINKERPOP-1996 io() #893

spmallette commented Jul 20, 2018 •

edited

Loading

twilmes Jul 27, 2018

twilmes Jul 27, 2018

spmallette commented Jul 27, 2018

dkuppitz left a comment

dkuppitz Jul 28, 2018

spmallette Jul 28, 2018

spmallette Jul 30, 2018

dkuppitz Jul 28, 2018

spmallette Jul 28, 2018

TINKERPOP-1996 io() #893

TINKERPOP-1996 io() #893

Conversation

spmallette commented Jul 20, 2018 • edited Loading

twilmes Jul 27, 2018

Choose a reason for hiding this comment

twilmes Jul 27, 2018

Choose a reason for hiding this comment

spmallette commented Jul 27, 2018

dkuppitz left a comment

Choose a reason for hiding this comment

dkuppitz Jul 28, 2018

Choose a reason for hiding this comment

spmallette Jul 28, 2018

Choose a reason for hiding this comment

spmallette Jul 30, 2018

Choose a reason for hiding this comment

dkuppitz Jul 28, 2018

Choose a reason for hiding this comment

spmallette Jul 28, 2018

Choose a reason for hiding this comment

spmallette commented Jul 20, 2018 •

edited

Loading