Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python glv graphson update #448

Merged
merged 8 commits into from Oct 12, 2016

Conversation

aholmberg
Copy link
Contributor

I have a series of gremlinpython graphson serdes simplifications, followed by a refactor to make type mappings an instance variable, rather than a module-level mapping. I did not attempt to preserve the module API since gremlinpython has not entered GA release status.

I welcome any feedback.

@okram
Copy link
Contributor

okram commented Oct 6, 2016

This looks good. However, can you please rebase with master/ as there have been some changes to graphson.py. Also, if there is anything that needs to be added to gremlin-variants.asciidoc about registering serializers/deserializers, please add. Thanks.

VOTE +1.

@aholmberg
Copy link
Contributor Author

Thanks Marko.
Rebased again. Also removed print statement from one test and updated Python tests to use unittest.TestCase assertions.

@davebshow
Copy link
Contributor

To be honest, I think that we should move away from the unittest framework in the future. I know that pytest allows for unittest integration, but in general this is to facilitate the transition to pytest for large projects that already use unittest. The main disadvantage of using both frameworks is that many of the great features in pytest require more boilerplate if we inherit from unittest.TestCase. The other problem is that unittest requires that you learn the assertion API, whereas with pytest you just use good old fashioned Python. I was going to open a JIRA about this, but I haven't gotten around to it yet.

@aholmberg
Copy link
Contributor Author

I'm ambivalent on that. I just noticed unittest was present, and the bare assertions were not very informative. I can take that commit out if you think it's not worth it now.

@davebshow
Copy link
Contributor

davebshow commented Oct 7, 2016

I like this. Much simpler, well tested, in general a nice PR. Pretty slick using the metaclass to create the de-/serializer maps. One thing I wonder:

Presumably using the GraphSONIO instance methods writeObject/toObject instead of the previous class methods provides an advantage because we can now pass a custom serializer/deserializer map to the GraphSONIO instance, thus overriding/adding to the default serializers. However, the GraphSONIO instance is created inside the DriverRemoteConnection without the optional kwargs. I wonder if we should add graphson_io=None to the __init__ method signature. In the case of None, we can instantiate the object as is:

def __init__(self, url, traversal_source, username="", password="", loop=None, graphson_io=None):
    if not graphson_io:
        graphson_io = GraphSONIO()
    self._graphson_io = graphson_io

That way the user can take advantage of the new functionality provided by the kwargs.

Other than that and my previous comment. This is LGTM.

VOTE + 1

@aholmberg
Copy link
Contributor Author

graphson_io=None to the init method signature

I think that is a fine idea. I was mostly focused on the GraphSON API and not concerned with adding it to the token RemoteConnection implementation (figuring it would get pulled in if someone saw cause to do it).
If you think it's worth doing now (and it won't invalidate this vote), I can push the update I have here.

@aholmberg
Copy link
Contributor Author

argh. needs a third rebase, too :-/

@okram
Copy link
Contributor

okram commented Oct 7, 2016

@aholmberg I know. Sorry. I'm not graceful.

@davebshow
Copy link
Contributor

Yeah makes sense. I'm hoping to do some major refactors of the Driver over the next couple months anyway, but I'm juggling too many things right now. Either way this seems good to go.

@okram
Copy link
Contributor

okram commented Oct 7, 2016

I like these changes. With a solid rebase of master/ given my recent TraversalStrategy serializer work, I'm good with merging this PR.

VOTE +1.

@spmallette
Copy link
Contributor

VOTE +1

@okram
Copy link
Contributor

okram commented Oct 8, 2016

Hi -- one thing to add. I notice you changed GraphSONWriter and GraphSONWriter to GraphSONIO ... The writer/reader names are identical to Gremlin-Java. If its possible, can we leave the original class names? .. unless there is a strong pattern in Python that says that is bad. Thanks.

@aholmberg
Copy link
Contributor Author

you changed GraphSONWriter and GraphSONWriter to GraphSONIO

This was not Python-specific, but felt like a natural design to me. I unified the I/O per type, and tied those together at the top level in GraphSONIO. It seems strange to me to have the two sides of the same encoding split between classes. I thought I saw something in the java impl called GraphSONIO, but maybe I was mistaken.

I can split them up, but then it means carrying around two things to encode/decode everywhere, instead of one:

connection = DriverConnection(..., graphson_reader=GraphSONReader(...), graphson_writer=GraphSONWriter(...))
vs.
connection = DriverConnection(..., graphson_io=GraphSONIO(...))

I'll study the Java impl. a little closer to see if I can understand the benefit. Please let me know if you have any further thoughts.

@aholmberg
Copy link
Contributor Author

(also, ACK on rebase -- I should be getting to it today)

@aholmberg
Copy link
Contributor Author

  • rebased
  • dropped unittest.TestCase assertions
  • added custom graphson_io to DriverRemoteConnection

I haven't had a chance to get back to the GraphSONIO vs Reader/Writer question.

@okram
Copy link
Contributor

okram commented Oct 11, 2016

Cool @aholmberg ... Here are some more pointers.

https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/GraphReader.java

https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/GraphWriter.java

Where readObject/writeObject are the genero methods.

Also, you might need to update init-code-blocks.awk so that the docs build.

https://github.com/apache/tinkerpop/blob/master/docs/preprocessor/awk/init-code-blocks.awk#L83

To test it -- bin/process-docs.sh -f docs/src/reference/gremlin-variants.asciidoc. If you can't test it easily, no worries. I will fiddle the awk and testing of doc building for you.

@aholmberg
Copy link
Contributor Author

Thanks. I see the code there. What is not clear to me is why these two things are divided, and why that API should be replicated here. I'm having a hard time doing this change because it makes it more cumbersome to use (at least for the integration I'm considering).

I can vaguely imagine using readers and writers in a migration scenario where input and output are different. Can you help me understand the use case in the context of this GLV, where client IO uses a single format?

@okram
Copy link
Contributor

okram commented Oct 11, 2016

@aholmberg We have had a distinction between reader/writer since the beginning of TinkerPop. I believe (@spmallette knows more) that there is a GraphSONMapper class that has all the serializers registered. Thus, both reader and writer have reference to a GraphSONMapper.

https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphson/GraphSONMapper.java

@spmallette
Copy link
Contributor

I don't think there's a use case in the context of the python GLV, but we have that use case in GraphSON in legacy support for TinkerPop 2.x based GraphSON (the migration scenario mentioned). I don't know that we'd extend that to Python. I guess the point here is to try to maintain API consistency across all the languages, even if we don't quite have all the support for all the features from Java applicable to Python just yet.

@aholmberg
Copy link
Contributor Author

Maybe we should clarify what APIs really need to be preserved. I have been thinking about the process package, and structure module as the Gremlin API, and the driver and io as distinct integrations.

I'm working on splitting reader/writer now, but I'm not convinced this API is worth replicating here. We can chalk it up to my ignorance of the greater ecosystem machinery.

not an ideal design by itself, but follows the Java API precedence, which has
other requirements
@spmallette
Copy link
Contributor

Maybe we should clarify what APIs really need to be preserved.

imo i think that could be debated. at this point, i think being sorta cautious and following what's been working for a java though seems prudent. thanks for re-working.

@aholmberg
Copy link
Contributor Author

Pushed the split names. I didn't see issues related to these changes running process-docs.sh.
May have been related to the previous removal of GraphSONWriter/Reader.

@asfgit asfgit merged commit 0e211d6 into apache:master Oct 12, 2016
@al3xandru
Copy link

A bit late to the conversation as I've noticed this hasn't been merged yet.

I don't think there's a use case in the context of the python GLV, but we have that use case in GraphSON in legacy support for TinkerPop 2.x based GraphSON (the migration scenario mentioned). I don't know that we'd extend that to Python.

Considering that the GraphSONIO provides both a simpler API and easier integration as noted by @davebshow, plus it doesn't need to account for the legacy support, there are only benefits with this approach.

I guess the point here is to try to maintain API consistency across all the languages, even if we don't quite have all the support for all the features from Java applicable to Python just yet.

That's indeed a very good point. Assuming the main users of this API are the GLV implementors, then I'm wondering if making it idiomatic for the target platform shouldn't be the top goal.

@okram
Copy link
Contributor

okram commented Oct 18, 2016

Hi @al3xandru .... This work has already been merged and will be released in TinkerPop 3.2.3. To answer your concerns though:

  1. In order to make life easier for TinkerPop, its important that we keep the same conventions throughout all the Gremlin language variants. If there is GraphSONReader/Writer in Java, GraphSONIO in Python, and GraphSONifier in JavaScript, it will become increasingly difficult for us to know what does what. Hence, we made a strong stance to say -- "keep the naming the same." With that, I don't see why a GraphSONIO utility class couldn't be created that simply references the GraphSONReader/Writer methods.
  2. We want idiomatic in some spots and not so in others. Backend utility classes -- non-idomatic (lets do our best to be "the same" across all GLVs). Front end user classes --- idiomatic. For example, Gremlin-Ruby would be g.V().out_e(:knows).weight ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants