Skip to content

TINKERPOP-2436 The gremlin server starts even if all graphs instantiation has failed#1342

Merged
spmallette merged 1 commit intoapache:3.4-devfrom
mmadoo:TINKERPOP-2436
Oct 27, 2020
Merged

TINKERPOP-2436 The gremlin server starts even if all graphs instantiation has failed#1342
spmallette merged 1 commit intoapache:3.4-devfrom
mmadoo:TINKERPOP-2436

Conversation

@mmadoo
Copy link
Contributor

@mmadoo mmadoo commented Oct 8, 2020

https://issues.apache.org/jira/browse/TINKERPOP-2436

I could create a junit test if needed but I do not know where and how add such test

@spmallette
Copy link
Contributor

We had it this way purposely. because you might want to configure the server with no graphs at all. I'd admit it an odd use case. I suppose we could think about changing it. Perhaps users would be better served to know that their graph configurations failed. If they wanted the corner use case to start with no graphs at all I suppose they could provide their own GraphManager to allow that (or perhaps we could provide that implementation ourselves as a utility). Let's see if anyone else comments on this change and I'll put some more thought to it. Thanks!

@spmallette
Copy link
Contributor

spmallette commented Oct 14, 2020

No comments from anyone so I suppose there are no high-priority concerns. I suppose this isn't a bad change and might actually help new users who might get their configurations wrong and then wonder why they are getting errors. I suppose that has happened before. Here's some things that I think need to be done to get this one ready for proper review:

  1. I would prefer that DefaultGraphManager be left alone and instead make it a basis for extension. I would recommend creating an extension of DefaultGraphManager called CheckedGraphManager (open to another name if you have a suggestion) that basically do the functionality you are suggesting. I would think this new class would just call super() in the constructor then do the validation. You will need to alter DefaultGraphManager scopes to make this work.
  2. I think we should continue to target master (3.5.0) with this change and therefore it is safe to change the default here to your new class.
  3. A unit test would go in this package here
  4. A more robust error message here would be nice and I think we'd prefer IllegalStateException
  5. For situations where there are no graphs because of exceptions here I'm not sure we need to rethrow the exceptions as we all ready WARN above of such problems. I think a simple well-formed IllegalStateException that no graphs are present and the server stopping should be sufficient.
  6. I presume throwing an exception will stop Gremlin Server from starting. I don't think that breaks any of our integration tests because they all start with configured graphs. I suppose you will find that out. We might want integration tests for this change to ensure the server shuts down nicely/cleanly, but let's see what the rest of these changes look like first.
  7. This is a breaking change in behavior so I think we will need Upgrade Documentation for both users and providers on this PR.

I'm not sure if that's everything or not, but I can't see much further on this one without some of these changes in place. It's always interesting how software works - take something though to be "simple" and it quickly unravels into something quite the opposite. Please let me know if you have any questions @mmadoo

@mmadoo
Copy link
Contributor Author

mmadoo commented Oct 14, 2020

I do not have change the Upgrade Documentation as this GraphManager is not the default, so it will not break anything. An option is to set like it in 3.4 branch and to make it default in master.

Also, why does DefaultGraphManager was final (class and methods), this make harder to do a extension on this class. I have only removed final on class (I like the Jetty approach that allows override at nearly any level)

@spmallette
Copy link
Contributor

spmallette commented Oct 14, 2020

I do not have change the Upgrade Documentation as this GraphManager is not the default, so it will not break anything. An option is to set like it in 3.4 branch and to make it default in master.

See item 2 above. I was offering you the option to change to your new one. If you want to point this PR at 3.4-dev then true, you don't need Upgrade documentation.

Also, why does DefaultGraphManager was final (class and methods), this make harder to do a extension on this class. I have only removed final on class (I like the Jetty approach that allows override at nearly any level)

I made mention of this in item 1 - "You will need to alter DefaultGraphManager scopes to make this work". Yes, you will need to make DefaultGraphManager more extensible by removing final, changing member variable scopes to protected, etc. There were reasons for doing it that way initially, but I don't think they apply so much anymore.....plus, you are now adding this change, so time to open it up for extension.

I think I would also add another item:

  1. Reference documentation should probably refer to the two options we have after your change. You could just update the graphManager key in the Gremlin Server Configuration section here.

@spmallette
Copy link
Contributor

This PR is looking pretty good. It looks like the only two issues from my first review that are still outstanding are:

  1. I presume throwing an exception will stop Gremlin Server from starting. I don't think that breaks any of our integration tests because they all start with configured graphs. I suppose you will find that out. We might want integration tests for this change to ensure the server shuts down nicely/cleanly, but let's see what the rest of these changes look like first.

I think an integration test would be good. We didn't have any "shutdown" tests for you to use as a model so I quickly added one, which would probably a good idea anyway, so you can add yours to that: 339551d

  1. This is a breaking change in behavior so I think we will need Upgrade Documentation for both users and providers on this PR.

We don't have a breaking change anymore since we kept the DefaultGraphManager functionality untouched. If you like you can have this functionality in 3.4.9 by re-targeting your PR to the 3.4-dev branch otherwise, feel free to leave it here on master for 3.5.0. In either case, I think it would be nice to introduce this functionality in the Upgrade Documentation. Adding that bit of extra documentation would nicely tidy this PR up, but I'm happy to write that up on merge if you like.

As a final point of tidying, after you choose the branch you are aimed at, please clean up the git history a bit and squash your commits down to one as we prepare for a final merge.

@mmadoo mmadoo changed the base branch from master to 3.4-dev October 22, 2020 12:46
@spmallette
Copy link
Contributor

I was a bit skeptical of the travis errors at first, but your pull request doesn't build for me and i'm not sure what's wrong. Were you able to get a basic mvn clean install to work on your end?

@mmadoo
Copy link
Contributor Author

mmadoo commented Oct 27, 2020

I was a bit skeptical of the travis errors at first, but your pull request doesn't build for me and i'm not sure what's wrong. Were you able to get a basic mvn clean install to work on your end?

I am able to run successfully the same commands as travis job:

mvn clean install -q -DskipTests -Dci
mvn verify -pl :gremlin-server -DskipTests -DskipIntegrationTests=false -DincludeNeo4j

In travis build, I can see that openjdk 11 is used. May be, the issue is due to the PR was created to master and not 3.4-dev. When I latter changed the target branch, may be travis is still use the master travis configuration file.

@spmallette
Copy link
Contributor

I noticed the travis issue with the jdk (which is why i was skeptical...i should have wrote that), but I'm building locally with:

$ mvn -version
Apache Maven 3.6.0
Maven home: /usr/share/maven
Java version: 1.8.0_242, vendor: AdoptOpenJDK, runtime: /home/smallette/.sdkman/candidates/java/8.0.242.hs-adpt/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "5.4.0-52-generic", arch: "amd64", family: "unix"

My errors are in Gremlin Server:

[ERROR] Errors: 
[ERROR]   RemoteGraphGroovyTranslatorProcessComputerTest.initializationError ? Runtime j...
[ERROR]   RemoteGraphGroovyTranslatorProcessStandardTest.initializationError ? Runtime j...
[ERROR]   GraphBinaryRemoteGraphProcessComputerTest.initializationError ? Runtime java.u...
[ERROR]   GraphBinaryRemoteGraphProcessStandardTest.initializationError ? Runtime java.u...
[ERROR]   GraphSONRemoteGraphProcessComputerTest.initializationError ? Runtime java.util...
[ERROR]   GraphSONRemoteGraphProcessStandardTest.initializationError ? Runtime java.util...
[ERROR]   GryoRemoteGraphProcessComputerTest.initializationError ? Runtime java.util.con...
[ERROR]   GryoRemoteGraphProcessStandardTest.initializationError ? Runtime java.util.con...

They seem related to this stacktrace perhaps:

[ERROR] initializationError(org.apache.tinkerpop.gremlin.process.remote.GraphSONRemoteGraphProcessStandardTest)  Time elapsed: 0.001 s  <<< ERROR!
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.io.IOException: Could not bind to localhost and 45940 - perhaps something else is bound to that address.
	at org.apache.tinkerpop.gremlin.driver.remote.AbstractRemoteGraphProvider.<init>(AbstractRemoteGraphProvider.java:233)
	at org.apache.tinkerpop.gremlin.driver.remote.AbstractRemoteGraphProvider.<init>(AbstractRemoteGraphProvider.java:223)
	at org.apache.tinkerpop.gremlin.driver.remote.GraphSONRemoteGraphProvider.<init>(GraphSONRemoteGraphProvider.java:28)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at java.lang.Class.newInstance(Class.java:442)
	at org.apache.tinkerpop.gremlin.AbstractGremlinSuite.<init>(AbstractGremlinSuite.java:99)
	at org.apache.tinkerpop.gremlin.process.ProcessStandardSuite.<init>(ProcessStandardSuite.java:277)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.junit.internal.builders.AnnotatedBuilder.buildRunner(AnnotatedBuilder.java:107)
	at org.junit.internal.builders.AnnotatedBuilder.runnerForClass(AnnotatedBuilder.java:86)
	at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59)
	at org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:26)
	at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59)
	at org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:33)
	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:362)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Could not bind to localhost and 45940 - perhaps something else is bound to that address.
	at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
	at org.apache.tinkerpop.gremlin.driver.remote.AbstractRemoteGraphProvider.startServer(AbstractRemoteGraphProvider.java:331)
	at org.apache.tinkerpop.gremlin.driver.remote.AbstractRemoteGraphProvider.<init>(AbstractRemoteGraphProvider.java:231)
	... 27 more
Caused by: java.io.IOException: Could not bind to localhost and 45940 - perhaps something else is bound to that address.
	at org.apache.tinkerpop.gremlin.server.GremlinServer$1.operationComplete(GremlinServer.java:186)
	at org.apache.tinkerpop.gremlin.server.GremlinServer$1.operationComplete(GremlinServer.java:173)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
	at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
	at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:998)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:552)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334)
	at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506)
	at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491)
	at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973)
	at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:248)
	at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:356)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at java.lang.Thread.run(Thread.java:748)

I've tested to check with lsof -i :45940 to see if anything else was holding that port and i don't find anything. It makes me think that somehow the changes have stopped Gremlin Server from shutting down cleanly in integration tests?? I don't see any indication of that in the code but perhaps that is what is happening. I don't get these issue on 3.4-dev which indicates that the problem is on this branch.

I think that the reason you don't see error is because you -DskipTests in both commands which ignores unit tests which is precisely where the build is failing. Please try to build with mvn clean install and see if you hit the same problem as I have.

@mmadoo
Copy link
Contributor Author

mmadoo commented Oct 27, 2020

It should be fixed now, the server was not stopped in GremlinServerShutdownIntegrationTest

@spmallette
Copy link
Contributor

ah - nice. sorry i suppose that was my mistake. i got too used to relying on the test infrastructure to cleanup Gremlin Server for me and that test didn't extend from that framework. Retesting on my end now - assuming it all passes I'll get this merged.

VOTE +1

spmallette added a commit that referenced this pull request Oct 27, 2020
@spmallette spmallette merged commit b3aeea8 into apache:3.4-dev Oct 27, 2020
@spmallette
Copy link
Contributor

Merged - thanks for your work on this. Take care!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants