New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removes bindOnLocalhost=boolean. Adds bindAddress and advertisedAddress. #26

Merged
merged 20 commits into from Sep 26, 2016

Conversation

Projects
None yet
4 participants
@radekg
Contributor

radekg commented Sep 20, 2016

Motivation

The motivation has been documented and discussed in #23.

Modifications

Adds bindAddress and advertisedAddress configuration settings.
Removes bindOnLocalhost configuration setting.

Result

The user can how set the address the service will bind on, for example: 0.0.0.0. The user can also advertise a different address via ZooKeeper. The latter setting makes it much easier to setup Pulsar in environments such as Amazon EC2.

…Address.

@yahoocla

This comment has been minimized.

Show comment
Hide comment
@yahoocla

yahoocla Sep 20, 2016

CLA is valid!

yahoocla commented Sep 20, 2016

CLA is valid!

Show outdated Hide outdated conf/broker.conf Outdated
Show outdated Hide outdated conf/broker.conf Outdated

@radekg radekg changed the title from Changes bindOnLocalhost=boolean to bindAddress. Introduces advertised… to Removes bindOnLocalhost=boolean. Adds bindAddress and advertisedAddress Sep 20, 2016

@radekg radekg changed the title from Removes bindOnLocalhost=boolean. Adds bindAddress and advertisedAddress to Removes bindOnLocalhost=boolean. Adds bindAddress and advertisedAddress. Sep 20, 2016

@radekg

This comment has been minimized.

Show comment
Hide comment
@radekg

radekg Sep 20, 2016

Contributor

It might be worth adding advertised ports settings for services. 100% in the short term someone is going to ask for it when setting Pulsar in Mesos with Docker.

Contributor

radekg commented Sep 20, 2016

It might be worth adding advertised ports settings for services. 100% in the short term someone is going to ask for it when setting Pulsar in Mesos with Docker.

@radekg

This comment has been minimized.

Show comment
Hide comment
@radekg

radekg Sep 20, 2016

Contributor

It isn't clear to me how the ServiceConfiguration is populated from file. That might be the only missing link.

Contributor

radekg commented Sep 20, 2016

It isn't clear to me how the ServiceConfiguration is populated from file. That might be the only missing link.

@merlimat

The change looks good to me. I think the only part missing is to enforce the bindAddress settings by have sockets effectively binding on the desired IP/Host. Before, the bindOnLocalhost was just a trick used for unit tests to advertise localhost instead of the real hostname, but that was not really affecting the "bind" IP address.

For example, https://github.com/yahoo/pulsar/blob/master/pulsar-broker/src/main/java/com/yahoo/pulsar/broker/web/WebService.java#L87 we'd have to add a connector.setHost() to use it.

Same thing with the Netty server socket, at https://github.com/yahoo/pulsar/blob/master/pulsar-broker/src/main/java/com/yahoo/pulsar/broker/service/BrokerService.java#L223 we need to specify the bindAddress.

Show outdated Hide outdated conf/broker.conf Outdated
Show outdated Hide outdated conf/broker.conf Outdated
Show outdated Hide outdated conf/websocket.conf Outdated
@@ -291,12 +301,32 @@ public void setWebServicePortTls(int webServicePortTls) {
this.webServicePortTls = webServicePortTls;
}
public boolean isBindOnLocalhost() {
return bindOnLocalhost;
public String getBindAddress() {

This comment has been minimized.

@merlimat

merlimat Sep 20, 2016

Contributor

Preferably, the ServiceConfiguration should be a simple POJO. This initialization of the bind address is potentially at risk if multiple threads are initializing. It'd be better to return null (or even better Optional<String>) and have the caller cope with the missing value.

@merlimat

merlimat Sep 20, 2016

Contributor

Preferably, the ServiceConfiguration should be a simple POJO. This initialization of the bind address is potentially at risk if multiple threads are initializing. It'd be better to return null (or even better Optional<String>) and have the caller cope with the missing value.

This comment has been minimized.

@radekg

radekg Sep 20, 2016

Contributor

I was trying to avoid throws on this method as the caller would have to require try / catch everywhere where this is used. Unless we can settle on RuntimeException or IllegalArgumentException?

@radekg

radekg Sep 20, 2016

Contributor

I was trying to avoid throws on this method as the caller would have to require try / catch everywhere where this is used. Unless we can settle on RuntimeException or IllegalArgumentException?

This comment has been minimized.

@radekg

radekg Sep 20, 2016

Contributor

Right, scratch that, that's a silly idea. I'll just re-throw the exception and handle where used.

@radekg

radekg Sep 20, 2016

Contributor

Right, scratch that, that's a silly idea. I'll just re-throw the exception and handle where used.

This comment has been minimized.

@radekg

radekg Sep 20, 2016

Contributor

Addressed in 3745033

@radekg

radekg Sep 20, 2016

Contributor

Addressed in 3745033

This comment has been minimized.

@radekg

radekg Sep 21, 2016

Contributor

@merlimat Why would multiple threads try to initialize a single instance of ServiceConfiguration? Is the service configuration stored somewhere as static?

@radekg

radekg Sep 21, 2016

Contributor

@merlimat Why would multiple threads try to initialize a single instance of ServiceConfiguration? Is the service configuration stored somewhere as static?

Show outdated Hide outdated ...r-common/src/main/java/com/yahoo/pulsar/broker/ServiceConfiguration.java Outdated
this.bindAddress = bindAddress;
}
public String getAdvertisedAddress() {

This comment has been minimized.

@merlimat

merlimat Sep 20, 2016

Contributor

Same as above, we should just return what was configured, and in any case, the fallback here should be on hostname rather than the bind address.

@merlimat

merlimat Sep 20, 2016

Contributor

Same as above, we should just return what was configured, and in any case, the fallback here should be on hostname rather than the bind address.

Show outdated Hide outdated pulsar-broker/src/main/java/com/yahoo/pulsar/broker/PulsarService.java Outdated

@merlimat merlimat added this to the 1.15 milestone Sep 20, 2016

@merlimat

This comment has been minimized.

Show comment
Hide comment
@merlimat

merlimat Sep 20, 2016

Contributor

It might be worth adding advertised ports settings for services. 100% in the short term someone is going to ask for it when setting Pulsar in Mesos with Docker.

If you're in the mood, I'm all for it! 😄

It isn't clear to me how the ServiceConfiguration is populated from file. That might be the only missing link.

Ok, that's a bit mysterious, but there's a ServiceConfigurationLoader class that uses reflections to associate the ServiceConfiguration fields to variables in the config file.

Contributor

merlimat commented Sep 20, 2016

It might be worth adding advertised ports settings for services. 100% in the short term someone is going to ask for it when setting Pulsar in Mesos with Docker.

If you're in the mood, I'm all for it! 😄

It isn't clear to me how the ServiceConfiguration is populated from file. That might be the only missing link.

Ok, that's a bit mysterious, but there's a ServiceConfigurationLoader class that uses reflections to associate the ServiceConfiguration fields to variables in the config file.

@radekg

This comment has been minimized.

Show comment
Hide comment
@radekg

radekg Sep 20, 2016

Contributor

@merlimat Addressed the comments related to bindAddress / advertisedAddress configuration setup. Web socket and broker now bind on the bindAddress.

Contributor

radekg commented Sep 20, 2016

@merlimat Addressed the comments related to bindAddress / advertisedAddress configuration setup. Web socket and broker now bind on the bindAddress.

@merlimat

This comment has been minimized.

Show comment
Hide comment
@merlimat

merlimat Sep 20, 2016

Contributor

@rdhabalia Can you take a look as well?

Contributor

merlimat commented Sep 20, 2016

@rdhabalia Can you take a look as well?

Show outdated Hide outdated conf/standalone.conf Outdated
@merlimat

Very nice, thank you!

@radekg

This comment has been minimized.

Show comment
Hide comment
@radekg

radekg Sep 20, 2016

Contributor

@merlimat I'm looking at travis, here: https://travis-ci.org/yahoo/pulsar/builds/161469038, I've been noticing similar situation on my machine:

Running TestSuite
2016-09-20 22:04:02,984 - INFO  - [main:MockedBookKeeperTestCase@65] - >>>>>> starting void com.yahoo.pulsar.broker.service.PersistentMessageFinderTest.testPersistentMessageFinder() throws java.lang.Exception
2016-09-20 22:04:05,505 - INFO  - [main:MockedBookKeeperTestCase@81] - @@@@@@@@@ stopping void com.yahoo.pulsar.broker.service.PersistentMessageFinderTest.testPersistentMessageFinder() throws java.lang.Exception
2016-09-20 22:04:05,543 - INFO  - [main:MockedBookKeeperTestCase@88] - --------- stopped void com.yahoo.pulsar.broker.service.PersistentMessageFinderTest.testPersistentMessageFinder() throws java.lang.Exception
Exception in thread "Thread-7" java.lang.AssertionError: expected [false] but found [true]
    at org.testng.Assert.fail(Assert.java:94)
    at org.testng.Assert.failNotEquals(Assert.java:494)
    at org.testng.Assert.assertFalse(Assert.java:63)
    at org.testng.Assert.assertFalse(Assert.java:73)
    at com.yahoo.pulsar.broker.service.PersistentTopicTest$5.run(PersistentTopicTest.java:483)
Sep 20, 2016 10:04:09 PM org.glassfish.jersey.server.ApplicationHandler initialize
INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 01:25:26...
Sep 20, 2016 10:04:10 PM org.glassfish.jersey.server.ApplicationHandler initialize
INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 01:25:26...
Sep 20, 2016 10:04:11 PM org.glassfish.jersey.server.ApplicationHandler initialize
INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 01:25:26...
2016-09-20 22:06:18,260 - ERROR - [main:Slf4JLogger@171] - LEAK: HashedWheelTimer.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records: 0
Created at:
    io.netty.util.HashedWheelTimer.<init>(HashedWheelTimer.java:241)
    io.netty.util.HashedWheelTimer.<init>(HashedWheelTimer.java:192)
    io.netty.util.HashedWheelTimer.<init>(HashedWheelTimer.java:174)
    com.yahoo.pulsar.client.impl.PulsarClientImpl.<init>(PulsarClientImpl.java:109)
    com.yahoo.pulsar.client.impl.PulsarClientImpl.<init>(PulsarClientImpl.java:82)
    com.yahoo.pulsar.client.api.PulsarClient.create(PulsarClient.java:55)
    com.yahoo.pulsar.broker.service.BacklogQuotaManagerTest.testConcurrentAckAndEviction(BacklogQuotaManagerTest.java:205)
    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    java.lang.reflect.Method.invoke(Method.java:483)
    org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:84)
    org.testng.internal.Invoker.invokeMethod(Invoker.java:714)
    org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901)
    org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231)
    org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
    org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
    org.testng.TestRunner.privateRun(TestRunner.java:767)
    org.testng.TestRunner.run(TestRunner.java:617)
    org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
    org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329)
    org.testng.SuiteRunner.privateRun(SuiteRunner.java:291)
    org.testng.SuiteRunner.run(SuiteRunner.java:240)
    org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
    org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
    org.testng.TestNG.runSuitesSequentially(TestNG.java:1224)
    org.testng.TestNG.runSuitesLocally(TestNG.java:1149)
    org.testng.TestNG.run(TestNG.java:1057)
    org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:132)
    org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeMulti(TestNGDirectoryTestSuite.java:193)
    org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:94)
    org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:147)
    org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:290)
    org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:242)
    org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:121)
No output has been received in the last 10 minutes, this potentially indicates a stalled build or something wrong with the build itself.
The build has been terminated

I believe none of my changes are responsible for this, obviously I could be wrong. I do not remember having such earlier today, when I started working on this PR. Do you know what happened in the meantime?

Contributor

radekg commented Sep 20, 2016

@merlimat I'm looking at travis, here: https://travis-ci.org/yahoo/pulsar/builds/161469038, I've been noticing similar situation on my machine:

Running TestSuite
2016-09-20 22:04:02,984 - INFO  - [main:MockedBookKeeperTestCase@65] - >>>>>> starting void com.yahoo.pulsar.broker.service.PersistentMessageFinderTest.testPersistentMessageFinder() throws java.lang.Exception
2016-09-20 22:04:05,505 - INFO  - [main:MockedBookKeeperTestCase@81] - @@@@@@@@@ stopping void com.yahoo.pulsar.broker.service.PersistentMessageFinderTest.testPersistentMessageFinder() throws java.lang.Exception
2016-09-20 22:04:05,543 - INFO  - [main:MockedBookKeeperTestCase@88] - --------- stopped void com.yahoo.pulsar.broker.service.PersistentMessageFinderTest.testPersistentMessageFinder() throws java.lang.Exception
Exception in thread "Thread-7" java.lang.AssertionError: expected [false] but found [true]
    at org.testng.Assert.fail(Assert.java:94)
    at org.testng.Assert.failNotEquals(Assert.java:494)
    at org.testng.Assert.assertFalse(Assert.java:63)
    at org.testng.Assert.assertFalse(Assert.java:73)
    at com.yahoo.pulsar.broker.service.PersistentTopicTest$5.run(PersistentTopicTest.java:483)
Sep 20, 2016 10:04:09 PM org.glassfish.jersey.server.ApplicationHandler initialize
INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 01:25:26...
Sep 20, 2016 10:04:10 PM org.glassfish.jersey.server.ApplicationHandler initialize
INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 01:25:26...
Sep 20, 2016 10:04:11 PM org.glassfish.jersey.server.ApplicationHandler initialize
INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 01:25:26...
2016-09-20 22:06:18,260 - ERROR - [main:Slf4JLogger@171] - LEAK: HashedWheelTimer.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records: 0
Created at:
    io.netty.util.HashedWheelTimer.<init>(HashedWheelTimer.java:241)
    io.netty.util.HashedWheelTimer.<init>(HashedWheelTimer.java:192)
    io.netty.util.HashedWheelTimer.<init>(HashedWheelTimer.java:174)
    com.yahoo.pulsar.client.impl.PulsarClientImpl.<init>(PulsarClientImpl.java:109)
    com.yahoo.pulsar.client.impl.PulsarClientImpl.<init>(PulsarClientImpl.java:82)
    com.yahoo.pulsar.client.api.PulsarClient.create(PulsarClient.java:55)
    com.yahoo.pulsar.broker.service.BacklogQuotaManagerTest.testConcurrentAckAndEviction(BacklogQuotaManagerTest.java:205)
    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    java.lang.reflect.Method.invoke(Method.java:483)
    org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:84)
    org.testng.internal.Invoker.invokeMethod(Invoker.java:714)
    org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901)
    org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231)
    org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
    org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
    org.testng.TestRunner.privateRun(TestRunner.java:767)
    org.testng.TestRunner.run(TestRunner.java:617)
    org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
    org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329)
    org.testng.SuiteRunner.privateRun(SuiteRunner.java:291)
    org.testng.SuiteRunner.run(SuiteRunner.java:240)
    org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
    org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
    org.testng.TestNG.runSuitesSequentially(TestNG.java:1224)
    org.testng.TestNG.runSuitesLocally(TestNG.java:1149)
    org.testng.TestNG.run(TestNG.java:1057)
    org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:132)
    org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeMulti(TestNGDirectoryTestSuite.java:193)
    org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:94)
    org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:147)
    org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:290)
    org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:242)
    org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:121)
No output has been received in the last 10 minutes, this potentially indicates a stalled build or something wrong with the build itself.
The build has been terminated

I believe none of my changes are responsible for this, obviously I could be wrong. I do not remember having such earlier today, when I started working on this PR. Do you know what happened in the meantime?

@rdhabalia

minor thing. but looks good to me.

@merlimat

This comment has been minimized.

Show comment
Hide comment
@merlimat

merlimat Sep 20, 2016

Contributor

@radekg I think the test issue might be related to the fact that without specifying the IP at bind time, we were always binding on 0.0.0.0. Now, we're specifying the bind address, by passing the hostname, so 127.0.0.1 will be excluded. There might be few unit tests that weren't passing the bindOnLocalhost=true, that are affected.

Contributor

merlimat commented Sep 20, 2016

@radekg I think the test issue might be related to the fact that without specifying the IP at bind time, we were always binding on 0.0.0.0. Now, we're specifying the bind address, by passing the hostname, so 127.0.0.1 will be excluded. There might be few unit tests that weren't passing the bindOnLocalhost=true, that are affected.

@radekg

This comment has been minimized.

Show comment
Hide comment
@radekg

radekg Sep 20, 2016

Contributor

The default binding is still InetAddress.getLocalHost().getHostName() (https://github.com/yahoo/pulsar/pull/26/files#diff-5ff5807c8da7704d2729f62605e53e88R571) which should by default go to localhost. What I see locally, where there is no 10 minute timeout, is that the test continues just fine but it takes about 20 minutes just in that step.

When running these tests locally, the progress can be seen with lsof -i -n -P | grep TCP | grep java | wc -l. The number changes, the test is much, much slower though. I'll give it a shot after a few hours of sleep.

Contributor

radekg commented Sep 20, 2016

The default binding is still InetAddress.getLocalHost().getHostName() (https://github.com/yahoo/pulsar/pull/26/files#diff-5ff5807c8da7704d2729f62605e53e88R571) which should by default go to localhost. What I see locally, where there is no 10 minute timeout, is that the test continues just fine but it takes about 20 minutes just in that step.

When running these tests locally, the progress can be seen with lsof -i -n -P | grep TCP | grep java | wc -l. The number changes, the test is much, much slower though. I'll give it a shot after a few hours of sleep.

@radekg

This comment has been minimized.

Show comment
Hide comment
@radekg
Contributor

radekg commented Sep 23, 2016

@radekg

This comment has been minimized.

Show comment
Hide comment
@radekg

radekg Sep 23, 2016

Contributor

@merlimat with your find, I get:

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Pulsar ............................................. SUCCESS [  0.546 s]
[INFO] Pulsar Checksum .................................... SUCCESS [  6.607 s]
[INFO] Pulsar Common ...................................... SUCCESS [  6.283 s]
[INFO] Managed Ledger ..................................... SUCCESS [01:16 min]
[INFO] Pulsar Client Java ................................. SUCCESS [  2.281 s]
[INFO] Pulsar ZooKeeper Utils ............................. SUCCESS [  5.427 s]
[INFO] pulsar-broker-common ............................... SUCCESS [  0.557 s]
[INFO] Pulsar Discovery Service WAR ....................... SUCCESS [  3.497 s]
[INFO] Pulsar WebSocket ................................... SUCCESS [  1.982 s]
[INFO] Pulsar Client Admin ................................ SUCCESS [  0.781 s]
[INFO] Pulsar Broker ...................................... SUCCESS [11:42 min]
[INFO] Pulsar Client Tools ................................ SUCCESS [  7.853 s]
[INFO] Pulsar Storm adapter ............................... SUCCESS [ 26.670 s]
[INFO] Pulsar Test Client ................................. SUCCESS [  0.862 s]
[INFO] distribution ....................................... SUCCESS [  0.014 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 14:03 min
[INFO] Finished at: 2016-09-23T22:23:01+02:00
[INFO] Final Memory: 202M/755M
[INFO] ------------------------------------------------------------------------

Pushing the commit now, going to try with some sensible defaults in the ServiceConfguration. Please do not merge immediately. I'd consider this a WIP. IMO it is ugly having to declare bindAddress and advertisedAddress explicitly everywhere.

Contributor

radekg commented Sep 23, 2016

@merlimat with your find, I get:

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Pulsar ............................................. SUCCESS [  0.546 s]
[INFO] Pulsar Checksum .................................... SUCCESS [  6.607 s]
[INFO] Pulsar Common ...................................... SUCCESS [  6.283 s]
[INFO] Managed Ledger ..................................... SUCCESS [01:16 min]
[INFO] Pulsar Client Java ................................. SUCCESS [  2.281 s]
[INFO] Pulsar ZooKeeper Utils ............................. SUCCESS [  5.427 s]
[INFO] pulsar-broker-common ............................... SUCCESS [  0.557 s]
[INFO] Pulsar Discovery Service WAR ....................... SUCCESS [  3.497 s]
[INFO] Pulsar WebSocket ................................... SUCCESS [  1.982 s]
[INFO] Pulsar Client Admin ................................ SUCCESS [  0.781 s]
[INFO] Pulsar Broker ...................................... SUCCESS [11:42 min]
[INFO] Pulsar Client Tools ................................ SUCCESS [  7.853 s]
[INFO] Pulsar Storm adapter ............................... SUCCESS [ 26.670 s]
[INFO] Pulsar Test Client ................................. SUCCESS [  0.862 s]
[INFO] distribution ....................................... SUCCESS [  0.014 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 14:03 min
[INFO] Finished at: 2016-09-23T22:23:01+02:00
[INFO] Final Memory: 202M/755M
[INFO] ------------------------------------------------------------------------

Pushing the commit now, going to try with some sensible defaults in the ServiceConfguration. Please do not merge immediately. I'd consider this a WIP. IMO it is ugly having to declare bindAddress and advertisedAddress explicitly everywhere.

@radekg

This comment has been minimized.

Show comment
Hide comment
@radekg

radekg Sep 23, 2016

Contributor

The last commit causes the unit tests to fail. The failures are:

Tests run: 362, Failures: 6, Errors: 0, Skipped: 0, Time elapsed: 711.046 sec <<< FAILURE! - in TestSuite
testUnloadIfBrokerCrashes(com.yahoo.pulsar.broker.SLAMonitoringTest)  Time elapsed: 0.128 sec  <<< FAILURE!
java.lang.AssertionError: expected [pulsar://moan.fttx-di01-b07.german-fiber.net:15240] but found [pulsar://moan.fttx-di01-b07.german-fiber.net:15238]
    at com.yahoo.pulsar.broker.SLAMonitoringTest.testUnloadIfBrokerCrashes(SLAMonitoringTest.java:232)

testBrokerStatsMetrics(com.yahoo.pulsar.broker.service.BrokerServiceTest)  Time elapsed: 0.458 sec  <<< FAILURE!
java.lang.AssertionError: expected [true] but found [false]
    at com.yahoo.pulsar.broker.service.BrokerServiceTest.testBrokerStatsMetrics(BrokerServiceTest.java:220)

testTlsEnabled(com.yahoo.pulsar.broker.service.BrokerServiceTest)  Time elapsed: 0.273 sec  <<< FAILURE!
java.lang.AssertionError: should not fail
    at com.yahoo.pulsar.broker.service.BrokerServiceTest.testTlsEnabled(BrokerServiceTest.java:403)

testTlsAuthAllowInsecure(com.yahoo.pulsar.broker.web.WebServiceTest)  Time elapsed: 0.144 sec  <<< FAILURE!
com.yahoo.pulsar.client.admin.PulsarAdminException: javax.ws.rs.ProcessingException: Already connected
    at com.yahoo.pulsar.broker.web.WebServiceTest.setupEnv(WebServiceTest.java:325)
    at com.yahoo.pulsar.broker.web.WebServiceTest.testTlsAuthAllowInsecure(WebServiceTest.java:192)
Caused by: javax.ws.rs.ProcessingException: Already connected
    at com.yahoo.pulsar.broker.web.WebServiceTest.setupEnv(WebServiceTest.java:325)
    at com.yahoo.pulsar.broker.web.WebServiceTest.testTlsAuthAllowInsecure(WebServiceTest.java:192)
Caused by: java.lang.IllegalStateException: Already connected
    at com.yahoo.pulsar.broker.web.WebServiceTest.setupEnv(WebServiceTest.java:325)
    at com.yahoo.pulsar.broker.web.WebServiceTest.testTlsAuthAllowInsecure(WebServiceTest.java:192)

testTlsAuthDisallowInsecure(com.yahoo.pulsar.broker.web.WebServiceTest)  Time elapsed: 0.135 sec  <<< FAILURE!
com.yahoo.pulsar.client.admin.PulsarAdminException: javax.ws.rs.ProcessingException: Already connected
    at com.yahoo.pulsar.broker.web.WebServiceTest.setupEnv(WebServiceTest.java:325)
    at com.yahoo.pulsar.broker.web.WebServiceTest.testTlsAuthDisallowInsecure(WebServiceTest.java:215)
Caused by: javax.ws.rs.ProcessingException: Already connected
    at com.yahoo.pulsar.broker.web.WebServiceTest.setupEnv(WebServiceTest.java:325)
    at com.yahoo.pulsar.broker.web.WebServiceTest.testTlsAuthDisallowInsecure(WebServiceTest.java:215)
Caused by: java.lang.IllegalStateException: Already connected
    at com.yahoo.pulsar.broker.web.WebServiceTest.setupEnv(WebServiceTest.java:325)
    at com.yahoo.pulsar.broker.web.WebServiceTest.testTlsAuthDisallowInsecure(WebServiceTest.java:215)

testTlsEnabled(com.yahoo.pulsar.broker.web.WebServiceTest)  Time elapsed: 0.14 sec  <<< FAILURE!
java.lang.AssertionError: HTTPS request shouldn't fail
    at com.yahoo.pulsar.broker.web.WebServiceTest.testTlsEnabled(WebServiceTest.java:157)
Caused by: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: No name matching moan.fttx-di01-b07.german-fiber.net found
    at com.yahoo.pulsar.broker.web.WebServiceTest.makeHttpRequest(WebServiceTest.java:258)
    at com.yahoo.pulsar.broker.web.WebServiceTest.testTlsEnabled(WebServiceTest.java:155)
Caused by: java.security.cert.CertificateException: No name matching moan.fttx-di01-b07.german-fiber.net found
    at com.yahoo.pulsar.broker.web.WebServiceTest.makeHttpRequest(WebServiceTest.java:258)
    at com.yahoo.pulsar.broker.web.WebServiceTest.testTlsEnabled(WebServiceTest.java:155)


Results :

Failed tests:
  SLAMonitoringTest.testUnloadIfBrokerCrashes:232 expected [pulsar://moan.fttx-di01-b07.german-fiber.net:15240] but found [pulsar://moan.fttx-di01-b07.german-fiber.net:15238]
  BrokerServiceTest.testBrokerStatsMetrics:220 expected [true] but found [false]
  BrokerServiceTest.testTlsEnabled:403 should not fail
  WebServiceTest.testTlsAuthAllowInsecure:192->setupEnv:325 » PulsarAdmin javax....
  WebServiceTest.testTlsAuthDisallowInsecure:215->setupEnv:325 » PulsarAdmin jav...
  WebServiceTest.testTlsEnabled:157 HTTPS request shouldn't fail
Contributor

radekg commented Sep 23, 2016

The last commit causes the unit tests to fail. The failures are:

Tests run: 362, Failures: 6, Errors: 0, Skipped: 0, Time elapsed: 711.046 sec <<< FAILURE! - in TestSuite
testUnloadIfBrokerCrashes(com.yahoo.pulsar.broker.SLAMonitoringTest)  Time elapsed: 0.128 sec  <<< FAILURE!
java.lang.AssertionError: expected [pulsar://moan.fttx-di01-b07.german-fiber.net:15240] but found [pulsar://moan.fttx-di01-b07.german-fiber.net:15238]
    at com.yahoo.pulsar.broker.SLAMonitoringTest.testUnloadIfBrokerCrashes(SLAMonitoringTest.java:232)

testBrokerStatsMetrics(com.yahoo.pulsar.broker.service.BrokerServiceTest)  Time elapsed: 0.458 sec  <<< FAILURE!
java.lang.AssertionError: expected [true] but found [false]
    at com.yahoo.pulsar.broker.service.BrokerServiceTest.testBrokerStatsMetrics(BrokerServiceTest.java:220)

testTlsEnabled(com.yahoo.pulsar.broker.service.BrokerServiceTest)  Time elapsed: 0.273 sec  <<< FAILURE!
java.lang.AssertionError: should not fail
    at com.yahoo.pulsar.broker.service.BrokerServiceTest.testTlsEnabled(BrokerServiceTest.java:403)

testTlsAuthAllowInsecure(com.yahoo.pulsar.broker.web.WebServiceTest)  Time elapsed: 0.144 sec  <<< FAILURE!
com.yahoo.pulsar.client.admin.PulsarAdminException: javax.ws.rs.ProcessingException: Already connected
    at com.yahoo.pulsar.broker.web.WebServiceTest.setupEnv(WebServiceTest.java:325)
    at com.yahoo.pulsar.broker.web.WebServiceTest.testTlsAuthAllowInsecure(WebServiceTest.java:192)
Caused by: javax.ws.rs.ProcessingException: Already connected
    at com.yahoo.pulsar.broker.web.WebServiceTest.setupEnv(WebServiceTest.java:325)
    at com.yahoo.pulsar.broker.web.WebServiceTest.testTlsAuthAllowInsecure(WebServiceTest.java:192)
Caused by: java.lang.IllegalStateException: Already connected
    at com.yahoo.pulsar.broker.web.WebServiceTest.setupEnv(WebServiceTest.java:325)
    at com.yahoo.pulsar.broker.web.WebServiceTest.testTlsAuthAllowInsecure(WebServiceTest.java:192)

testTlsAuthDisallowInsecure(com.yahoo.pulsar.broker.web.WebServiceTest)  Time elapsed: 0.135 sec  <<< FAILURE!
com.yahoo.pulsar.client.admin.PulsarAdminException: javax.ws.rs.ProcessingException: Already connected
    at com.yahoo.pulsar.broker.web.WebServiceTest.setupEnv(WebServiceTest.java:325)
    at com.yahoo.pulsar.broker.web.WebServiceTest.testTlsAuthDisallowInsecure(WebServiceTest.java:215)
Caused by: javax.ws.rs.ProcessingException: Already connected
    at com.yahoo.pulsar.broker.web.WebServiceTest.setupEnv(WebServiceTest.java:325)
    at com.yahoo.pulsar.broker.web.WebServiceTest.testTlsAuthDisallowInsecure(WebServiceTest.java:215)
Caused by: java.lang.IllegalStateException: Already connected
    at com.yahoo.pulsar.broker.web.WebServiceTest.setupEnv(WebServiceTest.java:325)
    at com.yahoo.pulsar.broker.web.WebServiceTest.testTlsAuthDisallowInsecure(WebServiceTest.java:215)

testTlsEnabled(com.yahoo.pulsar.broker.web.WebServiceTest)  Time elapsed: 0.14 sec  <<< FAILURE!
java.lang.AssertionError: HTTPS request shouldn't fail
    at com.yahoo.pulsar.broker.web.WebServiceTest.testTlsEnabled(WebServiceTest.java:157)
Caused by: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: No name matching moan.fttx-di01-b07.german-fiber.net found
    at com.yahoo.pulsar.broker.web.WebServiceTest.makeHttpRequest(WebServiceTest.java:258)
    at com.yahoo.pulsar.broker.web.WebServiceTest.testTlsEnabled(WebServiceTest.java:155)
Caused by: java.security.cert.CertificateException: No name matching moan.fttx-di01-b07.german-fiber.net found
    at com.yahoo.pulsar.broker.web.WebServiceTest.makeHttpRequest(WebServiceTest.java:258)
    at com.yahoo.pulsar.broker.web.WebServiceTest.testTlsEnabled(WebServiceTest.java:155)


Results :

Failed tests:
  SLAMonitoringTest.testUnloadIfBrokerCrashes:232 expected [pulsar://moan.fttx-di01-b07.german-fiber.net:15240] but found [pulsar://moan.fttx-di01-b07.german-fiber.net:15238]
  BrokerServiceTest.testBrokerStatsMetrics:220 expected [true] but found [false]
  BrokerServiceTest.testTlsEnabled:403 should not fail
  WebServiceTest.testTlsAuthAllowInsecure:192->setupEnv:325 » PulsarAdmin javax....
  WebServiceTest.testTlsAuthDisallowInsecure:215->setupEnv:325 » PulsarAdmin jav...
  WebServiceTest.testTlsEnabled:157 HTTPS request shouldn't fail
@radekg

This comment has been minimized.

Show comment
Hide comment
@radekg

radekg Sep 23, 2016

Contributor

Right, all tls tests need to use localhost as advertisedAddress because of the certificate.

Contributor

radekg commented Sep 23, 2016

Right, all tls tests need to use localhost as advertisedAddress because of the certificate.

@radekg

This comment has been minimized.

Show comment
Hide comment
@radekg

radekg Sep 24, 2016

Contributor

@merlimat @rdhabalia the final failing test is the following:
SLAMonitoringTest.testUnloadIfBrokerCrashes:230 expected [pulsar://moan.fttx-di01-b07.german-fiber.net:15240] but found [pulsar://moan.fttx-di01-b07.german-fiber.net:15244]

I wonder, should the 0 in https://github.com/yahoo/pulsar/blob/master/pulsar-broker/src/test/java/com/yahoo/pulsar/broker/SLAMonitoringTest.java#L230 be crashIndex? If not, any helper why is this test failing?

Contributor

radekg commented Sep 24, 2016

@merlimat @rdhabalia the final failing test is the following:
SLAMonitoringTest.testUnloadIfBrokerCrashes:230 expected [pulsar://moan.fttx-di01-b07.german-fiber.net:15240] but found [pulsar://moan.fttx-di01-b07.german-fiber.net:15244]

I wonder, should the 0 in https://github.com/yahoo/pulsar/blob/master/pulsar-broker/src/test/java/com/yahoo/pulsar/broker/SLAMonitoringTest.java#L230 be crashIndex? If not, any helper why is this test failing?

@@ -5,6 +5,8 @@ test-results
dependency-reduced-pom.xml
logs
/data
pulsar-broker/tmp.*
pulsar-broker/src/test/resources/log4j.properties

This comment has been minimized.

@radekg

radekg Sep 24, 2016

Contributor

Naughty, I know, might be a good idea to go through logging.

@radekg

radekg Sep 24, 2016

Contributor

Naughty, I know, might be a good idea to go through logging.

@radekg

This comment has been minimized.

Show comment
Hide comment
@radekg

radekg Sep 24, 2016

Contributor

I do not understand that single test. I my own head, it should be the following: the destination is owned by the broker which is about to be stopped. The broker is stopped, the ownership changes. Once the original broker comes back, the ownership should be transferred to to the original owner. If that is what should be happening, something does not work?

I have changed this test (no commit anywhere) to this:

    @Test
    public void testUnloadIfBrokerCrashes() {
        int crashIndex = BROKER_COUNT / 2;
        log.info("Trying to close the broker at index = {}", crashIndex);

        String destination = String.format("persistent://%s/%s/%s:%s/%s", NamespaceService.SLA_NAMESPACE_PROPERTY,
                "my-cluster", pulsarServices[crashIndex].getAdvertisedAddress(), brokerWebServicePorts[crashIndex], "my-topic");

        String originalOwner = null;
        try {
            originalOwner = pulsarAdmins[0].lookups().lookupDestination(destination);
        } catch (PulsarAdminException e) {
            fail("Should be a able to read the original onwer.Exception: " + e);
        }

        try {
            pulsarServices[crashIndex].close();
        } catch (PulsarServerException e) {
            e.printStackTrace();
            fail("Should be a able to close the broker index " + crashIndex + " Exception: " + e);
        }

        log.info("Lookup for namespace {}", destination);

        String ownerAfterStop = null;
        try {
            ownerAfterStop = pulsarAdmins[BROKER_COUNT - 1].lookups().lookupDestination(destination);
            log.info("{} Namespace is owned by {}", destination, ownerAfterStop);
            assertNotEquals(ownerAfterStop,
                    "pulsar://" + pulsarServices[crashIndex].getAdvertisedAddress() + ":" + brokerNativeBrokerPorts[crashIndex]);
        } catch (PulsarAdminException e) {
            e.printStackTrace();
            fail("The SLA Monitor namespace should be owned by some other broker");
        }

        // Check if the namespace is properly unloaded and reowned by the broker
        try {
            pulsarServices[crashIndex] = new PulsarService(configurations[crashIndex]);
            pulsarServices[crashIndex].start();
            assertEquals(pulsarServices[crashIndex].getConfiguration().getBrokerServicePort(),
                    brokerNativeBrokerPorts[crashIndex]);
        } catch (PulsarServerException e) {
            e.printStackTrace();
            fail("The broker should be able to start without exception");
        }

        String ownerAfterRestart = null;
        try {
            Thread.sleep(5000);
            ownerAfterRestart = pulsarAdmins[0].lookups().lookupDestination(destination);
            log.info("{} Namespace is re-owned by {}", destination, ownerAfterRestart);
            System.out.println(" -----------------> SLA 1.1: " + originalOwner);
            System.out.println(" -----------------> SLA 1.2: " + ownerAfterStop);
            System.out.println(" -----------------> SLA 1.3: " + ownerAfterRestart);
            for (int i=0; i<BROKER_COUNT; i++) {
                System.out.println(" -----------------> SLA 2." + i + ": pulsar://" + pulsarServices[i].getAdvertisedAddress() + ":" + brokerNativeBrokerPorts[i]);
            }
            //assertEquals(ownerAfterStop, ownerAfterRestart);
        } catch (PulsarAdminException e) {
            e.printStackTrace();
            fail("The SLA Monitor namespace should be reowned by the original broker " + originalOwner + ".");
        } catch (InterruptedException ex) {
            fail("Could not wait!");
        }

        try {
            pulsarServices[crashIndex].close();
        } catch (PulsarServerException e) {
            e.printStackTrace();
            fail("The broker should be able to stop without exception");
        }
    }

The output from the test is:

2016-09-24 19:43:46,083 - INFO  - [pulsar-web-9939-8:PulsarService@423] - Loading all topics on service unit: sla-monitor/my-cluster/moan.fttx-di01-b07.german-fiber.net:15239/0x00000000_0xffffffff
2016-09-24 19:43:46,083 - INFO  - [main:SLAMonitoringTest@215] - persistent://sla-monitor/my-cluster/moan.fttx-di01-b07.german-fiber.net:15239/my-topic Namespace is owned by pulsar://moan.fttx-di01-b07.german-fiber.net:15238
2016-09-24 19:43:46,084 - INFO  - [main-EventThread:ZookeeperClientFactoryImpl@67] - ZooKeeper session established: State:CONNECTED Timeout:30000 sessionid:0x1575d4bb54f0011 local:/127.0.0.1:54973 remoteserver:127.0.0.1/127.0.0.1:15017 lastZxid:0 xid:1 sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:0
2016-09-24 19:43:46,085 - INFO  - [main:PulsarService@361] - starting configuration cache service
2016-09-24 19:43:46,085 - INFO  - [main-EventThread:ZookeeperClientFactoryImpl@67] - ZooKeeper session established: State:CONNECTED Timeout:30000 sessionid:0x1575d4bb54f0012 local:/127.0.0.1:54974 remoteserver:127.0.0.1/127.0.0.1:15017 lastZxid:0 xid:1 sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:0
2016-09-24 19:43:46,086 - WARN  - [main:ZkBookieRackAffinityMapping@140] - Error getting bookie info from zk, using default rack node /default-rack: KeeperErrorCode = NoNode for /bookies
2016-09-24 19:43:46,088 - WARN  - [main-EventThread:ZkBookieRackAffinityMapping@140] - Error getting bookie info from zk, using default rack node /default-rack: KeeperErrorCode = NoNode for /bookies
2016-09-24 19:43:46,088 - WARN  - [main-EventThread:ZkBookieRackAffinityMapping@140] - Error getting bookie info from zk, using default rack node /default-rack: KeeperErrorCode = NoNode for /bookies
2016-09-24 19:43:46,088 - WARN  - [main-EventThread:ZkBookieRackAffinityMapping@140] - Error getting bookie info from zk, using default rack node /default-rack: KeeperErrorCode = NoNode for /bookies
2016-09-24 19:43:46,089 - INFO  - [main:BrokerService@161] - Using 16 threads for broker service IO
2016-09-24 19:43:46,090 - INFO  - [main:AuthenticationService@61] - Authentication is disabled
2016-09-24 19:43:46,090 - INFO  - [main:PulsarService@389] - Starting load management service ...
2016-09-24 19:43:46,091 - INFO  - [pulsar-ordered-9849-1:ZooKeeperChildrenCache@67] - reloadCache called in zookeeperChildrenCache for path /loadbalance/brokers
2016-09-24 19:43:46,091 - INFO  - [pulsar-ordered-10037-1:ZooKeeperChildrenCache@67] - reloadCache called in zookeeperChildrenCache for path /loadbalance/brokers
2016-09-24 19:43:46,091 - INFO  - [pulsar-ordered-9990-1:ZooKeeperChildrenCache@67] - reloadCache called in zookeeperChildrenCache for path /loadbalance/brokers
2016-09-24 19:43:46,091 - INFO  - [pulsar-ordered-9896-1:ZooKeeperChildrenCache@67] - reloadCache called in zookeeperChildrenCache for path /loadbalance/brokers
2016-09-24 19:43:46,091 - INFO  - [pool-1415-thread-1:SimpleLoadManagerImpl@642] - doLoadRanking - load balancing strategy: weightedRandomSelection
2016-09-24 19:43:46,091 - INFO  - [pool-1427-thread-1:SimpleLoadManagerImpl@642] - doLoadRanking - load balancing strategy: weightedRandomSelection
2016-09-24 19:43:46,091 - INFO  - [pool-1419-thread-1:SimpleLoadManagerImpl@642] - doLoadRanking - load balancing strategy: weightedRandomSelection
2016-09-24 19:43:46,092 - INFO  - [pool-1431-thread-1:SimpleLoadManagerImpl@642] - doLoadRanking - load balancing strategy: weightedRandomSelection
2016-09-24 19:43:46,092 - INFO  - [main:SimpleLoadManagerImpl@642] - doLoadRanking - load balancing strategy: weightedRandomSelection
2016-09-24 19:43:46,092 - INFO  - [main:SimpleLoadManagerImpl@267] - Created broker ephemeral node on /loadbalance/brokers/moan.fttx-di01-b07.german-fiber.net:15239
2016-09-24 19:43:46,093 - INFO  - [main:PulsarService@379] - starting name space service, bootstrap namespaces=[]
2016-09-24 19:43:46,097 - INFO  - [main:NamespaceService@127] - namespace service is ready ...
2016-09-24 19:43:46,097 - INFO  - [main:PulsarService@246] - Starting Pulsar Broker service
2016-09-24 19:43:46,098 - INFO  - [main:DistributedIdGenerator@76] - Created sequential node at /counters/producer-name/-0000000005 -- Generator Id is my-cluster-5
2016-09-24 19:43:46,098 - INFO  - [main:BrokerService@225] - Started Pulsar Broker service on port 15240
2016-09-24 19:43:46,099 - INFO  - [main:BrokerService@265] - Scheduling a thread to check backlog quota after [60] seconds in background
2016-09-24 19:43:46,100 - INFO  - [main:WebService@137] - Servlet path: '/' -- Enable client version check: false -- shouldCheckApiVersionOnPath: false
2016-09-24 19:43:46,100 - INFO  - [main:WebService@137] - Servlet path: '/admin' -- Enable client version check: false -- shouldCheckApiVersionOnPath: false
2016-09-24 19:43:46,100 - INFO  - [main:WebService@137] - Servlet path: '/lookup' -- Enable client version check: false -- shouldCheckApiVersionOnPath: true
2016-09-24 19:43:46,101 - INFO  - [main:OwnershipCache$OwnedServiceUnitCacheLoader@105] - Acquiring zk lock on namespace /namespace/pulsar/my-cluster/moan.fttx-di01-b07.german-fiber.net:15239/0x00000000_0xffffffff
2016-09-24 19:43:46,102 - INFO  - [main:OwnershipCache$OwnedServiceUnitCacheLoader@120] - Acquired zk lock on namespace /namespace/pulsar/my-cluster/moan.fttx-di01-b07.german-fiber.net:15239/0x00000000_0xffffffff
2016-09-24 19:43:46,103 - INFO  - [main:PulsarService@423] - Loading all topics on service unit: pulsar/my-cluster/moan.fttx-di01-b07.german-fiber.net:15239/0x00000000_0xffffffff
2016-09-24 19:43:46,103 - INFO  - [main:NamespaceService@206] - added heartbeat namespace name in local cache: ns=pulsar/my-cluster/moan.fttx-di01-b07.german-fiber.net:15239
2016-09-24 19:43:46,103 - INFO  - [main:LeaderElectionService@171] - LeaderElectionService started
2016-09-24 19:43:46,104 - INFO  - [main:LeaderElectionService@123] - Broker [http://moan.fttx-di01-b07.german-fiber.net:15239] is the follower now. Waiting for the watch to trigger...
2016-09-24 19:43:46,186 - INFO  - [main:WebService@199] - Web Service started at http://moan.fttx-di01-b07.german-fiber.net:15239
2016-09-24 19:43:46,187 - ERROR - [pulsar-10081-1:JvmMetrics@132] - Failed to collect GC stats: java.lang:type=GarbageCollector,name=G1 Young Generation
2016-09-24 19:43:46,187 - INFO  - [main:PulsarService@311] - messaging service is ready, bootstrap service on port=15239, broker url=pulsar://moan.fttx-di01-b07.german-fiber.net:15240, cluster=my-cluster, configs=com.yahoo.pulsar.broker.ServiceConfiguration@3f855e48
2016-09-24 19:43:51,190 - INFO  - [main:SLAMonitoringTest@238] - persistent://sla-monitor/my-cluster/moan.fttx-di01-b07.german-fiber.net:15239/my-topic Namespace is re-owned by pulsar://moan.fttx-di01-b07.german-fiber.net:15238
 -----------------> SLA 1.1: pulsar://moan.fttx-di01-b07.german-fiber.net:15240
 -----------------> SLA 1.2: pulsar://moan.fttx-di01-b07.german-fiber.net:15238
 -----------------> SLA 1.3: pulsar://moan.fttx-di01-b07.german-fiber.net:15238
 -----------------> SLA 2.0: pulsar://moan.fttx-di01-b07.german-fiber.net:15236
 -----------------> SLA 2.1: pulsar://moan.fttx-di01-b07.german-fiber.net:15238
 -----------------> SLA 2.2: pulsar://moan.fttx-di01-b07.german-fiber.net:15240
 -----------------> SLA 2.3: pulsar://moan.fttx-di01-b07.german-fiber.net:15242
 -----------------> SLA 2.4: pulsar://moan.fttx-di01-b07.german-fiber.net:15244

So the new owner is always crashIndex - 1, even when the original owner comes back. Is that what it should be or am I totally missing the purpose of this test.

Contributor

radekg commented Sep 24, 2016

I do not understand that single test. I my own head, it should be the following: the destination is owned by the broker which is about to be stopped. The broker is stopped, the ownership changes. Once the original broker comes back, the ownership should be transferred to to the original owner. If that is what should be happening, something does not work?

I have changed this test (no commit anywhere) to this:

    @Test
    public void testUnloadIfBrokerCrashes() {
        int crashIndex = BROKER_COUNT / 2;
        log.info("Trying to close the broker at index = {}", crashIndex);

        String destination = String.format("persistent://%s/%s/%s:%s/%s", NamespaceService.SLA_NAMESPACE_PROPERTY,
                "my-cluster", pulsarServices[crashIndex].getAdvertisedAddress(), brokerWebServicePorts[crashIndex], "my-topic");

        String originalOwner = null;
        try {
            originalOwner = pulsarAdmins[0].lookups().lookupDestination(destination);
        } catch (PulsarAdminException e) {
            fail("Should be a able to read the original onwer.Exception: " + e);
        }

        try {
            pulsarServices[crashIndex].close();
        } catch (PulsarServerException e) {
            e.printStackTrace();
            fail("Should be a able to close the broker index " + crashIndex + " Exception: " + e);
        }

        log.info("Lookup for namespace {}", destination);

        String ownerAfterStop = null;
        try {
            ownerAfterStop = pulsarAdmins[BROKER_COUNT - 1].lookups().lookupDestination(destination);
            log.info("{} Namespace is owned by {}", destination, ownerAfterStop);
            assertNotEquals(ownerAfterStop,
                    "pulsar://" + pulsarServices[crashIndex].getAdvertisedAddress() + ":" + brokerNativeBrokerPorts[crashIndex]);
        } catch (PulsarAdminException e) {
            e.printStackTrace();
            fail("The SLA Monitor namespace should be owned by some other broker");
        }

        // Check if the namespace is properly unloaded and reowned by the broker
        try {
            pulsarServices[crashIndex] = new PulsarService(configurations[crashIndex]);
            pulsarServices[crashIndex].start();
            assertEquals(pulsarServices[crashIndex].getConfiguration().getBrokerServicePort(),
                    brokerNativeBrokerPorts[crashIndex]);
        } catch (PulsarServerException e) {
            e.printStackTrace();
            fail("The broker should be able to start without exception");
        }

        String ownerAfterRestart = null;
        try {
            Thread.sleep(5000);
            ownerAfterRestart = pulsarAdmins[0].lookups().lookupDestination(destination);
            log.info("{} Namespace is re-owned by {}", destination, ownerAfterRestart);
            System.out.println(" -----------------> SLA 1.1: " + originalOwner);
            System.out.println(" -----------------> SLA 1.2: " + ownerAfterStop);
            System.out.println(" -----------------> SLA 1.3: " + ownerAfterRestart);
            for (int i=0; i<BROKER_COUNT; i++) {
                System.out.println(" -----------------> SLA 2." + i + ": pulsar://" + pulsarServices[i].getAdvertisedAddress() + ":" + brokerNativeBrokerPorts[i]);
            }
            //assertEquals(ownerAfterStop, ownerAfterRestart);
        } catch (PulsarAdminException e) {
            e.printStackTrace();
            fail("The SLA Monitor namespace should be reowned by the original broker " + originalOwner + ".");
        } catch (InterruptedException ex) {
            fail("Could not wait!");
        }

        try {
            pulsarServices[crashIndex].close();
        } catch (PulsarServerException e) {
            e.printStackTrace();
            fail("The broker should be able to stop without exception");
        }
    }

The output from the test is:

2016-09-24 19:43:46,083 - INFO  - [pulsar-web-9939-8:PulsarService@423] - Loading all topics on service unit: sla-monitor/my-cluster/moan.fttx-di01-b07.german-fiber.net:15239/0x00000000_0xffffffff
2016-09-24 19:43:46,083 - INFO  - [main:SLAMonitoringTest@215] - persistent://sla-monitor/my-cluster/moan.fttx-di01-b07.german-fiber.net:15239/my-topic Namespace is owned by pulsar://moan.fttx-di01-b07.german-fiber.net:15238
2016-09-24 19:43:46,084 - INFO  - [main-EventThread:ZookeeperClientFactoryImpl@67] - ZooKeeper session established: State:CONNECTED Timeout:30000 sessionid:0x1575d4bb54f0011 local:/127.0.0.1:54973 remoteserver:127.0.0.1/127.0.0.1:15017 lastZxid:0 xid:1 sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:0
2016-09-24 19:43:46,085 - INFO  - [main:PulsarService@361] - starting configuration cache service
2016-09-24 19:43:46,085 - INFO  - [main-EventThread:ZookeeperClientFactoryImpl@67] - ZooKeeper session established: State:CONNECTED Timeout:30000 sessionid:0x1575d4bb54f0012 local:/127.0.0.1:54974 remoteserver:127.0.0.1/127.0.0.1:15017 lastZxid:0 xid:1 sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:0
2016-09-24 19:43:46,086 - WARN  - [main:ZkBookieRackAffinityMapping@140] - Error getting bookie info from zk, using default rack node /default-rack: KeeperErrorCode = NoNode for /bookies
2016-09-24 19:43:46,088 - WARN  - [main-EventThread:ZkBookieRackAffinityMapping@140] - Error getting bookie info from zk, using default rack node /default-rack: KeeperErrorCode = NoNode for /bookies
2016-09-24 19:43:46,088 - WARN  - [main-EventThread:ZkBookieRackAffinityMapping@140] - Error getting bookie info from zk, using default rack node /default-rack: KeeperErrorCode = NoNode for /bookies
2016-09-24 19:43:46,088 - WARN  - [main-EventThread:ZkBookieRackAffinityMapping@140] - Error getting bookie info from zk, using default rack node /default-rack: KeeperErrorCode = NoNode for /bookies
2016-09-24 19:43:46,089 - INFO  - [main:BrokerService@161] - Using 16 threads for broker service IO
2016-09-24 19:43:46,090 - INFO  - [main:AuthenticationService@61] - Authentication is disabled
2016-09-24 19:43:46,090 - INFO  - [main:PulsarService@389] - Starting load management service ...
2016-09-24 19:43:46,091 - INFO  - [pulsar-ordered-9849-1:ZooKeeperChildrenCache@67] - reloadCache called in zookeeperChildrenCache for path /loadbalance/brokers
2016-09-24 19:43:46,091 - INFO  - [pulsar-ordered-10037-1:ZooKeeperChildrenCache@67] - reloadCache called in zookeeperChildrenCache for path /loadbalance/brokers
2016-09-24 19:43:46,091 - INFO  - [pulsar-ordered-9990-1:ZooKeeperChildrenCache@67] - reloadCache called in zookeeperChildrenCache for path /loadbalance/brokers
2016-09-24 19:43:46,091 - INFO  - [pulsar-ordered-9896-1:ZooKeeperChildrenCache@67] - reloadCache called in zookeeperChildrenCache for path /loadbalance/brokers
2016-09-24 19:43:46,091 - INFO  - [pool-1415-thread-1:SimpleLoadManagerImpl@642] - doLoadRanking - load balancing strategy: weightedRandomSelection
2016-09-24 19:43:46,091 - INFO  - [pool-1427-thread-1:SimpleLoadManagerImpl@642] - doLoadRanking - load balancing strategy: weightedRandomSelection
2016-09-24 19:43:46,091 - INFO  - [pool-1419-thread-1:SimpleLoadManagerImpl@642] - doLoadRanking - load balancing strategy: weightedRandomSelection
2016-09-24 19:43:46,092 - INFO  - [pool-1431-thread-1:SimpleLoadManagerImpl@642] - doLoadRanking - load balancing strategy: weightedRandomSelection
2016-09-24 19:43:46,092 - INFO  - [main:SimpleLoadManagerImpl@642] - doLoadRanking - load balancing strategy: weightedRandomSelection
2016-09-24 19:43:46,092 - INFO  - [main:SimpleLoadManagerImpl@267] - Created broker ephemeral node on /loadbalance/brokers/moan.fttx-di01-b07.german-fiber.net:15239
2016-09-24 19:43:46,093 - INFO  - [main:PulsarService@379] - starting name space service, bootstrap namespaces=[]
2016-09-24 19:43:46,097 - INFO  - [main:NamespaceService@127] - namespace service is ready ...
2016-09-24 19:43:46,097 - INFO  - [main:PulsarService@246] - Starting Pulsar Broker service
2016-09-24 19:43:46,098 - INFO  - [main:DistributedIdGenerator@76] - Created sequential node at /counters/producer-name/-0000000005 -- Generator Id is my-cluster-5
2016-09-24 19:43:46,098 - INFO  - [main:BrokerService@225] - Started Pulsar Broker service on port 15240
2016-09-24 19:43:46,099 - INFO  - [main:BrokerService@265] - Scheduling a thread to check backlog quota after [60] seconds in background
2016-09-24 19:43:46,100 - INFO  - [main:WebService@137] - Servlet path: '/' -- Enable client version check: false -- shouldCheckApiVersionOnPath: false
2016-09-24 19:43:46,100 - INFO  - [main:WebService@137] - Servlet path: '/admin' -- Enable client version check: false -- shouldCheckApiVersionOnPath: false
2016-09-24 19:43:46,100 - INFO  - [main:WebService@137] - Servlet path: '/lookup' -- Enable client version check: false -- shouldCheckApiVersionOnPath: true
2016-09-24 19:43:46,101 - INFO  - [main:OwnershipCache$OwnedServiceUnitCacheLoader@105] - Acquiring zk lock on namespace /namespace/pulsar/my-cluster/moan.fttx-di01-b07.german-fiber.net:15239/0x00000000_0xffffffff
2016-09-24 19:43:46,102 - INFO  - [main:OwnershipCache$OwnedServiceUnitCacheLoader@120] - Acquired zk lock on namespace /namespace/pulsar/my-cluster/moan.fttx-di01-b07.german-fiber.net:15239/0x00000000_0xffffffff
2016-09-24 19:43:46,103 - INFO  - [main:PulsarService@423] - Loading all topics on service unit: pulsar/my-cluster/moan.fttx-di01-b07.german-fiber.net:15239/0x00000000_0xffffffff
2016-09-24 19:43:46,103 - INFO  - [main:NamespaceService@206] - added heartbeat namespace name in local cache: ns=pulsar/my-cluster/moan.fttx-di01-b07.german-fiber.net:15239
2016-09-24 19:43:46,103 - INFO  - [main:LeaderElectionService@171] - LeaderElectionService started
2016-09-24 19:43:46,104 - INFO  - [main:LeaderElectionService@123] - Broker [http://moan.fttx-di01-b07.german-fiber.net:15239] is the follower now. Waiting for the watch to trigger...
2016-09-24 19:43:46,186 - INFO  - [main:WebService@199] - Web Service started at http://moan.fttx-di01-b07.german-fiber.net:15239
2016-09-24 19:43:46,187 - ERROR - [pulsar-10081-1:JvmMetrics@132] - Failed to collect GC stats: java.lang:type=GarbageCollector,name=G1 Young Generation
2016-09-24 19:43:46,187 - INFO  - [main:PulsarService@311] - messaging service is ready, bootstrap service on port=15239, broker url=pulsar://moan.fttx-di01-b07.german-fiber.net:15240, cluster=my-cluster, configs=com.yahoo.pulsar.broker.ServiceConfiguration@3f855e48
2016-09-24 19:43:51,190 - INFO  - [main:SLAMonitoringTest@238] - persistent://sla-monitor/my-cluster/moan.fttx-di01-b07.german-fiber.net:15239/my-topic Namespace is re-owned by pulsar://moan.fttx-di01-b07.german-fiber.net:15238
 -----------------> SLA 1.1: pulsar://moan.fttx-di01-b07.german-fiber.net:15240
 -----------------> SLA 1.2: pulsar://moan.fttx-di01-b07.german-fiber.net:15238
 -----------------> SLA 1.3: pulsar://moan.fttx-di01-b07.german-fiber.net:15238
 -----------------> SLA 2.0: pulsar://moan.fttx-di01-b07.german-fiber.net:15236
 -----------------> SLA 2.1: pulsar://moan.fttx-di01-b07.german-fiber.net:15238
 -----------------> SLA 2.2: pulsar://moan.fttx-di01-b07.german-fiber.net:15240
 -----------------> SLA 2.3: pulsar://moan.fttx-di01-b07.german-fiber.net:15242
 -----------------> SLA 2.4: pulsar://moan.fttx-di01-b07.german-fiber.net:15244

So the new owner is always crashIndex - 1, even when the original owner comes back. Is that what it should be or am I totally missing the purpose of this test.

@rdhabalia

This comment has been minimized.

Show comment
Hide comment
@rdhabalia

rdhabalia Sep 26, 2016

Contributor

@radekg
Can you try: replacing host with getAdvertisedAddress() at PulsarService

As Pulsar has default SLAMonitoring-namespace which looks for broker-address registered at zk-loadbalancer node and it is registered with getAdvertisedAddress() but right now it is getting host and therefore it is failing.

Change:

AdminResource.path("policies") + "/" + NamespaceService.getSLAMonitorNamespace(getAdvertisedAddress(), config)))
Contributor

rdhabalia commented Sep 26, 2016

@radekg
Can you try: replacing host with getAdvertisedAddress() at PulsarService

As Pulsar has default SLAMonitoring-namespace which looks for broker-address registered at zk-loadbalancer node and it is registered with getAdvertisedAddress() but right now it is getting host and therefore it is failing.

Change:

AdminResource.path("policies") + "/" + NamespaceService.getSLAMonitorNamespace(getAdvertisedAddress(), config)))
@radekg

This comment has been minimized.

Show comment
Hide comment
@radekg

radekg Sep 26, 2016

Contributor

@rdhabalia that does the job. This is now passing the tests. Thank you! Can we review again?
I was also wondering if it isn't worth changing the PulsarService.getHost() to a little bit more descriptive PulsarService.getBindAddress(). What do you think?

Contributor

radekg commented Sep 26, 2016

@rdhabalia that does the job. This is now passing the tests. Thank you! Can we review again?
I was also wondering if it isn't worth changing the PulsarService.getHost() to a little bit more descriptive PulsarService.getBindAddress(). What do you think?

@radekg

This comment has been minimized.

Show comment
Hide comment
Contributor

radekg commented Sep 26, 2016

@radekg

This comment has been minimized.

Show comment
Hide comment
@radekg

radekg Sep 26, 2016

Contributor

The failure seems to be one of those intermittent issues. It successfully ran here: https://travis-ci.org/radekg/pulsar/builds/162817512. I guess the build needs to be restarted?

Contributor

radekg commented Sep 26, 2016

The failure seems to be one of those intermittent issues. It successfully ran here: https://travis-ci.org/radekg/pulsar/builds/162817512. I guess the build needs to be restarted?

@merlimat

Change looks good, thanks for going after all test cases. Just one very minor comment

Address the final review point, adjust the metrics test, rename Pulsa…
…rService.getHost() to PulsarService.getBindAddress().
@merlimat

This comment has been minimized.

Show comment
Hide comment
@merlimat

merlimat Sep 26, 2016

Contributor

Nice work!

Contributor

merlimat commented Sep 26, 2016

Nice work!

@merlimat merlimat merged commit 8df6b74 into apache:master Sep 26, 2016

@radekg

This comment has been minimized.

Show comment
Hide comment
@radekg

radekg Sep 26, 2016

Contributor

Thank you!

Contributor

radekg commented Sep 26, 2016

Thank you!

lucperkins pushed a commit to lucperkins/incubator-pulsar that referenced this pull request Dec 26, 2017

System exception (apache#26)
* Create pulsar-functions module (#1)

* Create pulsar-functions module

* rename `sdk` package to `api`

* Added the first cut of the Java interface for Pulsar functions (apache#2)

* Introduced the concept of systemException to differentiate from userException

sijie added a commit to sijie/incubator-pulsar that referenced this pull request Mar 1, 2018

System exception (apache#26)
* Create pulsar-functions module (#1)

* Create pulsar-functions module

* rename `sdk` package to `api`

* Added the first cut of the Java interface for Pulsar functions (#2)

* Introduced the concept of systemException to differentiate from userException

sijie added a commit to sijie/incubator-pulsar that referenced this pull request Mar 4, 2018

System exception (apache#26)
* Create pulsar-functions module (#1)

* Create pulsar-functions module

* rename `sdk` package to `api`

* Added the first cut of the Java interface for Pulsar functions (#2)

* Introduced the concept of systemException to differentiate from userException
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment