Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster not able to start #904

Closed
cookingkode opened this issue Oct 30, 2018 · 7 comments
Closed

cluster not able to start #904

cookingkode opened this issue Oct 30, 2018 · 7 comments
Labels
archived Archived issues from the legacy Java implementation of Atomix legacy Issues from the legacy Java implementation of Atomix

Comments

@cookingkode
Copy link

I am trying to start a 2 node cluster on the same machine. I'm running the same spring boot jar twice giving different values to myPort and otherPort ( code below)

AtomixBuilder builder = Atomix.builder()
.withMemberId(myName)
.withAddress("localhost:" + myPort)
.withMembershipProvider(BootstrapDiscoveryProvider.builder()
.withNodes(
Node.builder()
.withId(otherName).
withAddress("localhost:" + otherPort)
.build()
).build());

   builder.addProfile(Profile.dataGrid());
   this.ref = builder.build();
   this.ref.start().join(); 

The two instances keep communicate with each other, but the code never returns. I enabled debug logs but i dont see any errors. What am i missing?

@kuujo
Copy link
Member

kuujo commented Oct 30, 2018 via email

@cookingkode
Copy link
Author

Looks like it's HashBasedPrimaryElectionService is the last service started..And later on there is . started msg for the Raft Partition group as well.. Will try to debug further..

2018-10-31 09:45:21.863 INFO 33092 --- [nt-nio-server-0] i.a.c.m.i.NettyMessagingService : Started
2018-10-31 09:45:21.875 INFO 33092 --- [tomix-cluster-0] i.a.c.d.BootstrapDiscoveryProvider : Joined
2018-10-31 09:45:21.875 INFO 33092 --- [tomix-cluster-0] i.a.c.i.DefaultClusterMembershipService : member2 - Member activated: Member{id=member2, address=localhost:2222, properties={}}
2018-10-31 09:45:21.877 INFO 33092 --- [tomix-cluster-0] i.a.c.i.DefaultClusterMembershipService : Started
2018-10-31 09:45:21.877 INFO 33092 --- [tomix-cluster-0] c.m.i.DefaultClusterCommunicationService : Started
2018-10-31 09:45:21.878 INFO 33092 --- [tomix-cluster-0] i.a.c.m.i.DefaultClusterEventService : Started
2018-10-31 09:45:21.884 INFO 33092 --- [ atomix-0] i.DefaultPartitionGroupMembershipService : Started

2018-10-31 09:45:21.913 INFO 33092 --- [ atomix-0] .a.p.p.i.HashBasedPrimaryElectionService :
i.a.p.r.i.DefaultRaftServer : RaftServer{system-partition-1} - Server started successfully!

2018-10-31 09:45:26.358 DEBUG 33092 --- [tem-partition-1] i.a.p.r.p.i.RaftPartitionServer : Successfully started server for partition PartitionId{id=1, group=system}
2018-10-31 09:45:26.372 DEBUG 33092 --- [tem-partition-1] i.a.p.r.p.i.RaftPartitionClient : Successfully started client for partition PartitionId{id=1, group=system}
2018-10-31 09:45:26.372 INFO 33092 --- [tem-partition-1] i.a.p.r.p.RaftPartitionGroup : Started

@lukasz-antoniak
Copy link

While running Atomix 3.1.0, I encountered the same issue. I am able to start three Atomix nodes with my IDE, all on different ports of localhost, and system works as expected. After moving to Docker Compose and running three containers, none of the nodes can join the cluster.

Last message in the log:

[2019-01-08 22:33:58,358] INFO [io.atomix.protocols.raft.partition.RaftPartitionGroup] Started

From what I can tell, io.atomix.primitive.partition.impl.DefaultPartitionService does not complete its start-up procedure (that is the next message in the log while running in IDE). All members are able to see each other, as cluster membership events of type MEMBER_ADDED have been triggered.

Configuration:

cluster {
  clusterId: "atomix"
  node {
    id: "member-0"
    host: "node1"
    port: 5001
  }
  discovery {
    type: bootstrap
    nodes.1 {
      id: "member-0"
      host: "node1"
      port: 5001
    }
    nodes.2 {
      id: "member-1"
      host: "node2"
      port: 5001
    }
    nodes.3 {
      id: "member-2"
      host: "node3"
      port: 5001
    }
  }
}

managementGroup {
  type: raft
  name: system
  partitions: 1
  members: ["member-0", "member-1", "member-2"]
  storage {
    directory: "/volume/data/atomix/mgmt"
    level: disk
  }
}

partitionGroups.raft {
  type: raft
  partitions: 3
  members: ["member-0", "member-1", "member-2"]
  storage {
    directory: "/volume/data/atomix/pg"
    level: disk
  }
}

Thoughts?

@kuujo
Copy link
Member

kuujo commented Jan 8, 2019 via email

@lukasz-antoniak
Copy link

Thank you for quick reply. I have been looking at those few configuration lines for hours, so decided to follow with some debugging. It looks to me that DefaultPartitionService hangs on the line: https://github.com/atomix/atomix/blob/master/primitive/src/main/java/io/atomix/primitive/partition/impl/DefaultPartitionService.java#L167. I have enabled debug level logs and observe:

DEBUG PrimitiveService{2}{type=PrimaryElectorType{name=PRIMARY_ELECTOR}, name=atomix-primary-elector} - Opening session 23 (io.atomix.protocols.raft.service.RaftServiceContext)
DEBUG Session{23}{type=PrimaryElectorType{name=PRIMARY_ELECTOR}, name=atomix-primary-elector} - State changed: OPEN (io.atomix.protocols.raft.session.RaftSession)
DEBUG PrimitiveService{2}{type=PrimaryElectorType{name=PRIMARY_ELECTOR}, name=atomix-primary-elector} - Session expired in 56860 milliseconds: RaftSession{RaftServiceContext{server=system-partition-1, type=PrimaryElectorType{name=PRIMARY_ELECTOR}, name=atomix-primary-elector, id=2}, session=12, timestamp=2019-01-09 06:15:34,329} (io.atomix.protocols.raft.service.RaftServiceContext)

Do you have any idea where to look next? What could cause the issue? Docker Compose is nothing more than running several Docker containers in a convenient way. Port 5001 TCP/UDP is open and I assume that my configuration does not need multi-casting.

@lukasz-antoniak
Copy link

Finally fixed it. In my case, two Guava JARs were present on classpath (transitive dependency). Atomix did not work with Guava 20.0.

@johnou
Copy link
Member

johnou commented May 1, 2020

#1009

@johnou johnou closed this as completed May 1, 2020
@kuujo kuujo added archived Archived issues from the legacy Java implementation of Atomix legacy Issues from the legacy Java implementation of Atomix labels Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
archived Archived issues from the legacy Java implementation of Atomix legacy Issues from the legacy Java implementation of Atomix
Projects
None yet
Development

No branches or pull requests

4 participants