Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The remote system has quarantined this system. #2

Open
Francesko90 opened this issue Jan 27, 2020 · 0 comments
Open

The remote system has quarantined this system. #2

Francesko90 opened this issue Jan 27, 2020 · 0 comments

Comments

@Francesko90
Copy link

Francesko90 commented Jan 27, 2020

Hi

I try to use your example to resolve quarantine status in my cluster, some times the startup of member removed work fines, some time the remote system has quarantined this system.

gw_1                  | 2020-01-27T16:35:19,391 INFO  --- [.default-dispatcher-19] ications.sbr.RoleBasedSplitBrainResolver : Starting Split-Brain-Resolver
gw_1                  | 2020-01-27T16:35:19,482 INFO  --- [                  main] BOOTSPLASH                               : [JOINING  ] Waiting to join cluster with roles [gw]
gw_1                  | 2020-01-27T16:35:19,559 INFO  --- [.default-dispatcher-23] kka.cluster.Cluster(akka://my-cluster) : Cluster Node [akka.tcp://my-cluster@gw:20001] - Metrics collection has started successfully
gw_1                  | 2020-01-27T16:35:21,321 ERROR --- [.default-dispatcher-18] akka.remote.EndpointWriter               : AssociationError [akka.tcp://my-cluster@gw:20001] <- [akka.tcp://my-cluster@rmm:20001]: Error [Invalid address: akka.tcp://my-cluster@rmm:20001] [
gw_1                  | akka.remote.InvalidAssociation: Invalid address: akka.tcp://my-cluster@rmm:20001
gw_1                  | Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.
gw_1                  | ]
gw_1                  | 2020-01-27T16:35:21,387 WARN  --- [.default-dispatcher-18] akka.remote.Remoting                     : Tried to associate with unreachable remote address [akka.tcp://my-cluster@rmm:20001]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.]
gw_1                  | 2020-01-27T16:35:21,458 INFO  --- [.default-dispatcher-19] akka.actor.LocalActorRef                 : Message [akka.remote.EndpointWriter$AckIdleCheckTimer$] from Actor[akka://my-cluster/system/endpointManager/endpointWriter-akka.tcp%3A%2F%2Fcluster%40rmm%3A20001-9#-1969467600] to Actor[akka://my-cluster/system/endpointManager/endpointWriter-akka.tcp%3A%2F%2Fcluster%40rmm%3A20001-9#-1969467600] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
gw_1                  | 2020-01-27T16:35:22,039 ERROR --- [.default-dispatcher-19] akka.remote.EndpointWriter               : AssociationError [akka.tcp://my-cluster@gw:20001] <- [akka.tcp://my-cluster@rmm:20001]: Error [Invalid address: akka.tcp://my-cluster@rmm:20001] [
gw_1                  | akka.remote.InvalidAssociation: Invalid address: akka.tcp://my-cluster@rmm:20001
gw_1                  | Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.
gw_1                  | ]
gw_1                  | 2020-01-27T16:35:22,043 WARN  --- [.default-dispatcher-19] akka.remote.Remoting                     : Tried to associate with unreachable remote address [akka.tcp://my-cluster@rmm:20001]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.]
gw_1                  | 2020-01-27T16:35:22,051 INFO  --- [.default-dispatcher-22] akka.actor.LocalActorRef                 : Message [akka.remote.transport.ProtocolStateActor$HandleListenerRegistered] from Actor[akka://my-cluster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2Fcluster%40172.20.0.11%3A49438-11#2098609777] to Actor[akka://my-cluster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2Fcluster%40172.20.0.11%3A49438-11#2098609777] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
gw_1                  | 2020-01-27T16:35:22,201 ERROR --- [.default-dispatcher-18] akka.remote.EndpointWriter               : AssociationError [akka.tcp://my-cluster@gw:20001] <- [akka.tcp://my-cluster@rmm:20001]: Error [Invalid address: akka.tcp://my-cluster@rmm:20001] [
gw_1                  | akka.remote.InvalidAssociation: Invalid address: akka.tcp://my-cluster@rmm:20001
gw_1                  | Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.
gw_1                  | ]
gw_1                  | 2020-01-27T16:35:22,230 INFO  --- [.default-dispatcher-18] akka.actor.LocalActorRef                 : Message [akka.remote.transport.ProtocolStateActor$HandleListenerRegistered] from Actor[akka://my-cluster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2Fcluster%40172.20.0.11%3A49442-13#-1197697586] to Actor[akka://my-cluster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2Fcluster%40172.20.0.11%3A49442-13#-1197697586] was not delivered. [3] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
gw_1                  | 2020-01-27T16:35:22,264 WARN  --- [.default-dispatcher-18] akka.remote.Remoting                     : Tried to associate with unreachable remote address [akka.tcp://my-cluster@rmm:20001]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.]
gw_1                  | 2020-01-27T16:35:23,210 INFO  --- [.default-dispatcher-22] akka.actor.LocalActorRef                 : Message [akka.remote.transport.ProtocolStateActor$HandleListenerRegistered] from Actor[akka://my-cluster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2Fcluster%40172.20.0.11%3A49446-15#1977578206] to Actor[akka://my-cluster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2Fcluster%40172.20.0.11%3A49446-15#1977578206] was not delivered. [4] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
gw_1                  | 2020-01-27T16:35:23,211 ERROR --- [.default-dispatcher-22] akka.remote.EndpointWriter               : AssociationError [akka.tcp://my-cluster@gw:20001] <- [akka.tcp://my-cluster@rmm:20001]: Error [Invalid address: akka.tcp://my-cluster@rmm:20001] [
gw_1                  | akka.remote.InvalidAssociation: Invalid address: akka.tcp://my-cluster@rmm:20001
gw_1                  | Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.
gw_1                  | ]


CM Actor (seed node)

cm_1    | 2020-01-27T17:04:53,057 WARN  --- [.default-dispatcher-27] akka.cluster.ClusterCoreDaemon           : Cluster Node [akka.tcp://my-cluster@cm:20001] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://my-cluster@gw:20001, status = Up)]. Node roles [CM]
cm_1    | 2020-01-27T17:04:54,555 WARN  --- [.default-dispatcher-19] ka.remote.transport.netty.NettyTransport : Remote connection to [null] failed with java.net.ConnectException: Connection refused: gw/172.20.0.11:20001
cm_1    | 2020-01-27T17:04:54,563 WARN  --- [.default-dispatcher-19] akka.remote.ReliableDeliverySupervisor   : Association with remote system [akka.tcp://my-cluster@gw:20001] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://my-cluster@gw:20001]] Caused by: [Connection refused: gw/172.20.0.11:20001]
cm_1    | 2020-01-27T17:04:57,666 WARN  --- [r.default-dispatcher-3] akka.remote.Remoting                     : Association to [akka.tcp://my-cluster@gw:20001] having UID [1975507447] is irrecoverably failed. UID is now quarantined and all messages to this UID will be delivered to dead letters. Remote actorsystem must be restarted to recover from this situation.
cm_1    | 2020-01-27T17:04:57,705 WARN  --- [.default-dispatcher-23] ka.remote.transport.netty.NettyTransport : Remote connection to [null] failed with java.net.ConnectException: Connection refused: gw/172.20.0.11:20001
cm_1    | 2020-01-27T17:04:57,708 WARN  --- [.default-dispatcher-23] akka.remote.ReliableDeliverySupervisor   : Association with remote system [akka.tcp://my-cluster@gw:20001] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://my-cluster@gw:20001]] Caused by: [Connection refused: gw/172.20.0.11:20001]
cm_1    | 2020-01-27T17:05:01,314 INFO  --- [.default-dispatcher-23] kka.cluster.Cluster(akka://my-cluster) : Cluster Node [akka.tcp://my-cluster@cm:20001] - Node [akka.tcp://my-cluster@gw:20001] is JOINING, roles [gw]
cm_1    | 2020-01-27T17:05:07,655 WARN  --- [.default-dispatcher-20] akka.remote.Remoting                     : Association to [akka.tcp://my-cluster@gw:20001] having UID [-1963598268] is irrecoverably failed. UID is now quarantined and all messages to this UID will be delivered to dead letters. Remote actorsystem must be restarted to recover from this situation.

My configuration file

my {
  cluster{
     ip = "127.0.0.1"
     ip = ${?CLUSTER_IP} #this value will be set from the docker file entry point
     hostname = "cm"
     port = 20001
     name = "my-cluster"
    seed-nodes = ["akka.tcp://"${my.cluster.name}"@"${my.cluster.hostname}":"${my.cluster.port}"","akka.tcp://"${my.cluster.name}"@wm:20001"]
  }
}

akka.cluster.use-dispatcher = cluster-dispatcher

cluster-dispatcher {
  type = "Dispatcher"
  executor = "fork-join-executor"
  fork-join-executor {
    parallelism-min = 2
    parallelism-max = 4
  }
}
akka {
  loggers = ["akka.event.slf4j.Slf4jLogger"]
  loglevel = "DEBUG"
  logging-filter = "akka.event.slf4j.Slf4jLoggingFilter"

  actor {
    provider = "cluster"
    debug = {
       lifecycle = on
    }
  }

  remote {
    
    log-sent-messages = off
    log-received-messages = off
    log-frame-size-exceeding = 10000KiB
    enabled-transports = ["akka.remote.netty.tcp"]
    netty.tcp {
      send-buffer-size =  200MiB
      receive-buffer-size =  200MiB
      maximum-frame-size = 100MiB
      hostname = ${my.cluster.hostname}
      bind-host = ${my.cluster.ip}
      port =  ${my.cluster.port}
    }
  }

  cluster {
    seed-nodes = ${my.cluster.seed-nodes}
    downing-provider-class = "------------.RoleBasedSplitBrainResolverProvider"
        split-brain-resolver {
          stable-after = 10 seconds
          essential-roles = []
    }
  }

}

# Disable legacy metrics in akka-cluster.
akka.cluster.metrics.enabled=off

# Enable metrics extension in akka-cluster-metrics.
akka.extensions=["akka.cluster.metrics.ClusterMetricsExtension","akka.cluster.pubsub.DistributedPubSub"]

# Sigar native library extract location during tests.
# Note: use per-jvm-instance folder when running multiple jvm on one host.
akka.cluster.metrics.native-library-extract-folder=${user.dir}/target/native

# Settings for the DistributedPubSub extension
akka.cluster.pub-sub {
  # Actor name of the mediator actor, /system/distributedPubSubMediator
  name = distributedPubSubMediator

  # Start the mediator on members tagged with this role.
  # All members are used if undefined or empty.
  role = ""

  # The routing logic to use for 'Send'
  # Possible values: random, round-robin, broadcast
  routing-logic = random

  # How often the DistributedPubSubMediator should send out gossip information
  gossip-interval = 1s

  # Removed entries are pruned after this duration
  removed-time-to-live = 120s

  # Maximum number of elements to transfer in one message when synchronizing the registries.
  # Next chunk will be transferred in next round of gossip.
  max-delta-elements = 3000

  # The id of the dispatcher to use for DistributedPubSubMediator actors.
  # If not specified default dispatcher is used.
  # If specified you need to define the settings of the actual dispatcher.
  use-dispatcher = ""

}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant