Skip to content

The cluster doesn't start correctly #21483

@rkudryashov

Description

@rkudryashov

My Environment

  • ArangoDB Version: 3.11.11
  • Deployment Mode: Cluster
  • Deployment Strategy: Docker Compose
  • Configuration: see compose.yaml
  • Infrastructure: own
  • Operating System: Windows 11
  • Total RAM in your machine: 64Gb
  • Disks in use: SSD
  • Used Package: Docker image

Steps to reproduce

  1. Clone the repro-project
  2. Execute docker compose up
  3. See logs

Problem:

It seems that with the specified ArangoDB version, the cluster doesn't start correctly, that is, coordinators print multiple errors:

coordinator1  | 2024-12-03T13:04:42Z [1] ERROR [f29ef] {communication} did not find endpoint of server 'PRMR-083fcb5f-4fbf-4952-b8ae-b6e8ac9a3e1b'
coordinator2  | 2024-12-03T13:04:43Z [1] ERROR [f29ef] {communication} did not find endpoint of server 'PRMR-083fcb5f-4fbf-4952-b8ae-b6e8ac9a3e1b'
coordinator2  | 2024-12-03T13:04:43Z [1] ERROR [f29ef] {communication} did not find endpoint of server 'PRMR-083fcb5f-4fbf-4952-b8ae-b6e8ac9a3e1b'
coordinator3  | 2024-12-03T13:04:43Z [1] ERROR [f29ef] {communication} did not find endpoint of server 'PRMR-083fcb5f-4fbf-4952-b8ae-b6e8ac9a3e1b'
coordinator3  | 2024-12-03T13:04:43Z [1] ERROR [f29ef] {communication} did not find endpoint of server 'PRMR-083fcb5f-4fbf-4952-b8ae-b6e8ac9a3e1b'
coordinator1  | 2024-12-03T13:04:43Z [1] ERROR [f29ef] {communication} did not find endpoint of server 'PRMR-083fcb5f-4fbf-4952-b8ae-b6e8ac9a3e1b'
coordinator1  | 2024-12-03T13:04:43Z [1] ERROR [f29ef] {communication} did not find endpoint of server 'PRMR-083fcb5f-4fbf-4952-b8ae-b6e8ac9a3e1b'

Also, one of DBs prints errors:

db3           | 2024-12-03T13:04:52Z [1] ERROR [77a84] {communication} did not understand destination ''
db3           | 2024-12-03T13:04:52Z [1] WARNING [1254a] {maintenance} SynchronizeShard failed to get a count on leader _system/s22: A cluster backend which was required for the operation could not be reached
db3           | 2024-12-03T13:04:52Z [1] ERROR [77a84] {communication} did not understand destination ''
db3           | 2024-12-03T13:04:52Z [1] WARNING [1254a] {maintenance} SynchronizeShard failed to get a count on leader _system/s24: A cluster backend which was required for the operation could not be reached
db3           | 2024-12-03T13:04:52Z [1] ERROR [77a84] {communication} did not understand destination ''
db3           | 2024-12-03T13:04:52Z [1] WARNING [1254a] {maintenance} SynchronizeShard failed to get a count on leader _system/s26: A cluster backend which was required for the operation could not be reached
db3           | 2024-12-03T13:04:52Z [1] ERROR [77a84] {communication} did not understand destination ''
db3           | 2024-12-03T13:04:52Z [1] WARNING [1254a] {maintenance} SynchronizeShard failed to get a count on leader _system/s25: A cluster backend which was required for the operation could not be reached

Expected result:

No errors in the logs.

Additional context:

There is no a single error in the logs, if I downgrade the version to 3.11.10. Also, there is no error, if I don't change the version and do remove definitions of coordinators from Docker Compose file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions