Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bitnami/clickhouse] Connection to Clickhouse-Keeper broken/unresponsive #15935

Open
marcleibold opened this issue Apr 3, 2023 · 25 comments
Open
Assignees
Labels
clickhouse on-hold Issues or Pull Requests with this label will never be considered stale tech-issues The user has a technical issue about an application
Projects

Comments

@marcleibold
Copy link
Contributor

marcleibold commented Apr 3, 2023

Name and Version

bitnami/clickhouse 3.1.5

What architecture are you using?

amd64

What steps will reproduce the bug?

  1. In a GKE (Google Kubernetes Engine) Cluster
  2. With the attached values.yaml
  3. Apply with Terraform (shouldn't be any different than standard helm)

Result: Pods are running without any suspicious logs, but when you either exec into them or execute some command from the web UI, which is executed "ON CLUSTER", the progress indicator never goes past 49%. This was tried with a CREATE TABLE statement trying to create a ReplicatedMergeTree on the cluster.
The Clickhouse cluster consists of 2 shards and 2 replicas.

Are you using any custom parameters or values?

Our values.yaml:

fullnameOverride: clickhouse-replicated

# ClickHouse Parameters

image:
  registry: docker.io
  repository: bitnami/clickhouse
  tag: "23-debian-11"
  pullPolicy: IfNotPresent

shards: ${CLICKHOUSE_SHARDS_COUNT}
replicaCount: ${CLICKHOUSE_REPLICAS_COUNT}

containerPorts:
  http: 8123
  https: 8443
  tcp: 9000
  tcpSecure: 9440
  keeper: 2181
  keeperSecure: 3181
  keeperInter: 9444
  mysql: 9004
  postgresql: 9005
  interserver: 9009
  metrics: 8001

auth:
  username: clickhouse_operator
  password: "${CLICKHOUSE_PASSWORD}"

logLevel: trace

keeper:
  enabled: true

zookeeper:
  enabled: false

defaultConfigurationOverrides: |
  <clickhouse>
    <!-- Macros -->
    <macros>
      <shard from_env="CLICKHOUSE_SHARD_ID"></shard>
      <replica from_env="CLICKHOUSE_REPLICA_ID"></replica>
      <layer>{{ include "common.names.fullname" . }}</layer>
    </macros>
    <!-- Log Level -->
    <logger>
      <level>{{ .Values.logLevel }}</level>
    </logger>
    {{- if or (ne (int .Values.shards) 1) (ne (int .Values.replicaCount) 1)}}
    <!-- Cluster configuration - Any update of the shards and replicas requires helm upgrade -->
    <remote_servers>
      <default>
        {{- $shards := $.Values.shards | int }}
        {{- range $shard, $e := until $shards }}
        <shard>
            <internal_replication>true</internal_replication>
            {{- $replicas := $.Values.replicaCount | int }}
            {{- range $i, $_e := until $replicas }}
            <replica>
                <host>{{ printf "%s-shard%d-%d.%s.%s.svc.%s" (include "common.names.fullname" $ ) $shard $i (include "clickhouse.headlessServiceName" $) (include "common.names.namespace" $) $.Values.clusterDomain }}</host>
                <port>{{ $.Values.service.ports.tcp }}</port>
            </replica>
            {{- end }}
        </shard>
        {{- end }}
      </default>
    </remote_servers>
    {{- end }}
    {{- if .Values.keeper.enabled }}
    <!-- keeper configuration -->
    <keeper_server>
      {{/*ClickHouse keeper configuration using the helm chart */}}
      <tcp_port>{{ $.Values.containerPorts.keeper }}</tcp_port>
      {{- if .Values.tls.enabled }}
      <tcp_port_secure>{{ $.Values.containerPorts.keeperSecure }}</tcp_port_secure>
      {{- end }}
      <server_id from_env="KEEPER_SERVER_ID"></server_id>
      <log_storage_path>/bitnami/clickhouse/keeper/coordination/log</log_storage_path>
      <snapshot_storage_path>/bitnami/clickhouse/keeper/coordination/snapshots</snapshot_storage_path>
      <coordination_settings>
          <operation_timeout_ms>10000</operation_timeout_ms>
          <session_timeout_ms>30000</session_timeout_ms>
          <raft_logs_level>trace</raft_logs_level>
      </coordination_settings>
      <raft_configuration>
      {{- $nodes := .Values.replicaCount | int }}
      {{- range $node, $e := until $nodes }}
      <server>
        <id>{{ $node | int }}</id>
        <hostname from_env="{{ printf "KEEPER_NODE_%d" $node }}"></hostname>
        <port>{{ $.Values.service.ports.keeperInter }}</port>
      </server>
      {{- end }}
      </raft_configuration>
    </keeper_server>
    {{- end }}
    {{- if or .Values.keeper.enabled .Values.zookeeper.enabled .Values.externalZookeeper.servers }}
    <!-- Zookeeper configuration -->
    <zookeeper>
      {{- if or .Values.keeper.enabled }}
      {{- $nodes := .Values.replicaCount | int }}
      {{- range $node, $e := until $nodes }}
      <node>
        <host from_env="{{ printf "KEEPER_NODE_%d" $node }}"></host>
        <port>{{ $.Values.service.ports.keeper }}</port>
      </node>
      {{- end }}
      {{- else if .Values.zookeeper.enabled }}
      {{/* Zookeeper configuration using the helm chart */}}
      {{- $nodes := .Values.zookeeper.replicaCount | int }}
      {{- range $node, $e := until $nodes }}
      <node>
        <host from_env="{{ printf "KEEPER_NODE_%d" $node }}"></host>
        <port>{{ $.Values.zookeeper.service.ports.client }}</port>
      </node>
      {{- end }}
      {{- else if .Values.externalZookeeper.servers }}
      {{/* Zookeeper configuration using an external instance */}}
      {{- range $node :=.Values.externalZookeeper.servers }}
      <node>
        <host>{{ $node }}</host>
        <port>{{ $.Values.externalZookeeper.port }}</port>
      </node>
      {{- end }}
      {{- end }}
    </zookeeper>
    {{- end }}
    <distributed_ddl>
        <path>/clickhouse/task_queue/ddl</path>
    </distributed_ddl>
    {{- if .Values.tls.enabled }}
    <!-- TLS configuration -->
    <tcp_port_secure from_env="CLICKHOUSE_TCP_SECURE_PORT"></tcp_port_secure>
    <https_port from_env="CLICKHOUSE_HTTPS_PORT"></https_port>
    <openSSL>
        <server>
            {{- $certFileName := default "tls.crt" .Values.tls.certFilename }}
            {{- $keyFileName := default "tls.key" .Values.tls.certKeyFilename }}
            <certificateFile>/bitnami/clickhouse/certs/{{$certFileName}}</certificateFile>
            <privateKeyFile>/bitnami/clickhouse/certs/{{$keyFileName}}</privateKeyFile>
            <verificationMode>none</verificationMode>
            <cacheSessions>true</cacheSessions>
            <disableProtocols>sslv2,sslv3</disableProtocols>
            <preferServerCiphers>true</preferServerCiphers>
            {{- if or .Values.tls.autoGenerated .Values.tls.certCAFilename }}
            {{- $caFileName := default "ca.crt" .Values.tls.certCAFilename }}
            <caConfig>/bitnami/clickhouse/certs/{{$caFileName}}</caConfig>
            {{- else }}
            <loadDefaultCAFile>true</loadDefaultCAFile>
            {{- end }}
        </server>
        <client>
            <loadDefaultCAFile>true</loadDefaultCAFile>
            <cacheSessions>true</cacheSessions>
            <disableProtocols>sslv2,sslv3</disableProtocols>
            <preferServerCiphers>true</preferServerCiphers>
            <verificationMode>none</verificationMode>
            <invalidCertificateHandler>
                <name>AcceptCertificateHandler</name>
            </invalidCertificateHandler>
        </client>
    </openSSL>
    {{- end }}
    {{- if .Values.metrics.enabled }}
     <!-- Prometheus metrics -->
     <prometheus>
        <endpoint>/metrics</endpoint>
        <port from_env="CLICKHOUSE_METRICS_PORT"></port>
        <metrics>true</metrics>
        <events>true</events>
        <asynchronous_metrics>true</asynchronous_metrics>
    </prometheus>
    {{- end }}
    <profiles>
      <default>
        <distributed_ddl_task_timeout>900</distributed_ddl_task_timeout>
      </default>
    </profiles>
  </clickhouse>

extraVolumes:
  - name: clickhouse-client-config
    configMap:
      name: clickhouse-client-config

extraVolumeMounts:
  - name: clickhouse-client-config
    mountPath: /etc/clickhouse-client/

initdbScripts:
  create_bigtable.sh: |
    <init script (not working)>

# TLS configuration

tls:
  enabled: true
  autoGenerated: false
  certificatesSecret: clickhouse-tls-secret
  certFilename: tls.crt
  certKeyFilename: tls.key
  certCAFilename: ca.crt

# Traffic Exposure Parameters

## ClickHouse service parameters

## http: ClickHouse service HTTP port
## https: ClickHouse service HTTPS port
## tcp: ClickHouse service TCP port
## tcpSecure: ClickHouse service TCP (secure) port
## keeper: ClickHouse keeper TCP container port
## keeperSecure: ClickHouse keeper TCP (secure) container port
## keeperInter: ClickHouse keeper interserver TCP container port
## mysql: ClickHouse service MySQL port
## postgresql: ClickHouse service PostgreSQL port
## interserver: ClickHouse service Interserver port
## metrics: ClickHouse service metrics port

service:
  type: LoadBalancer  
  ports:
    https: 443
  loadBalancerIP: "${LOAD_BALANCER_IP}"

## Persistence Parameters

persistence:
  enabled: true
  accessModes:
    - ReadWriteOnce
  size: ${CLICKHOUSE_DATA_VOLUME_SIZE}

## Prometheus metrics

metrics:
  enabled: true
  podAnnotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "{{ .Values.containerPorts.metrics }}"

serviceAccount:
  create: true

What is the expected behavior?

The expected behaviour is normal creation of the tables within the distributed_ddl_task_timeout

What do you see instead?

The table creation (tested with the clickhouse-client command after exec-ing into the pod) is stuck at 49% progress.

CREATE TABLE logs_replicated ON CLUSTER default
(
    `gateway_flow_id` String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{table}', '{replica}')
PRIMARY KEY gateway_flow_id
ORDER BY gateway_flow_id
SETTINGS index_granularity = 8192

Query id: 2abb72cc-3a48-4416-8e46-f9edbf219463

┌─host────────────────────────────────────────────────────────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   3 │                1 │
└─────────────────────────────────────────────────────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
┌─host────────────────────────────────────────────────────────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   2 │                0 │
└─────────────────────────────────────────────────────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
← Progress: 2.00 rows, 262.00 B (0.07 rows/s., 9.19 B/s.)  49%

When aborted, the table seems to have been created.

SHOW TABLES

Query id: 8be31ab0-5c0c-40a2-a733-8c5bcb57f35f

┌─name────────────┐
│ logs_replicated │
└─────────────────┘

1 row in set. Elapsed: 0.002 sec.

When trying to drop the tables, the same problem occurs:

DROP TABLE logs_replicated ON CLUSTER default

Query id: b0112235-fdca-4334-b116-b68a451e8dba

┌─host────────────────────────────────────────────────────────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   3 │                0 │
│ clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   2 │                0 │
└─────────────────────────────────────────────────────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
↗ Progress: 2.00 rows, 262.00 B (0.23 rows/s., 30.60 B/s.)  49%

The tables seem to have been created, but the command doesn't finish, therefore I believe Clickhouse-Keeper doesn't answer the command, but executes it.

When trying to create the table again, because I assumed Keeper executed the last command, the command tells me, that the replica already exists, not the table itself. So the problem seems to lay somewhere with the replicas

CREATE TABLE logs_replicated ON CLUSTER default
(
    `gateway_flow_id` String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{table}', '{replica}')
PRIMARY KEY gateway_flow_id
ORDER BY gateway_flow_id
SETTINGS index_granularity = 8192

Query id: c83fb364-73fa-4dfb-b419-8581628c97fb

┌─host────────────────────────────────────────────────────────────────────────────────────┬─port─┬─status─┬─error───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │    253 │ Code: 253. DB::Exception: Replica /clickhouse/tables/shard1/default/logs_replicated/replicas/clickhouse-replicated-shard1-0 already exists. (REPLICA_ALREADY_EXISTS) (version 23.3.1.2823 (official build)) │                   3 │                0 │
│ clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │    253 │ Code: 253. DB::Exception: Replica /clickhouse/tables/shard1/default/logs_replicated/replicas/clickhouse-replicated-shard1-1 already exists. (REPLICA_ALREADY_EXISTS) (version 23.3.1.2823 (official build)) │                   2 │                0 │
└─────────────────────────────────────────────────────────────────────────────────────────┴──────┴────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────┴──────────────────┘
↓ Progress: 2.00 rows, 668.00 B (0.09 rows/s., 31.49 B/s.)  49%

Additional information

No response

@marcleibold marcleibold added the tech-issues The user has a technical issue about an application label Apr 3, 2023
@bitnami-bot bitnami-bot added this to Triage in Support Apr 3, 2023
@github-actions github-actions bot added the triage Triage is needed label Apr 3, 2023
@javsalgar
Copy link
Contributor

Hi,

Does the issue happen when using the zookeeper included in the chart? Just to pin-point where the issue could be

@github-actions github-actions bot moved this from Triage to Pending in Support Apr 4, 2023
@marcleibold
Copy link
Contributor Author

Hi,

I have it configured like this now

keeper:
   enabled: false
zookeeper:
   enabled: true
   replicaCount: 3

And now the command just completes normally

CREATE TABLE logs_replicated ON CLUSTER default
(
    `gateway_flow_id` String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{table}', '{replica}')
PRIMARY KEY gateway_flow_id
ORDER BY gateway_flow_id
SETTINGS index_granularity = 8192

Query id: 0c7dd092-a396-4fe6-9ca9-0001a867c370

┌─host────────────────────────────────────────────────────────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-replicated-shard0-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   3 │                0 │
│ clickhouse-replicated-shard0-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   2 │                0 │
│ clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   1 │                0 │
│ clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   0 │                0 │
└─────────────────────────────────────────────────────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

4 rows in set. Elapsed: 0.321 sec.

@bitnami-bot bitnami-bot moved this from Pending to Triage in Support Apr 4, 2023
@javsalgar javsalgar moved this from Triage to In progress in Support Apr 5, 2023
@github-actions github-actions bot added in-progress and removed triage Triage is needed labels Apr 5, 2023
@bitnami-bot bitnami-bot assigned fmulero and unassigned javsalgar Apr 5, 2023
@fmulero
Copy link
Collaborator

fmulero commented Apr 10, 2023

Thanks @marcleibold for letting us know. Have you faced the issue with the default defaultConfigurationOverrides ? Have you changed that value when moved on to zookeeper?

@github-actions github-actions bot moved this from In progress to Pending in Support Apr 10, 2023
@marcleibold
Copy link
Contributor Author

Hi @fmulero ,
I did not change anything when I tried it out with zookeeper, so the defaultConfigurationOverrides were still the same as described above.
And when I now try to remove the defaultConfigurationOverrides from the values.yaml completely and try the CREATE TABLE command again, it is again stuck on 49%

CREATE TABLE logs_replicated ON CLUSTER default
(
    `gateway_flow_id` String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{table}', '{replica}')
PRIMARY KEY gateway_flow_id
ORDER BY gateway_flow_id
SETTINGS index_granularity = 8192

Query id: 3cbb9139-279a-4853-9038-a2208a08444a

┌─host────────────────────────────────────────────────────────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   3 │                0 │
│ clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   2 │                0 │
└─────────────────────────────────────────────────────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
↖ Progress: 2.00 rows, 262.00 B (0.21 rows/s., 27.98 B/s.)  49%

@github-actions github-actions bot moved this from Pending to In progress in Support Apr 11, 2023
@fmulero
Copy link
Collaborator

fmulero commented Apr 12, 2023

Hi @marcleibold

I've reproduced the same issue in a simpler scenario, just enabling keeper:

helm install myrelease bitnami/clickhouse --set keeper.enabled=true --set zookeeper.enabled=false

I've checked keeper status and it seems there is no active clients (10.42.1.26 is the ip of my pod).

$ echo stat | nc localhost 2181
ClickHouse Keeper version: v23.3.1.2823-testing-46e85357ce2da2a99f56ee83a079e892d7ec3726
Clients:
 10.42.1.26:45740(recved=0,sent=0)
 10.42.1.26:49358(recved=5005,sent=5006)

Latency min/avg/max: 0/0/6
Received: 5005
Sent: 5006
Connections: 1
Outstanding: 0
Zxid: 961
Mode: follower
Node count: 80

It seems something is misconfigured about keeper. I need a further investigation, please bear with us.

@github-actions github-actions bot moved this from In progress to Pending in Support Apr 12, 2023
@fmulero fmulero moved this from Pending to In progress in Support Apr 12, 2023
@roberthorn
Copy link

I think the issue may be here, I don't think KEEPER_SERVER_ID is actually set anywhere

@marcleibold
Copy link
Contributor Author

It seems like that is the issue. I also do not see the KEEPER_SERVER_ID when I run set in one of the containers

I have no name!@clickhouse-replicated-shard1-0:/$ set
APP_VERSION=23.3.1
BASH=/bin/bash
BASHOPTS=checkwinsize:cmdhist:complete_fullquote:expand_aliases:extquote:force_fignore:globasciiranges:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
BASH_ALIASES=()
BASH_ARGC=([0]="0")
BASH_ARGV=()
BASH_CMDS=()
BASH_LINENO=()
BASH_SOURCE=()
BASH_VERSINFO=([0]="5" [1]="1" [2]="4" [3]="1" [4]="release" [5]="x86_64-pc-linux-gnu")
BASH_VERSION='5.1.4(1)-release'
BITNAMI_APP_NAME=clickhouse
BITNAMI_DEBUG=false
CLICKHOUSE_ADMIN_PASSWORD=<redacted>
CLICKHOUSE_ADMIN_USER=<redacted>
CLICKHOUSE_HTTPS_PORT=8443
CLICKHOUSE_HTTP_PORT=8123
CLICKHOUSE_INTERSERVER_HTTP_PORT=9009
CLICKHOUSE_KEEPER_INTER_PORT=9444
CLICKHOUSE_KEEPER_PORT=2181
CLICKHOUSE_KEEPER_SECURE_PORT=3181
CLICKHOUSE_METRICS_PORT=8001
CLICKHOUSE_MYSQL_PORT=9004
CLICKHOUSE_POSTGRESQL_PORT=9005
CLICKHOUSE_REPLICATED_PORT=tcp://10.0.46.111:8123
CLICKHOUSE_REPLICATED_PORT_2181_TCP=tcp://10.0.46.111:2181
CLICKHOUSE_REPLICATED_PORT_2181_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_2181_TCP_PORT=2181
CLICKHOUSE_REPLICATED_PORT_2181_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_3181_TCP=tcp://10.0.46.111:3181
CLICKHOUSE_REPLICATED_PORT_3181_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_3181_TCP_PORT=3181
CLICKHOUSE_REPLICATED_PORT_3181_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_443_TCP=tcp://10.0.46.111:443
CLICKHOUSE_REPLICATED_PORT_443_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_443_TCP_PORT=443
CLICKHOUSE_REPLICATED_PORT_443_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_8001_TCP=tcp://10.0.46.111:8001
CLICKHOUSE_REPLICATED_PORT_8001_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_8001_TCP_PORT=8001
CLICKHOUSE_REPLICATED_PORT_8001_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_8123_TCP=tcp://10.0.46.111:8123
CLICKHOUSE_REPLICATED_PORT_8123_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_8123_TCP_PORT=8123
CLICKHOUSE_REPLICATED_PORT_8123_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9000_TCP=tcp://10.0.46.111:9000
CLICKHOUSE_REPLICATED_PORT_9000_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9000_TCP_PORT=9000
CLICKHOUSE_REPLICATED_PORT_9000_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9004_TCP=tcp://10.0.46.111:9004
CLICKHOUSE_REPLICATED_PORT_9004_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9004_TCP_PORT=9004
CLICKHOUSE_REPLICATED_PORT_9004_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9005_TCP=tcp://10.0.46.111:9005
CLICKHOUSE_REPLICATED_PORT_9005_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9005_TCP_PORT=9005
CLICKHOUSE_REPLICATED_PORT_9005_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9009_TCP=tcp://10.0.46.111:9009
CLICKHOUSE_REPLICATED_PORT_9009_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9009_TCP_PORT=9009
CLICKHOUSE_REPLICATED_PORT_9009_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9440_TCP=tcp://10.0.46.111:9440
CLICKHOUSE_REPLICATED_PORT_9440_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9440_TCP_PORT=9440
CLICKHOUSE_REPLICATED_PORT_9440_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9444_TCP=tcp://10.0.46.111:9444
CLICKHOUSE_REPLICATED_PORT_9444_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9444_TCP_PORT=9444
CLICKHOUSE_REPLICATED_PORT_9444_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_SERVICE_HOST=10.0.46.111
CLICKHOUSE_REPLICATED_SERVICE_PORT=8123
CLICKHOUSE_REPLICATED_SERVICE_PORT_HTTP=8123
CLICKHOUSE_REPLICATED_SERVICE_PORT_HTTPS=443
CLICKHOUSE_REPLICATED_SERVICE_PORT_HTTP_INTERSRV=9009
CLICKHOUSE_REPLICATED_SERVICE_PORT_HTTP_METRICS=8001
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP=9000
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_KEEPER=2181
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_KEEPERINTER=9444
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_KEEPERTLS=3181
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_MYSQL=9004
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_POSTGRESQL=9005
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_SECURE=9440
CLICKHOUSE_REPLICA_ID=clickhouse-replicated-shard1-0
CLICKHOUSE_SHARD_ID=shard1
CLICKHOUSE_TCP_PORT=9000
CLICKHOUSE_TCP_SECURE_PORT=9440
CLICKHOUSE_TLS_CA_FILE=/opt/bitnami/clickhouse/certs/ca.crt
CLICKHOUSE_TLS_CERT_FILE=/opt/bitnami/clickhouse/certs/tls.crt
CLICKHOUSE_TLS_KEY_FILE=/opt/bitnami/clickhouse/certs/tls.key
COLUMNS=155
DIRSTACK=()
EUID=1001
GROUPS=()
HISTFILE=//.bash_history
HISTFILESIZE=500
HISTSIZE=500
HOME=/
HOSTNAME=clickhouse-replicated-shard1-0
HOSTTYPE=x86_64
IFS=$' \t\n'
KEEPER_NODE_0=clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local
KEEPER_NODE_1=clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local
KUBERNETES_PORT=tcp://10.0.32.1:443
KUBERNETES_PORT_443_TCP=tcp://10.0.32.1:443
KUBERNETES_PORT_443_TCP_ADDR=10.0.32.1
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_SERVICE_HOST=10.0.32.1
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_PORT_HTTPS=443
LINES=17
MACHTYPE=x86_64-pc-linux-gnu
MAILCHECK=60
OPTERR=1
OPTIND=1
OSTYPE=linux-gnu
OS_ARCH=amd64
OS_FLAVOUR=debian-11
OS_NAME=linux
PATH=/opt/bitnami/common/bin:/opt/bitnami/clickhouse/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PIPESTATUS=([0]="1")
PPID=0
PS1='${debian_chroot:+($debian_chroot)}\u@\h:\w\$ '
PS2='> '
PS4='+ '
PWD=/
SHELL=/bin/sh
SHELLOPTS=braceexpand:emacs:hashall:histexpand:history:interactive-comments:monitor
SHLVL=1
TERM=xterm
UID=1001
_=']'
clickhouseCTL_API=3

@marcleibold
Copy link
Contributor Author

Although the variable should be set in this script.

The line also works completely fine as I just tested inside of my container:

I have no name!@clickhouse-replicated-shard1-0:/$ echo $KEEPER_SERVER_ID

I have no name!@clickhouse-replicated-shard1-0:/$ if [[ -f "/bitnami/clickhouse/keeper/data/myid" ]]; then
    export KEEPER_SERVER_ID="$(cat /bitnami/clickhouse/keeper/data/myid)"
else
    HOSTNAME="$(hostname -s)"
    if [[ $HOSTNAME =~ (.*)-([0-9]+)$ ]]; then
        export KEEPER_SERVER_ID=${BASH_REMATCH[2]}
    else
        echo "Failed to get index from hostname $HOST"
        exit 1
fi  fi
I have no name!@clickhouse-replicated-shard1-0:/$ echo $KEEPER_SERVER_ID
0
I have no name!@clickhouse-replicated-shard1-0:/$

The script is also present in the configmap and all, but it is apparently just not executed for some reason.

@marcleibold
Copy link
Contributor Author

Another thing I checked:
since the last line in the script is the following:
exec /opt/bitnami/scripts/clickhouse/entrypoint.sh /opt/bitnami/scripts/clickhouse/run.sh -- --listen_host=0.0.0.0

There should be a process called setup.sh running, after the script is run. (Which is also the case when it is run manually)
This process is not there when I run top, therefore the issue is almost definitely where the script is supposed to get executed

@fmulero
Copy link
Collaborator

fmulero commented May 2, 2023

Thanks a lot for all the clues! I did some changes and tests but it is taking me more than expected and I have also some issues with shards. I've just opened an internal task to address it. We will keep you posted on any news.

@fmulero fmulero moved this from In progress to On hold in Support May 2, 2023
@github-actions github-actions bot moved this from On hold to Pending in Support May 2, 2023
@github-actions github-actions bot added on-hold Issues or Pull Requests with this label will never be considered stale and removed in-progress labels May 2, 2023
@github-actions github-actions bot added on-hold Issues or Pull Requests with this label will never be considered stale and removed triage Triage is needed labels Jun 21, 2023
@Jojoooo1
Copy link

Where you able to fix it ?

@bitnami-bot bitnami-bot moved this from On hold to Triage in Support Aug 22, 2023
@github-actions github-actions bot added triage Triage is needed and removed on-hold Issues or Pull Requests with this label will never be considered stale labels Aug 22, 2023
@fmulero
Copy link
Collaborator

fmulero commented Sep 1, 2023

Sorry, there is no updates on this 😞

@github-actions github-actions bot moved this from Triage to Pending in Support Sep 1, 2023
@fmulero fmulero moved this from Pending to On hold in Support Sep 1, 2023
@github-actions github-actions bot added on-hold Issues or Pull Requests with this label will never be considered stale and removed triage Triage is needed labels Sep 1, 2023
@exfly
Copy link

exfly commented Sep 20, 2023

Any workaround here?

@bitnami-bot bitnami-bot moved this from On hold to Triage in Support Sep 20, 2023
@github-actions github-actions bot added triage Triage is needed and removed on-hold Issues or Pull Requests with this label will never be considered stale labels Sep 20, 2023
@marcleibold
Copy link
Contributor Author

Any workaround here?

Not as far as I know, just use the built-in Zookeeper

@carrodher carrodher moved this from Triage to Pending in Support Sep 20, 2023
@fmulero fmulero moved this from Pending to On hold in Support Sep 27, 2023
@github-actions github-actions bot added on-hold Issues or Pull Requests with this label will never be considered stale and removed triage Triage is needed labels Sep 27, 2023
@brendavarguez
Copy link

Is there any update?

@fmulero
Copy link
Collaborator

fmulero commented Dec 26, 2023

Sorry, there is no updates on this. I'll try to bump the priority but we are a small team we can't give you any ETA, sorry.

@mike-fischer1
Copy link

Hi this issue is affecting us since we can't switch over to clickhouse-keeper completely and zookeeper isn't officially support by clickhouse anymore.

@nikitamikhaylov
Copy link

zookeeper isn't officially support by clickhouse anymore.

This is not true. We still support ZooKeeper for the sake of backward compatibility and our users. However, ClickHouse Keeper proved to be much better and we've implemented several extensions which allow us to get better performance in certain scenarios.

@mike-fischer1
Copy link

mike-fischer1 commented Feb 29, 2024

zookeeper isn't officially support by clickhouse anymore.

This is not true. We still support ZooKeeper for the sake of backward compatibility and our users. However, ClickHouse Keeper proved to be much better and we've implemented several extensions which allow us to get better performance in certain scenarios.

We have a support contract with clickhouse and they really want us to use clickhouse-keeper.

@mike-fischer1
Copy link

Any updates?

@simonfelding
Copy link

would like this to be fixed.

@fmulero
Copy link
Collaborator

fmulero commented Apr 29, 2024

I've just bumped the priority

@mleklund
Copy link

I have been messing with the chart and I am pretty sure the issue is that a set of keeper replicas is created for every shard. Looking over the documentation for shards and for replicas, I believe that all nodes should share a single set of keepers. Now whether the right thing to do is to create a separate statefulset of keepers (which would probably be easiest) or to only point servers to the keepers on shard 0, I will leave up to the maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clickhouse on-hold Issues or Pull Requests with this label will never be considered stale tech-issues The user has a technical issue about an application
Projects
No open projects
Support
On hold
Development

No branches or pull requests