Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Otel collector losts connection to clickhouse and stop reporting trace data after running a certain time #5023

Open
sysudengle opened this issue May 17, 2024 · 3 comments

Comments

@sysudengle
Copy link

sysudengle commented May 17, 2024

Bug description

Please describe.
If this affects the front-end, screenshots would be of great help.
I deployed my signoz via helm chart to my k8s cluster. And I had multiple apps reporting tracing data to Signoz Otel Collector. The issue was that my Otel Collector could work for a certain time. But after a while, it just stopped reporting data to clickhouse. What I could see was that collector kept logging stuff like below:

2024-05-17T06:46:37.663Z warn batchprocessor@v0.88.0/batch_processor.go:258 Sender failed {"kind": "processor", "name": "batch", "pipeline": "traces", "error": "read: read tcp 172.20.9.171:38464->10.68.134.252:9000: use of closed network connection", "errorVerbose": "read:\n github.com/ClickHouse/ch-go/proto.(*Reader).ReadFull\n /home/runner/go/pkg/mod/github.com/!sig!noz/ch-go@v0.61.2-dd/proto/reader.go:62\n - read tcp 172.20.9.171:38464->10.68.134.252:9000: use of closed network connection"}

I don't have such issue when using older version of signoz like v0.26.

Version information

  • Signoz version: 0.45.0
  • Otel collector version: 0.88.22
    • Helm chart version*: 0.41.0
  • ** Clickhouse version**: bitnami/clickhouse:24.1.5-debian-12-r3
  • Your OS and version: k8s version v1.26.1
  • Your CPU Architecture(ARM/Intel): Intel

Additional context

My helm chart configuration is here:

global:
  imageRegistry: &GLOBAL_IMAGE_REGISTRY xx.com:9065
  imagePullSecrets: []
  storageClass: nfs-arch-75
  cloud: gcp/autogke

clickhouse:
  enabled: false

externalClickhouse:
  host: clickhouse.euler-clickhouse-pro
  cluster: cluster
  database: signoz_metrics
  traceDatabase: signoz_trace
  user: default
  password: arch@2023


queryService:
  name: "query-service"
  replicaCount: 1
  image:
    registry: xx.com:9065
    repository: signoz/query-service
    tag: 0.45.0
    pullPolicy: IfNotPresent
  resources:
    requests:
      cpu: 100m
      memory: 100Mi
    limits:
      cpu: '1'
      memory: 2Gi
  persistence:
    enabled: true
    storageClass: nfs-arch-75
    accessModes:
      - ReadWriteOnce

    size: 5Gi


frontend:
  name: "frontend"
  replicaCount: 1

  image:
    registry: xx.com:9065
    repository: signoz/frontend
    tag: 0.45.0
    pullPolicy: IfNotPresent
  resources:
    requests:
      cpu: 100m
      memory: 100Mi
    limits:
      cpu: '1'
      memory: 1Gi


alertmanager:
  name: "alertmanager"
  replicaCount: 1

  image:
    registry: xx.com:9065
    repository: signoz/alertmanager
    pullPolicy: IfNotPresent
    tag: 0.23.4
  initContainers:
    init:
      enabled: true
      image:
        registry: xx.com:9065
        repository: busybox
        tag: 1.35
        pullPolicy: IfNotPresent
      command:
        delay: 5
  resources:
    requests:
      cpu: 100m
      memory: 100Mi
    limits:
      cpu: '1'
      memory: 1Gi
  persistence:
    enabled: true
    storageClass: nfs-arch-75
    size: 1000Mi

otelCollector:
  name: "otel-collector"
  image:
    registry: nf-regstiry.com:9065
    repository: signoz/signoz-otel-collector
    tag: 0.88.22
    pullPolicy: Always

  initContainers:
    init:
      enabled: true
      image:
        registry: xx.com:9065
        repository: busybox
        tag: 1.35
        pullPolicy: IfNotPresent
  resources:
    requests:
      cpu: 100m
      memory: 200Mi
    limits:
      cpu: "1"
      memory: 2Gi


  config:
    exporters:
      clickhousetraces:
        datasource: tcp://${CLICKHOUSE_USER}:${CLICKHOUSE_PASSWORD}@${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/${CLICKHOUSE_TRACE_DATABASE}
        low_cardinal_exception_grouping: ${LOW_CARDINAL_EXCEPTION_GROUPING}
        sending_queue:
          enabled: false
        retry_on_failure:
          enabled: false
      clickhouselogsexporter:
        dsn: tcp://${CLICKHOUSE_USER}:${CLICKHOUSE_PASSWORD}@${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/${CLICKHOUSE_LOG_DATABASE}
        timeout: 10s
        sending_queue:
          queue_size: 100
        retry_on_failure:
          enabled: false
          initial_interval: 5s
          max_interval: 30s
          max_elapsed_time: 300s
    service:
      pipelines:
        traces:
          receivers: [otlp, jaeger]
          processors: [signozspanmetrics/cumulative, batch]
          exporters: [clickhousetraces]

otelCollectorMetrics:
  name: "otel-collector-metrics"
  image:
    registry: xx.com:9065
    repository: signoz/signoz-otel-collector
    tag: 0.88.22
  initContainers:
    init:
      enabled: true
      image:
        registry: xx.com:9065
        repository: busybox
        tag: 1.35
  resources:
    requests:
      cpu: 100m
      memory: 100Mi
    limits:
      cpu: "1"
      memory: 2Gi

schemaMigrator:
  enabled: false

k8s-infra:
  enabled: false

Thank you for your bug report – we love squashing them!

Copy link

welcome bot commented May 17, 2024

Thanks for opening this issue. A team member should give feedback soon. In the meantime, feel free to check out the contributing guidelines.

@nityanandagohain
Copy link
Member

What is the status of clickhouse ? is it running properly, also share the logs of clickhouse.

The error that you pasted is a warn log, try checking for error logs

@srikanthccv
Copy link
Member

Please try adding the timeout: 10s to the clickhousetraces exporter and check if that resolves the issue.

override-values.yaml

otelCollector:
    config:
        exporters:
            clickhousetraces:
                timeout: 10s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants