High CPU Usage with RabbitMQ Connector (polling interface implemented incorrectly?) #414

Thylossus · 2020-09-02T13:27:11Z

Hi,

I'm using the Camel Rabbit MQ Connector as a source connector for my Kafka cluster. Kafka connect is running in a Docker container built with the following Dockerfile

FROM openjdk:8-jre-alpine

ARG VERSION_APACHE_KAFKA=2.2.1
ARG VERSION_SCALA=2.12
ARG VERSION_APACHE_CAMEL=0.4.0

RUN mkdir -p /kafka/connect/plugins && \
    mkdir -p /kafka/connect/app && \
    mkdir -p /kafka/connect/config && \
    wget https://archive.apache.org/dist/kafka/${VERSION_APACHE_KAFKA}/kafka_${VERSION_SCALA}-${VERSION_APACHE_KAFKA}.tgz -q -O kafka.tgz && \
    tar -xzf kafka.tgz -C /kafka/connect/app --strip 1 && \
    rm -f kafka.tgz && \
    wget https://repo1.maven.org/maven2/org/apache/camel/kafkaconnector/camel-rabbitmq-kafka-connector/${VERSION_APACHE_CAMEL}/camel-rabbitmq-kafka-connector-${VERSION_APACHE_CAMEL}-package.zip -q -O rabbitmq-connector.zip && \
    unzip -qq rabbitmq-connector.zip -d /kafka/connect/plugins && \
    rm -f rabbitmq-connector.zip && \
    apk add -q --no-cache bash

ENTRYPOINT [ "/kafka/connect/entrypoint.sh" ]

When running this configuration and adding a RabbitMQ connector with the following configuration

{
  "name": "{{ config.connector_name }}",
  "config": {
    "connector.class": "org.apache.camel.kafkaconnector.rabbitmq.CamelRabbitmqSourceConnector",
    "topics": "{{ config.topic }}",
    "value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
    "tasks.max": 1,
    "camel.component.rabbitmq.hostname": "{{ kafka_connect_rabbitmq_host }}",
    "camel.component.rabbitmq.portNumber": {{ kafka_connect_rabbitmq_port }},
    "camel.component.rabbitmq.username": "{{ kafka_connect_rabbitmq_user }}",
    "camel.component.rabbitmq.password": "{{ kafka_connect_rabbitmq_password }}",
    "camel.source.path.exchangeName": "{{ config.exchange }}",
    "camel.source.endpoint.exchangeType": "topic",
    "camel.source.endpoint.autoDelete": false,
    "camel.source.endpoint.queue": "{{ config.queue }}",
    "camel.source.endpoint.routingKey": "{{ config.routing_key }}"
  }
}

the connector task created by this configuration consumes all resources of its assigned CPU core.

I've looked at the thread dumps and activated TRACE logging and came up with the following conclusion:
The CamelSourceTask does not seem to properly implement the poll method of the abstract SourceTask class.

The JavaDoc for the poll method is as follows (source):

Poll this source task for new records. If no data is currently available, this method
should block but return control to the caller regularly (by returning {@code null}) in
order for the task to transition to the {@code PAUSED} state if requested to do so.

The task will be {@link #stop() stopped} on a separate thread, and when that happens
this method is expected to unblock, quickly finish up any remaining processing, and
return.

Looking at the implementation I cannot see how the contract w.r.t. "should block, but return control to the caller regularly" is fulfilled. Since consumer.receiveNoWait(); is called, exchanges are not read in a blocking manner and the subsequent code is not blocking either.

This causes the execute method of WorkerSourceTask to continuously call the poll method which immediately returns if no exchange is available (due to the break statement in the while loop) effectively creating an infinite loop which consumes all CPU resources.

I would be grateful if someone could take a look at my analysis. Maybe there is some configuration option that solves my issue, but I was not able to find it. Looking forward to hearing from anyone.

Best regards,

Tobias

P.S.: when TRACE logging is activated, the logs immediately show that there is some kind of resource-intensive loop:

2020-09-02T10:57:16.381+0000 [task-thread-deribw-events-0] TRACE WorkerSourceTask - WorkerSourceTask{id=deribw-events-0} Nothing to send to Kafka. Polling source for additional records
2020-09-02T10:57:16.381+0000 [task-thread-deribw-events-0] TRACE WorkerSourceTask - WorkerSourceTask{id=deribw-events-0} About to send 0 records to Kafka
2020-09-02T10:57:16.382+0000 [task-thread-deribw-events-0] TRACE WorkerSourceTask - WorkerSourceTask{id=deribw-events-0} Nothing to send to Kafka. Polling source for additional records
2020-09-02T10:57:16.382+0000 [task-thread-deribw-events-0] TRACE WorkerSourceTask - WorkerSourceTask{id=deribw-events-0} About to send 0 records to Kafka
2020-09-02T10:57:16.382+0000 [task-thread-deribw-events-0] TRACE WorkerSourceTask - WorkerSourceTask{id=deribw-events-0} Nothing to send to Kafka. Polling source for additional records
2020-09-02T10:57:16.382+0000 [task-thread-deribw-events-0] TRACE WorkerSourceTask - WorkerSourceTask{id=deribw-events-0} About to send 0 records to Kafka
2020-09-02T10:57:16.382+0000 [task-thread-deribw-events-0] TRACE WorkerSourceTask - WorkerSourceTask{id=deribw-events-0} Nothing to send to Kafka. Polling source for additional records
2020-09-02T10:57:16.382+0000 [task-thread-deribw-events-0] TRACE WorkerSourceTask - WorkerSourceTask{id=deribw-events-0} About to send 0 records to Kafka

The corresponding log statements can be found in the execute method of WorkerSourceTask

The text was updated successfully, but these errors were encountered:

oscerd · 2020-09-02T13:36:21Z

This is because we created a tiny layer based on camel. That's how a camel consumer should work in plain camel. There is space for improvements but I don't think it's implemented incorrectly.. the aim of this project is leveraging the camel components and the engine we already have.

oscerd · 2020-09-02T13:45:33Z

By the way I'll try to reproduce and check.

oscerd · 2020-09-02T13:55:20Z

Thanks for reporting.

oscerd · 2020-09-03T04:39:54Z

I tested the connector yesterday for about 40 minutes with this configuration

name=CamelRabbitmqSourceConnector
connector.class=org.apache.camel.kafkaconnector.rabbitmq.CamelRabbitmqSourceConnector
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter

camel.source.maxPollDuration=10000

topics=test26

camel.component.rabbitmq.hostname=172.17.0.2
camel.component.rabbitmq.portnumber=5672
camel.source.path.exchangeName=queue
camel.source.endpoint.exchangeType=topic
camel.source.endpoint.autoDelete=false
camel.source.endpoint.queue=queue
camel.source.endpoint.routingKey=key

The connector didn't consume because nothing was in RabbitMQ. This is the result with Java VisualVM

And the monitoring part

As you can see the CPU is constantly at 12%.

How are you checking the used resources? Through top? Are you using the top irix mode?

…che/camel-kafka-connector#414

Thylossus · 2020-09-03T07:43:42Z

Hi,

thank you very much for looking into the issue. I've prepared a docker-compose file to reproduce my issue: https://github.com/Thylossus/docker-kafka-camel-rabbitmq-connector-cpu-usage-scenario. I did not have the time yesterday to do so.

When I start all containers and setup the RabbitMQ connector, the CPU usage almost immediately increases significantly. I use top in Irix mode to check the resources:

Mem: 4911524K used, 8137016K free, 1252K shrd, 43904K buff, 1593012K cached
CPU:  84% usr   0% sys   0% nic  15% idle   0% io   0% irq   0% sirq
Load average: 10.14 6.92 3.24 12/1265 473
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    6     1 root     S    2688m  21%   2  79% /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -Xms256M -Xmx1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/kafka/connect/app/bin/../logs -Dlog4j.configuration=file:/ka
    1     0 root     S     2180   0%   5   0% bash /kafka/connect/entrypoint.sh
  462     0 root     S     1532   0%   2   0% top
  467     0 root     R     1528   0%   6   0% top -b -n 1

However, on our test systems we also have New Relic Monitoring which reports high CPU usage as well.

With the docker-compose setup on my local machine, you can also see the heavy load on the host system:

Maybe the cause lies within my connector configuration (I've also tried adding camel.source.maxPollDuration=10000 as in your configuration, but it did not change the behavior) or running Kafka Connect in distributed mode. I've spent quite some time investigating this issue and cannot explain what's the cause of the high CPU usage except for my assumption in the initial issue description.

Best regards,

Tobias

Thylossus · 2020-09-03T08:25:27Z

I've updated the example with an exposed JMX port. Now one can also connect to the Kafka Connect process via VisualVM or other JMX tools.

In the idle state (no RabbitMQ events), Visual VM reports ~80% cpu usage:

oscerd · 2020-09-03T08:32:43Z

Thanks for this. I forgot to say that I was testing on kafka 2.4.0

oscerd · 2020-09-03T08:33:05Z

We'll a look. @orpiske do you want to try to reproduce?

orpiske · 2020-09-03T08:38:42Z

We'll a look. @orpiske do you want to try to reproduce?

Yes I do! I will take a look at that.

Thylossus · 2020-09-03T08:47:16Z

Thanks for this. I forgot to say that I was testing on kafka 2.4.0

I've just tested Kafka version 2.4.0 in the Connect Docker container and still got the same results.

Thank you both for investigating this 👍

oscerd · 2020-09-03T10:30:52Z

After ten minutes of running I'm still

oscerd@ghost:~/workspace/miscellanea/docker-kafka-camel-rabbitmq-connector-cpu-usage-scenario$ bash setup.sh 
Topic rabbitmqtest does not exist yet. Creating it.
Created topic rabbitmqtest.
Creating exchange X.test
Connector rabbitmqtest does not exist yet. Creating it.
{"name":"rabbitmqtest","config":{"connector.class":"org.apache.camel.kafkaconnector.rabbitmq.CamelRabbitmqSourceConnector","topics":"rabbitmqtest","value.converter":"org.apache.kafka.connect.converters.ByteArrayConverter","tasks.max":"10","camel.component.rabbitmq.hostname":"rabbitmq","camel.component.rabbitmq.portNumber":"5672","camel.component.rabbitmq.username":"rabbitmq","camel.component.rabbitmq.password":"rabbitmq","camel.source.path.exchangeName":"X.test","camel.source.endpoint.exchangeType":"topic","camel.source.endpoint.autoDelete":"false","camel.source.endpoint.queue":"Q.test.kafka.import","camel.source.endpoint.routingKey":"events","name":"rabbitmqtest"},"tasks":[],"type":"source"}CPU usage as reported by top (Irix mode) in the Kafka Connect container
Mem: 20483064K used, 12140688K free, 1273088K shrd, 2280K buff, 8323776K cached
CPU:  11% usr   1% sys   0% nic  75% idle  12% io   0% irq   0% sirq
Load average: 2.47 1.59 1.59 5/2347 317
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    7     1 root     S    2556m   7%   3   0% /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -Xms256M -Xmx1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Djava.rmi.server.hostname=0.0
    1     0 root     S     2160   0%   6   0% {entrypoint.sh} /bin/bash /kafka/connect/entrypoint.sh
  313     0 root     R     1516   0%   4   0% top -b -n 1

CPU usage as reported by top (Irix mode) in the Kafka Connect container
Mem: 20475304K used, 12148448K free, 1268776K shrd, 2280K buff, 8320176K cached
CPU:   4% usr   2% sys   0% nic  81% idle  11% io   0% irq   0% sirq
Load average: 2.59 1.63 1.60 3/2343 350
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    7     1 root     S    2584m   8%   3   5% /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -Xms256M -Xmx1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Djava.rmi.server.hostname=0.0
    1     0 root     S     2160   0%   6   0% {entrypoint.sh} /bin/bash /kafka/connect/entrypoint.sh
  346     0 root     R     1516   0%   7   0% top -b -n 1

^Coscerd@ghost:~/workspace/miscellanea/docker-kafka-camel-rabbitmq-connector-cpu-usage-scenario$ bash setup.sh 
Topic rabbitmqtest already exists.
Creating exchange X.test
Connector rabbitmqtest already exists.
CPU usage as reported by top (Irix mode) in the Kafka Connect container
Mem: 20709508K used, 11914244K free, 1331772K shrd, 2280K buff, 8453220K cached
CPU:   7% usr   6% sys   0% nic  74% idle  12% io   0% irq   0% sirq
Load average: 1.85 1.80 1.68 5/2398 395
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    7     1 root     S    2642m   8%   3   4% /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -Xms256M -Xmx1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Djava.rmi.server.hostname=0.0
    1     0 root     S     2160   0%   6   0% {entrypoint.sh} /bin/bash /kafka/connect/entrypoint.sh
  391     0 root     R     1516   0%   0   0% top -b -n 1

CPU usage as reported by top (Irix mode) in the Kafka Connect container
Mem: 20713436K used, 11910316K free, 1332128K shrd, 2280K buff, 8453600K cached
CPU:   6% usr   1% sys   0% nic  87% idle   4% io   0% irq   0% sirq
Load average: 1.72 1.77 1.67 14/2392 401
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    7     1 root     S    2641m   8%   3   3% /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -Xms256M -Xmx1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Djava.rmi.server.hostname=0.0
    1     0 root     S     2160   0%   6   0% {entrypoint.sh} /bin/bash /kafka/connect/entrypoint.sh
  396     0 root     R     1516   0%   0   0% top -b -n 1

oscerd · 2020-09-03T10:32:13Z

I just modified the cpu_quota here in kafka_connect service:

version: "2.1"

# Based on https://github.com/simplesteph/kafka-stack-docker-compose/blob/master/zk-single-kafka-multiple.yml
services:
  zoo1:
    image: zookeeper:3.4.9
    hostname: zoo1
    container_name: camel_demo_zoo1
    ports:
      - "2181:2181"
    environment:
      ZOO_MY_ID: 1
      ZOO_PORT: 2181
      ZOO_SERVERS: server.1=zoo1:2888:3888
    volumes:
      - ./zk-single-kafka-multiple/zoo1/data:/data
      - ./zk-single-kafka-multiple/zoo1/datalog:/datalog

  kafka1:
    image: confluentinc/cp-kafka:5.5.1
    hostname: kafka1
    container_name: camel_demo_kafka1
    ports:
      - "9092:9092"
    environment:
      KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka1:19092,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
      KAFKA_ZOOKEEPER_CONNECT: "zoo1:2181"
      KAFKA_BROKER_ID: 1
      KAFKA_LOG4J_LOGGERS: "kafka.controller=INFO,kafka.producer.async.DefaultEventHandler=INFO,state.change.logger=INFO"
    volumes:
      - ./zk-single-kafka-multiple/kafka1/data:/var/lib/kafka/data
    depends_on:
      - zoo1

  kafka2:
    image: confluentinc/cp-kafka:5.5.1
    hostname: kafka2
    container_name: camel_demo_kafka2
    ports:
      - "9093:9093"
    environment:
      KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka2:19093,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9093
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
      KAFKA_ZOOKEEPER_CONNECT: "zoo1:2181"
      KAFKA_BROKER_ID: 2
      KAFKA_LOG4J_LOGGERS: "kafka.controller=INFO,kafka.producer.async.DefaultEventHandler=INFO,state.change.logger=INFO"
    volumes:
      - ./zk-single-kafka-multiple/kafka2/data:/var/lib/kafka/data
    depends_on:
      - zoo1

  kafka3:
    image: confluentinc/cp-kafka:5.5.1
    hostname: kafka3
    container_name: camel_demo_kafka3
    ports:
      - "9094:9094"
    environment:
      KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka3:19094,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9094
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
      KAFKA_ZOOKEEPER_CONNECT: "zoo1:2181"
      KAFKA_BROKER_ID: 3
      KAFKA_LOG4J_LOGGERS: "kafka.controller=INFO,kafka.producer.async.DefaultEventHandler=INFO,state.change.logger=INFO"
    volumes:
      - ./zk-single-kafka-multiple/kafka3/data:/var/lib/kafka/data
    depends_on:
      - zoo1

  rabbitmq:
    image: rabbitmq:3-management
    hostname: rabbitmq
    container_name: camel_demo_rabbitmq
    ports:
      - "15672:15672"
      - "5672:5672"
    environment:
      RABBITMQ_DEFAULT_USER: "rabbitmq"
      RABBITMQ_DEFAULT_PASS: "rabbitmq"
      RABBITMQ_DEFAULT_VHOST: "/"

  kafka_connect:
    build: ./kafka-connect
    hostname: kafka_connect
    container_name: camel_demo_kafka_connect
    cpu_quota: 25000
    ports:
      - "8083:8083"
      - "9010:9010"
    environment:
      KAFKA_HEAP_OPTS: "-Xms256M -Xmx1G"
      KAFKA_JMX_OPTS: "-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false  -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Djava.rmi.server.hostname=0.0.0.0 -Dcom.sun.management.jmxremote.rmi.port=9010 "
      JMX_PORT: 9010
      KAFKA_LOG4J_OPTS: "-Dlog4j.configuration=file:/kafka/connect/config/log4j.properties"
    depends_on:
      - kafka1
      - kafka2
      - kafka3
      - rabbitmq

orpiske · 2020-09-03T10:32:34Z

@oscerd I managed to reproduce it.

oscerd · 2020-09-03T10:33:36Z

Me too. But with a little modification to conf it seems better

oscerd · 2020-09-03T11:28:18Z

After one hour

oscerd@ghost:~/workspace/miscellanea/docker-kafka-camel-rabbitmq-connector-cpu-usage-scenario$ bash setup.sh 
Topic rabbitmqtest already exists.
Creating exchange X.test
Connector rabbitmqtest already exists.
CPU usage as reported by top (Irix mode) in the Kafka Connect container
Mem: 21581456K used, 11042296K free, 1323052K shrd, 2280K buff, 8477644K cached
CPU:   4% usr   0% sys   0% nic  82% idle  11% io   0% irq   0% sirq
Load average: 1.91 2.77 1.87 7/2374 417
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    7     1 root     S    2640m   8%   3   2% /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -Xms256M -Xmx1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Djava.rmi.server.hostname=0.0
    1     0 root     S     2160   0%   6   0% {entrypoint.sh} /bin/bash /kafka/connect/entrypoint.sh
  412     0 root     R     1516   0%   2   0% top -b -n 1

CPU usage as reported by top (Irix mode) in the Kafka Connect container
Mem: 21564480K used, 11059272K free, 1311232K shrd, 2280K buff, 8465832K cached
CPU:   3% usr   1% sys   0% nic  83% idle  12% io   0% irq   0% sirq
Load average: 1.92 2.75 1.87 10/2367 422
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    7     1 root     S    2640m   8%   3   4% /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -Xms256M -Xmx1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Djava.rmi.server.hostname=0.0
    1     0 root     S     2160   0%   6   0% {entrypoint.sh} /bin/bash /kafka/connect/entrypoint.sh
  418     0 root     R     1516   0%   5   0% top -b -n 1

Thylossus · 2020-09-03T12:08:21Z

@oscerd Did you get your results with or without the cpu_quota setting? I just ran the setup on a Linux machine and, again, got high CPU usage:

Mem: 7912780K used, 138900K free, 339276K shrd, 507192K buff, 2710832K cached
CPU:  96% usr   3% sys   0% nic   0% idle   0% io   0% irq   0% sirq
Load average: 5.59 2.29 1.94 13/1688 427
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    6     1 root     S    2672m  34%   3  89% /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -Xms256M -Xmx1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Djava.rmi.server.hostname=0.0
  422     0 root     R     1528   0%   4   3% top -b -n 1
    1     0 root     S     2180   0%   0   0% bash /kafka/connect/entrypoint.sh

orpiske · 2020-09-03T12:09:44Z

As soon as I add the connector - from running the setup script - there's a big spike in both the GC activity as well as the CPU usage. I'm checking if the same thing happens under different settings / connectors / etc.

Thylossus · 2020-09-03T12:11:24Z

When I set cpu_quota to 25000 as @oscerd did, the CPU usage is, indeed, not at > 80% anymore, but this is rather a workaround, isn't it? What's you opinion?

oscerd · 2020-09-03T12:18:28Z

I don't think it's a workaround at all, because basically this happens only in a container context.. I mean if you run the same configuration in standalone kafka connect you won't get the same resource consumption. It's something we need to investigate for sure by the way.

orpiske · 2020-09-03T12:32:37Z

I agree that we need to investigate this. I am not sure if it's a bug either. As usual for performance issues, there's a lot of things involved. I did try w/ a different connector to see if there would be any similarity and there was none with regards to CPU, but there were other strange things nonetheless. So, yeah, I guess we need to take a closer look.

orpiske · 2020-09-03T12:46:46Z

Another thing I noticed: the reproducer has the tasks.max set to 10. This is not yet supported. After I decreased it to 1, the CPU usage on my case decreased significantly.

oscerd · 2020-09-03T12:47:32Z

Yes, those connector actually doesn't work in distributed mode, we need to support it in some way. Still WIP.

Thylossus · 2020-09-03T13:04:09Z

Another thing I noticed: the reproducer has the tasks.max set to 10. This is not yet supported. After I decreased it to 1, the CPU usage on my case decreased significantly.

Have you had a look at the load on the individual cores of your system? Setting tasks.max to 1 will, indeed, reduce the overall system load as only one worker respectively one thread is spawned, but this worker will utilize "its core" to 100%. So if you have a system with two CPU cores, the CPU usage will be at 50% (or 100% in top with Irix mode)

orpiske · 2020-09-03T13:07:42Z

Another thing I noticed: the reproducer has the tasks.max set to 10. This is not yet supported. After I decreased it to 1, the CPU usage on my case decreased significantly.

Have you had a look at the load on the individual cores of your system? Setting tasks.max to 1 will, indeed, reduce the overall system load as only one worker respectively one thread is spawned, but this worker will utilize "its core" to 100%. So if you have a system with two CPU cores, the CPU usage will be at 50% (or 100% in top with Irix mode)

I did not look individually to a specific core, but the load did decrease by a factor of 10 (ie.: from ~31% previously to ~3% after that). Good catch.

Thylossus · 2020-09-03T13:17:59Z

Thank you both for spending so much time with investigating this issue 👍 I've configured our deployment so that the docker container's quota is fixed to 0.75 CPU cores and it works fine:

CONTAINER ID        NAME                    CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
1c24d113ae09        dericon-kafka-connect   75.27%              598.1MiB / 3.851GiB   15.17%              1.25MB / 452kB      0B / 4.1kB          46

I run the connector with task.max = 1 and while this still uses the quota completely, we can handle this load on our server without impacting other applications too much. This is a great progress compared to yesterday. Your help is highly appreciated.

I've also updated the sample with the latest settings.

While we can work with the current solution, I'm not sure if this issue can be closed yet, but I'll leave the decision with you.

oscerd · 2020-09-03T13:37:12Z

Let's leave this open

oscerd · 2020-09-03T13:37:54Z

I would focus on understanding of this specific to rabbitmq connector/component

orpiske · 2020-09-04T18:54:01Z

Today I did play a bit with this and I think we may have some room to improve the CamelSourceTask.

I adjusted the code a little bit and managed to reduce both the CPU usage (from ~9% to ~0.4% on this computer), GC allocations and heap usage (from ~350Mb to ~290Mb) while idle.

Before:

After:

With the change, the code spends far less time in the CamelTask.poll() than the current one we have.

Of course, this is just an initial proof of concept. Although it passes through all the tests, we need to do more serious testing, reviews, cleanups, etc ... but it looks there's something we can do here.

oscerd · 2020-09-04T19:05:58Z

Really well done @orpiske . Can you open a PR with your updates?

oscerd · 2020-09-04T19:06:08Z

And thanks a lot!

orpiske · 2020-09-04T19:11:03Z

Really well done @orpiske . Can you open a PR with your updates?

Thanks @oscerd! That was a fun one to play with.

Yes I can. I will cleanup the patches a little bit and I'll send one for review on Monday ... Tuesday at worst. I also put some code in place to make it easier for us to debug problems like this the future.

This adds a reference implementation for checking the resource usage of the RabbitMQ component while iddle. The motivation for this is related to the github issue apache#414.

This adds a reference implementation for checking the resource usage of the RabbitMQ component while iddle. The motivation for this is related to the github issue #414.

Thylossus · 2020-09-10T05:27:49Z

Thank you for looking into it! The PR looks good. Looking forward to version 0.5.0 :)

This adds a reference implementation for checking the resource usage of the RabbitMQ component while iddle. The motivation for this is related to the github issue #414.

oscerd · 2020-10-02T04:45:05Z

I think we can close this one.

Thylossus pushed a commit to Thylossus/docker-kafka-camel-rabbitmq-connector-cpu-usage-scenario that referenced this issue Sep 3, 2020

Setup docker compose file to reproduce high CPU consumption issue apa…

85a52cc

…che/camel-kafka-connector#414

orpiske mentioned this issue Sep 7, 2020

Proposed fix for issue #414 #428

Merged

orpiske added a commit that referenced this issue Sep 8, 2020

Added performance test infrastructure for RabbitMQ

e76a792

This adds a reference implementation for checking the resource usage of the RabbitMQ component while iddle. The motivation for this is related to the github issue #414.

oscerd pushed a commit that referenced this issue Sep 27, 2020

Added performance test infrastructure for RabbitMQ

57b3ad6

This adds a reference implementation for checking the resource usage of the RabbitMQ component while iddle. The motivation for this is related to the github issue #414.

oscerd pushed a commit that referenced this issue Sep 27, 2020

Added performance test infrastructure for RabbitMQ

dc6fe59

This adds a reference implementation for checking the resource usage of the RabbitMQ component while iddle. The motivation for this is related to the github issue #414.

oscerd closed this as completed Oct 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High CPU Usage with RabbitMQ Connector (polling interface implemented incorrectly?) #414

High CPU Usage with RabbitMQ Connector (polling interface implemented incorrectly?) #414

Thylossus commented Sep 2, 2020

oscerd commented Sep 2, 2020

oscerd commented Sep 2, 2020

oscerd commented Sep 2, 2020

oscerd commented Sep 3, 2020 •

edited

Loading

Thylossus commented Sep 3, 2020 •

edited

Loading

Thylossus commented Sep 3, 2020

oscerd commented Sep 3, 2020

oscerd commented Sep 3, 2020

orpiske commented Sep 3, 2020

Thylossus commented Sep 3, 2020

oscerd commented Sep 3, 2020 •

edited

Loading

oscerd commented Sep 3, 2020 •

edited

Loading

orpiske commented Sep 3, 2020

oscerd commented Sep 3, 2020

oscerd commented Sep 3, 2020

Thylossus commented Sep 3, 2020

orpiske commented Sep 3, 2020

Thylossus commented Sep 3, 2020

oscerd commented Sep 3, 2020

orpiske commented Sep 3, 2020

orpiske commented Sep 3, 2020

oscerd commented Sep 3, 2020

Thylossus commented Sep 3, 2020

orpiske commented Sep 3, 2020 •

edited

Loading

Thylossus commented Sep 3, 2020

oscerd commented Sep 3, 2020

oscerd commented Sep 3, 2020

orpiske commented Sep 4, 2020

oscerd commented Sep 4, 2020

oscerd commented Sep 4, 2020

orpiske commented Sep 4, 2020

Thylossus commented Sep 10, 2020

oscerd commented Oct 2, 2020

High CPU Usage with RabbitMQ Connector (polling interface implemented incorrectly?) #414

High CPU Usage with RabbitMQ Connector (polling interface implemented incorrectly?) #414

Comments

Thylossus commented Sep 2, 2020

oscerd commented Sep 2, 2020

oscerd commented Sep 2, 2020

oscerd commented Sep 2, 2020

oscerd commented Sep 3, 2020 • edited Loading

Thylossus commented Sep 3, 2020 • edited Loading

Thylossus commented Sep 3, 2020

oscerd commented Sep 3, 2020

oscerd commented Sep 3, 2020

orpiske commented Sep 3, 2020

Thylossus commented Sep 3, 2020

oscerd commented Sep 3, 2020 • edited Loading

oscerd commented Sep 3, 2020 • edited Loading

orpiske commented Sep 3, 2020

oscerd commented Sep 3, 2020

oscerd commented Sep 3, 2020

Thylossus commented Sep 3, 2020

orpiske commented Sep 3, 2020

Thylossus commented Sep 3, 2020

oscerd commented Sep 3, 2020

orpiske commented Sep 3, 2020

orpiske commented Sep 3, 2020

oscerd commented Sep 3, 2020

Thylossus commented Sep 3, 2020

orpiske commented Sep 3, 2020 • edited Loading

Thylossus commented Sep 3, 2020

oscerd commented Sep 3, 2020

oscerd commented Sep 3, 2020

orpiske commented Sep 4, 2020

oscerd commented Sep 4, 2020

oscerd commented Sep 4, 2020

orpiske commented Sep 4, 2020

Thylossus commented Sep 10, 2020

oscerd commented Oct 2, 2020

oscerd commented Sep 3, 2020 •

edited

Loading

Thylossus commented Sep 3, 2020 •

edited

Loading

oscerd commented Sep 3, 2020 •

edited

Loading

oscerd commented Sep 3, 2020 •

edited

Loading

orpiske commented Sep 3, 2020 •

edited

Loading