In [2]:
from IPython.display import Image

# Kafka Tutorial 2
[카프카 공식 링크](http://kafka.apache.org/)

## Index
1. Core parameters
    1. Zookeeper
    2. Broker
    3. Topic
    4. Producer
    5. Consumer
2. Exactly Once
3. Configuring partition backups to S3
4. 부록: UDN을 이용한 Idempotent producer 스트레스 테스트
5. 부록: 리플리카 팩터와 Ack 
6. 부록: 구성 설정 래퍼런스
---

### 1. Core parameters

#### 1.1. Zookeeper
[주키퍼 설정 공식 문서](https://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_configuration)

##### 최소 설정
##### 클러스터 설정
##### 권한/인증 설정
---

#### 1.2. Broker
[브로커 설정 공식 문서](https://kafka.apache.org/documentation/#brokerconfigs)

#### 1.3. Topic
[토픽 설정 공식 문서](https://kafka.apache.org/documentation/#topicconfigs)

#### 1.4. Producer
[프로듀서 설정 공식 문서](https://kafka.apache.org/documentation/#producerconfigs)  
가용성, 성능과 밀접한 연관이 있는 파라미터

`offset.topic.replication.factor`: 리플리카 수를 지정  
`ack`: ack 타입 설정

#### 1.5. Consumer
[컨슈머 설정 공식 문서](https://kafka.apache.org/documentation/#consumerconfigs)

---

### 2. Exactly Once


#### 문제
- **증권 거래 트랜잭션의 경우 메시지가 중복되거나 유실되어서는 안된다.**
- 카프카는 최소-한번-전달(At-least-once Delivery, 최소 1회 메시지 전송을 보장)을 기본으로 제공한다. 하지만 이것으로 인해 데이터 중복이 발생할 수 있다.
> Backgound
> 1. at-most-once: 실패, 타임아웃등이 발생하면 메시지를 버릴 수도 있다. 
> 2. at-least-once: 메시지는 최소 1회 반드시 상대편에게 전달이 된다.  
> **메시지 시스템은 실패 할 수 있기 때문에, 데이터는 중복해서 전달될 수 있다.** 메시지를 전송이 응답지연 등 으로 실패했다고 판단하면 다시 메시지를 전송 하는데, 실패했다고 판단했던 메시지가 (사실은 실패가 아니라서) 나중에 전송될 수 있기 때문이다. 마찬가지로 메시지의 순서도 보장 할 수 없다.
> 3. exactly-once: 정확하게 한번의 메시지 전송을 보장한다. 중복과 유실 모두 허용하지 않는다.
  

#### 중복과 유실이 발생하는 원인  
- 기본적으로 모든 분산시스템의 노드들은 실패 가능성이 항상 존재한다.
    1. `프로듀서와 브로커간의 실패`: **카프카의 견고함은 프로듀서가 브로커로 부터 ACK를 받는 메커니즘에 의존한다. 하지만 어떤 이유로 프로듀서가 메시지를 재전송하면서 메시지 중복이 발생한다.**  
    문제는 ACK를 받지 못했다고 요청이 실패했다고 단정 할 수 없다는데 있다. 메시지를 토픽에 성공적으로 기록했지만 브로커와 프로듀서의 충돌, 네트워크의 문제로 ACK를 받지 못할 수 있다. 프로듀서는 실패의 원인을 정확히 알 수 없기 때문에, 일단 실패했다고 가정하고 다시 메시지를 전송한다. 결국 컨슈머는 두 개의 중복된 메시지를 수신하게 된다.
    
    ![kafka-eos](./assets/kafka_eos1.png)
    
    2. `컨슈머의 실패` : **컨슈머가 다운될 경우 파티션에서 메시지를 어디까지 읽고 처리했는지 알 수 없다.**  
    그래서 일단 새로운 클라이언트가 실행되면, 실패한 인스턴스의 최근 상태를 복구하고 안전한 지점에서 처리를 시작해야 한다. **즉 처리한 오프셋을 항상 동기화(commit) 할 수 있어야 한다.**
    
    3. `브로커의 실패` : **HW/SW 이슈로 브로커는 실패 가능성이 항상 존재한다.**  
    하지만 카프카는 고가용성, persistent, 내구성(durable) 시스템으로 모든 메시지는 (일정시간 동안)유지되며, n개의 노드에 복제저장된다. 결과적으로 카프카는 최대 n-1 개의 브로커 오류를 용인한다.

#### 해결책
- 대부분의 경우 `프로듀서와 브로커간의 실패` 로 프롣듀서가 메시지를 재전송하면서 메시지 중복이 발생한다. 
> 카프카 프로듀서 API에서 자동적으로 메시지 전송 실패시 재시도 처리하기에 유실이 일어날 확률은 적다.
- 카프카는 2017년 (0.11 릴리즈)부터 `Idempotence API`와 `transaction API` 을 제공하여 중복과 유실이 없는 `exactly-once` 를 지원한다.  
> [Confluent가 메인테이너로 있는 파이썬 카프카 프로듀서, 컨슈머 API](https://github.com/confluentinc/confluent-kafka-python)는 **현재 Idempotent(멱등성) 기능만 지원**(2019. 05 릴리즈)한다.


1. Idempotent (멱등성)
    - 멱등성은 연산을 여러 번 적용하더라도 결과가 달라지지 않는 성질이다.
    - 멱등성을 이용하여 `Idempotent producer` 을 구성하여 `exactly-once` 를 지원할 수 있다.
    - `enable.idempotence=true` 로 프로듀서를 설정하게 되면 브로커는 메시지의 메타데이터(PID, SN)를 읽고 **단일 파티션에서** `프로듀서와 브로컨의 실패` 로 인해 중복된 메시지가 발생할 가능성을 제거한다. 
    ##### 멱등성 적용시 메시지 중복 처리 과정
        1. 메타 데이터와 함께 메시지 전송
            - PID: 프로듀서 아이디
            - sequence number: 시퀸스 순서
            - PID-sequence number pair는 모든 파티션 내에서 global unique 함.
        
        ![idempotence1](./assets/idempotence1.png)
        
        ![idempotence2](./assets/idempotence2.png)
        
        2. 프로듀스 수신 Ack 실패 
        
        ![idempotence3](./assets/idempotence3.png)
        
        3. (프로듀서가 메시지 전송 실패로 인식하여) 메시지 재전송
            - 이전에 보냈던 메시지와 똑같은 메타데이터를 갖는 메시지 전송해야함.
            - 브로커는 자동으로 메시지의 중복을 인지하고 중복을 허용하지 않음.
            
        ![idempotence4](./assets/idempotence4.png)
        
        ![idempotence5](./assets/idempotence5.png)
        
    ##### 예제
    M1 (PID: 1, SN: 1) - written to partition. For PID 1, Max SN=1  
    M2 (PID: 1, SN: 2) - written to partition. For PID 1, Max SN=2  
    M3 (PID: 1, SN: 3) - written to partition. For PID 1, Max SN=3  
    M4 (PID: 1, SN: 4) - written to partition. For PID 1, Max SN=4  
    M5 (PID: 1, SN: 5) - written to partition. For PID 1, Max SN=5  
    M6 (PID: 1, SN: 6) - written to partition. For PID 1, Max SN=6  
    
    **M4, M5, M6 메시지는 SN(Sequence Number)가 Max SN보다 작으므로 브로커가 메시지를 중복으로 처리한다.**     
    
    M4 (PID: 1, SN: 4) - rejected, SN <= Max SN  
    M5 (PID: 1, SN: 5) - rejected, SN <= Max SN  
    M6 (PID: 1, SN: 6) - rejected, SN <= Max SN  
    
    M7 (PID: 1, SN: 7) - written to partition. For PID 1, Max SN=7  
    M8 (PID: 1, SN: 8) - written to partition. For PID 1, Max SN=8  
    M9 (PID: 1, SN: 9) - written to partition. For PID 1, Max SN=9  
    M10 (PID: 1, SN: 10) - written to partition. For PID 1, Max SN=10  


#### Idempotent 샘플 코드
##### Confluent-kafka 설치
Confluent Inc. 가 메인테이너로 있는 [Confluent-kafka-python api](https://github.com/confluentinc/confluent-kafka-python)를 사용한다.

```Bash
(base) $ conda activate vopt
(vopt) $ conda install -c conda-forge python-confluent-kafka 
(vopt) $ cd kafka-docker-compose
(vopt) $ docker-compose up
```

In [6]:
# Idempotent Producer 활성화
from confluent_kafka import Producer

p = Producer(
    {
        'bootstrap.servers': '127.0.0.1:29092',
        'enable.idempotence': True # Idempotent Producer 설정
    }
)

def msg_callback(err, msg):
    print('success: ', msg.value())

data_list = ['a','b','c','d','e', 'EOM']
for data in data_list:
    p.poll(0)
    p.produce('mytopic0_0', data.encode('utf-8'), callback=msg_callback)

p.flush()

success:  b'a'
success:  b'b'
success:  b'c'
success:  b'd'
success:  b'e'
success:  b'EOM'


0

##### Idempotent Producer 사용시 제약사항
1. `Acks=All`  
`Acks=0`, `Acks=1` 을 사용할 수 없고 강제적으로 `Acks=All` 만 사용할 수 있다.
> 강제적으로 `Acks=0` 로 적용할 시 아래와 같은 에러를 보게 된다.  
> ```Python  
KafkaException: KafkaError{code=_INVALID_ARG,val=-186,str="Failed to create producer: `acks` must be set to `all` when `enable.idempotence` is true"}
```

2. `max.in.flight.requests.per.connection <= 5`  
`max.in.flight.requests.per.connection` 설정은 5 이하로만 사용할 수 있다. 
> 강제적으로 5 초과를 적용할 시 아래와 같은 에러를 보게 된다.  
> ```Python  
KafkaException: KafkaError{code=_INVALID_ARG,val=-186,str="Failed to create producer: `max.in.flight` must be set <= 5 when `enable.idempotence` is true"}  
``` 

3. `partition.num=1`  
단일 파티션만 사용할 수 있다.

In [4]:
from confluent_kafka import Consumer, KafkaError

c = Consumer({
    'bootstrap.servers': '127.0.0.1:29092',
    'group.id': 'mygroup',
    'auto.offset.reset': 'earliest'
})

c.subscribe(['mytopic1_0'])

while True:
    msg = c.poll(0)

    if msg is None:
        continue
    if msg.value().decode('utf-8') == "EOM":
        break
    if msg.error():
        print("Consumer error: {}".format(msg.error()))
        continue

    print('Received message: {}'.format(msg.value().decode('utf-8')))
    c.commit()

c.close()

KeyboardInterrupt: 

### Stress Test
- 실제로 메시지 유실과 중복이 발생하지 않는지 테스트할려면 코드상에선 불가능하고 네트워크 환경을 직접적으로 건드려야 한다.
- 도커를 사용할 경우 도커의 UDN(User defined Network)를 통해 네트워크의 전체적인 속도를 지연시키거나 TCP 연결을 제한하는 등 직접적인 조작이 가능하다.
- 자세한 예제는 [부록](#4.-Appendix%3A-UDN을-이용한-Idempotent-producer-스트레스-테스트)에 나와있다.
---

2. transaction API(트랜잭션)  
    - atomic write 트랜잭션 API로 정확히 한번만 메시지를 전달할 수 있다. 
    
    > **(주의!)**[Confluent가 메인테이너로 있는 파이썬 카프카 프로듀서, 컨슈머 API](https://github.com/confluentinc/confluent-kafka-python)는 **현재 Idempotent(멱등성) 기능만 지원**(2019. 05 릴리즈)하고있다.

#### Transaction 샘플코드 (Java)
```Java
producer.initTransactions();
try {
    producer.beginTransaction();
    producer.send(record0);
    producer.send(record1);
    producer.commitTransaction();
} catch (KafkaException e) {
    producer.aboutTransaction();
}
```
1. 트랜잭션 시작
    - `transactional.id` (tid)는 항상 고유한 값이여야 한다.

![kafka-eos-61-1024](./assets/kafka-eos-61-1024.jpg)

2. 트랜잭션 전송
    - 프로듀서가 특정 파티션들에 보낸 메시지의 메타데이터(t는 파티션이고 n은 메시지이다.)를 트랜잭션 로그에 저장한다.

![kafka-eos-62-1024](./assets/kafka-eos-62-1024.jpg)

3. 파티션에 메시지를 기록한다.

![kafka-eos-63-1024](./assets/kafka-eos-63-1024.jpg)

4. 보낸 메시지들을 커밋한다.

![kafka-eos-64-1024](./assets/kafka-eos-64-1024.jpg)

5. `marker` 을 기록한다.
    - 이때 2번 과정에서 생성했던 메타데이터들을 가지고 `marker` 를 (`marker` 는 중복과 유실이 없는 메시지의 범위를 나타낸다) 기록한다.
    
![kafka-eos-65-1024](./assets/kafka-eos-65-1024.jpg)

6. 커밋완료

![kafka-eos-66-1024](assets/kafka-eos-66-1024.jpg)

7. Ack를 프로듀서에게 전송한다.
    - 프로듀서-브로커간 모든 전송과정이 끝났다.

![kafka-eos-67-1024](./assets/kafka-eos-67-1024.jpg)

8. 컨슈머가 커밋된 메시지들만 읽는다.
    - 커밋된 메시지들은 확실하게 중복과 유실이 발생하지 않는 메시지들이다.

![kafka-eos-68-1024](./assets/kafka-eos-68-1024.jpg)

---

### 3. Configuring partition backups to S3  

틱데이터(현물, 선물, 옵션)의 경우 매일 많은량의 데이터를 생산한다. 브로커내부에 로그 세그먼테이션 형태 그대로 둘 경우 관리 및 비용문제가 발생한다.  

하지만 카프카는 `Connector API` 라는 기능을 제공함으로서 손쉽게 로우데이터를 데이터 레이크(`s3, redshift` etc) 저장할 수 있다.

![kafka-connect](./assets/kafka-connect.png)

`Connector API` 를 통해 카프카는 특정 `source` 에서 데이터를 읽어와 `sink` 에 데이터를 저장할 수 있다.  
> `source` 는 프로듀서, `sink` 는 컨슈머.  
> 직접 `source` 와 `sink` 쪽 코딩을 할 필요 없이 간단한 몇가지 설정을 통해 `source` 쪽에서 데이터를 끌어와 `sink` 에 저장할 수 있다.  

따라서 특정 토픽(예: 틱 데이터 저장 토픽)을 `s3` 에 저장할 수 있다.

좀 더 구체적으로 효율적인 틱 데이터 저장 및 관리를 위해 다음과 같은 작업을 한다.

1. 특정 토픽(예: 틱데이터 저장 토픽, 주문 트랜잭션 토픽)은 실시간으로 S3에 업로드 한다.
2. 일일 단위로 특정 토픽의 로그 파일을 삭제

#### 예제
##### 1. Access Key 생성
aws security credential 페이지에서 예제에 사용할 Access Key를 생성한다.
> 최소한 S3 `Object Write` 에 관한 권한이 필요하다.

```
AWS Access Key ID [None]: A12345789123A
AWS Secret Access Key [None]: U123456789123456789123456789t
```
##### 2. AWS EC2 서버 생성
- AMI: Ubuntu Server 18.04 LTS(HVM)
- Instance type: c5.xlarge(4 vCPU, 8GiB EBS only)

##### 3. 기본적 파일 설치
```Bash
$ sudo -s
(root) $ apt-get update
(root) $ apt-get install unzip -y
(root) $ apt-get install vim -y
(root) $ apt-get install python-pip -y
(root) $ apt-get install awscli -y
```

##### 4. AWS 설정
```Bash
(root) $ aws configure
AWS Access Key ID [None]: A12345789123A
AWS Secret Access Key [None]: U123456789123456789123456789t
Default region name [None]: ap-northeast-2
Default output format [None]:

(root) $ export AWS_ACCESS_KEY_ID=A12345789123A
(root) $ export AWS_SECRET_ACCESS_KEY=U123456789123456789123456789t
(root) $ export AWS_DEFAULT_REGION=ap-northeast-2
```
> `kafka-s3-connector` 사용시 `export AWS_ACCESS_KEY_ID=` 처럼 환경변수로 등록해주어야 함.  
> **`aws configure` 명령어로만 AWS_ACCESS_KEY_ID 설정시 `provider chain invalid` 버그가 발생.**

##### 5. 카프카 설치 - jdk 설치
카프카, 주키퍼는 jvm에 의존적이다.
```Bash
(root) $ add-apt-repository ppa:openjdk-r/ppa
(root) $ apt-get update 
(root) $ apt-get install openjdk-8-jdk -y
(root) $ java -version
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-8u212-b03-0ubuntu1.18.04.1-b03)
OpenJDK 64-Bit Server VM (build 25.212-b03, mixed mode)
```

##### 5. 카프카 설치 - 카프카 다운로드
[download](https://www.apache.org/dyn/closer.cgi?path=/kafka/2.2.0/kafka_2.12-2.2.0.tgz) 이곳에서 미러사이트 하나를 선택해서 다운로드한다.
> 예제에서는 `/home/ubuntu` 에 다운로드하고 설치하지만 보안을 고려한다면 다른 경로를 사용한다.

```Bash
(root) $ cd ~
(root) $ pwd
/home/ubuntu
(root) $ wget http://mirror.navercorp.com/apache/kafka/2.2.0/kafka_2.12-2.2.0.tgz 
(root) $ tar -zxf kafka_2.12-2.2.0.tgz 
(root) $ cd kafka_2.12-2.2.0
```

##### 6. 주키퍼 설정
```Bash
(root) $ pwd
/home/ubuntu/kafka_2.12-2.2.0
(root) $ cd config
(root) $ nano zookeeper.properties
```

zookeeper.properties 에서 주키퍼의 구성 옵션을 변경할 수 있다.  
이 예제에서는 스탠다드-얼론 환경이므로 구성 옵션을 변경할 필요가 없다.  
> 만약 기존 서버가 있고 2181 포트에서 리스닝하고있다면 아래 포트를 변경한다.

```properties
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# the directory where the snapshot is stored.
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
# disable the per-ip limit on the number of connections since this is a non-production
maxClientCnxns=0
```

##### 7. 카프카 설정
```Bash
(root) $ pwd
/home/ubuntu/kafka_2.12-2.2.0/config
(root) $ nano server.properties
```

server.properties 에서 카프카의 구성 옵션을 변경할 수 있다.
예제에서 변경할 부분은 `log.roll.hours=24`, `log.retention.hours=24` 이다.

1. `log.roll.hours=24` : 첫 로그 세그먼테이션이 생성된뒤 24시간이 지나면 새로운 로그 세그먼테이션 생성.
2. `log.retention.hours=24` : 로그 세그먼테이션의 마지막 수정 시간(mtime)을 기준으로 24시간만 보관. 

결과적으로 48시간동안만 카프카 브로커가 로그 세그먼테이션을 가지고 있다.   
> 처음 24시간은 로그 세그먼테이션이 쌓인후 24시간동안 보관한다.  
> `log.segment.bytes` 를 이용하면 바이트 단위로 로그 세그먼테이션을 롤아웃 할 수 있다. [여기](https://kafka.apache.org/documentation/#topicconfigs) 참조

```properties
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################
...
############################# Socket Server Settings #############################
...
############################# Internal Topic Settings  #############################
...
############################# Log Flush Policy #############################
...
############################# Log Retention Policy #############################
log.retention.hours=24
log.roll.hours=24

############################# Zookeeper #############################
...
############################# Group Coordinator Settings #############################
```

##### 8. 주키퍼 실행
```Bash
(root) $ pwd
/home/ubuntu/kafka_2.12-2.2.0/config
(root) $ ../bin/zookeeper-server-start.sh zookeeper.properties
...
[2019-06-26 02:20:01,417] INFO minSessionTimeout set to -1 (org.apache.zookeeper.server.ZooKeeperServer)
[2019-06-26 02:20:01,417] INFO maxSessionTimeout set to -1 (org.apache.zookeeper.server.ZooKeeperServer)
[2019-06-26 02:20:01,423] INFO Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory (org.apache.zookeeper.server.ServerCnxnFactory)
[2019-06-26 02:20:01,425] INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NIOServerCnxnFactory)
```

아무런 오류가 나지 않는다면 `Ctrl+C` 를 눌러 종료시킨 후 데몬으로 주키퍼를 실행한다.
```Bash
(root) $ ../bin/zookeeper-server-start.sh -daemon zookeeper.properties
```

##### 9. 카프카 실행
```Bash
(root) $ pwd
/home/ubuntu/kafka_2.12-2.2.0/config
(root) $ ../bin/kafka-server-start.sh server.properties
...
[2019-06-26 02:23:01,809] INFO [SocketServer brokerId=0] Started data-plane processors for 1 acceptors (kafka.network.SocketServer)
[2019-06-26 02:23:01,814] INFO Kafka version: 2.2.0 (org.apache.kafka.common.utils.AppInfoParser)
[2019-06-26 02:23:01,814] INFO Kafka commitId: 05fcfde8f69b0349 (org.apache.kafka.common.utils.AppInfoParser)
[2019-06-26 02:23:01,816] INFO [KafkaServer id=0] started (kafka.server.KafkaServer)
```

아무런 오류가 나지 않는다면 `Ctrl+C` 를 눌러 종료시킨 후데몬으로 주키퍼를 실행한다.
```Bash
(root) $ ../bin/kafka-server-start.sh -daemon server.properties
```

##### 10. 주키퍼/카프카 실행 확인
```Bash
(root) $ netstat -lntp

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN      589/systemd-resolve
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      20665/sshd
tcp6       0      0 :::44323                :::*                    LISTEN      29509/java
tcp6       0      0 :::9092                 :::*                    LISTEN      29509/java
tcp6       0      0 :::2181                 :::*                    LISTEN      28539/java
tcp6       0      0 :::37515                :::*                    LISTEN      28539/java
tcp6       0      0 :::22                   :::*                    LISTEN      20665/sshd
```

정상적으로 `주키퍼(2181 포트)`, `카프카(9092 포트)`가 리스닝중이다.

##### 11. 메시지 발행, 구독 테스트
```Bash
(root) $ cd ..
(root) $ pwd
/home/ubuntu/kafka_2.12-2.2.0
(root) $ sh bin/kafka-console-producer.sh --broker-list localhost:9092 --topic mytopic
>a
>b
>c
>d ^C
(root) $ sh bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic mytopic --from-beginning
a
b
c
d
^CProcessed a total of 4 messages
```

##### 12. s3-connector 설치

`mytopic` 토픽이 틱 데이터가 쌓이는 토픽이라고 가정하고, 이 토픽에서 생성되는 메시지를 전부 s3 버킷(`kafkamytopictest`)에 업로드하는 작업을 한다.

1. `s3-connector` 를 설치하기 위해서 [여기](https://github.com/confluentinc/kafka-connect-storage-cloud/releases)에서 `stable` 버전을 확인한다.
2. `wget https://api.hub.confluent.io/api/plugins/confluentinc/kafka-connect-s3/versions/4.1.1/archive` 명령어를 1번에서 확인한 `stable` 버전으로 url을 변경한다. (현재 `stable` 버전은 5.2.2 다. )
3. `wget https://api.hub.confluent.io/api/plugins/confluentinc/kafka-connect-s3/versions/5.2.2/archive`

```Bash
(root) $ cd ~
(root) $ wget https://api.hub.confluent.io/api/plugins/confluentinc/kafka-connect-s3/versions/5.2.2/archive
(root) $ unzip archive
(root) $ cd kafka_2.12-2.2.0
(root) $ pwd
/home/ubuntu/kafka_2.12-2.2.0

4. kafka_2.12-2.2.0 내부에 plugins/kafka-connect-s3 폴더를 만들고 거기에 s3-connector lib 파일들을 복사한다.

```Bash
(root) $ mkdir -p plugins/kafka-connect-s3
(root) $ cd plugins/kafka-connect-s3
(root) $ cp ~/confluentinc-kafka-connect-s3-5.2.2/lib/* .
(root) $ pwd
/home/ubuntu/kafka_2.12-2.2.0/plugins/kafka-connect-s3
```

##### 13. s3-connector 설정
`kafka-s3-connector` 가 연결할 카프카 브로커를 설정한다. 

1. `~/kafka_2.12-2.2.0/config` 에 `connect-s3.properties` 라는 파일을 만들고 아래처럼 입력한다.
2. `plugin.path=` 는 위의 12번 과정에서 했던 `/home/ubuntu/kafka_2.12-2.2.0/plugins` 를 입력한다.
> 카프카가 알아서 `plugins` 내부의 `kafka-connect-s3` 폴더를 탐색하며 필요한 라이브러리들을 임포트 한다.

```Bash
(root) $ cd ~/kafka_2.12-2.2.0/config
(root) $ nano connect-s3.properties

# Kafka broker IP addresses to connect to
bootstrap.servers=localhost:9092

# Path to directory containing the connector jar and dependencies
plugin.path=/home/ubuntu/kafka_2.12-2.2.0/plugins

# Converters to use to convert keys and values
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter

# The internal converters Kafka Connect uses for storing offset and configuration data
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
```

3. 데이터가 저장될 버킷을 생성한다. 버킷 이름은 `kafkamytopictest` 이다.
```Bash
(root) $ aws s3api create-bucket --bucket kafkamytopictest --region ap-northeast-2 --create-bucket-configuration LocationConstraint=ap-northeast-2
{
    "Location": "http://kafkamytopictest.s3.amazonaws.com/"
}
```


4. `kafka-s3-connector` 가 어떻게 s3 버킷에 저장할지 설정하는 파일인 `connect-s3-sink.properties` 라는 파일을 만들고 아래처럼 입력한다.
```Bash
(root) $ nano connect-s3-sink.properties
name=s3-sink
connector.class=io.confluent.connect.s3.S3SinkConnector
tasks.max=1
topics=mytopic
s3.region=ap-northeast-2
s3.bucket.name=kafkamytopictest
s3.compression.type=gzip
s3.part.size=5242880
flush.size=3
storage.class=io.confluent.connect.s3.storage.S3Storage
format.class=io.confluent.connect.s3.format.json.JsonFormat
schema.generator.class=io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator
partitioner.class=io.confluent.connect.storage.partitioner.TimeBasedPartitioner
partition.duration.ms=3600000
path.format=YYYY-MM-dd
locale=KR
timezone=UTC
schema.compatibility=NONE
```
> `topics=`: kafka-s3-connector 가 데이터를 읽어올 토픽의 이름.  
> `s3.bucket.name=`: 데이터를 저장할 버킷 이름.  
> `s3.part.size=5242880`: s3 multipart 업로드시 한 파트당 크기(기본:5MB)  
> `flush.size`: 파티션 당 몇개의 레코드가 버킷에 쓰여질지 지정.  
> `path.format`: 버킷에 생성될 경로 이름, `(your s3 bucket)/mytopic/2019-06-28` 로 저장됨.  
> 자세한 내용은 [여기](https://docs.confluent.io/current/connect/kafka-connect-s3/index.html) 참조.

##### 14. kafka-s3-connector 실행
아래 명령어를 실행하면 `mytopic` 토픽에 있는 메시지(로그)들이 `kafkamytopictest` 버킷안에 저장된다.
```Bash
(root) $ ../bin/connect-standalone.sh connect-s3.properties connect-s3-sink.properties
```

정상적으로 실행되면 아래와 같이 로그가 출력되고, aws s3 console로 가서 파일을 확인한다.
```Bash
[2019-06-26 04:14:30,371] INFO Files committed to S3. Target commit offset for mytopic-0 is 3 (io.confluent.connect.s3.TopicPartitionWriter:492)
[2019-06-26 04:15:29,574] INFO WorkerSinkTask{id=s3-sink-0} Committing offsets asynchronously using sequence number 1: {mytopic-0=OffsetAndMetadata{offset=3, leaderEpoch=null, metadata=''}} (org.apache.kafka.connect.runtime.WorkerSinkTask:344)
```

![s3_connector](./assets/s3_connector.png)  

압축을 풀고 파일을 열어보면 11단계에서 입력했던 "a", "b", "c", "d" 중 "a", "b", "c" 가 저장되었음을 볼 수 있다.  

![s3_connector_data](./assets/s3_connector_data.png)
> "d" 만 파일 안에 저장되지 않았는데 12.3 단계에서 `flush.size=3` 으로 설정되어서 메시지가 3개씩 버킷에 저장된다.  
> 메시지를 2개 더 보내면("e", "f") "d", "e", "f" 메시지가 `mytopic+0+0000000003.json.gz` 로 버킷에 저장된다.  
> 환경에 따라 이 `flush.size=` 를 조정한다.



---

### 4. 부록: UDN을 이용한 Idempotent producer 스트레스 테스트

Idempontent Producer 테스트는 [여기](https://jack-vanlightly.com/blog/2018/10/25/testing-producer-deduplication-in-apache-kafka-and-apache-pulsar)에서 참조하였다.

실제로 메시지 중복/유실이 없는지 확인하기 위해 직접 네트워크 환경을 느리게(`slow`) 하거나 불안정하게(`flaky`, 패킷 드랍) 만들어서 타임아웃을 유도하고 컨슈머와 프로듀서가 메시지를 재전송하게 만들어야 한다.

이를 할 수 있게 하는 툴은 [Blockade](https://github.com/worstcase/blockade) 이며 도커 가상 네트워크 환경(UDN, User Define Network)에서 동작한다.
> `Blockade`  
> Blockade is a utility for testing network failures and partitions in distributed applications. Blockade uses Docker containers to run application processes and manages the network from the host system to create various failure scenarios.

1. 테스트 환경을 만드는 코드를 클론
```Bash
$ conda create -n "kafka_test" python=3.6
$ git clone https://github.com/bohblue2/ChaosTestingCode
$ cd ChaosTestingCode/KafkaUdn
$ conda activate kafka_test
$ pip install -r requirements.txt
```

2. 테스트
    - 클러스터 구성 : Docker 환경에서 1 ZooKeeper, 3 Kafka brokers
    - 네트워크 구성 : 사용자 정의 네트워크 (docker bridge)
    - 테스트 결과 변수  
    
        `DedupEnabled`  - `True` 시 Idempotent Producer 사용  
        `TestRun` - 테스트 횟수 (the test run number)  
        `SendCount` -  프로듀서에서 보낸 메시지 수 (the number of messages sent)  
        `AckCount` - Ack 수 (the number of messages acknowledged positively or negatively)   
        `PosAckCount` - 메시지 전송이 성공되어 보낸 Ack 수 (the number of messages positively acknowledged)  
        `NegAckCount` - 메시지 전송이 실패되어 보낸 Ack 수 (the number of messages negatively acknowledged)  
        `Received` - 소비된 메시지 수 (the number of messages consumed)  
        `NotReceived`  - 유실된 메시지 수 (the number of messages lost)  
        `ReceivedNoAck`- Ack를 받지 못한 상태에서 소비된 메시지 수 (the number of unacknowledged messages consumed)   
        `MsgsWithDups` - 중복된 메시지 수 (the number of messages with duplicates)  


2.1. Kill TCP Connection Kafka Test
```Bash
# docker images download jackvanlightly/cloudkarafka-manager:latest
python test_idempotent.py --tests 1 --run-minutes 2 --topic mytopic --idempotence True --msg-count 1000 --in-flight-max 100000 --new-cluster true
```

---

#### 5. 부록: 리플리카 팩터와 Ack

##### 리플리카
- 리플리케이션(복제)는 카프카 시스템에서 가장 중요한 부분 중 하나이며, 시스템 가용성에 핵심적인 역활이다.  
- 리플리카는 한 파티션을 복제하여 여러 브로커들이 복제본을 갖고 있는 것을 의미한다.  
- 카프카 브로커 설정에서 `default.replication.factor` 를 조정하여 토픽 생성시 자동으로 리플리카 수를 구성할 수 있다.  

![replication.001-600x450](./assets/replication.001-600x450.jpg)  
> 리플리카 팩터가 1로 설정되어있는 경우

![replication.002-600x450](./assets/replication.003-600x450.jpg)
> 토픽별로 리플리카 팩터가 (Topic01) 2개, (Topic 02)3개로 설정되어있는 경우

- 토픽 별로 리플리카 수를 지정할 수 있으며, 데이터가 중요한 토픽은 리플리카 수를 많이 가져감으로서 데이터의 가용성을 확보할 수 있다.
> 다만 1GB 파티션에 리플리카 팩터가 5로 지정되어있으면 전체적으로 5GB를 차지하게 된다. 하드웨어와 가용성간 트레이드 오프를 고려해야한다.

- 리플리카 펙터가 2 이상으로 설정 된 경우 하나는 `leader`, 나머지는 `fellower` 라고 불린다.  
- 높은 가용성을 확보하기 위해 `fellower` 는 `leader` 데이터를 주기적으로 pull 하며 동기를 유지한다.  
![replication.004-600x450](./assets/replication.004-600x450.jpg)
> 프로듀서와 컨슈머가 특정 토픽의 파티션에 read/write 하는 모든것은 `leader` 파티션을 가지고 있는 브로커에서 이루어진다.  
> **프로듀서와 컨슈머는 브로커 메타데이터를 항상 최신상태로 가지고 있기 때문에 어떤 브로커가 `leader` 파티션을 가지고 있는지 알고있다.**
- `leader` 파티션을 가지고 있던 브로커가 죽는다면 `fellower` 중 하나가 `leader` 가 된다. 
> `fellower` 는 리더의 복제본이기 때문에 메시지 유실을 최소화 한다.

##### Ack
- 프로듀서는 브로커에게 메시지를 전송한 후 `Ack` 를 받는다.  
![ack_process](./assets/ack.png)  
- `Ack` 는 브로커가 프로듀서에게 받은 메시지를 정상적으로 파티션에 기록했을때 전송한다. 
- `Ack` 가 타임아웃시간(`timeout.ms` 설정)내에 오지 않을 경우 메시지 전송이 실패했다고 판단하고 재전송을 한다.
- 프로듀서 설정에서 `Ack` 타입을 3가지 방법으로 설정할 수있다. 설정에 따라 가용성과 성능의 트레이드 오프를 조절한다.
    1. `Ack=0` : 프로듀서는 자신이 보낸 메시지에 대해 브로커로부터 확인하지 않음.
    2. `Ack=1` : 프로듀서는 자신이 보낸 메시지에 대해 카프카의 leader가 메시지를 받았는지 기다림. 
    > follower들은 확인하지 않는다. leader가 확인응답을 보내고, follower에게 복제가 되기 전에 leader가 fail되면, 해당 메시지는 손실될 수 있다.
    
    3. `Ack=all` or `Ack=-1` : 프로듀서는 자신이 보낸 메시지에 대해 카프카의 leader와 follower까지 받았는지 기다림. 
    > 모든 팔로워가 복제 처리된 것을 확인하므로 메시지가 손실될 확률은 거의 없지만 리더와 팔로워간 레이턴시가 길 경우 프로듀서에서 타임아웃이 발생 할 수 있다.
    
    ![producing-to-partitions](./assets/producing-to-partitions.png)

- `Ack` 설정에 따른 성능은 다음과 같다.  
![producer-performance-tuning-for-apache-kafka-25-638](./assets/producer-performance-tuning-for-apache-kafka-25-638.jpg)


#### 6. 구성 설정 레퍼런스ㄹ
C/P: C는 컨슈머, P는 프로듀서, `*` 는 컨슈머, 프로듀서 공통

Property                                 | C/P | Range           |       Default | Importance | Description              
-----------------------------------------|-----|-----------------|--------------:|------------| --------------------------
builtin.features                         |  *  |                 | gzip, snappy, ssl, sasl, regex, lz4, sasl_gssapi, sasl_plain, sasl_scram, plugins, zstd, sasl_oauthbearer | low        | Indicates the builtin features for this build of librdkafka. An application can either query this value or attempt to set it with its list of required features to check for library support. <br>*Type: CSV flags*
client.id                                |  *  |                 |       rdkafka | low        | Client identifier. <br>*Type: string*
metadata.broker.list                     |  *  |                 |               | high       | Initial list of brokers as a CSV list of broker host or host:port. The application may also use `rd_kafka_brokers_add()` to add brokers during runtime. <br>*Type: string*
bootstrap.servers                        |  *  |                 |               | high       | Alias for `metadata.broker.list`: Initial list of brokers as a CSV list of broker host or host:port. The application may also use `rd_kafka_brokers_add()` to add brokers during runtime. <br>*Type: string*
message.max.bytes                        |  *  | 1000 .. 1000000000 |       1000000 | medium     | Maximum Kafka protocol request message size. <br>*Type: integer*
message.copy.max.bytes                   |  *  | 0 .. 1000000000 |         65535 | low        | Maximum size for message to be copied to buffer. Messages larger than this will be passed by reference (zero-copy) at the expense of larger iovecs. <br>*Type: integer*
receive.message.max.bytes                |  *  | 1000 .. 2147483647 |     100000000 | medium     | Maximum Kafka protocol response message size. This serves as a safety precaution to avoid memory exhaustion in case of protocol hickups. This value must be at least `fetch.max.bytes`  + 512 to allow for protocol overhead; the value is adjusted automatically unless the configuration property is explicitly set. <br>*Type: integer*
max.in.flight.requests.per.connection    |  *  | 1 .. 1000000    |       1000000 | low        | Maximum number of in-flight requests per broker connection. This is a generic property applied to all broker communication, however it is primarily relevant to produce requests. In particular, note that other mechanisms limit the number of outstanding consumer fetch request per broker to one. <br>*Type: integer*
max.in.flight                            |  *  | 1 .. 1000000    |       1000000 | low        | Alias for `max.in.flight.requests.per.connection`: Maximum number of in-flight requests per broker connection. This is a generic property applied to all broker communication, however it is primarily relevant to produce requests. In particular, note that other mechanisms limit the number of outstanding consumer fetch request per broker to one. <br>*Type: integer*
metadata.request.timeout.ms              |  *  | 10 .. 900000    |         60000 | low        | Non-topic request timeout in milliseconds. This is for metadata requests, etc. <br>*Type: integer*
topic.metadata.refresh.interval.ms       |  *  | -1 .. 3600000   |        300000 | low        | Topic metadata refresh interval in milliseconds. The metadata is automatically refreshed on error and connect. Use -1 to disable the intervalled refresh. <br>*Type: integer*
metadata.max.age.ms                      |  *  | 1 .. 86400000   |        900000 | low        | Metadata cache max age. Defaults to topic.metadata.refresh.interval.ms * 3 <br>*Type: integer*
topic.metadata.refresh.fast.interval.ms  |  *  | 1 .. 60000      |           250 | low        | When a topic loses its leader a new metadata request will be enqueued with this initial interval, exponentially increasing until the topic metadata has been refreshed. This is used to recover quickly from transitioning leader brokers. <br>*Type: integer*
topic.metadata.refresh.fast.cnt          |  *  | 0 .. 1000       |            10 | low        | **DEPRECATED** No longer used. <br>*Type: integer*
topic.metadata.refresh.sparse            |  *  | true, false     |          true | low        | Sparse metadata requests (consumes less network bandwidth) <br>*Type: boolean*
topic.blacklist                          |  *  |                 |               | low        | Topic blacklist, a comma-separated list of regular expressions for matching topic names that should be ignored in broker metadata information as if the topics did not exist. <br>*Type: pattern list*
debug                                    |  *  | generic, broker, topic, metadata, feature, queue, msg, protocol, cgrp, security, fetch, interceptor, plugin, consumer, admin, eos, all |               | medium     | A comma-separated list of debug contexts to enable. Detailed Producer debugging: broker,topic,msg. Consumer: consumer,cgrp,topic,fetch <br>*Type: CSV flags*
socket.timeout.ms                        |  *  | 10 .. 300000    |         60000 | low        | Default timeout for network requests. Producer: ProduceRequests will use the lesser value of `socket.timeout.ms` and remaining `message.timeout.ms` for the first message in the batch. Consumer: FetchRequests will use `fetch.wait.max.ms` + `socket.timeout.ms`. Admin: Admin requests will use `socket.timeout.ms` or explicitly set `rd_kafka_AdminOptions_set_operation_timeout()` value. <br>*Type: integer*
socket.blocking.max.ms                   |  *  | 1 .. 60000      |          1000 | low        | **DEPRECATED** No longer used. <br>*Type: integer*
socket.send.buffer.bytes                 |  *  | 0 .. 100000000  |             0 | low        | Broker socket send buffer size. System default is used if 0. <br>*Type: integer*
socket.receive.buffer.bytes              |  *  | 0 .. 100000000  |             0 | low        | Broker socket receive buffer size. System default is used if 0. <br>*Type: integer*
socket.keepalive.enable                  |  *  | true, false     |         false | low        | Enable TCP keep-alives (SO_KEEPALIVE) on broker sockets <br>*Type: boolean*
socket.nagle.disable                     |  *  | true, false     |         false | low        | Disable the Nagle algorithm (TCP_NODELAY) on broker sockets. <br>*Type: boolean*
socket.max.fails                         |  *  | 0 .. 1000000    |             1 | low        | Disconnect from broker when this number of send failures (e.g., timed out requests) is reached. Disable with 0. WARNING: It is highly recommended to leave this setting at its default value of 1 to avoid the client and broker to become desynchronized in case of request timeouts. NOTE: The connection is automatically re-established. <br>*Type: integer*
broker.address.ttl                       |  *  | 0 .. 86400000   |          1000 | low        | How long to cache the broker address resolving results (milliseconds). <br>*Type: integer*
broker.address.family                    |  *  | any, v4, v6     |           any | low        | Allowed broker IP address families: any, v4, v6 <br>*Type: enum value*
reconnect.backoff.jitter.ms              |  *  | 0 .. 3600000    |             0 | low        | **DEPRECATED** No longer used. See `reconnect.backoff.ms` and `reconnect.backoff.max.ms`. <br>*Type: integer*
reconnect.backoff.ms                     |  *  | 0 .. 3600000    |           100 | medium     | The initial time to wait before reconnecting to a broker after the connection has been closed. The time is increased exponentially until `reconnect.backoff.max.ms` is reached. -25% to +50% jitter is applied to each reconnect backoff. A value of 0 disables the backoff and reconnects immediately. <br>*Type: integer*
reconnect.backoff.max.ms                 |  *  | 0 .. 3600000    |         10000 | medium     | The maximum time to wait before reconnecting to a broker after the connection has been closed. <br>*Type: integer*
statistics.interval.ms                   |  *  | 0 .. 86400000   |             0 | high       | librdkafka statistics emit interval. The application also needs to register a stats callback using `rd_kafka_conf_set_stats_cb()`. The granularity is 1000ms. A value of 0 disables statistics. <br>*Type: integer*
enabled_events                           |  *  | 0 .. 2147483647 |             0 | low        | See `rd_kafka_conf_set_events()` <br>*Type: integer*
error_cb                                 |  *  |                 |               | low        | Error callback (set with rd_kafka_conf_set_error_cb()) <br>*Type: pointer*
throttle_cb                              |  *  |                 |               | low        | Throttle callback (set with rd_kafka_conf_set_throttle_cb()) <br>*Type: pointer*
stats_cb                                 |  *  |                 |               | low        | Statistics callback (set with rd_kafka_conf_set_stats_cb()) <br>*Type: pointer*
log_cb                                   |  *  |                 |               | low        | Log callback (set with rd_kafka_conf_set_log_cb()) <br>*Type: pointer*
log_level                                |  *  | 0 .. 7          |             6 | low        | Logging level (syslog(3) levels) <br>*Type: integer*
log.queue                                |  *  | true, false     |         false | low        | Disable spontaneous log_cb from internal librdkafka threads, instead enqueue log messages on queue set with `rd_kafka_set_log_queue()` and serve log callbacks or events through the standard poll APIs. **NOTE**: Log messages will linger in a temporary queue until the log queue has been set. <br>*Type: boolean*
log.thread.name                          |  *  | true, false     |          true | low        | Print internal thread name in log messages (useful for debugging librdkafka internals) <br>*Type: boolean*
log.connection.close                     |  *  | true, false     |          true | low        | Log broker disconnects. It might be useful to turn this off when interacting with 0.9 brokers with an aggressive `connection.max.idle.ms` value. <br>*Type: boolean*
background_event_cb                      |  *  |                 |               | low        | Background queue event callback (set with rd_kafka_conf_set_background_event_cb()) <br>*Type: pointer*
socket_cb                                |  *  |                 |               | low        | Socket creation callback to provide race-free CLOEXEC <br>*Type: pointer*
connect_cb                               |  *  |                 |               | low        | Socket connect callback <br>*Type: pointer*
closesocket_cb                           |  *  |                 |               | low        | Socket close callback <br>*Type: pointer*
open_cb                                  |  *  |                 |               | low        | File open callback to provide race-free CLOEXEC <br>*Type: pointer*
opaque                                   |  *  |                 |               | low        | Application opaque (set with rd_kafka_conf_set_opaque()) <br>*Type: pointer*
default_topic_conf                       |  *  |                 |               | low        | Default topic configuration for automatically subscribed topics <br>*Type: pointer*
internal.termination.signal              |  *  | 0 .. 128        |             0 | low        | Signal that librdkafka will use to quickly terminate on rd_kafka_destroy(). If this signal is not set then there will be a delay before rd_kafka_wait_destroyed() returns true as internal threads are timing out their system calls. If this signal is set however the delay will be minimal. The application should mask this signal as an internal signal handler is installed. <br>*Type: integer*
api.version.request                      |  *  | true, false     |          true | high       | Request broker's supported API versions to adjust functionality to available protocol features. If set to false, or the ApiVersionRequest fails, the fallback version `broker.version.fallback` will be used. **NOTE**: Depends on broker version >=0.10.0. If the request is not supported by (an older) broker the `broker.version.fallback` fallback is used. <br>*Type: boolean*
api.version.request.timeout.ms           |  *  | 1 .. 300000     |         10000 | low        | Timeout for broker API version requests. <br>*Type: integer*
api.version.fallback.ms                  |  *  | 0 .. 604800000  |             0 | medium     | Dictates how long the `broker.version.fallback` fallback is used in the case the ApiVersionRequest fails. **NOTE**: The ApiVersionRequest is only issued when a new connection to the broker is made (such as after an upgrade). <br>*Type: integer*
broker.version.fallback                  |  *  |                 |        0.10.0 | medium     | Older broker versions (before 0.10.0) provide no way for a client to query for supported protocol features (ApiVersionRequest, see `api.version.request`) making it impossible for the client to know what features it may use. As a workaround a user may set this property to the expected broker version and the client will automatically adjust its feature set accordingly if the ApiVersionRequest fails (or is disabled). The fallback broker version will be used for `api.version.fallback.ms`. Valid values are: 0.9.0, 0.8.2, 0.8.1, 0.8.0. Any other value >= 0.10, such as 0.10.2.1, enables ApiVersionRequests. <br>*Type: string*
security.protocol                        |  *  | plaintext, ssl, sasl_plaintext, sasl_ssl |     plaintext | high       | Protocol used to communicate with brokers. <br>*Type: enum value*
ssl.cipher.suites                        |  *  |                 |               | low        | A cipher suite is a named combination of authentication, encryption, MAC and key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol. See manual page for `ciphers(1)` and `SSL_CTX_set_cipher_list(3). <br>*Type: string*
ssl.curves.list                          |  *  |                 |               | low        | The supported-curves extension in the TLS ClientHello message specifies the curves (standard/named, or 'explicit' GF(2^k) or GF(p)) the client is willing to have the server use. See manual page for `SSL_CTX_set1_curves_list(3)`. OpenSSL >= 1.0.2 required. <br>*Type: string*
ssl.sigalgs.list                         |  *  |                 |               | low        | The client uses the TLS ClientHello signature_algorithms extension to indicate to the server which signature/hash algorithm pairs may be used in digital signatures. See manual page for `SSL_CTX_set1_sigalgs_list(3)`. OpenSSL >= 1.0.2 required. <br>*Type: string*
ssl.key.location                         |  *  |                 |               | low        | Path to client's private key (PEM) used for authentication. <br>*Type: string*
ssl.key.password                         |  *  |                 |               | low        | Private key passphrase (for use with `ssl.key.location` and `set_ssl_cert()`) <br>*Type: string*
ssl.key.pem                              |  *  |                 |               | low        | Client's private key string (PEM format) used for authentication. <br>*Type: string*
ssl_key                                  |  *  |                 |               | low        | Client's private key as set by rd_kafka_conf_set_ssl_cert() <br>*Type: *
ssl.certificate.location                 |  *  |                 |               | low        | Path to client's public key (PEM) used for authentication. <br>*Type: string*
ssl.certificate.pem                      |  *  |                 |               | low        | Client's public key string (PEM format) used for authentication. <br>*Type: string*
ssl_certificate                          |  *  |                 |               | low        | Client's public key as set by rd_kafka_conf_set_ssl_cert() <br>*Type: *
ssl.ca.location                          |  *  |                 |               | low        | File or directory path to CA certificate(s) for verifying the broker's key. <br>*Type: string*
ssl_ca                                   |  *  |                 |               | low        | CA certificate as set by rd_kafka_conf_set_ssl_cert() <br>*Type: *
ssl.crl.location                         |  *  |                 |               | low        | Path to CRL for verifying broker's certificate validity. <br>*Type: string*
ssl.keystore.location                    |  *  |                 |               | low        | Path to client's keystore (PKCS#12) used for authentication. <br>*Type: string*
ssl.keystore.password                    |  *  |                 |               | low        | Client's keystore (PKCS#12) password. <br>*Type: string*
enable.ssl.certificate.verification      |  *  | true, false     |          true | low        | Enable OpenSSL's builtin broker (server) certificate verification. This verification can be extended by the application by implementing a certificate_verify_cb. <br>*Type: boolean*
ssl.endpoint.identification.algorithm    |  *  | none, https     |          none | low        | Endpoint identification algorithm to validate broker hostname using broker certificate. https - Server (broker) hostname verification as specified in RFC2818. none - No endpoint verification. OpenSSL >= 1.0.2 required. <br>*Type: enum value*
ssl.certificate.verify_cb                |  *  |                 |               | low        | Callback to verify the broker certificate chain. <br>*Type: pointer*
sasl.mechanisms                          |  *  |                 |        GSSAPI | high       | SASL mechanism to use for authentication. Supported: GSSAPI, PLAIN, SCRAM-SHA-256, SCRAM-SHA-512, OAUTHBEARER. **NOTE**: Despite the name only one mechanism must be configured. <br>*Type: string*
sasl.mechanism                           |  *  |                 |        GSSAPI | high       | Alias for `sasl.mechanisms`: SASL mechanism to use for authentication. Supported: GSSAPI, PLAIN, SCRAM-SHA-256, SCRAM-SHA-512, OAUTHBEARER. **NOTE**: Despite the name only one mechanism must be configured. <br>*Type: string*
sasl.kerberos.service.name               |  *  |                 |         kafka | low        | Kerberos principal name that Kafka runs as, not including /hostname@REALM <br>*Type: string*
sasl.kerberos.principal                  |  *  |                 |   kafkaclient | low        | This client's Kerberos principal name. (Not supported on Windows, will use the logon user's principal). <br>*Type: string*
sasl.kerberos.kinit.cmd                  |  *  |                 | kinit -R -t "%{sasl.kerberos.keytab}" -k %{sasl.kerberos.principal} || kinit -t "%{sasl.kerberos.keytab}" -k %{sasl.kerberos.principal} | low        | Shell command to refresh or acquire the client's Kerberos ticket. This command is executed on client creation and every sasl.kerberos.min.time.before.relogin (0=disable). %{config.prop.name} is replaced by corresponding config object value. <br>*Type: string*
sasl.kerberos.keytab                     |  *  |                 |               | low        | Path to Kerberos keytab file. This configuration property is only used as a variable in `sasl.kerberos.kinit.cmd` as ` ... -t "%{sasl.kerberos.keytab}"`. <br>*Type: string*
sasl.kerberos.min.time.before.relogin    |  *  | 0 .. 86400000   |         60000 | low        | Minimum time in milliseconds between key refresh attempts. Disable automatic key refresh by setting this property to 0. <br>*Type: integer*
sasl.username                            |  *  |                 |               | high       | SASL username for use with the PLAIN and SASL-SCRAM-.. mechanisms <br>*Type: string*
sasl.password                            |  *  |                 |               | high       | SASL password for use with the PLAIN and SASL-SCRAM-.. mechanism <br>*Type: string*
sasl.oauthbearer.config                  |  *  |                 |               | low        | SASL/OAUTHBEARER configuration. The format is implementation-dependent and must be parsed accordingly. The default unsecured token implementation (see https://tools.ietf.org/html/rfc7515#appendix-A.5) recognizes space-separated name=value pairs with valid names including principalClaimName, principal, scopeClaimName, scope, and lifeSeconds. The default value for principalClaimName is "sub", the default value for scopeClaimName is "scope", and the default value for lifeSeconds is 3600. The scope value is CSV format with the default value being no/empty scope. For example: `principalClaimName=azp principal=admin scopeClaimName=roles scope=role1,role2 lifeSeconds=600`. In addition, SASL extensions can be communicated to the broker via `extension_<extensionname>=value`. For example: `principal=admin extension_traceId=123` <br>*Type: string*
enable.sasl.oauthbearer.unsecure.jwt     |  *  | true, false     |         false | low        | Enable the builtin unsecure JWT OAUTHBEARER token handler if no oauthbearer_refresh_cb has been set. This builtin handler should only be used for development or testing, and not in production. <br>*Type: boolean*
oauthbearer_token_refresh_cb             |  *  |                 |               | low        | SASL/OAUTHBEARER token refresh callback (set with rd_kafka_conf_set_oauthbearer_token_refresh_cb(), triggered by rd_kafka_poll(), et.al. This callback will be triggered when it is time to refresh the client's OAUTHBEARER token. <br>*Type: pointer*
plugin.library.paths                     |  *  |                 |               | low        | List of plugin libraries to load (; separated). The library search path is platform dependent (see dlopen(3) for Unix and LoadLibrary() for Windows). If no filename extension is specified the platform-specific extension (such as .dll or .so) will be appended automatically. <br>*Type: string*
interceptors                             |  *  |                 |               | low        | Interceptors added through rd_kafka_conf_interceptor_add_..() and any configuration handled by interceptors. <br>*Type: *
group.id                                 |  C  |                 |               | high       | Client group id string. All clients sharing the same group.id belong to the same group. <br>*Type: string*
partition.assignment.strategy            |  C  |                 | range,roundrobin | medium     | Name of partition assignment strategy to use when elected group leader assigns partitions to group members. <br>*Type: string*
session.timeout.ms                       |  C  | 1 .. 3600000    |         10000 | high       | Client group session and failure detection timeout. The consumer sends periodic heartbeats (heartbeat.interval.ms) to indicate its liveness to the broker. If no hearts are received by the broker for a group member within the session timeout, the broker will remove the consumer from the group and trigger a rebalance. The allowed range is configured with the **broker** configuration properties `group.min.session.timeout.ms` and `group.max.session.timeout.ms`. Also see `max.poll.interval.ms`. <br>*Type: integer*
heartbeat.interval.ms                    |  C  | 1 .. 3600000    |          3000 | low        | Group session keepalive heartbeat interval. <br>*Type: integer*
group.protocol.type                      |  C  |                 |      consumer | low        | Group protocol type <br>*Type: string*
coordinator.query.interval.ms            |  C  | 1 .. 3600000    |        600000 | low        | How often to query for the current client group coordinator. If the currently assigned coordinator is down the configured query interval will be divided by ten to more quickly recover in case of coordinator reassignment. <br>*Type: integer*
max.poll.interval.ms                     |  C  | 1 .. 86400000   |        300000 | high       | Maximum allowed time between calls to consume messages (e.g., rd_kafka_consumer_poll()) for high-level consumers. If this interval is exceeded the consumer is considered failed and the group will rebalance in order to reassign the partitions to another consumer group member. Warning: Offset commits may be not possible at this point. Note: It is recommended to set `enable.auto.offset.store=false` for long-time processing applications and then explicitly store offsets (using offsets_store()) *after* message processing, to make sure offsets are not auto-committed prior to processing has finished. The interval is checked two times per second. See KIP-62 for more information. <br>*Type: integer*
enable.auto.commit                       |  C  | true, false     |          true | high       | Automatically and periodically commit offsets in the background. Note: setting this to false does not prevent the consumer from fetching previously committed start offsets. To circumvent this behaviour set specific start offsets per partition in the call to assign(). <br>*Type: boolean*
auto.commit.interval.ms                  |  C  | 0 .. 86400000   |          5000 | medium     | The frequency in milliseconds that the consumer offsets are committed (written) to offset storage. (0 = disable). This setting is used by the high-level consumer. <br>*Type: integer*
enable.auto.offset.store                 |  C  | true, false     |          true | high       | Automatically store offset of last message provided to application. The offset store is an in-memory store of the next offset to (auto-)commit for each partition. <br>*Type: boolean*
queued.min.messages                      |  C  | 1 .. 10000000   |        100000 | medium     | Minimum number of messages per topic+partition librdkafka tries to maintain in the local consumer queue. <br>*Type: integer*
queued.max.messages.kbytes               |  C  | 1 .. 2097151    |       1048576 | medium     | Maximum number of kilobytes per topic+partition in the local consumer queue. This value may be overshot by fetch.message.max.bytes. This property has higher priority than queued.min.messages. <br>*Type: integer*
fetch.wait.max.ms                        |  C  | 0 .. 300000     |           100 | low        | Maximum time the broker may wait to fill the response with fetch.min.bytes. <br>*Type: integer*
fetch.message.max.bytes                  |  C  | 1 .. 1000000000 |       1048576 | medium     | Initial maximum number of bytes per topic+partition to request when fetching messages from the broker. If the client encounters a message larger than this value it will gradually try to increase it until the entire message can be fetched. <br>*Type: integer*
max.partition.fetch.bytes                |  C  | 1 .. 1000000000 |       1048576 | medium     | Alias for `fetch.message.max.bytes`: Initial maximum number of bytes per topic+partition to request when fetching messages from the broker. If the client encounters a message larger than this value it will gradually try to increase it until the entire message can be fetched. <br>*Type: integer*
fetch.max.bytes                          |  C  | 0 .. 2147483135 |      52428800 | medium     | Maximum amount of data the broker shall return for a Fetch request. Messages are fetched in batches by the consumer and if the first message batch in the first non-empty partition of the Fetch request is larger than this value, then the message batch will still be returned to ensure the consumer can make progress. The maximum message batch size accepted by the broker is defined via `message.max.bytes` (broker config) or `max.message.bytes` (broker topic config). `fetch.max.bytes` is automatically adjusted upwards to be at least `message.max.bytes` (consumer config). <br>*Type: integer*
fetch.min.bytes                          |  C  | 1 .. 100000000  |             1 | low        | Minimum number of bytes the broker responds with. If fetch.wait.max.ms expires the accumulated data will be sent to the client regardless of this setting. <br>*Type: integer*
fetch.error.backoff.ms                   |  C  | 0 .. 300000     |           500 | medium     | How long to postpone the next fetch request for a topic+partition in case of a fetch error. <br>*Type: integer*
offset.store.method                      |  C  | none, file, broker |        broker | low        | **DEPRECATED** Offset commit store method: 'file' - DEPRECATED: local file store (offset.store.path, et.al), 'broker' - broker commit store (requires Apache Kafka 0.8.2 or later on the broker). <br>*Type: enum value*
consume_cb                               |  C  |                 |               | low        | Message consume callback (set with rd_kafka_conf_set_consume_cb()) <br>*Type: pointer*
rebalance_cb                             |  C  |                 |               | low        | Called after consumer group has been rebalanced (set with rd_kafka_conf_set_rebalance_cb()) <br>*Type: pointer*
offset_commit_cb                         |  C  |                 |               | low        | Offset commit result propagation callback. (set with rd_kafka_conf_set_offset_commit_cb()) <br>*Type: pointer*
enable.partition.eof                     |  C  | true, false     |         false | low        | Emit RD_KAFKA_RESP_ERR__PARTITION_EOF event whenever the consumer reaches the end of a partition. <br>*Type: boolean*
check.crcs                               |  C  | true, false     |         false | medium     | Verify CRC32 of consumed messages, ensuring no on-the-wire or on-disk corruption to the messages occurred. This check comes at slightly increased CPU usage. <br>*Type: boolean*
enable.idempotence                       |  P  | true, false     |         false | high       | When set to `true`, the producer will ensure that messages are successfully produced exactly once and in the original produce order. The following configuration properties are adjusted automatically (if not modified by the user) when idempotence is enabled: `max.in.flight.requests.per.connection=5` (must be less than or equal to 5), `retries=INT32_MAX` (must be greater than 0), `acks=all`, `queuing.strategy=fifo`. Producer instantation will fail if user-supplied configuration is incompatible. <br>*Type: boolean*
enable.gapless.guarantee                 |  P  | true, false     |         false | low        | **EXPERIMENTAL**: subject to change or removal. When set to `true`, any error that could result in a gap in the produced message series when a batch of messages fails, will raise a fatal error (ERR__GAPLESS_GUARANTEE) and stop the producer. Messages failing due to `message.timeout.ms` are not covered by this guarantee. Requires `enable.idempotence=true`. <br>*Type: boolean*
queue.buffering.max.messages             |  P  | 1 .. 10000000   |        100000 | high       | Maximum number of messages allowed on the producer queue. This queue is shared by all topics and partitions. <br>*Type: integer*
queue.buffering.max.kbytes               |  P  | 1 .. 2097151    |       1048576 | high       | Maximum total message size sum allowed on the producer queue. This queue is shared by all topics and partitions. This property has higher priority than queue.buffering.max.messages. <br>*Type: integer*
queue.buffering.max.ms                   |  P  | 0 .. 900000     |             0 | high       | Delay in milliseconds to wait for messages in the producer queue to accumulate before constructing message batches (MessageSets) to transmit to brokers. A higher value allows larger and more effective (less overhead, improved compression) batches of messages to accumulate at the expense of increased message delivery latency. <br>*Type: integer*
linger.ms                                |  P  | 0 .. 900000     |             0 | high       | Alias for `queue.buffering.max.ms`: Delay in milliseconds to wait for messages in the producer queue to accumulate before constructing message batches (MessageSets) to transmit to brokers. A higher value allows larger and more effective (less overhead, improved compression) batches of messages to accumulate at the expense of increased message delivery latency. <br>*Type: integer*
message.send.max.retries                 |  P  | 0 .. 10000000   |             2 | high       | How many times to retry sending a failing Message. **Note:** retrying may cause reordering unless `enable.idempotence` is set to true. <br>*Type: integer*
retries                                  |  P  | 0 .. 10000000   |             2 | high       | Alias for `message.send.max.retries`: How many times to retry sending a failing Message. **Note:** retrying may cause reordering unless `enable.idempotence` is set to true. <br>*Type: integer*
retry.backoff.ms                         |  P  | 1 .. 300000     |           100 | medium     | The backoff time in milliseconds before retrying a protocol request. <br>*Type: integer*
queue.buffering.backpressure.threshold   |  P  | 1 .. 1000000    |             1 | low        | The threshold of outstanding not yet transmitted broker requests needed to backpressure the producer's message accumulator. If the number of not yet transmitted requests equals or exceeds this number, produce request creation that would have otherwise been triggered (for example, in accordance with linger.ms) will be delayed. A lower number yields larger and more effective batches. A higher value can improve latency when using compression on slow machines. <br>*Type: integer*
compression.codec                        |  P  | none, gzip, snappy, lz4, zstd |          none | medium     | compression codec to use for compressing message sets. This is the default value for all topics, may be overridden by the topic configuration property `compression.codec`.  <br>*Type: enum value*
compression.type                         |  P  | none, gzip, snappy, lz4, zstd |          none | medium     | Alias for `compression.codec`: compression codec to use for compressing message sets. This is the default value for all topics, may be overridden by the topic configuration property `compression.codec`.  <br>*Type: enum value*
batch.num.messages                       |  P  | 1 .. 1000000    |         10000 | medium     | Maximum number of messages batched in one MessageSet. The total MessageSet size is also limited by message.max.bytes. <br>*Type: integer*
delivery.report.only.error               |  P  | true, false     |         false | low        | Only provide delivery reports for failed messages. <br>*Type: boolean*
dr_cb                                    |  P  |                 |               | low        | Delivery report callback (set with rd_kafka_conf_set_dr_cb()) <br>*Type: pointer*
dr_msg_cb                                |  P  |                 |               | low        | Delivery report callback (set with rd_kafka_conf_set_dr_msg_cb()) <br>*Type: pointer*


## 토픽관련 설정 옵션

Property                                 | C/P | Range           |       Default | Importance | Description              
-----------------------------------------|-----|-----------------|--------------:|------------| --------------------------
request.required.acks                    |  P  | -1 .. 1000      |            -1 | high       | This field indicates the number of acknowledgements the leader broker must receive from ISR brokers before responding to the request: *0*=Broker does not send any response/ack to client, *-1* or *all*=Broker will block until message is committed by all in sync replicas (ISRs). If there are less than `min.insync.replicas` (broker configuration) in the ISR set the produce request will fail. <br>*Type: integer*
acks                                     |  P  | -1 .. 1000      |            -1 | high       | Alias for `request.required.acks`: This field indicates the number of acknowledgements the leader broker must receive from ISR brokers before responding to the request: *0*=Broker does not send any response/ack to client, *-1* or *all*=Broker will block until message is committed by all in sync replicas (ISRs). If there are less than `min.insync.replicas` (broker configuration) in the ISR set the produce request will fail. <br>*Type: integer*
request.timeout.ms                       |  P  | 1 .. 900000     |          5000 | medium     | The ack timeout of the producer request in milliseconds. This value is only enforced by the broker and relies on `request.required.acks` being != 0. <br>*Type: integer*
message.timeout.ms                       |  P  | 0 .. 2147483647 |        300000 | high       | Local message timeout. This value is only enforced locally and limits the time a produced message waits for successful delivery. A time of 0 is infinite. This is the maximum time librdkafka may use to deliver a message (including retries). Delivery error occurs when either the retry count or the message timeout are exceeded. <br>*Type: integer*
delivery.timeout.ms                      |  P  | 0 .. 2147483647 |        300000 | high       | Alias for `message.timeout.ms`: Local message timeout. This value is only enforced locally and limits the time a produced message waits for successful delivery. A time of 0 is infinite. This is the maximum time librdkafka may use to deliver a message (including retries). Delivery error occurs when either the retry count or the message timeout are exceeded. <br>*Type: integer*
queuing.strategy                         |  P  | fifo, lifo      |          fifo | low        | **EXPERIMENTAL**: subject to change or removal. **DEPRECATED** Producer queuing strategy. FIFO preserves produce ordering, while LIFO prioritizes new messages. <br>*Type: enum value*
produce.offset.report                    |  P  | true, false     |         false | low        | **DEPRECATED** No longer used. <br>*Type: boolean*
partitioner                              |  P  |                 | consistent_random | high       | Partitioner: `random` - random distribution, `consistent` - CRC32 hash of key (Empty and NULL keys are mapped to single partition), `consistent_random` - CRC32 hash of key (Empty and NULL keys are randomly partitioned), `murmur2` - Java Producer compatible Murmur2 hash of key (NULL keys are mapped to single partition), `murmur2_random` - Java Producer compatible Murmur2 hash of key (NULL keys are randomly partitioned. This is functionally equivalent to the default partitioner in the Java Producer.). <br>*Type: string*
partitioner_cb                           |  P  |                 |               | low        | Custom partitioner callback (set with rd_kafka_topic_conf_set_partitioner_cb()) <br>*Type: pointer*
msg_order_cmp                            |  P  |                 |               | low        | **EXPERIMENTAL**: subject to change or removal. **DEPRECATED** Message queue ordering comparator (set with rd_kafka_topic_conf_set_msg_order_cmp()). Also see `queuing.strategy`. <br>*Type: pointer*
opaque                                   |  *  |                 |               | low        | Application opaque (set with rd_kafka_topic_conf_set_opaque()) <br>*Type: pointer*
compression.codec                        |  P  | none, gzip, snappy, lz4, zstd, inherit |       inherit | high       | Compression codec to use for compressing message sets. inherit = inherit global compression.codec configuration. <br>*Type: enum value*
compression.type                         |  P  | none, gzip, snappy, lz4, zstd |          none | medium     | Alias for `compression.codec`: compression codec to use for compressing message sets. This is the default value for all topics, may be overridden by the topic configuration property `compression.codec`.  <br>*Type: enum value*
compression.level                        |  P  | -1 .. 12        |            -1 | medium     | Compression level parameter for algorithm selected by configuration property `compression.codec`. Higher values will result in better compression at the cost of more CPU usage. Usable range is algorithm-dependent: [0-9] for gzip; [0-12] for lz4; only 0 for snappy; -1 = codec-dependent default compression level. <br>*Type: integer*
auto.commit.enable                       |  C  | true, false     |          true | low        | **DEPRECATED** [**LEGACY PROPERTY:** This property is used by the simple legacy consumer only. When using the high-level KafkaConsumer, the global `enable.auto.commit` property must be used instead]. If true, periodically commit offset of the last message handed to the application. This committed offset will be used when the process restarts to pick up where it left off. If false, the application will have to call `rd_kafka_offset_store()` to store an offset (optional). **NOTE:** There is currently no zookeeper integration, offsets will be written to broker or local file according to offset.store.method. <br>*Type: boolean*
enable.auto.commit                       |  C  | true, false     |          true | low        | **DEPRECATED** Alias for `auto.commit.enable`: [**LEGACY PROPERTY:** This property is used by the simple legacy consumer only. When using the high-level KafkaConsumer, the global `enable.auto.commit` property must be used instead]. If true, periodically commit offset of the last message handed to the application. This committed offset will be used when the process restarts to pick up where it left off. If false, the application will have to call `rd_kafka_offset_store()` to store an offset (optional). **NOTE:** There is currently no zookeeper integration, offsets will be written to broker or local file according to offset.store.method. <br>*Type: boolean*
auto.commit.interval.ms                  |  C  | 10 .. 86400000  |         60000 | high       | [**LEGACY PROPERTY:** This setting is used by the simple legacy consumer only. When using the high-level KafkaConsumer, the global `auto.commit.interval.ms` property must be used instead]. The frequency in milliseconds that the consumer offsets are committed (written) to offset storage. <br>*Type: integer*
auto.offset.reset                        |  C  | smallest, earliest, beginning, largest, latest, end, error |       largest | high       | Action to take when there is no initial offset in offset store or the desired offset is out of range: 'smallest','earliest' - automatically reset the offset to the smallest offset, 'largest','latest' - automatically reset the offset to the largest offset, 'error' - trigger an error which is retrieved by consuming messages and checking 'message->err'. <br>*Type: enum value*
offset.store.path                        |  C  |                 |             . | low        | **DEPRECATED** Path to local file for storing offsets. If the path is a directory a filename will be automatically generated in that directory based on the topic and partition. File-based offset storage will be removed in a future version. <br>*Type: string*
offset.store.sync.interval.ms            |  C  | -1 .. 86400000  |            -1 | low        | **DEPRECATED** fsync() interval for the offset file, in milliseconds. Use -1 to disable syncing, and 0 for immediate sync after each write. File-based offset storage will be removed in a future version. <br>*Type: integer*
offset.store.method                      |  C  | file, broker    |        broker | low        | **DEPRECATED** Offset commit store method: 'file' - DEPRECATED: local file store (offset.store.path, et.al), 'broker' - broker commit store (requires "group.id" to be configured and Apache Kafka 0.8.2 or later on the broker.). <br>*Type: enum value*
consume.callback.max.messages            |  C  | 0 .. 1000000    |             0 | low        | Maximum number of messages to dispatch in one `rd_kafka_consume_callback*()` call (0 = unlimited) <br>*Type: integer*

### 5. Reference
https://www.slideshare.net/Naveen1914/kafka-eos  

https://www.cloudkarafka.com/blog/2019-04-10-apache-kafka-idempotent-producer-avoiding-message-duplication.html

https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/  

https://www.confluent.io/blog/transactions-apache-kafka/  

https://kafka.apache.org/10/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html  

https://kafka.apache.org/10/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html  

[카프카 운영자가 말하는 리플리케이션](https://www.popit.kr/kafka-%EC%9A%B4%EC%98%81%EC%9E%90%EA%B0%80-%EB%A7%90%ED%95%98%EB%8A%94-topic-replication/)

[카프카 성능 튜닝 관련]

https://www.slideshare.net/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600

https://kafka.apache.org/documentation/#design_filesystem

https://www.slideshare.net/baniuyao/kafka-24299168

https://epicdevs.com/17

[테스트 관련]

https://github.com/Vanlightly/ChaosTestingCode  

https://jack-vanlightly.com/blog/2018/10/25/testing-producer-deduplication-in-apache-kafka-and-apache-pulsar  

https://github.com/dpkp/kafka-python/tree/master/test  

https://github.com/confluentinc/confluent-kafka-python/tree/master/tests

---