# How to Install Confluent Apache Kafka?
- https://nijanthanravi.wordpress.com/2019/07/14/setup-confluent-kafka-on-windows/
- https://medium.com/@praveenkumarsingh/confluent-kafka-on-windows-how-to-fix-classpath-is-empty-cf7c31d9c787

### Python Dependencies:
- pip install confluent-kafka
- pip install avro-python3

### MySQL Script:
```
CREATE TABLE `testdb`.`products` (
  `product_id` INT NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(45) NOT NULL,
  `price` DECIMAL(6,2) NULL,
  `expiration_date` DATE NULL,
  PRIMARY KEY (`product_id`)
);

INSERT INTO testdb.products (name, price, expiration_date)
VALUES ('product name 1', 5.5, '2019-02-10');

UPDATE testdb.products SET price=0;

DELETE FROM testdb.products;
```

### How to Start Confluent Apache Kafka?
1. Open command prompt at confluent folder


2. Start Zookeeper<br>
   2.1 Execute ***bin\windows\zookeeper-server-start.bat etc\kafka\zookeeper.properties***<br>
   2.2 Default port is ```2181```, change at ***etc\kafka\zookeeper.properties*** if necessary<br>
   2.3 If encounter ```Classpath``` error:
   - Open ***bin\windows\kafka-run-class.bat***
   - Search for ```rem Classpath addition for core``` in the bat file
   - Add following lines above the search line:
   ```
   rem classpath addition for LSB style path
   if exist %BASE_DIR%\share\java\kafka\* (
     call:concat %BASE_DIR%\share\java\kafka\*
   )
   ```
   - Re-run zookeeper


3. Start Kafka Broker<br>
   3.1 Execute ***bin\windows\kafka-server-start.bat etc\kafka\server.properties***<br>
   3.2 Delete ***tmp\kafka-log*** & ***tmp\zookeeper*** folder if encounter ```ERROR Shutdown broker because all log dirs in tmp\kafka-logs have failed (kafka.log.LogManager)```


4. Debezium MySQL CDC Connector<br>
   4.1 Download:
   - https://www.confluent.io/hub/debezium/debezium-connector-mysql
   4.2 Create ***share\java\kafka\plugins*** folder, and move downloaded jar files to the folder
   - Ensure to check if there's latest debezium plugin version from official debezium website
   4.3 Ensure ```plugin.path``` is set as ```plugin.path=share/java``` on ***etc\kafka\connect-distributed.properties*** file
   4.4 Configure MySQL binlog:
   - Reference:
     - https://documentation.commvault.com/commvault/v11/article?p=34667.htm
     - https://debezium.io/documentation/reference/0.9/connectors/mysql.html
   - Comment out default ```log-bin``` value from ***my.ini*** file, and paste the following:
   ```
    log_bin=mysql-bin
    binlog_format=row
    binlog_row_image=full
    expire_logs_days=1
    gtid_mode=on
    enforce_gtid_consistency=on
    binlog_rows_query_log_events=on
   ```
   - Check MySQL variables:
   ```
   SHOW VARIABLES WHERE variable_name IN ('server_id','log_bin','binlog_format','binlog_row_image', 'expire_logs_days');
   SHOW VARIABLES WHERE variable_name IN ('gtid_mode', 'enforce_gtid_consistency', 'binlog_rows_query_log_events');
   ```


5. Start Connector<br>
   5.1 Execute ***bin\windows\connect-distributed.bat etc\kafka\connect-distributed.properties***<br>
   5.2 If encounter ```FileNotFoundException```:
   - Open ***bin\windows\connect-distributed.bat***
   - Search for ```rem Log4j settings``` in the bat file
   - Replace ```config/tools-log4j.properties``` with ```etc/kafka/tools-log4j.properties```
   - Re-run connector
   5.3 Check if http://localhost:8083/ is running


6. Start Debezium Source Connector (Producer)<br>
   6.1 Run following on bash prompt, or perform POST request on Postman:
   ```
    curl -X POST -H "Content-Type: application/json" http://localhost:8083/connectors -d '{
        "name": "debezium-mysql-source-connector",
        "config": {
            "connector.class": "io.debezium.connector.mysql.MySqlConnector",
            "tasks.max": 1,
            "database.hostname": "localhost",
            "database.port": "3306",
            "database.user": "root",
            "database.password": "root",
            "database.server.id": "184054",
            "database.server.name": "test_server",
            "database.whitelist": "testdb",
            "table.whitelist": "testdb.products",
            "database.history.kafka.bootstrap.servers": "localhost:9092",
            "database.history.kafka.topic": "TEST_TOPIC",
            "snapshot.mode": "schema_only"
        }
    }'
   ```
   6.2 If encounter unrecognized server timezone error:
   - Download SQL script (POSIX standard) to populate timezone data at: https://dev.mysql.com/downloads/timezones.html
   - Add ```USE mysql;``` at the beginning of script, and run it to populate timezone data
   - Change timezone once complete: ```SET GLOBAL time_zone='Asia/Kuala_Lumpur'```
   - Re-submit POST request
   6.3 Ensure connector & tasks are in "RUNNING" state:
   - http://localhost:8083/connectors/debezium-mysql-source-connector/status


7. List topic (Optional)<br>
   7.1 Execute ***bin\windows\kafka-topics.bat --list --bootstrap-server localhost:9092***<br>
   7.2 <font color='red'>**NOTE**</font>: Topic naming convention to listen on will be: ```<serverName>.<databaseName>.<tableName>```

In [None]:
from kafka import KafkaConsumer
import json

In [None]:
consumer = KafkaConsumer(bootstrap_servers=['localhost:9092'], auto_offset_reset='earliest')
consumer.topics()

In [None]:
consumer.subscribe(['test_server.testdb.products'])
for message in consumer:
    
    value = json.loads('{}' if message.value is None else message.value)
    value = {k: v for k,v in value.get('payload', {}).items() if k in ['before', 'after', 'op']}
    
    print(f'[CONSUMED]:')
    print(json.dumps(value, indent=2))
    print()