Couchbase, Confluent Kafka, and HBase

This covers the following items.

Setup Couchbase Kafka Connector
Setup Confluent Platform
Setup Confluent HBase Connector

Overview Diagram

Confluent System Requirements

https://docs.confluent.io/current/installation/system-requirements.html

OS

https://docs.confluent.io/current/installation/system-requirements.html#operating-systems

RHEL/CentOS 7.x
RHEL/CentOS 8.x

I confirmed with 7.x.

Environment

I confirmed with the following environment

Cloud: AWS

Confluent

OS: Red Hat Enterprise Linux (RHEL) 7 (HVM) (RHEL-7.7_HVM-20191119-x86_64-2-Hourly2-GP2)
Instance Type: t2.xlarge

Please note that Confluent Platform is resource-intensive. You will fail to register and start HBase Connector without enough memory.

Couchbase

OS: Red Hat Enterprise Linux (RHEL) 7 (HVM) (RHEL-7.7_HVM-20191119-x86_64-2-Hourly2-GP2)
Instance Type: t2.medium

Cloudera

You can prepare your Cloudera cluster using the script maintained in my repository, Single Node CDH Cluster.

Couchbase Setup

In this repository, we assume that you already have a Couchbase cluster to which you will setup Kafka Connector.

Optionally, you may refer to my shell scripts for Couchbase Cluster preparetion on AWS below.

couchbase-aws-scripts

Instance for Confluent Platform

When you can use shell scripts and AWS CLI, you can use the scripts here to create a new EC2 instance.

Otherwise, please create an instance that meets the above specifications, AMI and Instance Type, on your own.

Cofluent Platform Setup

1_1_confluent_kafka_setup.md

Kafka Connector Setup

Install

2_1_couchbase_kafka-connector_install.md

Configuration

2_2_couchbase_kafka-connector_config.md

Run Kafka Connector

After you restart your host, you need to run Kafka Connector every time.

cd $KAFKA_CONNECT_COUCHBASE_HOME
env CLASSPATH=lib/* \
    connect-standalone.sh $KAFKA_CONNECT_COUCHBASE_HOME/connect-confluent.properties \
                       etc/car-source.properties

HBase Connector Setup

Configuration

1_2_conflluent_hbase_connector_setup.md

Run Confluent Platform

After you restart your host, you need to run Confluent Platform (No need to register HBase Connector again).

$ confluent local start

Check Data Stored in HBase

Login to your Cloudera CDH Cluster and check the data in HBase.

$ hbase shell
...
hbase> scan 'cars'

Count command returns the number of rows in a table. It’s quite fast when configured with the right CACHE

hbase> count 'cars', CACHE => 1000

The above count fetches 1000 rows at a time. Set CACHE lower if your rows are big. Default is to fetch one row at a time.

For further details about hbase shell, please refer to Apache HBase ™ Reference Guide: https://hbase.apache.org/book.html#shell

You would easily find some other unofficial resources as follows:

Import data to Couchbase

When you import data to Couchbase, you will find the same data in HBase in real-time.

$ cbimport json -c couchbase://127.0.0.1 -u Administrator -p couchbase -b cars -d file://ten_cars.json  -f lines -g '#UUID#'

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
config		config
images		images
scripts		scripts
work		work
1_1_confluent_kafka_setup.md		1_1_confluent_kafka_setup.md
1_2_conflluent_hbase_connector_setup.md		1_2_conflluent_hbase_connector_setup.md
2_1_couchbase_kafka-connector_install.md		2_1_couchbase_kafka-connector_install.md
2_2_couchbase_kafka-connector_config.md		2_2_couchbase_kafka-connector_config.md
README.md		README.md

YoshiyukiKono/couchbase_confluent-kafka_hbase

Folders and files

Latest commit

History

Repository files navigation

Couchbase, Confluent Kafka, and HBase

Overview Diagram

Confluent System Requirements

OS

Environment

Confluent

Couchbase

Cloudera

Couchbase Setup

Instance for Confluent Platform

Cofluent Platform Setup

Kafka Connector Setup

Install

Configuration

Run Kafka Connector

HBase Connector Setup

Configuration

Run Confluent Platform

Check Data Stored in HBase

Import data to Couchbase

About

Resources

Stars

Watchers

Forks

Languages