flume-cassandra-sink

A Flume sink using Apache Cassandra

The Cassandra Sink will persist flume events to a Cassandra Cluster. The configuration is located in the flume config (see sample below.) Available config parameters:

hosts - comma separated list of Cassandra hosts the sink should connect to
port - [9160] Cassandra RPC port for client connections
cluster-name - [Logging] name of Cassandra Cluster (if changed, may have trouble with hector client stats)
keyspace-name - [logs] name of keyspace to use
records-colfam - [records] name of column family for storing log data
socket-timeout-millis - [5000] Hector client socket timeout
max-conns-per-host - [2] Hector client number of connections per Cassandra host
max-exhausted-wait-millis - [5000]

The Sink expects several flume event headers to be present:

key - used (combined with src) to create the Cassandra row key. It should be generated by the application doing the logging
timestamp - timestamp of when the log occurred, not necessarily when the flume event is created
src - A logical source of the flume event. Could be host, but probably you will have many hosts for a source. A more likely candidate for source is the name of the application
host - the name of the host where the message was generated

The records column family is keyed by 'src' + 'key' and will contain all the log data. It has these columns:

ts - timestamp from flume event header
src - source from flume event header
host - host name from flume event header
data - body of flume event

Cassandra Schema

The following is an example schema, but of course the problem is not how to get data into Cassandra. It is how to get data out! Unless you will only be retrieving logs by Cassandra row key, you will probably want to add some secondary indices or use DataStax Enterprise Search with its SOLR capabilities.

create keyspace logs with
   strategy_options = {datacenter1:1}
;

use logs;

create column family records with
   comparator = UTF8Type
   and gc_grace = 86400
;

Sample Flume config

agent.sources = avrosource
agent.channels = channel1
agent.sinks = cassandraSink

agent.sources.avrosource.type = avro
agent.sources.avrosource.channels = channel1
agent.sources.avrosource.bind = 0.0.0.0
agent.sources.avrosource.port = 4141

agent.sources.avrosource.interceptors = addHost addTimestamp
agent.sources.avrosource.interceptors.addHost.type = org.apache.flume.interceptor.HostInterceptor$Builder
agent.sources.avrosource.interceptors.addHost.preserveExisting = false
agent.sources.avrosource.interceptors.addHost.useIP = false
agent.sources.avrosource.interceptors.addHost.hostHeader = host

agent.sources.avrosource.interceptors.addTimestamp.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

# Cassandra flow
agent.channels.channel1.type = memory
agent.channels.channel1.capacity = 1000
###agent.channels.channel1.type = FILE
###agent.channels.channel1.checkpointDir = file-channel1/check
###agent.channels.channel1.dataDirs = file-channel1/data


agent.sinks.cassandraSink.channel = channel1

agent.sinks.cassandraSink.type = com.btoddb.flume.sinks.cassandra.CassandraSink
agent.sinks.cassandraSink.hosts = localhost

Building Cassandra Sink

The sink is built using Maven

mvn clean package -P assemble-artifacts

... runs all junits and produces flume-ng-cassandra-sink-1.0.0-SNAPSHOT.jar and flume-ng-cassandra-sink-1.0.0-SNAPSHOT-dist.tar.gz

The tar contains all the dependencies needed, and then some. See the list below regarding what is actually needed to use the sink in the flume environment.

Required Dependencies

hector-core*
guava*
speed4j*
uuid*
libthrift*
cassandra-thrift*

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
src		src
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

README.md

README.md

pom.xml

pom.xml

Repository files navigation

flume-cassandra-sink

Cassandra Schema

Sample Flume config

Building Cassandra Sink

Required Dependencies

About

Releases

Packages

Languages

btoddb/flume-ng-cassandra-sink

Folders and files

Latest commit

History

Repository files navigation

flume-cassandra-sink

Cassandra Schema

Sample Flume config

Building Cassandra Sink

Required Dependencies

About

Resources

Stars

Watchers

Forks

Languages