Skip to content
Log processing system using Flume and Cassandra
Java Other
Find file
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
flume-plugin
pig
scripts
ui
README

README

CDR Logprocessing plugin for Flume
==================================

Source organization

flume-plugin - source of flume plugin that writes CDR logs to cassandra.
scripts - simple perl script that generates sample CDR logs for testing. 



Getting Flume & Thrift
======================


https://github.com/cloudera/flume (master was used to test sample)
http://incubator.apache.org/thrift/download/

Thrift was compiled with the following option:

./configure --enable-gen-java=yes --enable-gen-cpp=yes --enable-gen-erlang=no --enable-gen-perl=no --enable-gen-py=no --enable-gen-php=no --with-boost=no; make

Assuming flume was installed under $HOME/flume, create a symlink to thrift-0.5.0/compiler/cpp/thrift, under $HOME/flume

Under $HOME/flume, ant


flume-plugin
============

This plugin allows you to use Cassandra as a Flume sink for CDR logs. 


Getting Started
---------------

1) This plugin was built using flume-0.9.3-core.jar, which is delivered as part of package. 

2) cd cassandra; ant release;

3) Copy cdr_logprocessing-0.1.tar.gz to $HOME/flume directory and uncompress it.

4) Add the following to your .bashrc file

export FLUME_HOME=$HOME/flume
export FLUME_LOG_DIR=/tmp
export FLUME_PID_DIR=/tmp
export FLUME_CONF_DIR=$HOME/flume/conf
export FLUME_CLASSPATH=$HOME/flume/cdrplugin/lib/apache-cassandra-0.7.0.jar:$HOME/flume/cdrplugin/lib/avro-1.4.0-rc4.jar:$HOME/flume/cdrplugin/lib/cdr_logprocessing-0.1.jar:$HOME/flume/cdrplugin/lib/commons-lang-2.4.jar:$HOME/flume/cdrplugin/lib/hector-core-0.7.0-22.jar:$HOME/flume/cdrplugin/lib/high-scale-lib-1.1.1.jar:$HOME/flume/cdrplugin/lib/jug-asl-2.0.0.jar:$HOME/flume/cdrplugin/lib/log4j-1.2.14.jar:$HOME/flume/cdrplugin/lib/perf4j-0.9.13.jar:$HOME/flume/cdrplugin/lib/slf4j-api-1.5.11.jar:$HOME/flume/cdrplugin/lib/slf4j-log4j12-1.5.8.jar

4. Modify flume-site.xml (you may start out by copying
flume-site.xml.template and removing the body of the file) to include:


    <configuration>
      <property>
        <name>flume.plugin.classes</name>
        <value>com.gemini.logprocessing.cassandra.CDRCassandraSink</value>
        <description>Comma separated list of plugin classes</description>
      </property>
    </configuration>

scripts
=======

loggen.pl will write sample CDR entries to /tmp/cdr.log. We can use this script for testing our setup. 

Usage
-----

This plugin primarily targets CDR log storage right now.

1) The following needs to be installed in cassandra using cli

connect <hostname>/9160;
create keyspace CDRLogs with replication_factor = 2 and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';
use CDRLogs;
create column family MSISDNTimeline with column_type = 'Standard' and comparator = 'BytesType';
create column family CDREntry with column_type = 'Standard' and comparator = 'BytesType';
create column family HourlyTimeline with column_type = 'Standard' and comparator = 'BytesType';

2) In flume config you call this sink as

CDRCassandraSink("cassandra_host:cassandra_port",ColumnFamilyForRawCDR);

where

cassandra_host:cassandra_port - cassandra host/port combination
ColumnFamilyforRawCDR - CF where raw cdr entries for this market are to be stored. 


3) In our test environment, we had NodeM - running flume master, NodeA -  running flume agent and NodeC - running flume collector & cassandra-0.7.2
    3.1) On NodeM
	3.1.1) Export all environment variables. 
	3.1.2) cd $FLUME_HOME; bin/flume master
	3.1.3) http://NodeM:35871/flumemaster.jsp will all active nodes and their configuration.
    3.2) On NodeA
	3.2.1) Edit flume-site.xml and add NodeM as master
	3.2.2) cd $FLUME_HOME; bin/flume node_nowatch
	3.2.3) http://NodeA:35862/flumeagent.jsp will display statistics.
    3.3) On NodeC
	3.3.1) Edit flume-site.xml and add NodeM as master
	3.3.2) cd $FLUME_HOME; bin/flume node_nowatch -n collector
	3.3.3) http://NodeC:35862/flumeagent.jsp will display statistics.

4) Go to http://NodeM:35871/flumeconfig.jsp and configure the nodes. 
    4.1) For NodeA - Source is tail("/tmp/cdr.log") and Sink is agentSink("NodeC",35853)
    4.2) For NodeC - Source is collectorSource(35853) and Sink is CDRCassandraSink("NodeC:9160", "CDRRaw_market1")
    
5) Go to http://NodeM:35871/flumemaster.jsp and if nodes were configured correctly, all nodes should show up as 'ACTIVE'
6) On NodeA - run the script perl loggen.pl (NOTE: This script will write to log file in a for(;;) loop)
7) Verify data in cassandra using cassandra-cli;


Issues
------

1) CDR format currently supported is of form

operatorId,operatorMarket,transactionId,cdrType,messageTimestamp,moIMSI,moIP,mtIP,PTN,msgType,moDomain,mtDomain

Something went wrong with that request. Please try again.