Skip to content

data-harmonization/Ingestion

 
 

Repository files navigation

FLUME-INGESTION

Build Status

Flume Ingestion started as a fork of Apache Flume (1.6), where you can find:

Several bug fixes

  • Some of them really important, such as unicode support

Several enhancements of Flume's sources & sinks

  • ElasticSearch mapper, for example

Custom sources and sinks, developed by Stratio

  • SNMP (v1, v2c and 3)
  • redis, Kafka (0.8.1.1)
  • MongoDB, JDBC, Cassandra and Druid
  • Stratio Streaming (Complex Event Processing engine)
  • REST client, Flume agents stats

You can find more documentation about us here

Flume Ingestion components

  • Data transporter and collector: Apache Flume
  • Data extractor and transformer: Morphlines
  • Custom sources to read data from:
    • REST
    • FlumeStats
    • SNMPTraps
    • IRC
  • Custom sinks to write the data to:

What is Apache Flume?

Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store.

Its use is not only designed for logs, in fact you can find a myriad of sources, sinks and transformations.

In addition, a sink could be a big data storage but also another real-time system (Apache Kafka, Spark Streaming).

Compile & Package

$ git submodule init
$ git submodule update
$ mvn install
$ cd stratio-ingestion-dist
$ mvn clean compile package -Ppackage

Distribution will be available at stratio-ingestion-dist/target/stratio-ingestion-0.4.0-SNAPSHOT-bin.tar.gz

Interesting facts about Flume-Ingestion

  • Flume Ingestion is Apache Flume "on steroids" :)

  • We are extensively using Kite SDK (morphlines) in order to do a better T from ETL, and so we have also developed a bunch of custom transformations.

  • �Stratio ingestion is fully open source and we work very close to the Flume community.

Flume Ingestion FAQ

Can I use Flume Ingestion for aggregating data (time-based rollups, for example)?

This is not a good idea from our experience, we use to combine Flume + Spark Streaming in order to do that (custom development)

Is Flume Ingestion multipersistence?

Yes, you can write data to JDBC sources, mongoDB, Apache Cassandra, ElasticSearch, Apache Kafka, among others.

Can I send data to streaming-cep-engine?

Of course, we have developed a sink in order to send events from Flume to an existing stream in our CEP engine. The sink will create the stream if it does not exist in the engine.

Changelog

See the changelog for changes.

About

Flume - Ingestion, an Apache Flume distribution

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 93.1%
  • Shell 3.7%
  • PowerShell 3.0%
  • Batchfile 0.2%