Flume Ingestion started as a fork of Apache Flume (1.6), where you can find:
Several bug fixes
- Some of them really important, such as unicode support
Several enhancements of Flume's sources & sinks
- ElasticSearch mapper, for example
Custom sources and sinks, developed by Stratio
- SNMP (v1, v2c and 3)
- redis, Kafka (0.8.1.1)
- MongoDB, JDBC, Cassandra and Druid
- Stratio Streaming (Complex Event Processing engine)
- REST client, Flume agents stats
You can find more documentation about us here
- Data transporter and collector: Apache Flume
- Data extractor and transformer: Morphlines
- Custom sources to read data from:
- REST
- FlumeStats
- SNMPTraps
- IRC
- Custom sinks to write the data to:
- Cassandra
- MongoDB
- Stratio Streaming
- JDBC
- Kafka
- Druid
Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store.
Its use is not only designed for logs, in fact you can find a myriad of sources, sinks and transformations.
In addition, a sink could be a big data storage but also another real-time system (Apache Kafka, Spark Streaming).
$ git submodule init
$ git submodule update
$ mvn install
$ cd stratio-ingestion-dist
$ mvn clean compile package -Ppackage
Distribution will be available at stratio-ingestion-dist/target/stratio-ingestion-0.4.0-SNAPSHOT-bin.tar.gz
-
Flume Ingestion is Apache Flume "on steroids" :)
-
We are extensively using Kite SDK (morphlines) in order to do a better T from ETL, and so we have also developed a bunch of custom transformations.
-
�Stratio ingestion is fully open source and we work very close to the Flume community.
Can I use Flume Ingestion for aggregating data (time-based rollups, for example)?
This is not a good idea from our experience, we use to combine Flume + Spark Streaming in order to do that (custom development)
Is Flume Ingestion multipersistence?
Yes, you can write data to JDBC sources, mongoDB, Apache Cassandra, ElasticSearch, Apache Kafka, among others.
Can I send data to streaming-cep-engine?
Of course, we have developed a sink in order to send events from Flume to an existing stream in our CEP engine. The sink will create the stream if it does not exist in the engine.
See the changelog for changes.