Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Hadoop’s HDFS. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.
Flume is open-sourced under the Apache Software Foundation License v2.0.
It is written primarily in Java and has been tested on unix-like systems:
Some online high level resources:
Learn how to setup, configure, and integrate flume with your existing services:
Here are some blogs on getting flume to write to S3.
Flume is extensible and designed to be able to deliver data to many data storage and management systems. At its core it is designed to deliver data reliably to Hadoop’s HDFS. It also has a plugin interface that allows contributers to add different sources and different sinks for their data. Here is a list of different systems Flume can or will deliver data to (NOTE: these are not supported by Cloudera):
YourKit is kindly supporting open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of innovative and intelligent tools for profiling Java and .NET applications. Take a look at YourKit’s leading software products: YourKit Java Profiler and YourKit .NET Profiler.