Skip to content
naishe edited this page Oct 16, 2012 · 44 revisions

Welcome to Flume!

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Hadoop’s HDFS. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.

Flume is open-sourced under the Apache Software Foundation License v2.0.

It is written primarily in Java and has been tested on unix-like systems:

  • Ubuntu 9.4+ (DEB compatible)
  • Centos 5.3+ (RPM compatible)
  • Mac OS X

Some online high level resources:

System Documentation

Learn how to setup, configure, and integrate flume with your existing services:

Flume How-to’s and recipes

Here are some blogs on getting flume to write to S3.

Configuration

Flume plugins

Flume is extensible and designed to be able to deliver data to many data storage and management systems. At its core it is designed to deliver data reliably to Hadoop’s HDFS. It also has a plugin interface that allows contributers to add different sources and different sinks for their data. Here is a list of different systems Flume can or will deliver data to (NOTE: these are not supported by Cloudera):

Contact us!

User resources

Developer resources

Thanks!

YourKit is kindly supporting open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of innovative and intelligent tools for profiling Java and .NET applications. Take a look at YourKit’s leading software products: YourKit Java Profiler and YourKit .NET Profiler.