Home

Welcome to Flume!

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Hadoop’s HDFS. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.

Flume is open-sourced under the Apache Software Foundation License v2.0.

It is written primarily in Java and has been tested on unix-like systems:

Ubuntu 9.4+ (DEB compatible)
Centos 5.3+ (RPM compatible)
Mac OS X

Some online high level resources:

Video “Chicago Data Summit 2011: Flume an Introduction” – Jonathan Hsieh
Video Hadoop World 2010: Flume: Reliable Distributed Streaming Log Collection – Jonathan Hsieh
Slides Inside Flume – Henry Robinson

System Documentation

Learn how to setup, configure, and integrate flume with your existing services:

Flume How-to’s and recipes

Here are some blogs on getting flume to write to S3.

Configuration

Justin Workman has several blog posts some of address problems folks have encountred.

Flume plugins

Flume is extensible and designed to be able to deliver data to many data storage and management systems. At its core it is designed to deliver data reliably to Hadoop’s HDFS. It also has a plugin interface that allows contributers to add different sources and different sinks for their data. Here is a list of different systems Flume can or will deliver data to (NOTE: these are not supported by Cloudera):

Cassandra Sink
Elastic Search Sink
Voldemort
AMQP via RabbitMQ
Hive (in progress) :: Video Hadoop World 2010: Scale in Collecting and Querying Log Data in Near Real-time – Anurag Phadke, Mozilla
HBase (in progress) :: Video Hadoop World 2010: Search Analytics with Flume and HBase – Otis Gospodnetic, Sematext International. Further Contributions from Alex Baranua and Dani Rayan.
MongoDb (in progress)
JRuby Plugins Chris Howe, Infochimps
FlumeBase :: Streaming SQL queries for Flume. Aaron Kimball

Contact us!

User resources

User mailing list
Cloudera.org issue tracker
Flume Bug Tracker
IRC channel #flume on irc.freenode.net

Developer resources

Thanks!

YourKit is kindly supporting open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of innovative and intelligent tools for profiling Java and .NET applications. Take a look at YourKit’s leading software products: YourKit Java Profiler and YourKit .NET Profiler.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly