Clone this wiki locally
Welcome to Flume!
Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Hadoop’s HDFS. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.
Flume is open-sourced under the Apache Software Foundation License v2.0.
It is written primarily in Java and has been tested on unix-like systems:
- Ubuntu 9.4+ (DEB compatible)
- Centos 5.3+ (RPM compatible)
- Mac OS X
Some online high level resources:
- Video “Chicago Data Summit 2011: Flume an Introduction” – Jonathan Hsieh
- Video Hadoop World 2010: Flume: Reliable Distributed Streaming Log Collection – Jonathan Hsieh
- Slides Inside Flume – Henry Robinson
Learn how to setup, configure, and integrate flume with your existing services:
- Flume Installation Instructions
- Flume Project Documentation
- Flume User Guide
- Flume Cookbook
- Flume Developer Guide
Flume How-to’s and recipes
Here are some blogs on getting flume to write to S3.
- Justin Workman has several blog posts some of address problems folks have encountred.
Flume is extensible and designed to be able to deliver data to many data storage and management systems. At its core it is designed to deliver data reliably to Hadoop’s HDFS. It also has a plugin interface that allows contributers to add different sources and different sinks for their data. Here is a list of different systems Flume can or will deliver data to (NOTE: these are not supported by Cloudera):
- Cassandra Sink
- Elastic Search Sink
- AMQP via RabbitMQ
- Hive (in progress) :: Video Hadoop World 2010: Scale in Collecting and Querying Log Data in Near Real-time – Anurag Phadke, Mozilla
- HBase (in progress) :: Video Hadoop World 2010: Search Analytics with Flume and HBase – Otis Gospodnetic, Sematext International. Further Contributions from Alex Baranua and Dani Rayan.
- MongoDb (in progress)
- JRuby Plugins Chris Howe, Infochimps
- FlumeBase :: Streaming SQL queries for Flume. Aaron Kimball
- User mailing list
- Cloudera.org issue tracker
- Flume Bug Tracker
- IRC channel #flume on irc.freenode.net
YourKit is kindly supporting open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of innovative and intelligent tools for profiling Java and .NET applications. Take a look at YourKit’s leading software products: YourKit Java Profiler and YourKit .NET Profiler.