Skip to content
Newer
Older
100644 41 lines (25 sloc) 1.74 KB
7b05996 @mikecafarella Rename project in README
mikecafarella authored Jun 24, 2011
1 RecordBreaker - Automatically learn Avro structures from your data
c3058d2 @mikecafarella Repair naming error in build file. Add README and LICENSE files
mikecafarella authored May 12, 2011
2 ========================================================================================
3
4 Introduction
5 ----------------------------------------------------------------------------------------
7b05996 @mikecafarella Rename project in README
mikecafarella authored Jun 24, 2011
6 RecordBreaker is a project that automatically turns your text-formatted data (server logs, sensor readings, etc) into structured Avro data, without any need to write parsers or extractors. Its goal is to dramatically reduce the time spent preparing data for analysis, enabling more time for the analysis itself.
c3058d2 @mikecafarella Repair naming error in build file. Add README and LICENSE files
mikecafarella authored May 12, 2011
7
7b05996 @mikecafarella Rename project in README
mikecafarella authored Jun 24, 2011
8 You can (and should!) read the full RecordBreaker tutorial here: [http://cloudera.github.com/RecordBreaker/](http://cloudera.github.com/RecordBreaker/)
c3058d2 @mikecafarella Repair naming error in build file. Add README and LICENSE files
mikecafarella authored May 12, 2011
9
ae86272 @mikecafarella Include fisheye info in README
mikecafarella authored Sep 13, 2012
10 The RecordBreaker repository is hosted at GitHub, here:
11 [https://github.com/cloudera/RecordBreaker](https://github.com/cloudera/RecordBreaker)
12
13 One interesting part of RecordBreaker is the FishEye system. It's a
14 web-based tool for examining and managing the diverse datasets likely
15 to be found in a typical HDFS installation. It draws features from
16 both filesystem management and database administration tools. Most
17 interestingly, it uses RecordBreaker techniques to automatically
18 figure out the structure of files it finds. You can run it by typing:
19
20
21
22 bin/learnstructure fisheye -run <portnum> <localstoragedir>
23
24
25
4b0fb1f @mikecafarella Revised README
mikecafarella authored Sep 13, 2012
26 Where __portnum__ is the HTTP port where FishEye will provide data to the
27 user, and __localstoreagedir__ is where it will maintain information
28 about a target filesystem.
ae86272 @mikecafarella Include fisheye info in README
mikecafarella authored Sep 13, 2012
29
c3058d2 @mikecafarella Repair naming error in build file. Add README and LICENSE files
mikecafarella authored May 12, 2011
30
31 Installation
32 ----------------------------------------------------------------------------------------
33 $ ant dist
34
35
36 Dependencies
37 ----------------------------------------------------------------------------------------
38 -- Java JDK 1.6
39 -- Apache ant 1.8.2
40
Something went wrong with that request. Please try again.