Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

Commit

Permalink
Merge branch 'master' of https://github.com/chrismattmann/drat
Browse files Browse the repository at this point in the history
  • Loading branch information
chrismattmann committed Jan 8, 2014
2 parents eb278e3 + 8aab0b7 commit 4b613b2
Showing 1 changed file with 11 additions and 2 deletions.
13 changes: 11 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,13 @@
drat
Distributed Release Audit Tool (DRAT)
====

A distributed, parallelized (Map Reduce) wrapper around Apache RAT to allow it to complete on large code repositories of multiple file types where Apache RAT hangs forever.
A distributed, parallelized (Map Reduce) wrapper around Apache™ RAT to allow it to complete on large code repositories of multiple file types where Apache™ RAT hangs forever.

The tool leverages Apache™ OODT to parllelize ane workflow together the following components:

1. Apache™ SOLR based exploration of a CM repository (e.g., Git, SVN, etc.) and classification of that repository based on MIME type using Apache™ Tika.
2. A MIME partitioner that uses Apache™ Tika to automatically deduce and classify by file type and then partition Apache™ RAT jobs based on sets of 100 files per type (configurable) -- the M/R "partitioner"
3. A throttle wrapper for RAT to MIME targeted Apache™ RAT. -- the M/R "mapper"
4. A reducer to "combine" the produced RAT logs together into a global RAT report that can be used for stats generation. -- the M/R "reducer"


0 comments on commit 4b613b2

Please sign in to comment.