Skip to content
A Data Parsing/Data Manipulation Tool Supporting Digitization Projects and Other Data Analysis Projects
Branch: master
Clone or download
Pull request Compare This branch is 297 commits ahead of terrywbrady:master.
Type Name Latest commit message Commit time
Failed to load latest commit information.
core-utils Allow item DAO's in EAD Dec 14, 2017
core Refine command line interface for File Analyzer Feb 13, 2018
demo enhance oai crawler Oct 19, 2018
doc move screen shots to ghpages Jan 17, 2014
dspace add creator to conversion Mar 15, 2019
.classpath classpath Sep 5, 2017
.gitignore Item update enhancements Feb 9, 2018
LICENSE Update footer, license info is already in a license file Sep 1, 2016


Project Page:

The File Analyzer and Metadata Harvester is a general purpose desktop (and command line) tool designed to automate simple, file-based operations. The File Analyzer assembles a toolkit of tasks a user can perform.

The tasks that have been written into the File Analyzer code base have been optimized for use by libraries, archives, and other cultural heritage institutions.

File Analyzer Wiki:

Demonstration Videos

Demonstration Video Demonstration Video Demonstration Video


This code has been derived from the NARA File Analyzer and Metadata Harvester which is available at


  • JDK 1.7 or higher (for build)
  • JRE 1.7 or higher (for runtime)
  • (If you need to run with Java 6, see Releases for an older version)
  • Maven (or you will need to compile the modules manually)



This code will build 3 flavors of the File Analyzer.

Core File Analyzer

  • All code runs from a self-extracting jar file

DSpace File Analyzer

  • This version of the file analyzer is a self-extracting jar file that references the core file analyzer jar file.
  • It contains tools for automating the creation of DSpace ingestion folders

Demo File Analyzer

  • This version contains extensions illustrating various capabilities of the File Analyzer.
  • This version of the file analyzer is a self-extracting jar file that references both the core and dspace file analyzer jar files.
  • This version of the application uses features of Apache Tika, BagIt, and Marc4j

Georgetown University Library IT Code RepositoriesGeorgetown University Library IT Code Repositories

You can’t perform that action at this time.