Skip to content

LibraryOfCongress/bagit-java

Repository files navigation

BagIt Library (BIL)

Build Status Travis-CI Build Status (Linux) Appveyor Build Status (Windows) CircleCI
Metrics Coverage Status Github Latest Release Downloads
Documentation License javadoc.io Crowdin Transifex

Description

The BAGIT LIBRARY is a software library intended to support the creation, manipulation, and validation of bags. Its current version is 0.97. It is version aware with the earliest supported version being 0.93.

Requirements

  • Java 8
  • gradle (for development only)

Support

  1. The Digital Curation Google Group (https://groups.google.com/d/forum/digital-curation) is an open discussion list that reaches many of the contributors to and users of this open-source project
  2. If you have found a bug please create a new issue on the issues page
  3. If you would like to contribute, please submit a pull request

Major differences between version 5 and 4.*

Command Line Interface

The 5.x versions do not include a command-line interface. Users who need a command-line utility can continue to use the latest 4.x release (download 4.12.3 or switch to an alternative implementation such as bagit-python or BagIt for Ruby.

Serialization

Starting with the 5.x versions bagit-java no longer supports directly serializing a bag to an archive file. The examples show how to implement a custom serializer for the zip and tar formats.

Fetching

The 5.x versions do not include a core fetch.txt implementation. If you need this functionality, the FetchHttpFileExample example demonstrates how you can implement this feature with your additional application and workflow requirements.

Internationalization

All logging and error messages have been put into a ResourceBundle. This allows for all the messages to be translated to multiple languages and automatically used during runtime. If you would like to contribute to translations please visit https://www.transifex.com/acdha/bagit-java/dashboard/ or https://crowdin.com/project/bagit-java.

New Interfaces

The 5.x version is a complete rewrite of the bagit-java library which attempts to follow modern Java practices and will require some changes to existing code:

Examples of using the new bagit-java library

Create a bag from a folder using version 0.97
Path folder = Paths.get("FolderYouWantToBag");
StandardSupportedAlgorithms algorithm = StandardSupportedAlgorithms.MD5;
boolean includeHiddenFiles = false;
Bag bag = BagCreator.bagInPlace(folder, Arrays.asList(algorithm), includeHiddenFiles);
Read an existing bag (version 0.93 and higher)
Path rootDir = Paths.get("RootDirectoryOfExistingBag");
BagReader reader = new BagReader();
Bag bag = reader.read(rootDir);
Write a Bag object to disk
Path outputDir = Paths.get("WhereYouWantToWriteTheBagTo");
BagWriter.write(bag, outputDir); //where bag is a Bag object
Verify Complete
boolean ignoreHiddenFiles = true;
BagVerifier verifier = new BagVerifier();
verifier.isComplete(bag, ignoreHiddenFiles);
Verify Valid
boolean ignoreHiddenFiles = true;
BagVerifier verifier = new BagVerifier();
verifier.isValid(bag, ignoreHiddenFiles);
Quickly verify by payload-oxum
boolean ignoreHiddenFiles = true;

if(BagVerifier.canQuickVerify(bag)){
  BagVerifier.quicklyVerify(bag, ignoreHiddenFiles);
}
Add other checksum algorithms

You only need to implement 2 interfaces:

public class MyNewSupportedAlgorithm implements SupportedAlgorithm {
  @Override
  public String getMessageDigestName() {
    return "SHA3-256";
  }
  @Override
  public String getBagitName() {
    return "sha3256";
  }
}

public class MyNewNameMapping implements BagitAlgorithmNameToSupportedAlgorithmMapping {
  @Override
  public SupportedAlgorithm getMessageDigestName(String bagitAlgorithmName) {
    if("sha3256".equals(bagitAlgorithmName)){
      return new MyNewSupportedAlgorithm();
    }

    return StandardSupportedAlgorithms.valueOf(bagitAlgorithmName.toUpperCase());
  }
}

and then add the implemented BagitAlgorithmNameToSupportedAlgorithmMapping class to your BagReader or bagVerifier object before using their methods.

Check for potential problems

The BagIt format is extremely flexible and allows for some conditions which are technically allowed but should be avoided to minimize confusion and maximize portability. The BagLinter class allows you to easily check a bag for warnings:

Path rootDir = Paths.get("RootDirectoryOfExistingBag");
BagLinter linter = new BagLinter();
List<BagitWarning> warnings = linter.lintBag(rootDir, Collections.emptyList());

You can provide a list of specific warnings to ignore:

dependencycheckth rootDir = Paths.get("RootDirectoryOfExistingBag");
BagLinter linter = new BagLinter();
List<BagitWarning> warnings = linter.lintBag(rootDir, Arrays.asList(BagitWarning.OLD_BAGIT_VERSION);

Developing Bagit-Java

Bagit-Java uses Gradle for its build system. Check out the great documentation to learn more.

Running tests and code quality checks

Inside the bagit-java root directory, run ./gradlew check.

Uploading to maven central
  1. Follow their guides
  2. http://central.sonatype.org/pages/releasing-the-deployment.html
  3. https://issues.sonatype.org/secure/Dashboard.jspa
  4. Once you have access, to create an official release and upload it you should specify the version by running ./gradlew -Pversion=<VERSION> uploadArchives
  5. Don't forget to tag the repository!
Uploading to jcenter
  1. Follow their guide
  2. https://github.com/bintray/bintray-examples/tree/master/gradle-bintray-plugin-examples
  3. Once you have access, to create an official release and upload it you should specify the version by running ./gradlew -Pversion=<VERSION> bintrayUpload
  4. Don't forget to tag the repository!

Note if using with Eclipse

Simply run ./gradlew eclipse and it will automatically create a eclipse project for you that you can import.

Roadmap for this library

  • Fix bugs/issues reported with new library (on going)
  • Translate to various languages (on going)