Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

CSV to BagIt

csv2bag is a Ruby script that parses a CSV file for a collection, maps data fields to predicates, optionally performs cleanup and linked data lookups, and outputs Bags containing RDF metadata and any associated full resolution media files. This is primarily developed for bulk ingest into Oregon Digital and was originally created as CONTENTdm to BagIt


Requires Ruby 2.1.2 (set by .ruby-version)

git clone
cd csv2bag
mkdir bags
mkdir metadata
bundle install
bundle exec ./csv2bag -h

CSV Setup

csv2bag expects the .csv file to have:

  • a header row as the first row
  • a mapping row as the second row, where each column contains one of the following:
    • the keyword SKIP to indicate that column won't be processed
    • the term for that column
    • the method to be used to generate the term for that column in the format method:METHOD_NAME

The CSV file should be named name_of_my_collection.csv and located in the /metadata/name_of_my_collection folder, along with the files that are to be bagged.

Example CSV snippet
Identifer,Article Title,Rights Statement,Primary author or editor,Publisher,Place of Publication,Subject(s),Countries
1,Hassan - Israel Water Policy Pressurizes Occupied Arabs,Rights Restricted - Free Access,"Sorman, Unal; Balkan, Guven",Jordan Newspaper Co.,Amman,Politics and government; Armed Forces; Agriculture; Settlements,Jordan; Israel
2,"Seawater vs. Brackish Water Desalting-- Technology, Operating Problems and Overall Economics",Rights Reserved - Restricted Access,"Glueckstern, P.; Kantor, Y.",Elsevier,Amsterdam,Technology; Economics; Saline water conversion,Israel
3,Desalination at Inland Sites,,"Gendel, A.",Elsevier,Amsterdam,Technology; Economics; Mediterranean Sea,Israel


  • Specify a predicate to place the field's text. For fields that don't need any cleanup or lookups done. (Examples: title, identifier, description, etc.)
  • Use Dublin Core as a base element set
  • Can also use any additional Linked Open Data vocabularies in rdf-vocab
  • Follow the appropriate schema. (Oregon Digital 1, ScholarsArchive@OSU)


  • Use a method for cleaning up known data errors or mapping strings to URIs
  • View List of Methods
  • Define all methods in a comment (so that programmer knows intent of method)

Optional Parameters

  • Source image files can be stored in a location other than the metadata/COLLECTION folder, and the new path can be referenced with the command line parameter --image-file-path
  • Source image files can be mapped to a different file name using a CSV file specified in the command line parameter --image-file. The CSV file must have the columns in the format of old_file,new_file and have no heading. The file is read in and a hash of old->new can then be used in the cleanup task to convert from the old filename to the new one.
  • Different log levels for console output can be specified in the command line parameter --console-level-log. Default is 'warn'. Logfile output is not affected.


  • Use Oregon Digital Git best practices and make changes / additions on a branch, commit with helpful commit message, then submit a Pull Request.
  • Validate syntax before commit.


Ruby script that parses a collection CSV file, maps data fields, can perform linked data lookups, and outputs Bags with RDF metadata and media files.






No releases published


No packages published