Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


An interface for collecting and parsing Federal Register documents


Clone the code from github.

The module requires a file named in the root project directory. The config file must define a variable named dataDir which points to the root directory where the data will be saved.

Data collection

The module collects data from two sources:

  1. dataCollection/ downloads metadata describing Federal Register documents from the API. Raw metadata is saved in annual zipped json files in dataDir/meta.

  2. dataCollection/ downloads the text of daily Federal Register documents from the Raw XML files are saved in dataDir/xml.

dataCollection/ builds parsed versions of the documents, where the XML is converted into Pandas data tables. These files are saved as pickled dataframes in dataDir/parsed. Files are named by document number, which must be extracted from the XML itself (and occasionally contains errors). The XML files sometimes contain duplicate printings of the same document, but each document only appears once in the parsed directory.

The complete dataset can be downloaded from scratch or updated to the latest available data by running

The complete dataset is approximately 20GB in size.

Loading data

Cleaned and processed data can be loaded through The most important functions are:

  • loadInfoDF loads all document metadata as a single dataframe
  • iterParsed iteratively loads available parsed documents
  • loadParsed loads a single parsed document


An interface for collecting and parsing Federal Register documents




No releases published


No packages published