A global metadata vault
Latest commit 7141d95 Feb 13, 2017 @maxogden maxogden committed on GitHub Update readme.md
Failed to load latest commit information.
cli Update readme.md Feb 14, 2017
metadata add metadata Feb 8, 2017
.gitignore initial prototype Dec 13, 2016
readme.md Update readme.md Feb 14, 2017



A global metadata vault for public domain datasets. A Dat Project initiative. Named after the Svalbard Global Seed Vault.


This repository is meant to house and track metadata about where important public datasets exist online. It is not meant to describe individual datasets, e.g. the NASA IceSat2 Satellite Mission Data, but rather describe entire repositories that house multiple datasets such as data.nasa.gov, data.gov, and others.

The target users for this information are other archivists who are wishing to coordinate on what they are crawling and storing. We hope to contribute to data backup efforts with this repository by collecting in one place a "dataset of datasets".

CLI Tool

We have a CLI utility you can run to automatically seed Svalbard datasets using a DataSilo. It's experimental and is a work in progress!

Check out the cli/ folder for details.


  • Metadata: Currently in the initial collection phase.
  • CLI: Under heavy development!

See metadata/ for known US federal data servers. Open Issues or PRs to report additional metadata sources.

In progress sources are being tracked in the issue tracker.

Current Sources

  • metadata/ftpgov.json - Known US Government FTP server list extracted from the ArchiveTeam FTP-GOV Crawl
  • metadata/datagov-ftp - FTP servers referenced from data.gov metadata
  • metadata/datagov-http - HTTP servers referenced from data.gov metadata (a little messy due to invalid metadata)