Svalbard

More info on active projects and modules at dat-ecosystem.org

Svalbard

A global metadata vault for public domain datasets. A Dat Project initiative. Named after the Svalbard Global Seed Vault.

The target users for this information are other archivists who are wishing to coordinate on what they are crawling and storing. We hope to contribute to data backup efforts with this repository by collecting in one place a "dataset of datasets".

Status

Svalbard V1 release is out!. You can download it with Dat here: https://datproject.org/de8cb55dcf2bee13b6cf86a6c4619f2368a66ffe0a0b270784bc386fcfa6ee70.

In progress sources are being tracked in the issue tracker.

Current Sources

data.gov

children-meta.json - from https://catalog.data.gov/api/action/package_search?fq=collection_package_id:*
parent-headers.json - from http://catalog.data.gov/api/3/action/package_search
children-headers.json - HTTP GET response headers for resources.*
parent-meta.json - HTTP GET response headers for resources.*
downloaded.json - download results for initial ~40TB download with SHA256 hashes of downloaded files as the 'file' property

internet archive

eotcdx.json - cdx files converted to json lines for all files inside warcs inside https://archive.org/details/EndOfTerm2016WebCrawls
ftpservers.txt - 750+ federal ftp servers being mirrored by archive team

Using the data

You can use any tool that supports JSON Lines to analyze the data, here is a tutorial.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
cli		cli
metadata		metadata
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Svalbard

Status

Current Sources

data.gov

internet archive

Using the data

About

Releases

Packages

Contributors 3

Languages

dat-ecosystem-archive/svalbard

Folders and files

Latest commit

History

Repository files navigation

Svalbard

Status

Current Sources

data.gov

internet archive

Using the data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages