The Dat in the Lab project
Welcome to the Dat in the Lab repo! This project is an exciting collaboration between Dat and the California Digital Library (CDL) UC Curation Center (UC3). Funded by the Gordon and Betty Moore Foundation, Dat in the Lab will enable us to pilot Dat-based workflows for research data management working in close collaboration with two research groups in the University of California system.
We're excited to be working with The Center for Watershed Sciences at UC Davis and The Dawson Lab at UC Merced. We are working with these groups to identify areas where Dat may be able to support them to manage, sync, version, and publish datasets. This repository will hold (or link out to) code, curriculum, workflows, and other outputs of the project. Welcome!
First things first: What's Dat?
Dat is a protocol designed for syncing folders of data. Dat functions analogously to source control systems like Git. It’s built for large datasets and designed for complex use-cases, like large live-updating datasets. When Dat is incorporated into researcher workflows, data versioning and sharing with colleagues can be automatic, facilitating collaboration and making data publication seamless. To read more about how Dat works, check out the Dat white paper
Dat was used by the California Digital Library to mirror a copy of Data.gov in early 2017. This project copied of over 40TB of public federal research data, read more at Data Mirror.
More about this project
Want to try Dat?
Give Dat a shot at try-dat.com. We created this tutorial for our visits to the labs at UC Davis and UC Merced, as well as other workshops including Mozfest 2017. If you're interested in how the tutorial is made, check out this repo
THe mkcontainer Containerfile that builds the virtual Hoffman2 here.