dangit

DAta Nudged into GIT - File-based datasets that use git for version control of individual records

Overview

DANGIT is an experimental way to do version control on datasets. By storing each row/record of a dataset as a file in a github repository, it is possible to easily track changes and allow for anyone (yes, anyone... the feedback loop is open!) to submit changes using the same workflows as open source software development. The files for the individual rows/records are then built into a single dataset file. Here's the gist with a braindump for the idea.

In the future, a simple UI with Github Single Sign-on would allow non-technical users to perform the entire fork/edit/build/pull request workflow without using the command line or editing text files.

How to Use

Clone this repo
Install Dependencies npm install
Clone the sample dataset nyc-pizzashops
Edit or add data to the sample dataset by editing files in /rows
Use DANGIT to build the dataset with your new changes node dangit build ../nyc-pizzashops
Create a Pull Request to submit your changes to the source repo

How it works

Dataset storage

A dataset is maintained in its own github repository with file structure like this:

/build - the build directory, where dangit writes the built dataset file (a geojson FeatureCollection or a CSV or a JSON array of objects) the build filename should be the same as the dataset's name, with the appropriate file extension
/rows - the rows directory, where individual rows are stored as geojson features or 2D json objects
dangit.json - the dangit configuration file, which includes name, type, uid field, etc.

Editing data

Edits are made on the files in /rows, new data are added by creating new files (for now, increment uid manually. Someday the build process should validate unique ids, data types, etc)

Building data

Run DANGIT build using node, passing in the path of the dataset you would like to build: node dangit build ../nyc-pizzashops DANGIT looks for a dangit.json file in the root of the directory you pass in, and starts the build based on type. For type geojson, it will expect each file in /rows to be a valid geojson feature, and will write a geojson FeatureCollection into /build.

Sample Dataset

You can participate in our early experimentation by adding or editing (or deleting) rows to the dataset nyc-pizzashops. Fork the repo, make your changes to the rows, build the distribution file, and do a pull request back to the source repo.

Commit Messages

Commit messages should include as much info as possible about the rows that were edited/added/removed.

Pull Requests

Pull requests on dataset repos should include a successful build of the data. (how should we validate this)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
types		types
Build.js		Build.js
README.md		README.md
dangit.js		dangit.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

types

types

Build.js

Build.js

README.md

README.md

dangit.js

dangit.js

package.json

package.json

Repository files navigation

dangit

Overview

How to Use

How it works

Dataset storage

Editing data

Building data

Sample Dataset

Commit Messages

Pull Requests

About

Releases

Packages

Languages

chriswhong/dangit

Folders and files

Latest commit

History

Repository files navigation

dangit

Overview

How to Use

How it works

Dataset storage

Editing data

Building data

Sample Dataset

Commit Messages

Pull Requests

About

Resources

Stars

Watchers

Forks

Languages