-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
449 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
<!DOCTYPE html> | ||
<html> | ||
<head> | ||
<title>Caltech Library's Digital Library Development Sandbox</title> | ||
<link href='https://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'> | ||
<link rel="stylesheet" href="/css/site.css"> | ||
</head> | ||
<body> | ||
<header> | ||
<a href="https://library.caltech.edu"><img src="/assets/liblogo.gif" alt="Caltech Library logo"></a> | ||
</header> | ||
<nav> | ||
<ul> | ||
<li><a href="/">Home</a></li> | ||
<li><a href="../">README</a></li> | ||
<li><a href="license.html">LICENSE</a></li> | ||
<li><a href="install.html">INSTALL</a></li> | ||
<li><a href="docs/">Documentation</a></li> | ||
<li><a href="how-to/">How To</a></li> | ||
<li><a href="https://github.com/caltechlibrary/dataset">Github</a></li> | ||
</ul> | ||
|
||
</nav> | ||
|
||
<section> | ||
<h1>dataset <a href="https://data.caltech.edu/badge/latestdoi/79394591"><img src="https://data.caltech.edu/badge/79394591.svg" alt="DOI" /></a></h1> | ||
|
||
<p><em>dataset</em> is a command line tool for working with JSON (object) documents stored as | ||
collections. <a href="docs/dataset/">This</a> supports basic storage actions (e.g. CRUD operations, filtering | ||
and extraction) as well as <a href="docs/dataset/indexer.html">indexing</a>, <a href="docs/dataset/find.html">searching</a>. | ||
A project goal of <em>dataset</em> is to “play nice” with shell scripts and other | ||
Unix tools (e.g. it respects standard in, out and error with minimal side effects). This means it is | ||
easily scriptable via Bash, Posix shell or interpretted languages like R.</p> | ||
|
||
<p><em>dataset</em> includes an implementation as a Python3 module. The same functionality as in the command line tool is | ||
replicated for Python3. (module requires Python 3.6 or better).</p> | ||
|
||
<p>Finally <em>dataset</em> is a golang package for managing JSON documents and their attachments on disc or in cloud storage | ||
(e.g. Amazon S3, Google Cloud Storage). The command line utilities excersize this package extensively.</p> | ||
|
||
<p>The inspiration for creating <em>dataset</em> was the desire to process metadata as JSON document collections using | ||
Unix shell utilities and pipe lines. While it has grown in capabilities that remains a core use case.</p> | ||
|
||
<p><em>dataset</em> organanizes JSON documents by unique names in collections. Collections are represented | ||
as an index into a series of buckets. The buckets are subdirectories (or paths under cloud storage services). | ||
Buckets hold individual JSON documents and their attachments. The JSON document is assigned automatically to a | ||
bucket (and the bucket generated if necessary) when it is added to a collection. | ||
Assigning documents to buckets avoids having too many documents assigned to a single path (e.g. on some Unix | ||
there is a limit to how many documents are held in a single directory). In addition to using the <em>dataset</em> | ||
comnad you can list and manipulate the JSON documents directly with common Unix commands like ls, find, grep or | ||
their cloud counter parts.</p> | ||
|
||
<p>See <a href="docs/getting-started-with-dataset.html">getting-started-with-datataset.md</a> for a tour of functionality.</p> | ||
|
||
<h3>Limitations of <em>dataset</em></h3> | ||
|
||
<p><em>dataset</em> has many limitations, some are listed below</p> | ||
|
||
<ul> | ||
<li>it is not a multi-process, multi-user data store (it’s just files on disc)</li> | ||
<li>it is not a repository management system</li> | ||
<li>it is not a general purpose multiuser database system</li> | ||
</ul> | ||
|
||
<h2>Operations</h2> | ||
|
||
<p>The basic operations support by <em>dataset</em> are listed below organized by collection and JSON document level.</p> | ||
|
||
<h3>Collection Level</h3> | ||
|
||
<ul> | ||
<li><a href="docs/dataset/init.html">init</a> creates a collection</li> | ||
<li><a href="docs/dataset/import-csv.html">import-csv</a> JSON documents from rows of a CSV file</li> | ||
<li><a href="docs/dataset/import-gsheet.html">import-gsheet</a> JSON documents from rows of a Google Sheet</li> | ||
<li><a href="docs/dataset/export-csv.html">export-csv</a> JSON documents from a collection into a CSV file</li> | ||
<li><a href="docs/dataset/export-gsheet.html">export-gsheet</a> JSON documents from a collection into a Google Sheet</li> | ||
<li><a href="docs/dataset/keys.html">keys</a> list keys of JSON documents in a collection, supports filtering and sorting</li> | ||
<li><a href="docs/dataset/haskey.html">haskey</a> returns true if key is found in collection, false otherwise</li> | ||
<li><a href="docs/dataset/count.html">count</a> returns the number of documents in a collection, supports filtering for subsets</li> | ||
<li><a href="docs/dataset/extract.html">extract</a> unique JSON attribute values from a collection</li> | ||
</ul> | ||
|
||
<h3>JSON Document level</h3> | ||
|
||
<ul> | ||
<li><a href="docs/dataset/create.html">create</a> a JSON document in a collection</li> | ||
<li><a href="docs/dataset/read.html">read</a> back a JSON document in a collection</li> | ||
<li><a href="docs/dataset/update.html">update</a> a JSON document in a collection</li> | ||
<li><a href="docs/dataset/delete.html">delete</a> a JSON document in a collection</li> | ||
<li><a href="docs/dataset/join.html">join</a> a JSON document with a document in a collection</li> | ||
<li><a href="docs/dataset/list.html">list</a> the lists JSON records as an array for the supplied keys</li> | ||
<li><a href="docs/dataset/path.html">path</a> list the file path for a JSON document in a collection</li> | ||
</ul> | ||
|
||
<h3>JSON Document Attachments</h3> | ||
|
||
<ul> | ||
<li><a href="docs/dataset/attach.html">attach</a> a file to a JSON document in a collection</li> | ||
<li><a href="docs/dataset/attachments.html">attachments</a> lists the files attached to a JSON document in a collection</li> | ||
<li><a href="docs/dataset/detach.html">detach</a> retrieve an attached file associated with a JSON document in a collection</li> | ||
<li><a href="docs/dataset/prune.html">prune</a> delete one or more attached files of a JSON document in a collection</li> | ||
</ul> | ||
|
||
<h3>Search</h3> | ||
|
||
<ul> | ||
<li><a href="docs/dataset/indexer.html">indexer</a> indexes JSON documents in a collection for searching with <em>find</em></li> | ||
<li><a href="docs/dataset/deindexer.html">deindexer</a> de-indexes (removes) JSON documents from an index</li> | ||
<li><a href="docs/dataset/find.html">find</a> provides a index based full text search interface for collections</li> | ||
</ul> | ||
|
||
<h2>Example</h2> | ||
|
||
<p>Common operations using the <em>dataset</em> command line tool</p> | ||
|
||
<ul> | ||
<li>create collection</li> | ||
<li>create a JSON document to collection</li> | ||
<li>read a JSON document</li> | ||
<li>update a JSON document</li> | ||
<li>delete a JSON document</li> | ||
</ul> | ||
|
||
<pre><code class="language-shell"> # Create a collection "mystuff.ds", the ".ds" lets the bin/dataset command know that's the collection to use. | ||
bin/dataset mystuff.ds init | ||
# if successful then you should see an OK otherwise an error message | ||
|
||
# Create a JSON document | ||
bin/dataset mystuff.ds create freda '{"name":"freda","email":"freda@inverness.example.org"}' | ||
# If successful then you should see an OK otherwise an error message | ||
|
||
# Read a JSON document | ||
bin/dataset mystuff.ds read freda | ||
|
||
# Path to JSON document | ||
bin/dataset mystuff.ds path freda | ||
|
||
# Update a JSON document | ||
bin/dataset mystuff.ds update freda '{"name":"freda","email":"freda@zbs.example.org", "count": 2}' | ||
# If successful then you should see an OK or an error message | ||
|
||
# List the keys in the collection | ||
bin/dataset mystuff.ds keys | ||
|
||
# Get keys filtered for the name "freda" | ||
bin/dataset mystuff.ds keys '(eq .name "freda")' | ||
|
||
# Join freda-profile.json with "freda" adding unique key/value pairs | ||
bin/dataset mystuff.ds join append freda freda-profile.json | ||
|
||
# Join freda-profile.json overwriting in commont key/values adding unique key/value pairs | ||
# from freda-profile.json | ||
bin/dataset mystuff.ds join overwrite freda freda-profile.json | ||
|
||
# Delete a JSON document | ||
bin/dataset mystuff.ds delete freda | ||
|
||
# Import data from a CSV file using column 1 as key | ||
bin/dataset -quiet -nl=false mystuff.ds import-csv my-data.csv 1 | ||
|
||
# To remove the collection just use the Unix shell command | ||
rm -fR mystuff.ds | ||
</code></pre> | ||
|
||
<h2>Releases</h2> | ||
|
||
<p>Compiled versions are provided for Linux (amd64), Mac OS X (amd64), Windows 10 (amd64) and Raspbian (ARM7). | ||
See <a href="https://github.com/caltechlibrary/dataset/releases">https://github.com/caltechlibrary/dataset/releases</a>.</p> | ||
|
||
</section> | ||
|
||
<footer> | ||
<span><h1><A href="https://caltech.edu">Caltech</a></h1></span> | ||
<span>© 2017 <a href="https://www.library.caltech.edu/copyright">Caltech library</a></span> | ||
<address>1200 E California Blvd, Mail Code 1-32, Pasadena, CA 91125-3200</address> | ||
<span>Phone: <a href="tel:+1-626-395-3405">(626)395-3405</a></span> | ||
<span><a href="mailto:library@caltech.edu">Email Us</a></span> | ||
<a class="cl-hide" href="sitemap.xml">Site Map</a> | ||
</footer> | ||
</body> | ||
</html> |
Oops, something went wrong.