Skip to content

Commit

Permalink
prep for release
Browse files Browse the repository at this point in the history
  • Loading branch information
rsdoiel committed Mar 16, 2018
1 parent b00ad22 commit 15e66fd
Show file tree
Hide file tree
Showing 5 changed files with 449 additions and 3 deletions.
4 changes: 2 additions & 2 deletions codemeta.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"codeRepository": "https://github.com/caltechlibrary/dataset",
"issueTracker": "https://github.com/caltechlibrary/dataset/issues",
"license": "https://data.caltech.edu/license",
"version": "0.0.36",
"version": "0.0.37",
"author": [
{
"@type": "Person",
Expand All @@ -26,7 +26,7 @@
}
],
"developmentStatus": "active",
"downloadUrl": "https://github.com/caltechlibrary/dataset/archive/v0.0.36.zip",
"downloadUrl": "https://github.com/caltechlibrary/dataset/archive/v0.0.37.zip",
"keywords": [
"GitHub",
"metadata",
Expand Down
2 changes: 1 addition & 1 deletion dataset.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ import (

const (
// Version of the dataset package
Version = `v0.0.36`
Version = `v0.0.37`

// License is a formatted from for dataset package based command line tools
License = `
Expand Down
181 changes: 181 additions & 0 deletions index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
<!DOCTYPE html>
<html>
<head>
<title>Caltech Library's Digital Library Development Sandbox</title>
<link href='https://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="/css/site.css">
</head>
<body>
<header>
<a href="https://library.caltech.edu"><img src="/assets/liblogo.gif" alt="Caltech Library logo"></a>
</header>
<nav>
<ul>
<li><a href="/">Home</a></li>
<li><a href="../">README</a></li>
<li><a href="license.html">LICENSE</a></li>
<li><a href="install.html">INSTALL</a></li>
<li><a href="docs/">Documentation</a></li>
<li><a href="how-to/">How To</a></li>
<li><a href="https://github.com/caltechlibrary/dataset">Github</a></li>
</ul>

</nav>

<section>
<h1>dataset <a href="https://data.caltech.edu/badge/latestdoi/79394591"><img src="https://data.caltech.edu/badge/79394591.svg" alt="DOI" /></a></h1>

<p><em>dataset</em> is a command line tool for working with JSON (object) documents stored as
collections. <a href="docs/dataset/">This</a> supports basic storage actions (e.g. CRUD operations, filtering
and extraction) as well as <a href="docs/dataset/indexer.html">indexing</a>, <a href="docs/dataset/find.html">searching</a>.
A project goal of <em>dataset</em> is to &ldquo;play nice&rdquo; with shell scripts and other
Unix tools (e.g. it respects standard in, out and error with minimal side effects). This means it is
easily scriptable via Bash, Posix shell or interpretted languages like R.</p>

<p><em>dataset</em> includes an implementation as a Python3 module. The same functionality as in the command line tool is
replicated for Python3. (module requires Python 3.6 or better).</p>

<p>Finally <em>dataset</em> is a golang package for managing JSON documents and their attachments on disc or in cloud storage
(e.g. Amazon S3, Google Cloud Storage). The command line utilities excersize this package extensively.</p>

<p>The inspiration for creating <em>dataset</em> was the desire to process metadata as JSON document collections using
Unix shell utilities and pipe lines. While it has grown in capabilities that remains a core use case.</p>

<p><em>dataset</em> organanizes JSON documents by unique names in collections. Collections are represented
as an index into a series of buckets. The buckets are subdirectories (or paths under cloud storage services).
Buckets hold individual JSON documents and their attachments. The JSON document is assigned automatically to a
bucket (and the bucket generated if necessary) when it is added to a collection.
Assigning documents to buckets avoids having too many documents assigned to a single path (e.g. on some Unix
there is a limit to how many documents are held in a single directory). In addition to using the <em>dataset</em>
comnad you can list and manipulate the JSON documents directly with common Unix commands like ls, find, grep or
their cloud counter parts.</p>

<p>See <a href="docs/getting-started-with-dataset.html">getting-started-with-datataset.md</a> for a tour of functionality.</p>

<h3>Limitations of <em>dataset</em></h3>

<p><em>dataset</em> has many limitations, some are listed below</p>

<ul>
<li>it is not a multi-process, multi-user data store (it&rsquo;s just files on disc)</li>
<li>it is not a repository management system</li>
<li>it is not a general purpose multiuser database system</li>
</ul>

<h2>Operations</h2>

<p>The basic operations support by <em>dataset</em> are listed below organized by collection and JSON document level.</p>

<h3>Collection Level</h3>

<ul>
<li><a href="docs/dataset/init.html">init</a> creates a collection</li>
<li><a href="docs/dataset/import-csv.html">import-csv</a> JSON documents from rows of a CSV file</li>
<li><a href="docs/dataset/import-gsheet.html">import-gsheet</a> JSON documents from rows of a Google Sheet</li>
<li><a href="docs/dataset/export-csv.html">export-csv</a> JSON documents from a collection into a CSV file</li>
<li><a href="docs/dataset/export-gsheet.html">export-gsheet</a> JSON documents from a collection into a Google Sheet</li>
<li><a href="docs/dataset/keys.html">keys</a> list keys of JSON documents in a collection, supports filtering and sorting</li>
<li><a href="docs/dataset/haskey.html">haskey</a> returns true if key is found in collection, false otherwise</li>
<li><a href="docs/dataset/count.html">count</a> returns the number of documents in a collection, supports filtering for subsets</li>
<li><a href="docs/dataset/extract.html">extract</a> unique JSON attribute values from a collection</li>
</ul>

<h3>JSON Document level</h3>

<ul>
<li><a href="docs/dataset/create.html">create</a> a JSON document in a collection</li>
<li><a href="docs/dataset/read.html">read</a> back a JSON document in a collection</li>
<li><a href="docs/dataset/update.html">update</a> a JSON document in a collection</li>
<li><a href="docs/dataset/delete.html">delete</a> a JSON document in a collection</li>
<li><a href="docs/dataset/join.html">join</a> a JSON document with a document in a collection</li>
<li><a href="docs/dataset/list.html">list</a> the lists JSON records as an array for the supplied keys</li>
<li><a href="docs/dataset/path.html">path</a> list the file path for a JSON document in a collection</li>
</ul>

<h3>JSON Document Attachments</h3>

<ul>
<li><a href="docs/dataset/attach.html">attach</a> a file to a JSON document in a collection</li>
<li><a href="docs/dataset/attachments.html">attachments</a> lists the files attached to a JSON document in a collection</li>
<li><a href="docs/dataset/detach.html">detach</a> retrieve an attached file associated with a JSON document in a collection</li>
<li><a href="docs/dataset/prune.html">prune</a> delete one or more attached files of a JSON document in a collection</li>
</ul>

<h3>Search</h3>

<ul>
<li><a href="docs/dataset/indexer.html">indexer</a> indexes JSON documents in a collection for searching with <em>find</em></li>
<li><a href="docs/dataset/deindexer.html">deindexer</a> de-indexes (removes) JSON documents from an index</li>
<li><a href="docs/dataset/find.html">find</a> provides a index based full text search interface for collections</li>
</ul>

<h2>Example</h2>

<p>Common operations using the <em>dataset</em> command line tool</p>

<ul>
<li>create collection</li>
<li>create a JSON document to collection</li>
<li>read a JSON document</li>
<li>update a JSON document</li>
<li>delete a JSON document</li>
</ul>

<pre><code class="language-shell"> # Create a collection &quot;mystuff.ds&quot;, the &quot;.ds&quot; lets the bin/dataset command know that's the collection to use.
bin/dataset mystuff.ds init
# if successful then you should see an OK otherwise an error message

# Create a JSON document
bin/dataset mystuff.ds create freda '{&quot;name&quot;:&quot;freda&quot;,&quot;email&quot;:&quot;freda@inverness.example.org&quot;}'
# If successful then you should see an OK otherwise an error message

# Read a JSON document
bin/dataset mystuff.ds read freda

# Path to JSON document
bin/dataset mystuff.ds path freda

# Update a JSON document
bin/dataset mystuff.ds update freda '{&quot;name&quot;:&quot;freda&quot;,&quot;email&quot;:&quot;freda@zbs.example.org&quot;, &quot;count&quot;: 2}'
# If successful then you should see an OK or an error message

# List the keys in the collection
bin/dataset mystuff.ds keys

# Get keys filtered for the name &quot;freda&quot;
bin/dataset mystuff.ds keys '(eq .name &quot;freda&quot;)'

# Join freda-profile.json with &quot;freda&quot; adding unique key/value pairs
bin/dataset mystuff.ds join append freda freda-profile.json

# Join freda-profile.json overwriting in commont key/values adding unique key/value pairs
# from freda-profile.json
bin/dataset mystuff.ds join overwrite freda freda-profile.json

# Delete a JSON document
bin/dataset mystuff.ds delete freda

# Import data from a CSV file using column 1 as key
bin/dataset -quiet -nl=false mystuff.ds import-csv my-data.csv 1

# To remove the collection just use the Unix shell command
rm -fR mystuff.ds
</code></pre>

<h2>Releases</h2>

<p>Compiled versions are provided for Linux (amd64), Mac OS X (amd64), Windows 10 (amd64) and Raspbian (ARM7).
See <a href="https://github.com/caltechlibrary/dataset/releases">https://github.com/caltechlibrary/dataset/releases</a>.</p>

</section>

<footer>
<span><h1><A href="https://caltech.edu">Caltech</a></h1></span>
<span>&copy; 2017 <a href="https://www.library.caltech.edu/copyright">Caltech library</a></span>
<address>1200 E California Blvd, Mail Code 1-32, Pasadena, CA 91125-3200</address>
<span>Phone: <a href="tel:+1-626-395-3405">(626)395-3405</a></span>
<span><a href="mailto:library@caltech.edu">Email Us</a></span>
<a class="cl-hide" href="sitemap.xml">Site Map</a>
</footer>
</body>
</html>

0 comments on commit 15e66fd

Please sign in to comment.