prep for release

caltechlibrary · Mar 16, 2018 · 15e66fd · 15e66fd
1 parent b00ad22
commit 15e66fd
Show file tree

Hide file tree

Showing 5 changed files with 449 additions and 3 deletions.
diff --git a/codemeta.json b/codemeta.json
@@ -6,7 +6,7 @@
     "codeRepository": "https://github.com/caltechlibrary/dataset",
     "issueTracker": "https://github.com/caltechlibrary/dataset/issues",
     "license": "https://data.caltech.edu/license",
-    "version": "0.0.36",
+    "version": "0.0.37",
     "author": [
         {
             "@type": "Person",
@@ -26,7 +26,7 @@
         }
     ],
     "developmentStatus": "active",
-    "downloadUrl": "https://github.com/caltechlibrary/dataset/archive/v0.0.36.zip",
+    "downloadUrl": "https://github.com/caltechlibrary/dataset/archive/v0.0.37.zip",
     "keywords": [
         "GitHub",
         "metadata",

diff --git a/dataset.go b/dataset.go
@@ -43,7 +43,7 @@ import (
 
 const (
 	// Version of the dataset package
-	Version = `v0.0.36`
+	Version = `v0.0.37`
 
 	// License is a formatted from for dataset package based command line tools
 	License = `

diff --git a/index.html b/index.html
@@ -0,0 +1,181 @@
+<!DOCTYPE html>
+<html>
+<head>
+    <title>Caltech Library's Digital Library Development Sandbox</title>
+    <link href='https://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'>
+    <link rel="stylesheet" href="/css/site.css">
+</head>
+<body>
+<header>
+<a href="https://library.caltech.edu"><img src="/assets/liblogo.gif" alt="Caltech Library logo"></a>
+</header>
+<nav>
+<ul>
+<li><a href="/">Home</a></li>
+<li><a href="../">README</a></li>
+<li><a href="license.html">LICENSE</a></li>
+<li><a href="install.html">INSTALL</a></li>
+<li><a href="docs/">Documentation</a></li>
+<li><a href="how-to/">How To</a></li>
+<li><a href="https://github.com/caltechlibrary/dataset">Github</a></li>
+</ul>
+
+</nav>
+
+<section>
+<h1>dataset   <a href="https://data.caltech.edu/badge/latestdoi/79394591"><img src="https://data.caltech.edu/badge/79394591.svg" alt="DOI" /></a></h1>
+
+<p><em>dataset</em> is a command line tool for working with JSON (object) documents stored as
+collections.  <a href="docs/dataset/">This</a> supports basic storage actions (e.g. CRUD operations, filtering
+and extraction) as well as <a href="docs/dataset/indexer.html">indexing</a>, <a href="docs/dataset/find.html">searching</a>.
+A project goal of <em>dataset</em> is to &ldquo;play nice&rdquo; with shell scripts and other
+Unix tools (e.g. it respects standard in, out and error with minimal side effects). This means it is
+easily scriptable via Bash, Posix shell or interpretted languages like R.</p>
+
+<p><em>dataset</em> includes an implementation as a Python3 module. The same functionality as in the command line tool is
+replicated for Python3. (module requires Python 3.6 or better).</p>
+
+<p>Finally <em>dataset</em> is a golang package for managing JSON documents and their attachments on disc or in cloud storage
+(e.g. Amazon S3, Google Cloud Storage). The command line utilities excersize this package extensively.</p>
+
+<p>The inspiration for creating <em>dataset</em> was the desire to process metadata as JSON document collections using
+Unix shell utilities and pipe lines. While it has grown in capabilities that remains a core use case.</p>
+
+<p><em>dataset</em> organanizes JSON documents by unique names in collections. Collections are represented
+as an index into a series of buckets. The buckets are subdirectories (or paths under cloud storage services).
+Buckets hold individual JSON documents and their attachments. The JSON document is assigned automatically to a
+bucket (and the bucket generated if necessary) when it is added to a collection.
+Assigning documents to buckets avoids having too many documents assigned to a single path (e.g. on some Unix
+there is a limit to how many documents are held in a single directory). In addition to using the <em>dataset</em>
+comnad you can list and manipulate the JSON documents directly with common Unix commands like ls, find, grep or
+their cloud counter parts.</p>
+
+<p>See <a href="docs/getting-started-with-dataset.html">getting-started-with-datataset.md</a> for a tour of functionality.</p>
+
+<h3>Limitations of <em>dataset</em></h3>
+
+<p><em>dataset</em> has many limitations, some are listed below</p>
+
+<ul>
+<li>it is not a multi-process, multi-user data store (it&rsquo;s just files on disc)</li>
+<li>it is not a repository management system</li>
+<li>it is not a general purpose multiuser database system</li>
+</ul>
+
+<h2>Operations</h2>
+
+<p>The basic operations support by <em>dataset</em> are listed below organized by collection and JSON document level.</p>
+
+<h3>Collection Level</h3>
+
+<ul>
+<li><a href="docs/dataset/init.html">init</a> creates a collection</li>
+<li><a href="docs/dataset/import-csv.html">import-csv</a> JSON documents from rows of a CSV file</li>
+<li><a href="docs/dataset/import-gsheet.html">import-gsheet</a> JSON documents from rows of a Google Sheet</li>
+<li><a href="docs/dataset/export-csv.html">export-csv</a> JSON documents from a collection into a CSV file</li>
+<li><a href="docs/dataset/export-gsheet.html">export-gsheet</a> JSON documents from a collection into a Google Sheet</li>
+<li><a href="docs/dataset/keys.html">keys</a> list keys of JSON documents in a collection, supports filtering and sorting</li>
+<li><a href="docs/dataset/haskey.html">haskey</a> returns true if key is found in collection, false otherwise</li>
+<li><a href="docs/dataset/count.html">count</a> returns the number of documents in a collection, supports filtering for subsets</li>
+<li><a href="docs/dataset/extract.html">extract</a> unique JSON attribute values from a collection</li>
+</ul>
+
+<h3>JSON Document level</h3>
+
+<ul>
+<li><a href="docs/dataset/create.html">create</a> a JSON document in a collection</li>
+<li><a href="docs/dataset/read.html">read</a> back a JSON document in a collection</li>
+<li><a href="docs/dataset/update.html">update</a> a JSON document in a collection</li>
+<li><a href="docs/dataset/delete.html">delete</a> a JSON document in a collection</li>
+<li><a href="docs/dataset/join.html">join</a> a JSON document with a document in a collection</li>
+<li><a href="docs/dataset/list.html">list</a> the lists JSON records as an array for the supplied keys</li>
+<li><a href="docs/dataset/path.html">path</a> list the file path for a JSON document in a collection</li>
+</ul>
+
+<h3>JSON Document Attachments</h3>
+
+<ul>
+<li><a href="docs/dataset/attach.html">attach</a> a file to a JSON document in a collection</li>
+<li><a href="docs/dataset/attachments.html">attachments</a> lists the files attached to a JSON document in a collection</li>
+<li><a href="docs/dataset/detach.html">detach</a> retrieve an attached file associated with a JSON document in a collection</li>
+<li><a href="docs/dataset/prune.html">prune</a> delete one or more attached files of a JSON document in a collection</li>
+</ul>
+
+<h3>Search</h3>
+
+<ul>
+<li><a href="docs/dataset/indexer.html">indexer</a> indexes JSON documents in a collection for searching with <em>find</em></li>
+<li><a href="docs/dataset/deindexer.html">deindexer</a> de-indexes (removes) JSON documents from an index</li>
+<li><a href="docs/dataset/find.html">find</a> provides a index based full text search interface for collections</li>
+</ul>
+
+<h2>Example</h2>
+
+<p>Common operations using the <em>dataset</em> command line tool</p>
+
+<ul>
+<li>create collection</li>
+<li>create a JSON document to collection</li>
+<li>read a JSON document</li>
+<li>update a JSON document</li>
+<li>delete a JSON document</li>
+</ul>
+
+<pre><code class="language-shell">    # Create a collection &quot;mystuff.ds&quot;, the &quot;.ds&quot; lets the bin/dataset command know that's the collection to use. 
+    bin/dataset mystuff.ds init
+    # if successful then you should see an OK otherwise an error message
+
+    # Create a JSON document 
+    bin/dataset mystuff.ds create freda '{&quot;name&quot;:&quot;freda&quot;,&quot;email&quot;:&quot;freda@inverness.example.org&quot;}'
+    # If successful then you should see an OK otherwise an error message
+
+    # Read a JSON document
+    bin/dataset mystuff.ds read freda
+
+    # Path to JSON document
+    bin/dataset mystuff.ds path freda
+
+    # Update a JSON document
+    bin/dataset mystuff.ds update freda '{&quot;name&quot;:&quot;freda&quot;,&quot;email&quot;:&quot;freda@zbs.example.org&quot;, &quot;count&quot;: 2}'
+    # If successful then you should see an OK or an error message
+
+    # List the keys in the collection
+    bin/dataset mystuff.ds keys
+
+    # Get keys filtered for the name &quot;freda&quot;
+    bin/dataset mystuff.ds keys '(eq .name &quot;freda&quot;)'
+
+    # Join freda-profile.json with &quot;freda&quot; adding unique key/value pairs
+    bin/dataset mystuff.ds join append freda freda-profile.json
+
+    # Join freda-profile.json overwriting in commont key/values adding unique key/value pairs
+    # from freda-profile.json
+    bin/dataset mystuff.ds join overwrite freda freda-profile.json
+
+    # Delete a JSON document
+    bin/dataset mystuff.ds delete freda
+
+    # Import data from a CSV file using column 1 as key
+    bin/dataset -quiet -nl=false mystuff.ds import-csv my-data.csv 1
+
+    # To remove the collection just use the Unix shell command
+    rm -fR mystuff.ds
+</code></pre>
+
+<h2>Releases</h2>
+
+<p>Compiled versions are provided for Linux (amd64), Mac OS X (amd64), Windows 10 (amd64) and Raspbian (ARM7).
+See <a href="https://github.com/caltechlibrary/dataset/releases">https://github.com/caltechlibrary/dataset/releases</a>.</p>
+
+</section>
+
+<footer>
+<span><h1><A href="https://caltech.edu">Caltech</a></h1></span>
+<span>&copy; 2017 <a href="https://www.library.caltech.edu/copyright">Caltech library</a></span>
+<address>1200 E California Blvd, Mail Code 1-32, Pasadena, CA 91125-3200</address> 
+<span>Phone: <a href="tel:+1-626-395-3405">(626)395-3405</a></span>
+<span><a href="mailto:library@caltech.edu">Email Us</a></span>
+<a class="cl-hide" href="sitemap.xml">Site Map</a>
+</footer>
+</body>
+</html>