Skip to content

Commit

Permalink
Improvements to docs
Browse files Browse the repository at this point in the history
  • Loading branch information
jfischer committed Mar 27, 2019
1 parent 78f79e7 commit 4bfa410
Show file tree
Hide file tree
Showing 11 changed files with 75 additions and 24 deletions.
5 changes: 3 additions & 2 deletions dataworkspaces/kits/jupyter.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
"""
Integration with Jupyter notebooks
Integration with Jupyter notebooks. This module provides a
:class:`~LineageBuilder` subclass to simplify Lineage for Notebooks.
"""

import ipykernel
Expand Down Expand Up @@ -57,7 +58,7 @@ def is_notebook():

class NotebookLineageBuilder(LineageBuilder):
"""Notebooks are the final step in a pipeline
(and potentially the only step). We customizer
(and potentially the only step). We customize
the standard lineage builder to get the step
name from the notebook's name and to always have
a results directory.
Expand Down
1 change: 1 addition & 0 deletions dataworkspaces/lineage.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
=>| |=>
------------
To do this, we need use the following classes:
* :class:`~ResourceRef` - A reference to a resource for use as a step input or output.
Expand Down
Binary file modified docs/_picture_src/pictures.pptx
Binary file not shown.
Binary file added docs/_static/collaboration-workflow.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/initial-workflow.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,9 @@ Data Workspaces lets you:
intro
tutorial
commands
resources
lineage
kits
resources
internals


Expand Down
2 changes: 1 addition & 1 deletion docs/internals.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _internals:

6. Internals: Developer's Guide
7. Internals: Developer's Guide
===============================
This section is a guide for people working on the development of Data Workspaces
or people who which to extend it (e.g. through their own resource types or
Expand Down
55 changes: 37 additions & 18 deletions docs/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -211,13 +211,13 @@ To run the command line interface, you use the ``dws`` command,
which should have been installed into your environment by ``pip install``.
``dws`` operations have the form::

dws [GLOBAL_OPTIONS] SUBCOMMAND [SUBCOMMAND_OPTIONS] [SUBCOMMAND_ARGS]
dws [GLOBAL_OPTIONS] COMMAND [COMMAND_OPTIONS] [COMMAND_ARGS]

Just run ``dws --help`` for a list of global options and subcommands.
Just run ``dws --help`` for a list of global options and commands.

Subcommands
~~~~~~~~~~~
Here is a summary of the key subcommands:
Commands
~~~~~~~~
Here is a summary of the key commands:

* ``init`` - initialize a new workspace in the current directory
* ``add`` - add a *resource* (a git repo, a directory, an s3 bucket, etc.)
Expand All @@ -230,30 +230,49 @@ Here is a summary of the key subcommands:
* ``run`` - run a command and capture the lineage. This information is saved in a file for
future calls to the same command. *(not yet implemented)*

See the :ref:`Command Reference <commands>` section for a full description of
all commands and their options.

Workflow
~~~~~~~~
To put these subcommands in context, here is a typical workflow for a project:
To put these commands in context, here is a typical workflow for the
initial data scientist on a project:

.. image:: _static/dws-workflow.png
.. image:: _static/initial-workflow.png

The person starting the project creates a new workspace on their local machine
using the ``init`` command. If they picked a standard project template, they may
already have all the resources they need defined Otherwise, they use the ``add``
command to tell the data workspace about their code, data sets, and places where
they will store intermediate data and results. They can now start running their
experiments. Once they have finished a complete experiment, then can use the
using the ``init`` command. Next, they need to tell the data workspace about
their code, data sets, and places where they will store intermediate data and
results. If subdirectories of the main workspace are sufficient, they
can do this as a part of the ``init`` command, using the ``--create-resources``
option. Otherwise, they use the ``add``
command to define each *resource* associated with their project.

The data scientist can now run their experiements. This is typically an
iterative process, represented in the picture by the dashed box labeled
"Experiment Workflow". Once they have finished a complete experiment, then can use the
``snapshot`` command to capture the state of their workspace.

They can go back and run further experiments, taking a snapshot each time they
have something interesting. They can also go back to a prior state using the
``restore`` command.

At some point, the original developer will want to copy their project to a remote
service for sharing (and backup). To do this, they create an empty git repository
on the remote origin (e.g. GitHub) and then run the ``push`` command to update
Collaboration
.............
At some point, the data scientist will want to copy their project to a remote
service for sharing (and backup). Data Workspaces can use any Git hosting
service for this (e.g. GitHub, GitLab, or BitBucket) and does not need any
special setup. Here is an overview of collaborations
facilitated by Data Workspaces:

.. image:: _static/collaboration-workflow.png

First, the data scientist creates an empty git repository
on the remote origin (e.g. GitHub, GitLab, or BitBucket) and then runs the ``push`` command to update
the origin with the full history of the workspace.

A new collaborator can use the ``clone`` command to copy the workspace down to
their local machine. They can then run experiments and take snapshots, just
like the original developer. Then can download changes from the origin via
the ``pull`` comand and add upload their changes via the ``push`` command.
like the original data scientisst. When readly, then can upload their changes to the via the ``push`` command.
Others can then use the ``pull`` command to download these changes to their workspace.
This process can be repeated as many times as necessary, and multiple collaborators can overlap
their work.
29 changes: 29 additions & 0 deletions docs/kits.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
.. _kits:

5. Kits Reference
=================
In this section, we cover *kits*, integrations with various data science
libraries and infrastructure provided by Data Workspaces.

Jupyter
-------
.. automodule:: dataworkspaces.kits.jupyter
:no-undoc-members:

.. autoclass:: NotebookLineageBuilder
:members:

Scikit-learn
------------

.. automodule:: dataworkspaces.kits.sklearn
:no-undoc-members:

.. autoclass:: Metrics
:members:

.. autoclass:: BinaryClassificationMetrics
:members:

.. autoclass:: MulticlassClassificationMetrics
:members:
2 changes: 1 addition & 1 deletion docs/lineage.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _lineage:

5. Lineage API
4. Lineage API
==============
The Lineage API is provided by the module ``dataworkspaces.lineage``.

Expand Down
2 changes: 1 addition & 1 deletion docs/resources.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _resources:

4. Resource Reference
6. Resource Reference
=====================
This section provide a little detail on how to use specific
resource types.
Expand Down

0 comments on commit 4bfa410

Please sign in to comment.