Skip to content

Commit

Permalink
Merge pull request #214 from datmo/docs-additions
Browse files Browse the repository at this point in the history
Docs: static content additions
  • Loading branch information
asampat3090 committed Jun 28, 2018
2 parents 8086e27 + 5ee3bd1 commit 844a198
Show file tree
Hide file tree
Showing 7 changed files with 244 additions and 60 deletions.
47 changes: 47 additions & 0 deletions docs/concepts.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
Datmo Concepts
===================================

Environments
-------------

**Environments** contain the hardware and software necessary for running code. These involve everything from programming languages, language-level packages/libraries, operating systems, and GPU drivers. Users can store multiple environments and choose which to use at the time of a task run.


Workspaces
------------

**Workspaces** are interactive programming environments/IDE's. Depending on which environment is chosen during setup, there are a handful of workspaces that are available out of the box including:
- Jupyter Notebook via ``$ datmo notebook``
- RStudio via ``$ datmo rstudio``
- JupyterLab *(coming soon)*


Runs
--------------

A **run** is comprised of *tasks* and *snapshots*. Each run contains the initial state (snapshot), followed by the action that was performed to it (task), as well as the final state of the repository (another snapshot).


Tasks
---------

**Tasks** are loggable command line actions a user takes within a project. For example, the commands ``python train.py`` or ``python predict.py`` would both be examples of tasks.


Snapshots
-------------

For recording state, we have our own fundamental unit called a **Snapshot**. This enables the user to have a single point of reference for the model version, rather than having to worry about individually tracking each component. Snapshots contain five components, each of which is logged at the time of Snapshot creation simultaneously.

- **Source code** is managed between snapshot versions automagically inside of a hidden ``.datmo`` folder that the user never has to interact with. Users can


- **Environment** (dependencies, packages, libraries, system env) are stored in environment files (typically Dockerfiles) for containerized task running and reproducibility on other systems. Datmo also currently autogenerates a `requirements.txt` file based on the packages imported by Python scripts in the repository.


- **Files** include visualizations, model weights files, datasets, and any other files present at the time of snapshot creation. For versioning models, large datasets or weights files are recommended to be stored as pointers to external sources in the _config_ property.


- **Configurations** are properties which alter your experiments (such as variable hyperparameters). Configurations are user defined, which can include (but are not limited to) algorithm type, framework, hyperparameters, external file locations, database queries, and more.

- **Metrics** are the values that help you assess your model (e.g. validation accuracy, training time, loss function score). These can be passed in from a memory-level variable/object in the Python SDK, or manually as a file or value via the CLI for all other languages.
37 changes: 0 additions & 37 deletions docs/examples.rst

This file was deleted.

30 changes: 30 additions & 0 deletions docs/faq.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
Frequently Asked Questions
===================================

Q: What is the role of the datmo open source tool?

A: The open source project acts as a user-controlled project manager (available as both a CLI and Python SDK) that enables users to create, run, manage, and record all aspects of their experiments.

-----

Q: Do I have to know how to use Docker to use Datmo?

A: Not at all! If you do know Docker however, it will be helpful to understand better how the environments are created and setup.

------

Q: How can I add my own environments to be used with Datmo?

A: The ``environment setup`` command adds in a default environment provided by datmo in the ``datmo_environment`` directory. You can add in your own environment by modifying these files, or adding your own files to the ``datmo_environment`` directory (ie: Dockerfile, requirements.txt, package.json, etc). You can run a `datmo environment create` and use the environment ID at the time you run a task or run a workspace. You can also just directly run a task or workspace and Datmo will create a new environment from ``datmo_environment`` and will set the most recent environment that was setup as the default for running tasks.

------

Q: How does Datmo handle all of my different environments?

A: The default environment that will be used for running tasks at any given time is chosen by the Dockerfile that is present in the ``datmo_environment`` directory. The other environments locally available for your project, visible with ``$ datmo environment ls`` and can be selected by passing the environment ID in as a parameter at the time of a task run or workspace creation.

-----

Q: I've made changes to the Dockerfile in my project, but the container environment isn't changing too. Why is this?

A: When running a task, Datmo always looks first inside the ``datmo_environment`` directory. If an environment is not present there, it will then use a Dockerfile from the project's root directory (if present). However, after the first run, Datmo creates an environment entity and Dockerfile that are replicas of the one used at the time of the initial run. Because of the priority of environment directories, Datmo will utilize the Dockerfile from the ``datmo_environment`` for subsequent runs, which means that changes to the original Dockerfile outside of ``datmo_environment`` will not appear in the environment Datmo has created/tracked. If you would like to change the environment, you can change it in the ``datmo_environment`` folder.
37 changes: 14 additions & 23 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,38 +1,29 @@
Welcome to Datmo's documentation!
=================================
===================================

Datmo is an open source model tracking tool for developers
Datmo is an open source model tracking and reproducibility tool for developers.

Why we built this
-----------------

As data scientists, machine learning engineers, and deep learning engineers, we faced a number of issues keeping track of our work and maintaining versions that could be put into production quicker.

In order to solve this challenge, we found there are a few components that are critical to ensuring this is the case.

1) Source code should be managed with current source control management tools (of which git is the most popular currently)
2) Dependencies should be encoded in one place for your source code (e.g. requirements.txt in python and pre-built containers)
3) Large files that cannot be stored in source code like weights files, data files, etc should be stored separately
4) Configurations and hyperparameters that define your experiments (e.g. data split, alpha, beta, etc)
5) Performance metrics that evaluate your model (e.g. validation accuracy)

We've encapsulated these concepts in an object called a *snapshot*. A snapshot is a combination of all 5 of the above components
and is the way that Datmo versions models for reproducibility and deployability. Our open source tool is an interface for
developers to transform their current model projects into trackable models that can be used for transportability throughout the
model building process.

We have used this internally to speed up our own iteration processes and are excited to share it with the community to continue
improving. If you're interested in contributing check out `the guidelines <https://github.com/datmo/datmo/blob/master/CONTRIBUTING.md>`_.
Features
-------------
- One command environment setup (languages, frameworks, packages, etc)
- Tracking and logging for model config and results
- Project versioning (model state tracking)
- Experiment reproducibility (re-run tasks)
- Visualize + export experiment history


Table of contents
-----------------
.. toctree::
:maxdepth: 3

quickstart
workflows
tutorials
concepts
cli
python_sdk
examples
faq


Indices and tables
Expand Down
59 changes: 59 additions & 0 deletions docs/quickstart.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
Quickstart
===================================

Spinning up a TensorFlow Jupyter Notebook
--------------------------------------------

1. Install datmo using pip:

``$ pip install datmo``

2. Navigate to a new folder for your project and run:

``$ datmo init``

3. Create a name and description. When prompted for a desired environment, type:

``tensorflow:cpu``

4. Open a jupyter notebook automatically with:

``$ datmo notebook``

Congrats, you now have a functional jupyter notebook with TensorFlow!


--------

Testing it out
------------------------

1. Navigate to the notebook by typing the following into your browser:

``localhost:8888/?token=UNIQUE_TOKEN_FROM_TERMINAL``

2. Click

``New --> Notebook: Python2``

3. In the first cell, paste in and run:

.. code::
import tensorflow as tf
4. In the second cell paste and run:

.. code:: python
# Define a constant
hello = tf.constant('Hello, TensorFlow!')
# Start tf session
sess = tf.Session()
# Run the op
print(sess.run(hello))
If your output is ``Hello, TensorFlow!``, you're good to go!
21 changes: 21 additions & 0 deletions docs/tutorials.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Tutorials
=================================

We keep a curated list of formally supported tutorials available on a secondary repository located `here <https://github.com/datmo/datmo-tutorails>`_.

For ease of access, we've included them here in the documentation as well.


+--------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------+---------------------------------------------------------------------------+
| Python Tutorials |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------+---------------------------------------------------------------------------+
| Project | Tags | Datmo Features Used |
+==============================================================================================================================================================+=============================+===========================================================================+
| Kaggle Titanic Survivor Prediction (`CLI `_ / `SDK in Jupyter Notebook <https://github.com/datmo/datmo-tutorials/tree/master/kaggle-titanic/sdk>`_) | AutoML, TPOT, SVM | ``notebook``, ``snapshot create``, ``snapshot ls`` |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------+---------------------------------------------------------------------------+
| Face Recognition (`CLI in Jupyter Notebook <https://github.com/datmo/datmo-tutorials/tree/master/face-recognition>`_) | CV, dlib, face_recognition | ``notebook``, ``snapshot create``, ``snapshot ls`` |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------+---------------------------------------------------------------------------+
| Keras Fashion MNIST (`CLI in Jupyter Notebook <https://github.com/datmo/datmo-tutorials/tree/master/keras-fashion-mnist>`_) | CV, keras, tensorflow | ``notebook``, ``snapshot create``, ``snapshot ls`` |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------+---------------------------------------------------------------------------+
| Kaggle Jigsaw Toxic Comment Identification (`CLI in Jupyter Notebook <https://github.com/datmo/datmo-tutorials/tree/master/toxic-comment-identification>`_) | NLP, capsule net, Keras | ``notebook``, ``snapshot create``, ``snapshot ls``, ``environment setup`` |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------+---------------------------------------------------------------------------+

0 comments on commit 844a198

Please sign in to comment.