Skip to content

Commit

Permalink
Merge pull request #120 from datmo/docs-update
Browse files Browse the repository at this point in the history
Docs update for milestone 0.1.0
  • Loading branch information
asampat3090 committed May 11, 2018
2 parents e20ddb7 + ebfc440 commit c2d61c3
Show file tree
Hide file tree
Showing 27 changed files with 721 additions and 254 deletions.
38 changes: 23 additions & 15 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,23 @@
# Datmo

Guidelines for contributors coming soon.
## Contributing to Datmo
We encourage community contributions to Datmo. The best place to get started is by running examples
locally. Once you have it working, contributors can take a shot at improving our documentation. While
our documentation is continually updated with code updates, there is always much room for improvement.
Documentation can be found in the `docs/` directory, in the docstrings for functions in the code, and
in the `README` file.

## Pull Request Process
Every contribution, must be a pull request and must have adequate time for review by other committers.

The goal of every pull request is to merge it into the main master code branch. The tasks of reviewing
a new pull request will be done by the main committers / maintainers of the repository. Here are a few flows for how
a pull request might progress

1) the PR is not mergeable, in which case either the maintainer or the person who created the branch should then mention they are addressing it and then work on it. If not the case progress to 2
2) the PR is mergeable but requires changes that would need to be done by the person who created the branch / feature, in which case we comment and just keep the thread going -- then it may progress to either 3 or 4
3) the PR is mergeable and is good to go
4) the PR is mergeable and is almost good to go but requires changes that are quick and can be done by the maintainer -- or by other contributors to the main repo to make it consistent with existing code, in which case we close the current PR, bring the branch into the main repo, and make changes there, then merge

## Code Style Guidelines
Datmo uses [yapf](https://github.com/google/yapf) to autoformat code.
Expand All @@ -11,20 +28,11 @@ cd <git_root>
yapf -i <python_files changed>
```

If you're using Visual Studio Code and want to run `yapf -i [filename]` on save, install the [Run on Save] extension.(https://marketplace.visualstudio.com/items?itemName=emeraldwalk.RunOnSave)

```json
"emeraldwalk.runonsave": {
"commands": [
{
"match":"\\.py$",
"cmd":"yapf -i ${file}"
}
]
}
```

Our integration tests will fail if code is not formatted correctly

## Documentation Style Guidelines
Datmo uses [NumPy style documentation](https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt). Please follow these conventions when documenting code, since we use [Sphinx+Napoleon](http://www.sphinx-doc.org/en/stable/ext/napoleon.html) to automatically generate docs on [our Docs page](http://datmo.readthedocs.io/en/latest/)
Datmo uses [NumPy style documentation](https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt). Please follow these conventions when documenting code, since we use [Sphinx+Napoleon](http://www.sphinx-doc.org/en/stable/ext/napoleon.html) to automatically generate docs on [our Docs page](http://datmo.readthedocs.io/en/latest/)

## Developer Information
You can find more developer information in the [`devtools/` directory](/devtools) including helpful
code snippets and commands you might find helpful in the development process.
99 changes: 73 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,27 @@
# Datmo
# ![Datmo Logo](images/datmo-logo.png)
[![PyPI version](https://badge.fury.io/py/datmo.svg)](https://badge.fury.io/py/datmo)
[![Build Status](https://travis-ci.org/datmo/datmo.svg?branch=master)](https://travis-ci.org/datmo/datmo)
[![Build status](https://ci.appveyor.com/api/projects/status/5302d8a23qr4ui4y/branch/master?svg=true)](https://ci.appveyor.com/project/asampat3090/datmo/branch/master)
[![Coverage Status](https://coveralls.io/repos/github/datmo/datmo/badge.svg?branch=master)](https://coveralls.io/github/datmo/datmo?branch=master)
[![Documentation Status](https://readthedocs.org/projects/datmo/badge/?version=latest)](http://datmo.readthedocs.io/en/latest/?badge=latest)
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/853b3d01b4424ac9aa72f9d5fead83b3)](https://www.codacy.com/app/datmo/datmo)

Open source model tracking tool for developers. Use `datmo init` to turn any repository into a supercharged experiment tracking
powerhouse.
**Datmo** is an open source model tracking and reproducibility tool for developers. Use `datmo init` to turn any repository into a trackable task record with reusable environments and metrics logging.

### Table of Contents
* [Introduction](#introduction)
* [Requirements](#requirements)
* [Installation](#installation)
* [Examples](#examples)
* [Documentation](#documentation)
* [Transform a Current Project](#transform)
* [Sharing](#sharing)
* [Contributing to Datmo](/CONTRIBUTING.md)
* [Testing](#testing)

## Introduction
Tracking experiments in a unified manner for data science, machine learning, and artificial intelligence projects is difficult for many reasons, with one of the largest being the lack of interoperability between frameworks, languages, environments, and best practices.

As data scientists, machine learning engineers, and deep learning engineers, we faced a number of issues keeping track of our work and maintaining versions that could be put into production quicker.

In order to solve this challenge, we figured there were a few components we need to put together to make it work.

1) Source code should be managed with current source control management tools (of which git is the most popular currently)
2) Dependencies should be encoded in one place for your source code (e.g. requirements.txt in python and pre-built containers)
3) Large files that cannot be stored in source code like weights files, data files, etc should be stored separately
4) Configurations and hyperparameters that define your experiments (e.g. data split, alpha, beta, etc)
5) Performance metrics that evaluate your model (e.g. validation accuracy)

We realized that we likely won't come up with the best solution on our own and thought it would make most sense to gather feedback from a community of like-minded individuals facing the same issue and develop an open protocol everyone can benefit from.
Datmo's open source tool helps to alleviate some of the largest pain points of dealing with model-based projects by leveraging strong foundational technologies and enforcing a set of conventions in a framework, language, and platform-agnostic CLI, with additional SDKs for more granular control and workflow integration.

## Requirements

Expand All @@ -41,16 +34,7 @@ We realized that we likely won't come up with the best solution on our own and t
pip install datmo
```

## Project Structure
Datmo adds `.datmo` directory which keeps track of all of the various entities into a repository to make it datmo-enabled.

## Project Templates
In the `/templates` folder we have templates for those who will be starting their projects from scratch.

Each folder includes a set of files that are not required by datmo but that augment your project and may be useful
as you start new projects.

## Project Examples
## Examples
In the `/examples` folder we have a few scripts you can run to get a feel for datmo. You can
navigate to [Examples](/examples/README.md) to learn more about how you can run the examples
and get started with your own projects.
Expand All @@ -65,6 +49,8 @@ Here's a comparison of a typical logistic regression model with one leveraging D
<tr>
<td class="tg-us36">
<pre lang="python">
# train.py
#
from sklearn import datasets
from sklearn import linear_model as lm
from sklearn import model_selection as ms
Expand Down Expand Up @@ -102,6 +88,8 @@ print(test_acc)
</pre></td>
<td class="tg-us36">
<pre lang="python">
# train.py
#
from sklearn import datasets
from sklearn import linear_model as lm
from sklearn import model_selection as ms
Expand Down Expand Up @@ -159,6 +147,65 @@ In order to run the above code you can do the following.
5. Congrats! You just created your first snapshot :) Now run an ls command for snapshots to see your first snapshot.

$ datmo snapshot ls

## How it works
### Project Structure
When running `datmo init`, Datmo adds a hidden `.datmo` directory which keeps track of all of the various entities at play. This is ncessary to render a repository datmo-enabled.

### Snapshots

<p align="center">
The fundamental unit of record in the Datmo ecosystem is a <b>Snapshot</b>, which contains 5 first-class components.
<br><br>
<img size="250px" src="https://raw.githubusercontent.com/datmo/datmo/docs-update/images/snapshot-badge-readme.png">
</p>


#### Code
Source code should be managed with current source control management tools. Datmo currently is built on top of git, but could theoretically be ported to work with any similar SCM protocol. While datmo will track all of your local changes and experiments on your machine, you will still need to push changes to a remote repository for them to be continually synced with a manager of choice (like GitHub).

For sharing Datmo entities directly with others (beta), see [this section](#sharing-beta) of the README below.

#### Environment
Dependencies should be encoded using standard best practices for your source code. Python packages should be enumerated in a `requirements.txt` file, while system level dependencies (typically found during GPU workflows) should be written into a `Dockerfile`.

#### Configuration
Variables used in your experiment that are necessary for reproducibility. These typically include algorithm hyperparameter values, train/test data split, etc.

#### Files
Large files that cannot be stored in source code (ie: untrackable in git due to size) should be stored separately. For data sources that are not discretizable into files (or are stored elsewhere), it is advised to write out the location/directory of these data sources/files as an entry in the `stats` property.

#### Stats
Model metrics are written to the `stats` property of a snapshot. Datmo does not enforce any type of formal metric definition, the user is free to pass any key-value dictionary during snapshot creation. This enables users to abide by their own metric logging convention while having the flexibility of being able to natively compare metrics across algorithms or frameworks.


## Documentation
The full docs are hosted [here](https://datmo.readthedocs.io/en/latest/index.html). If you wish to contribute to the docs (source code located here in `/docs`), follow the procedure outlined in `CONTRIBUTING.md`.

## Transform a Current Project
You can transform your existing repository into a datmo enabled repository with the following command
```
$ datmo init
```
If at any point you would like to remove datmo you can just remove the `.datmo` directory from your repository
or you can run the following command
```
$ datmo cleanup
```
### Optional: Mark your GitHub repository as a Datmo project
Once you initialize your project, you can denote your repository as a datmo project by adding the following badge to your README file.
This helps someone pulling the code to know how to setup and run Datmo commands, as the badge will link them to usage instructions here.

#### Markdown
```markdown
[![Datmo Model](https://github.com/datmo/datmo/blob/master/images/badge.svg)](https://github.com/datmo/datmo)
```
#### ReStructuredText
```
.. image:: https://github.com/datmo/datmo/blob/master/images/badge.svg
:target: https://github.com/datmo/datmo
```

## Sharing (Workaround)
**DISCLAIMER:** This is not currently an officially supported option and only works for
Expand Down Expand Up @@ -194,4 +241,4 @@ $ git commit -m "removed .datmo from tracking" # clean up your scm so datmo can
$ git pull origin +refs/datmo/*:refs/datmo/* # pull datmo refs from remote
$ datmo init # This enables datmo in the new location. If you enter blanks, no project information will be updated
```
If you are interested in sharing using the datmo protocol, you can visit [Datmo's website](https://datmo.com/product)
If you are interested in sharing using the datmo protocol, you can visit [Datmo's website](https://datmo.com/product)

0 comments on commit c2d61c3

Please sign in to comment.