Skip to content

Commit

Permalink
Finish Python functional API and revamp docs
Browse files Browse the repository at this point in the history
  • Loading branch information
csala committed May 28, 2019
1 parent 238ef3c commit a0d7b41
Show file tree
Hide file tree
Showing 48 changed files with 644 additions and 877 deletions.
13 changes: 3 additions & 10 deletions AUTHORS.rst
@@ -1,20 +1,13 @@
Credits
=======

Development Lead
----------------

* Kalyan Veeramachaneni <kalyan@mit.edu>
* Carles Sala <csala@csail.mit.edu>

Contributors
------------

* Bennett Cyphers <bcyphers@mit.edu>
* Thomas Swearingen <swearin3@msu.edu>
* Laura Gustafson <lgustaf@mit.edu>
* Carles Sala <csala@csail.mit.edu>
* Plamen Valentinov <plamen@pythiac.com>
* Kalyan Veeramachaneni <kalyan@mit.edu>
* Micah Smith <micahjsmith@gmail.com>
* Laura Gustafson <lgustaf@mit.edu>
* Kiran Karra <kiran.karra@gmail.com>
* Max Kanter <kmax12@gmail.com>
* Alfredo Cuesta-Infante <alfredo.cuesta@urjc.es>
Expand Down
107 changes: 107 additions & 0 deletions CLI.md
@@ -0,0 +1,107 @@
# Command Line Interface

**ATM** provides a simple command line client that will allow you to run ATM directly
from your terminal by simply passing it the path to a CSV file.

In this example, we will use the default values that are provided in the code, which will use
the `pollution.csv` that is being generated with the demo datasets by ATM.

## 1. Generate the demo data

**ATM** command line allows you to generate the demo data that we will be using through this steps
by running the following command:

```bash
atm get_demos
```

A print on your console with the generated demo datasets will appear:

```bash
Generating file demos/iris.csv
Generating file demos/pollution.csv
Generating file demos/pitchfork_genres.csv
```

## 2. Create a dataset and generate it's dataruns

Once you have generated the demo datasets, now it's time to create a `dataset` object inside the
database. Our command line also triggers the generation of `datarun` objects for this dataset in
order to automate this process as much as possible:

```bash
atm enter_data
```

If you run this command, you will create a dataset with the default values, which is using the
`pollution_1.csv` dataset from the demo datasets.

A print, with similar information to this, should be printed:

```bash
method logreg has 6 hyperpartitions
method dt has 2 hyperpartitions
method knn has 24 hyperpartitions
Dataruns created. Summary:
Dataset ID: 1
Training data: demos/pollution_1.csv
Test data: None
Datarun ID: 1
Hyperpartition selection strategy: uniform
Parameter tuning strategy: uniform
Budget: 100 (classifier)
```

For more information about the arguments that this command line accepts, please run:

```bash
atm enter_data --help
```

## 3. Start a worker

**ATM** requieres a worker to process the dataruns that are not completed and stored inside the
database. This worker process will be runing until there are no dataruns `pending`.

In order to launch such a process, execute:

```bash
atm worker
```

This will start a process that builds classifiers, tests them, and saves them to the `./models/`
directory. The output should show which hyperparameters are being tested and the performance of
each classifier (the "judgment metric"), plus the best overall performance so far.

Prints similar to this one will apear repeatedly on your console while the `worker` is processing
the datarun:

```bash
Classifier type: classify_logreg
Params chosen:
C = 8904.06127554
_scale = True
fit_intercept = False
penalty = l2
tol = 4.60893080631
dual = True
class_weight = auto

Judgment metric (f1): 0.536 +- 0.067
Best so far (classifier 21): 0.716 +- 0.035
```

Occasionally, a worker will encounter an error in the process of building and testing a
classifier. When this happens, the worker will print error data to the console, log the error in
the database, and move on to the next classifier.

You can break out of the worker with <kbd>Ctrl</kbd>+<kbd>c</kbd> and restart it with the same
command; it will pick up right where it left off. You can also run the command simultaneously in
different terminals to parallelize the work -- all workers will refer to the same ModelHub
database. When all 100 classifiers in your budget have been built, all workers will exit gracefully.

This command aswell offers more information about the arguments that this command line accepts:

```
atm worker --help
```
197 changes: 197 additions & 0 deletions CONTRIBUTING.rst
@@ -0,0 +1,197 @@
.. highlight:: shell

============
Contributing
============

Contributions are welcome, and they are greatly appreciated! Every little bit
helps, and credit will always be given.

You can contribute in many ways:

Types of Contributions
----------------------

Report Bugs
~~~~~~~~~~~

Report bugs at https://github.com/HDI-Project/ATM/issues.

If you are reporting a bug, please include:

* Your operating system name and version.
* Any details about your local setup that might be helpful in troubleshooting.
* Detailed steps to reproduce the bug.

Fix Bugs
~~~~~~~~

Look through the GitHub issues for bugs. Anything tagged with "bug" and "help
wanted" is open to whoever wants to implement it.

Implement Features
~~~~~~~~~~~~~~~~~~

Look through the GitHub issues for features. Anything tagged with "enhancement"
and "help wanted" is open to whoever wants to implement it.

Write Documentation
~~~~~~~~~~~~~~~~~~~

ATM could always use more documentation, whether as part of the
official ATM docs, in docstrings, or even on the web in blog posts,
articles, and such.

Submit Feedback
~~~~~~~~~~~~~~~

The best way to send feedback is to file an issue at https://github.com/HDI-Project/ATM/issues.

If you are proposing a feature:

* Explain in detail how it would work.
* Keep the scope as narrow as possible, to make it easier to implement.
* Remember that this is a volunteer-driven project, and that contributions
are welcome :)

Get Started!
------------

Ready to contribute? Here's how to set up `ATM` for local development.

1. Fork the `ATM` repo on GitHub.
2. Clone your fork locally::

$ git clone git@github.com:your_name_here/ATM.git

3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed,
this is how you set up your fork for local development::

$ mkvirtualenv ATM
$ cd ATM/
$ make install-develop

4. Create a branch for local development::

$ git checkout -b name-of-your-bugfix-or-feature

Now you can make your changes locally.

5. While hacking your changes, make sure to cover all your developments with the required
unit tests, and that none of the old tests fail as a consequence of your changes.
For this, make sure to run the tests suite and check the code coverage::

$ make test # Run the tests
$ make coverage # Get the coverage report

6. When you're done making changes, check that your changes pass flake8 and the
tests, including testing other Python versions with tox::

$ make lint # Check code styling
$ make test-all # Execute tests on all python versions

7. Make also sure to include the necessary documentation in the code as docstrings following
the `google docstring`_ style.
If you want to view how your documentation will look like when it is published, you can
generate and view the docs with this command::

$ make viewdocs

8. Commit your changes and push your branch to GitHub::

$ git add .
$ git commit -m "Your detailed description of your changes."
$ git push origin name-of-your-bugfix-or-feature

9. Submit a pull request through the GitHub website.

.. _google docstring: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html

Pull Request Guidelines
-----------------------

Before you submit a pull request, check that it meets these guidelines:

1. It resolves an open GitHub Issue and contains its reference in the title or
the comment. If there is no associated issue, feel free to create one.
2. Whenever possible, it resolves only **one** issue. If your PR resolves more than
one issue, try to split it in more than one pull request.
3. The pull request should include unit tests that cover all the changed code
4. If the pull request adds functionality, the docs should be updated. Put
your new functionality into a function with a docstring, and add the
feature to the list in README.rst.
5. The pull request should work for Python2.7, 3.4, 3.5 and 3.6. Check
https://travis-ci.org/HDI-Project/ATM/pull_requests
and make sure that all the checks pass.

Unit Testing Guidelines
-----------------------

All the Unit Tests should comply with the following requirements:

1. Unit Tests should be based only in unittest and pytest modules.

2. The tests that cover a module called ``atm/path/to/a_module.py`` should be
implemented in a separated module called ``tests/atm/path/to/test_a_module.py``.
Note that the module name has the ``test_`` prefix and is located in a path similar
to the one of the tested module, just inside te ``tests`` folder.

3. Each method of the tested module should have at least one associated test method, and
each test method should cover only **one** use case or scenario.

4. Test case methods should start with the ``test_`` prefix and have descriptive names
that indicate which scenario they cover.
Names such as ``test_some_methed_input_none``, ``test_some_method_value_error`` or
``test_some_method_timeout`` are right, but names like ``test_some_method_1``,
``some_method`` or ``test_error`` are not.

5. Each test should validate only what the code of the method being tested does, and not
cover the behavior of any third party package or tool being used, which is assumed to
work properly as far as it is being passed the right values.

6. Any third party tool that may have any kind of random behavior, such as some Machine
Learning models, databases or Web APIs, will be mocked using the ``mock`` library, and
the only thing that will be tested is that our code passes the right values to them.

7. Unit tests should not use anything from outside the test and the code being tested. This
includes not reading or writting to any filesystem or database, which will be properly
mocked.

Tips
----

To run a subset of tests::

$ pytest tests.test_atm

Release Workflow
----------------

The process of releasing a new version involves several steps combining both ``git`` and
``bumpversion`` which, briefly:

1. Merge what is in ``master`` branch into ``stable`` branch.
2. Update the version in ``setup.cfg``, ``atm/__init__.py`` and ``HISTORY.md`` files.
3. Create a new git tag pointing at the corresponding commit in ``stable`` branch.
4. Merge the new commit from ``stable`` into ``master``.
5. Update the version in ``setup.cfg`` and ``atm/__init__.py``
to open the next development iteration.

.. note:: Before starting the process, make sure that ``HISTORY.md`` has been updated with a new
entry that explains the changes that will be included in the new version.
Normally this is just a list of the Pull Requests that have been merged to master
since the last release.

Once this is done, run of the following commands:

1. If you are releasing a patch version::

make release

2. If you are releasing a minor version::

make release-minor

3. If you are releasing a major version::

make release-major
2 changes: 1 addition & 1 deletion Makefile
Expand Up @@ -124,7 +124,7 @@ coverage: ## check code coverage quickly with the default Python

.PHONY: docs
docs: clean-docs ## generate Sphinx HTML documentation, including API docs
sphinx-apidoc --module-first --separate -o docs/api/ atm
sphinx-apidoc --separate -T -o docs/api/ atm
$(MAKE) -C docs html

.PHONY: view-docs
Expand Down

0 comments on commit a0d7b41

Please sign in to comment.