Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Finish Python functional API and revamp docs
- Loading branch information
Showing
48 changed files
with
644 additions
and
877 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
# Command Line Interface | ||
|
||
**ATM** provides a simple command line client that will allow you to run ATM directly | ||
from your terminal by simply passing it the path to a CSV file. | ||
|
||
In this example, we will use the default values that are provided in the code, which will use | ||
the `pollution.csv` that is being generated with the demo datasets by ATM. | ||
|
||
## 1. Generate the demo data | ||
|
||
**ATM** command line allows you to generate the demo data that we will be using through this steps | ||
by running the following command: | ||
|
||
```bash | ||
atm get_demos | ||
``` | ||
|
||
A print on your console with the generated demo datasets will appear: | ||
|
||
```bash | ||
Generating file demos/iris.csv | ||
Generating file demos/pollution.csv | ||
Generating file demos/pitchfork_genres.csv | ||
``` | ||
|
||
## 2. Create a dataset and generate it's dataruns | ||
|
||
Once you have generated the demo datasets, now it's time to create a `dataset` object inside the | ||
database. Our command line also triggers the generation of `datarun` objects for this dataset in | ||
order to automate this process as much as possible: | ||
|
||
```bash | ||
atm enter_data | ||
``` | ||
|
||
If you run this command, you will create a dataset with the default values, which is using the | ||
`pollution_1.csv` dataset from the demo datasets. | ||
|
||
A print, with similar information to this, should be printed: | ||
|
||
```bash | ||
method logreg has 6 hyperpartitions | ||
method dt has 2 hyperpartitions | ||
method knn has 24 hyperpartitions | ||
Dataruns created. Summary: | ||
Dataset ID: 1 | ||
Training data: demos/pollution_1.csv | ||
Test data: None | ||
Datarun ID: 1 | ||
Hyperpartition selection strategy: uniform | ||
Parameter tuning strategy: uniform | ||
Budget: 100 (classifier) | ||
``` | ||
|
||
For more information about the arguments that this command line accepts, please run: | ||
|
||
```bash | ||
atm enter_data --help | ||
``` | ||
|
||
## 3. Start a worker | ||
|
||
**ATM** requieres a worker to process the dataruns that are not completed and stored inside the | ||
database. This worker process will be runing until there are no dataruns `pending`. | ||
|
||
In order to launch such a process, execute: | ||
|
||
```bash | ||
atm worker | ||
``` | ||
|
||
This will start a process that builds classifiers, tests them, and saves them to the `./models/` | ||
directory. The output should show which hyperparameters are being tested and the performance of | ||
each classifier (the "judgment metric"), plus the best overall performance so far. | ||
|
||
Prints similar to this one will apear repeatedly on your console while the `worker` is processing | ||
the datarun: | ||
|
||
```bash | ||
Classifier type: classify_logreg | ||
Params chosen: | ||
C = 8904.06127554 | ||
_scale = True | ||
fit_intercept = False | ||
penalty = l2 | ||
tol = 4.60893080631 | ||
dual = True | ||
class_weight = auto | ||
|
||
Judgment metric (f1): 0.536 +- 0.067 | ||
Best so far (classifier 21): 0.716 +- 0.035 | ||
``` | ||
|
||
Occasionally, a worker will encounter an error in the process of building and testing a | ||
classifier. When this happens, the worker will print error data to the console, log the error in | ||
the database, and move on to the next classifier. | ||
|
||
You can break out of the worker with <kbd>Ctrl</kbd>+<kbd>c</kbd> and restart it with the same | ||
command; it will pick up right where it left off. You can also run the command simultaneously in | ||
different terminals to parallelize the work -- all workers will refer to the same ModelHub | ||
database. When all 100 classifiers in your budget have been built, all workers will exit gracefully. | ||
|
||
This command aswell offers more information about the arguments that this command line accepts: | ||
|
||
``` | ||
atm worker --help | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,197 @@ | ||
.. highlight:: shell | ||
|
||
============ | ||
Contributing | ||
============ | ||
|
||
Contributions are welcome, and they are greatly appreciated! Every little bit | ||
helps, and credit will always be given. | ||
|
||
You can contribute in many ways: | ||
|
||
Types of Contributions | ||
---------------------- | ||
|
||
Report Bugs | ||
~~~~~~~~~~~ | ||
|
||
Report bugs at https://github.com/HDI-Project/ATM/issues. | ||
|
||
If you are reporting a bug, please include: | ||
|
||
* Your operating system name and version. | ||
* Any details about your local setup that might be helpful in troubleshooting. | ||
* Detailed steps to reproduce the bug. | ||
|
||
Fix Bugs | ||
~~~~~~~~ | ||
|
||
Look through the GitHub issues for bugs. Anything tagged with "bug" and "help | ||
wanted" is open to whoever wants to implement it. | ||
|
||
Implement Features | ||
~~~~~~~~~~~~~~~~~~ | ||
|
||
Look through the GitHub issues for features. Anything tagged with "enhancement" | ||
and "help wanted" is open to whoever wants to implement it. | ||
|
||
Write Documentation | ||
~~~~~~~~~~~~~~~~~~~ | ||
|
||
ATM could always use more documentation, whether as part of the | ||
official ATM docs, in docstrings, or even on the web in blog posts, | ||
articles, and such. | ||
|
||
Submit Feedback | ||
~~~~~~~~~~~~~~~ | ||
|
||
The best way to send feedback is to file an issue at https://github.com/HDI-Project/ATM/issues. | ||
|
||
If you are proposing a feature: | ||
|
||
* Explain in detail how it would work. | ||
* Keep the scope as narrow as possible, to make it easier to implement. | ||
* Remember that this is a volunteer-driven project, and that contributions | ||
are welcome :) | ||
|
||
Get Started! | ||
------------ | ||
|
||
Ready to contribute? Here's how to set up `ATM` for local development. | ||
|
||
1. Fork the `ATM` repo on GitHub. | ||
2. Clone your fork locally:: | ||
|
||
$ git clone git@github.com:your_name_here/ATM.git | ||
|
||
3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, | ||
this is how you set up your fork for local development:: | ||
|
||
$ mkvirtualenv ATM | ||
$ cd ATM/ | ||
$ make install-develop | ||
|
||
4. Create a branch for local development:: | ||
|
||
$ git checkout -b name-of-your-bugfix-or-feature | ||
|
||
Now you can make your changes locally. | ||
|
||
5. While hacking your changes, make sure to cover all your developments with the required | ||
unit tests, and that none of the old tests fail as a consequence of your changes. | ||
For this, make sure to run the tests suite and check the code coverage:: | ||
|
||
$ make test # Run the tests | ||
$ make coverage # Get the coverage report | ||
|
||
6. When you're done making changes, check that your changes pass flake8 and the | ||
tests, including testing other Python versions with tox:: | ||
|
||
$ make lint # Check code styling | ||
$ make test-all # Execute tests on all python versions | ||
|
||
7. Make also sure to include the necessary documentation in the code as docstrings following | ||
the `google docstring`_ style. | ||
If you want to view how your documentation will look like when it is published, you can | ||
generate and view the docs with this command:: | ||
|
||
$ make viewdocs | ||
|
||
8. Commit your changes and push your branch to GitHub:: | ||
|
||
$ git add . | ||
$ git commit -m "Your detailed description of your changes." | ||
$ git push origin name-of-your-bugfix-or-feature | ||
|
||
9. Submit a pull request through the GitHub website. | ||
|
||
.. _google docstring: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html | ||
|
||
Pull Request Guidelines | ||
----------------------- | ||
|
||
Before you submit a pull request, check that it meets these guidelines: | ||
|
||
1. It resolves an open GitHub Issue and contains its reference in the title or | ||
the comment. If there is no associated issue, feel free to create one. | ||
2. Whenever possible, it resolves only **one** issue. If your PR resolves more than | ||
one issue, try to split it in more than one pull request. | ||
3. The pull request should include unit tests that cover all the changed code | ||
4. If the pull request adds functionality, the docs should be updated. Put | ||
your new functionality into a function with a docstring, and add the | ||
feature to the list in README.rst. | ||
5. The pull request should work for Python2.7, 3.4, 3.5 and 3.6. Check | ||
https://travis-ci.org/HDI-Project/ATM/pull_requests | ||
and make sure that all the checks pass. | ||
|
||
Unit Testing Guidelines | ||
----------------------- | ||
|
||
All the Unit Tests should comply with the following requirements: | ||
|
||
1. Unit Tests should be based only in unittest and pytest modules. | ||
|
||
2. The tests that cover a module called ``atm/path/to/a_module.py`` should be | ||
implemented in a separated module called ``tests/atm/path/to/test_a_module.py``. | ||
Note that the module name has the ``test_`` prefix and is located in a path similar | ||
to the one of the tested module, just inside te ``tests`` folder. | ||
|
||
3. Each method of the tested module should have at least one associated test method, and | ||
each test method should cover only **one** use case or scenario. | ||
|
||
4. Test case methods should start with the ``test_`` prefix and have descriptive names | ||
that indicate which scenario they cover. | ||
Names such as ``test_some_methed_input_none``, ``test_some_method_value_error`` or | ||
``test_some_method_timeout`` are right, but names like ``test_some_method_1``, | ||
``some_method`` or ``test_error`` are not. | ||
|
||
5. Each test should validate only what the code of the method being tested does, and not | ||
cover the behavior of any third party package or tool being used, which is assumed to | ||
work properly as far as it is being passed the right values. | ||
|
||
6. Any third party tool that may have any kind of random behavior, such as some Machine | ||
Learning models, databases or Web APIs, will be mocked using the ``mock`` library, and | ||
the only thing that will be tested is that our code passes the right values to them. | ||
|
||
7. Unit tests should not use anything from outside the test and the code being tested. This | ||
includes not reading or writting to any filesystem or database, which will be properly | ||
mocked. | ||
|
||
Tips | ||
---- | ||
|
||
To run a subset of tests:: | ||
|
||
$ pytest tests.test_atm | ||
|
||
Release Workflow | ||
---------------- | ||
|
||
The process of releasing a new version involves several steps combining both ``git`` and | ||
``bumpversion`` which, briefly: | ||
|
||
1. Merge what is in ``master`` branch into ``stable`` branch. | ||
2. Update the version in ``setup.cfg``, ``atm/__init__.py`` and ``HISTORY.md`` files. | ||
3. Create a new git tag pointing at the corresponding commit in ``stable`` branch. | ||
4. Merge the new commit from ``stable`` into ``master``. | ||
5. Update the version in ``setup.cfg`` and ``atm/__init__.py`` | ||
to open the next development iteration. | ||
|
||
.. note:: Before starting the process, make sure that ``HISTORY.md`` has been updated with a new | ||
entry that explains the changes that will be included in the new version. | ||
Normally this is just a list of the Pull Requests that have been merged to master | ||
since the last release. | ||
|
||
Once this is done, run of the following commands: | ||
|
||
1. If you are releasing a patch version:: | ||
|
||
make release | ||
|
||
2. If you are releasing a minor version:: | ||
|
||
make release-minor | ||
|
||
3. If you are releasing a major version:: | ||
|
||
make release-major |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.