Finish Python functional API and revamp docs

HDI-Project · May 28, 2019 · a0d7b41 · a0d7b41
1 parent 238ef3c
commit a0d7b41
Show file tree

Hide file tree

Showing 48 changed files with 644 additions and 877 deletions.
diff --git a/AUTHORS.rst b/AUTHORS.rst
@@ -1,20 +1,13 @@
 Credits
 =======
 
-Development Lead
-----------------
-
-* Kalyan Veeramachaneni <kalyan@mit.edu>
-* Carles Sala <csala@csail.mit.edu>
-
-Contributors
-------------
-
 * Bennett Cyphers <bcyphers@mit.edu>
 * Thomas Swearingen <swearin3@msu.edu>
-* Laura Gustafson <lgustaf@mit.edu>
+* Carles Sala <csala@csail.mit.edu>
 * Plamen Valentinov <plamen@pythiac.com>
+* Kalyan Veeramachaneni <kalyan@mit.edu>
 * Micah Smith <micahjsmith@gmail.com>
+* Laura Gustafson <lgustaf@mit.edu>
 * Kiran Karra <kiran.karra@gmail.com>
 * Max Kanter <kmax12@gmail.com>
 * Alfredo Cuesta-Infante <alfredo.cuesta@urjc.es>

diff --git a/CLI.md b/CLI.md
@@ -0,0 +1,107 @@
+# Command Line Interface
+
+**ATM** provides a simple command line client that will allow you to run ATM directly
+from your terminal by simply passing it the path to a CSV file.
+
+In this example, we will use the default values that are provided in the code, which will use
+the `pollution.csv` that is being generated with the demo datasets by ATM.
+
+## 1. Generate the demo data
+
+**ATM** command line allows you to generate the demo data that we will be using through this steps
+by running the following command:
+
+```bash
+atm get_demos
+```
+
+A print on your console with the generated demo datasets will appear:
+
+```bash
+Generating file demos/iris.csv
+Generating file demos/pollution.csv
+Generating file demos/pitchfork_genres.csv
+```
+
+## 2. Create a dataset and generate it's dataruns
+
+Once you have generated the demo datasets, now it's time to create a `dataset` object inside the
+database. Our command line also triggers the generation of `datarun` objects for this dataset in
+order to automate this process as much as possible:
+
+```bash
+atm enter_data
+```
+
+If you run this command, you will create a dataset with the default values, which is using the
+`pollution_1.csv` dataset from the demo datasets.
+
+A print, with similar information to this, should be printed:
+
+```bash
+method logreg has 6 hyperpartitions
+method dt has 2 hyperpartitions
+method knn has 24 hyperpartitions
+Dataruns created. Summary:
+	Dataset ID: 1
+	Training data: demos/pollution_1.csv
+	Test data: None
+	Datarun ID: 1
+	Hyperpartition selection strategy: uniform
+	Parameter tuning strategy: uniform
+	Budget: 100 (classifier)
+```
+
+For more information about the arguments that this command line accepts, please run:
+
+```bash
+atm enter_data --help
+```
+
+## 3. Start a worker
+
+**ATM** requieres a worker to process the dataruns that are not completed and stored inside the
+database. This worker process will be runing until there are no dataruns `pending`.
+
+In order to launch such a process, execute:
+
+```bash
+atm worker
+```
+
+This will start a process that builds classifiers, tests them, and saves them to the `./models/`
+directory. The output should show which hyperparameters are being tested and the performance of
+each classifier (the "judgment metric"), plus the best overall performance so far.
+
+Prints similar to this one will apear repeatedly on your console while the `worker` is processing
+the datarun:
+
+```bash
+Classifier type: classify_logreg
+Params chosen:
+       C = 8904.06127554
+       _scale = True
+       fit_intercept = False
+       penalty = l2
+       tol = 4.60893080631
+       dual = True
+       class_weight = auto
+
+Judgment metric (f1): 0.536 +- 0.067
+Best so far (classifier 21): 0.716 +- 0.035
+```
+
+Occasionally, a worker will encounter an error in the process of building and testing a
+classifier. When this happens, the worker will print error data to the console, log the error in
+the database, and move on to the next classifier.
+
+You can break out of the worker with <kbd>Ctrl</kbd>+<kbd>c</kbd> and restart it with the same
+command; it will pick up right where it left off. You can also run the command simultaneously in
+different terminals to parallelize the work -- all workers will refer to the same ModelHub
+database. When all 100 classifiers in your budget have been built, all workers will exit gracefully.
+
+This command aswell offers more information about the arguments that this command line accepts:
+
+```
+atm worker --help
+```
diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst
@@ -0,0 +1,197 @@
+.. highlight:: shell
+
+============
+Contributing
+============
+
+Contributions are welcome, and they are greatly appreciated! Every little bit
+helps, and credit will always be given.
+
+You can contribute in many ways:
+
+Types of Contributions
+----------------------
+
+Report Bugs
+~~~~~~~~~~~
+
+Report bugs at https://github.com/HDI-Project/ATM/issues.
+
+If you are reporting a bug, please include:
+
+* Your operating system name and version.
+* Any details about your local setup that might be helpful in troubleshooting.
+* Detailed steps to reproduce the bug.
+
+Fix Bugs
+~~~~~~~~
+
+Look through the GitHub issues for bugs. Anything tagged with "bug" and "help
+wanted" is open to whoever wants to implement it.
+
+Implement Features
+~~~~~~~~~~~~~~~~~~
+
+Look through the GitHub issues for features. Anything tagged with "enhancement"
+and "help wanted" is open to whoever wants to implement it.
+
+Write Documentation
+~~~~~~~~~~~~~~~~~~~
+
+ATM could always use more documentation, whether as part of the
+official ATM docs, in docstrings, or even on the web in blog posts,
+articles, and such.
+
+Submit Feedback
+~~~~~~~~~~~~~~~
+
+The best way to send feedback is to file an issue at https://github.com/HDI-Project/ATM/issues.
+
+If you are proposing a feature:
+
+* Explain in detail how it would work.
+* Keep the scope as narrow as possible, to make it easier to implement.
+* Remember that this is a volunteer-driven project, and that contributions
+  are welcome :)
+
+Get Started!
+------------
+
+Ready to contribute? Here's how to set up `ATM` for local development.
+
+1. Fork the `ATM` repo on GitHub.
+2. Clone your fork locally::
+
+    $ git clone git@github.com:your_name_here/ATM.git
+
+3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed,
+   this is how you set up your fork for local development::
+
+    $ mkvirtualenv ATM
+    $ cd ATM/
+    $ make install-develop
+
+4. Create a branch for local development::
+
+    $ git checkout -b name-of-your-bugfix-or-feature
+
+   Now you can make your changes locally.
+
+5. While hacking your changes, make sure to cover all your developments with the required
+   unit tests, and that none of the old tests fail as a consequence of your changes.
+   For this, make sure to run the tests suite and check the code coverage::
+
+    $ make test       # Run the tests
+    $ make coverage   # Get the coverage report
+
+6. When you're done making changes, check that your changes pass flake8 and the
+   tests, including testing other Python versions with tox::
+
+    $ make lint       # Check code styling
+    $ make test-all   # Execute tests on all python versions
+
+7. Make also sure to include the necessary documentation in the code as docstrings following
+   the `google docstring`_ style.
+   If you want to view how your documentation will look like when it is published, you can
+   generate and view the docs with this command::
+
+    $ make viewdocs
+
+8. Commit your changes and push your branch to GitHub::
+
+    $ git add .
+    $ git commit -m "Your detailed description of your changes."
+    $ git push origin name-of-your-bugfix-or-feature
+
+9. Submit a pull request through the GitHub website.
+
+.. _google docstring: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html
+
+Pull Request Guidelines
+-----------------------
+
+Before you submit a pull request, check that it meets these guidelines:
+
+1. It resolves an open GitHub Issue and contains its reference in the title or
+   the comment. If there is no associated issue, feel free to create one.
+2. Whenever possible, it resolves only **one** issue. If your PR resolves more than
+   one issue, try to split it in more than one pull request.
+3. The pull request should include unit tests that cover all the changed code
+4. If the pull request adds functionality, the docs should be updated. Put
+   your new functionality into a function with a docstring, and add the
+   feature to the list in README.rst.
+5. The pull request should work for Python2.7, 3.4, 3.5 and 3.6. Check
+   https://travis-ci.org/HDI-Project/ATM/pull_requests
+   and make sure that all the checks pass.
+
+Unit Testing Guidelines
+-----------------------
+
+All the Unit Tests should comply with the following requirements:
+
+1. Unit Tests should be based only in unittest and pytest modules.
+
+2. The tests that cover a module called ``atm/path/to/a_module.py`` should be
+   implemented in a separated module called ``tests/atm/path/to/test_a_module.py``.
+   Note that the module name has the ``test_`` prefix and is located in a path similar
+   to the one of the tested module, just inside te ``tests`` folder.
+
+3. Each method of the tested module should have at least one associated test method, and
+   each test method should cover only **one** use case or scenario.
+
+4. Test case methods should start with the ``test_`` prefix and have descriptive names
+   that indicate which scenario they cover.
+   Names such as ``test_some_methed_input_none``, ``test_some_method_value_error`` or
+   ``test_some_method_timeout`` are right, but names like ``test_some_method_1``,
+   ``some_method`` or ``test_error`` are not.
+
+5. Each test should validate only what the code of the method being tested does, and not
+   cover the behavior of any third party package or tool being used, which is assumed to
+   work properly as far as it is being passed the right values.
+
+6. Any third party tool that may have any kind of random behavior, such as some Machine
+   Learning models, databases or Web APIs, will be mocked using the ``mock`` library, and
+   the only thing that will be tested is that our code passes the right values to them.
+
+7. Unit tests should not use anything from outside the test and the code being tested. This
+   includes not reading or writting to any filesystem or database, which will be properly
+   mocked.
+
+Tips
+----
+
+To run a subset of tests::
+
+    $ pytest tests.test_atm
+
+Release Workflow
+----------------
+
+The process of releasing a new version involves several steps combining both ``git`` and
+``bumpversion`` which, briefly:
+
+1. Merge what is in ``master`` branch into ``stable`` branch.
+2. Update the version in ``setup.cfg``, ``atm/__init__.py`` and ``HISTORY.md`` files.
+3. Create a new git tag pointing at the corresponding commit in ``stable`` branch.
+4. Merge the new commit from ``stable`` into ``master``.
+5. Update the version in ``setup.cfg`` and ``atm/__init__.py``
+   to open the next development iteration.
+
+.. note:: Before starting the process, make sure that ``HISTORY.md`` has been updated with a new
+          entry that explains the changes that will be included in the new version.
+          Normally this is just a list of the Pull Requests that have been merged to master
+          since the last release.
+
+Once this is done, run of the following commands:
+
+1. If you are releasing a patch version::
+
+    make release
+
+2. If you are releasing a minor version::
+
+    make release-minor
+
+3. If you are releasing a major version::
+
+    make release-major
diff --git a/Makefile b/Makefile
@@ -124,7 +124,7 @@ coverage: ## check code coverage quickly with the default Python
 
 .PHONY: docs
 docs: clean-docs ## generate Sphinx HTML documentation, including API docs
-	sphinx-apidoc --module-first --separate -o docs/api/ atm
+	sphinx-apidoc --separate -T -o docs/api/ atm
 	$(MAKE) -C docs html
 
 .PHONY: view-docs