Final changes for v0.1.0 (#341)

* [enhance] Increase the coverage (#336) * [feat] Support statistics print by adding results manager object (#334) * [feat] Support statistics print by adding results manager object * [refactor] Make SearchResults extract run_history at __init__ Since the search results should not be kept in eternally, I made this class to take run_history in __init__ so that we can implicitly call extraction inside. From this change, the call of extraction from outside is not recommended. However, you can still call it from outside and to prevent mixup of the environment, self.clear() will be called. * [fix] Separate those changes into PR#336 * [fix] Fix so that test_loss includes all the metrics * [enhance] Strengthen the test for sprint and SearchResults * [fix] Fix an issue in documentation * [enhance] Increase the coverage * [refactor] Separate the test for results_manager to organize the structure * [test] Add the test for get_incumbent_Result * [test] Remove the previous test_get_incumbent and see the coverage * [fix] [test] Fix reversion of metric and strengthen the test cases * [fix] Fix flake8 issues and increase coverage * [fix] Address Ravin's comments * [enhance] Increase the coverage * [fix] Fix a flake8 issu * Update for release (#335) * Create release workflow and CITATION.cff and update README, setup.py * fix bug in pypy token * fix documentation formatting * TODO for docker image * accept suggestions from shuhei * add further options for disable_file_output documentation * remove from release.yml * [feat] Add templates for issue and PR with the Ravin's suggestions (#136) * [doc] Add the workflow of the Auto-Pytorch (#285) * [doc] Add workflow of the AutoPytorch * [doc] Address Ravin's comment * [FIX] Silence catboost (#338) * set verbose=False in catboost * fix flake * change worst possible result of r2 (#340) * Update README.md with link for master branch * [FIX formatting in docs (#342) * fix formatting in docs * Update examples/40_advanced/example_resampling_strategy.py * Update README.md, remove cat requirements.txt Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com>
automl · Nov 23, 2021 · e4863fe · e4863fe
1 parent a1512d5
commit e4863fe
Show file tree

Hide file tree

Showing 28 changed files with 3,018 additions and 259 deletions.
diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md
@@ -0,0 +1,48 @@
+NOTE: ISSUES ARE NOT FOR CODE HELP - Ask for Help at https://stackoverflow.com
+
+Your issue may already be reported!
+Also, please search on the [issue tracker](../) before creating one.
+
+* **I'm submitting a ...**
+  - [ ] bug report
+  - [ ] feature request
+  - [ ] support request => Please do not submit support request here, see note at the top of this template.
+
+# Issue Description
+* When Issue Happens
+* Steps To Reproduce
+  1.
+  1.
+  1.
+
+## Expected Behavior
+<!--- If you're describing a bug, tell us what should happen -->
+<!--- If you're suggesting a change/improvement, tell us how it should work -->
+
+## Current Behavior
+<!--- If describing a bug, tell us what happens instead of the expected behavior -->
+<!--- If suggesting a change/improvement, explain the difference from current behavior -->
+
+## Possible Solution
+<!--- Not obligatory, but suggest a fix/reason for the bug, -->
+<!--- or ideas how to implement the addition or change -->
+
+## Your Code
+
+```
+If relevant, paste all of your challenge code here
+```
+
+## Error message
+
+```
+If relevant, paste all of your error messages here
+```
+
+## Your Local environment
+* Operating System, version
+* Python, version
+* Outputs of `pip freeze` or `conda list`
+
+Make sure to add **all the information needed to understand the bug** so that someone can help. 
+If the info is missing, we'll add the 'Needs more information' label and close the issue until there is enough information.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,38 @@
+<!--- Provide a general summary of your changes in the Title above -->
+
+## Types of changes
+<!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: -->
+- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
+- [ ] Bug fix (non-breaking change which fixes an issue)
+- [ ] New feature (non-breaking change which adds functionality)
+
+Note that a Pull Request should only contain one of refactoring, new features or documentation changes.
+Please separate these changes and send us individual PRs for each.
+For more information on how to create a good pull request, please refer to [The anatomy of a perfect pull request](https://medium.com/@hugooodias/the-anatomy-of-a-perfect-pull-request-567382bb6067).
+
+## Checklist:
+<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
+<!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! -->
+- [ ] My code follows the code style of this project.
+- [ ] My change requires a change to the documentation.
+- [ ] I have updated the documentation accordingly.
+* [ ] Have you checked to ensure there aren't other open [Pull Requests](../../../pulls) for the same update/change?
+* [ ] Have you added an explanation of what your changes do and why you'd like us to include them?
+* [ ] Have you written new tests for your core changes, as applicable?
+* [ ] Have you successfully ran tests with your changes locally?
+<!-- 
+* [ ] Have you followed the guidelines in our Contributing document?
+-->
+
+
+## Description
+<!--- Describe your changes in detail -->
+
+## Motivation and Context
+<!--- Why is this change required? What problem does it solve? -->
+<!--- If it fixes an open issue, please link to the issue here. -->
+
+## How has this been tested?
+<!--- Please describe in detail how you tested your changes. -->
+<!--- Include details of your testing environment, tests ran to see how -->
+<!--- your change affects other areas of the code, etc. -->
diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
@@ -0,0 +1,33 @@
+name: Push to PyPi
+
+on:
+  push:
+    branches:
+      - master
+
+jobs:
+  test:
+    runs-on: "ubuntu-latest"
+
+    steps:
+      - name: Checkout source
+        uses: actions/checkout@v2
+
+      - name: Set up Python 3.8
+        uses: actions/setup-python@v1
+        with:
+          python-version: 3.8
+
+      - name: Install build dependencies
+        run: python -m pip install build wheel
+
+      - name: Build distributions
+        shell: bash -l {0}
+        run: python setup.py sdist bdist_wheel
+
+      - name: Publish package to PyPI
+        if: github.repository == 'automl/Auto-PyTorch' && github.event_name == 'push' && startsWith(github.ref, 'refs/tags')
+        uses: pypa/gh-action-pypi-publish@master
+        with:
+          user: __token__
+          password: ${{ secrets.pypi_token }}
diff --git a/CITATION.cff b/CITATION.cff
@@ -0,0 +1,19 @@
+preferred-citation:
+  type: article
+  authors:
+  - family-names: "Zimmer"
+    given-names: "Lucas"
+    affiliation: "University of Freiburg, Germany"    
+  - family-names: "Lindauer"
+    given-names: "Marius"
+    affiliation: "University of Freiburg, Germany"    
+  - family-names: "Hutter"
+    given-names: "Frank"
+    affiliation: "University of Freiburg, Germany"
+  doi: "10.1109/TPAMI.2021.3067763"
+  journal-title: "IEEE Transactions on Pattern Analysis and Machine Intelligence"
+  title: "Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL"
+  year: 2021
+  note: "also available under https://arxiv.org/abs/2006.13799"
+  start: 3079
+  end: 3090
diff --git a/README.md b/README.md
@@ -1,14 +1,42 @@
 # Auto-PyTorch
 
-Copyright (C) 2019  [AutoML Group Freiburg](http://www.automl.org/)
+Copyright (C) 2021  [AutoML Groups Freiburg and Hannover](http://www.automl.org/)
 
-This an alpha version of Auto-PyTorch with improved API.
-So far, Auto-PyTorch supports tabular data (classification, regression).
-We plan to enable image data and time-series data.
+While early AutoML frameworks focused on optimizing traditional ML pipelines and their hyperparameters, another trend in AutoML is to focus on neural architecture search. To bring the best of these two worlds together, we developed **Auto-PyTorch**, which jointly and robustly optimizes the network architecture and the training hyperparameters to enable fully automated deep learning (AutoDL).
 
+Auto-PyTorch is mainly developed to support tabular data (classification, regression).
+The newest features in Auto-PyTorch for tabular data are described in the paper ["Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL"](https://arxiv.org/abs/2006.13799) (see below for bibtex ref).
+Also, find the documentation [here](https://automl.github.io/Auto-PyTorch/master).
 
-Find the documentation [here](https://automl.github.io/Auto-PyTorch/development)
+***From v0.1.0, AutoPyTorch has been updated to further improve usability, robustness and efficiency by using SMAC as the underlying optimization package as well as changing the code structure. Therefore, moving from v0.0.2 to v0.1.0 will break compatibility. 
+In case you would like to use the old API, you can find it at [`master_old`](https://github.com/automl/Auto-PyTorch/tree/master-old).***
 
+## Workflow
+
+The rough description of the workflow of Auto-Pytorch is drawn in the following figure.
+
+<img src="figs/apt_workflow.png" width="500">
+
+In the figure, **Data** is provided by user and
+**Portfolio** is a set of configurations of neural networks that work well on diverse datasets.
+The current version only supports the *greedy portfolio* as described in the paper *Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL*
+This portfolio is used to warm-start the optimization of SMAC.
+In other words, we evaluate the portfolio on a provided data as initial configurations.
+Then API starts the following procedures:
+1. **Validate input data**: Process each data type, e.g. encoding categorical data, so that Auto-Pytorch can handled.
+2. **Create dataset**: Create a dataset that can be handled in this API with a choice of cross validation or holdout splits.
+3. **Evaluate baselines** *1: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from `sklearn.dummy` that represents the worst possible performance.
+4. **Search by [SMAC](https://github.com/automl/SMAC3)**:\
+    a. Determine budget and cut-off rules by [Hyperband](https://jmlr.org/papers/volume18/16-558/16-558.pdf)\
+    b. Sample a pipeline hyperparameter configuration *2 by SMAC\
+    c. Update the observations by obtained results\
+    d. Repeat a. -- c. until the budget runs out
+5. Build the best ensemble for the provided dataset from the observations and [model selection of the ensemble](https://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml04.icdm06long.pdf).
+
+*1: Baselines are a predefined pool of machine learning algorithms, e.g. LightGBM and support vector machine, to solve either regression or classification task on the provided dataset
+
+*2: A pipeline hyperparameter configuration specifies the choice of components, e.g. target algorithm, the shape of neural networks, in each step and 
+(which specifies the choice of components in each step and their corresponding hyperparameters.
 
 ## Installation
 
@@ -25,14 +53,57 @@ We recommend using Anaconda for developing as follows:
 git submodule update --init --recursive
 
 # Create the environment
-conda create -n autopytorch python=3.8
-conda activate autopytorch
+conda create -n auto-pytorch python=3.8
+conda activate auto-pytorch
 conda install swig
-cat requirements.txt | xargs -n 1 -L 1 pip install
 python setup.py install
 
 ```
 
+## Examples
+
+In a nutshell:
+
+```py
+from autoPyTorch.api.tabular_classification import TabularClassificationTask
+
+# data and metric imports
+import sklearn.model_selection
+import sklearn.datasets
+import sklearn.metrics
+X, y = sklearn.datasets.load_digits(return_X_y=True)
+X_train, X_test, y_train, y_test = \
+        sklearn.model_selection.train_test_split(X, y, random_state=1)
+
+# initialise Auto-PyTorch api
+api = TabularClassificationTask()
+
+# Search for an ensemble of machine learning algorithms
+api.search(
+    X_train=X_train,
+    y_train=y_train,
+    X_test=X_test,
+    y_test=y_test,
+    optimize_metric='accuracy',
+    total_walltime_limit=300,
+    func_eval_time_limit_secs=50
+)
+
+# Calculate test accuracy
+y_pred = api.predict(X_test)
+score = api.score(y_pred, y_test)
+print("Accuracy score", score)
+```
+
+For more examples including customising the search space, parellising the code, etc, checkout the `examples` folder
+
+```sh
+$ cd examples/
+```
+
+
+Code for the [paper](https://arxiv.org/abs/2006.13799) is available under `examples/ensemble` in the [TPAMI.2021.3067763](https://github.com/automl/Auto-PyTorch/tree/TPAMI.2021.3067763`) branch.
+
 ## Contributing
 
 If you want to contribute to Auto-PyTorch, clone the repository and checkout our current development branch
@@ -63,8 +134,8 @@ Please refer to the branch `TPAMI.2021.3067763` to reproduce the paper *Auto-PyT
   title = {Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL},
   journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
   year = {2021},
-  note = {IEEE early access; also available under https://arxiv.org/abs/2006.13799},
-  pages = {1-12}
+  note = {also available under https://arxiv.org/abs/2006.13799},
+  pages = {3079 - 3090}
 }
 ```