Ravin Kohli: [FIX] Remove redundant categorical imputation (#375)

automl · Feb 9, 2022 · cddf25e · cddf25e
1 parent 9fac462
commit cddf25e
Show file tree

Hide file tree

Showing 31 changed files with 341 additions and 361 deletions.
diff --git a/development/_downloads/bc82bea3a5dd7bdba60b65220891d9e5/examples_python.zip b/development/_downloads/bc82bea3a5dd7bdba60b65220891d9e5/examples_python.zip
diff --git a/development/_downloads/fb625db3c50d423b1b7881136ffdeec8/examples_jupyter.zip b/development/_downloads/fb625db3c50d423b1b7881136ffdeec8/examples_jupyter.zip
diff --git a/development/_images/sphx_glr_example_plot_over_time_001.png b/development/_images/sphx_glr_example_plot_over_time_001.png
diff --git a/development/_images/sphx_glr_example_plot_over_time_thumb.png b/development/_images/sphx_glr_example_plot_over_time_thumb.png
diff --git a/development/_images/sphx_glr_example_visualization_001.png b/development/_images/sphx_glr_example_visualization_001.png
diff --git a/development/_images/sphx_glr_example_visualization_thumb.png b/development/_images/sphx_glr_example_visualization_thumb.png
diff --git a/development/_sources/examples/20_basics/example_image_classification.rst.txt b/development/_sources/examples/20_basics/example_image_classification.rst.txt
@@ -87,14 +87,14 @@ Image Classification
     Configuration(values={
       'image_augmenter:GaussianBlur:use_augmenter': False,
       'image_augmenter:GaussianNoise:use_augmenter': False,
-      'image_augmenter:RandomAffine:rotate': 171,
-      'image_augmenter:RandomAffine:scale_offset': 0.16968859827986923,
-      'image_augmenter:RandomAffine:shear': 42,
-      'image_augmenter:RandomAffine:translate_percent_offset': 0.0006590163048096454,
+      'image_augmenter:RandomAffine:rotate': 341,
+      'image_augmenter:RandomAffine:scale_offset': 0.395298372732557,
+      'image_augmenter:RandomAffine:shear': 23,
+      'image_augmenter:RandomAffine:translate_percent_offset': 0.30691858738922473,
       'image_augmenter:RandomAffine:use_augmenter': True,
       'image_augmenter:RandomCutout:use_augmenter': False,
       'image_augmenter:Resize:use_augmenter': True,
-      'image_augmenter:ZeroPadAndCrop:percent': 0.058124084899998096,
+      'image_augmenter:ZeroPadAndCrop:percent': 0.1233844931832313,
       'normalizer:__choice__': 'NoNormalizer',
     })
 
@@ -175,7 +175,7 @@ Image Classification
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 0 minutes  5.506 seconds)
+   **Total running time of the script:** ( 0 minutes  5.581 seconds)
 
 
 .. _sphx_glr_download_examples_20_basics_example_image_classification.py:

diff --git a/development/_sources/examples/20_basics/example_tabular_classification.rst.txt b/development/_sources/examples/20_basics/example_tabular_classification.rst.txt
@@ -134,7 +134,7 @@ Search for an ensemble of machine learning algorithms
  .. code-block:: none
 
 
-    <autoPyTorch.api.tabular_classification.TabularClassificationTask object at 0x7f5f46714070>
+    <autoPyTorch.api.tabular_classification.TabularClassificationTask object at 0x7f6d20d85100>
 
 
 
@@ -166,33 +166,29 @@ Print the final ensemble performance
  .. code-block:: none
 
     {'accuracy': 0.8497109826589595}
-    |    | Preprocessing                                                                         | Estimator                                                          |   Weight |
-    |---:|:--------------------------------------------------------------------------------------|:-------------------------------------------------------------------|---------:|
-    |  0 | SimpleImputer,Variance Threshold,NoEncoder,PowerTransformer,NoFeaturePreprocessing    | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential    |     0.14 |
-    |  1 | SimpleImputer,Variance Threshold,NoEncoder,MinMaxScaler,KitchenSink                   | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential    |     0.12 |
-    |  2 | None                                                                                  | CBLearner                                                          |     0.12 |
-    |  3 | None                                                                                  | SVMLearner                                                         |     0.12 |
-    |  4 | None                                                                                  | RFLearner                                                          |     0.08 |
-    |  5 | SimpleImputer,Variance Threshold,NoEncoder,MinMaxScaler,NoFeaturePreprocessing        | no embedding,MLPBackbone,FullyConnectedHead,nn.Sequential          |     0.06 |
-    |  6 | None                                                                                  | KNNLearner                                                         |     0.06 |
-    |  7 | SimpleImputer,Variance Threshold,OneHotEncoder,QuantileTransformer,PolynomialFeatures | embedding,ShapedResNetBackbone,FullyConnectedHead,nn.Sequential    |     0.04 |
-    |  8 | SimpleImputer,Variance Threshold,OneHotEncoder,MinMaxScaler,PolynomialFeatures        | embedding,MLPBackbone,FullyConnectedHead,nn.Sequential             |     0.04 |
-    |  9 | SimpleImputer,Variance Threshold,OneHotEncoder,NoScaler,PolynomialFeatures            | no embedding,ShapedResNetBackbone,FullyConnectedHead,nn.Sequential |     0.04 |
-    | 10 | None                                                                                  | LGBMLearner                                                        |     0.04 |
-    | 11 | None                                                                                  | ETLearner                                                          |     0.04 |
-    | 12 | SimpleImputer,Variance Threshold,OneHotEncoder,NoScaler,PolynomialFeatures            | embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential       |     0.02 |
-    | 13 | SimpleImputer,Variance Threshold,OneHotEncoder,QuantileTransformer,PolynomialFeatures | no embedding,ResNetBackbone,FullyConnectedHead,nn.Sequential       |     0.02 |
-    | 14 | SimpleImputer,Variance Threshold,NoEncoder,PowerTransformer,NoFeaturePreprocessing    | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential    |     0.02 |
-    | 15 | SimpleImputer,Variance Threshold,OneHotEncoder,StandardScaler,NoFeaturePreprocessing  | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential    |     0.02 |
-    | 16 | SimpleImputer,Variance Threshold,OneHotEncoder,StandardScaler,NoFeaturePreprocessing  | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential    |     0.02 |
+    |    | Preprocessing                                                                        | Estimator                                                          |   Weight |
+    |---:|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------|---------:|
+    |  0 | SimpleImputer,Variance Threshold,NoEncoder,MinMaxScaler,Nystroem                     | no embedding,ResNetBackbone,FullyConnectedHead,nn.Sequential       |     0.18 |
+    |  1 | SimpleImputer,Variance Threshold,NoEncoder,NoScaler,KitchenSink                      | no embedding,MLPBackbone,FullyConnectedHead,nn.Sequential          |     0.16 |
+    |  2 | SimpleImputer,Variance Threshold,NoEncoder,NoScaler,KitchenSink                      | no embedding,MLPBackbone,FullyConnectedHead,nn.Sequential          |     0.12 |
+    |  3 | None                                                                                 | CBLearner                                                          |     0.12 |
+    |  4 | None                                                                                 | SVMLearner                                                         |     0.1  |
+    |  5 | None                                                                                 | RFLearner                                                          |     0.06 |
+    |  6 | None                                                                                 | KNNLearner                                                         |     0.06 |
+    |  7 | SimpleImputer,Variance Threshold,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential    |     0.06 |
+    |  8 | SimpleImputer,Variance Threshold,NoEncoder,StandardScaler,PolynomialFeatures         | no embedding,ShapedResNetBackbone,FullyConnectedHead,nn.Sequential |     0.04 |
+    |  9 | None                                                                                 | LGBMLearner                                                        |     0.04 |
+    | 10 | SimpleImputer,Variance Threshold,OneHotEncoder,RobustScaler,KernelPCA                | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential    |     0.02 |
+    | 11 | SimpleImputer,Variance Threshold,OneHotEncoder,QuantileTransformer,KitchenSink       | embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential       |     0.02 |
+    | 12 | None                                                                                 | ETLearner                                                          |     0.02 |
     autoPyTorch results:
             Dataset name: Australian
             Optimisation Metric: accuracy
             Best validation score: 0.8713450292397661
-            Number of target algorithm runs: 26
-            Number of successful target algorithm runs: 24
-            Number of crashed target algorithm runs: 1
-            Number of target algorithms that exceeded the time limit: 1
+            Number of target algorithm runs: 24
+            Number of successful target algorithm runs: 22
+            Number of crashed target algorithm runs: 0
+            Number of target algorithms that exceeded the time limit: 2
             Number of target algorithms that exceeded the memory limit: 0
 
 
@@ -202,7 +198,7 @@ Print the final ensemble performance
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 5 minutes  22.134 seconds)
+   **Total running time of the script:** ( 5 minutes  26.257 seconds)
 
 
 .. _sphx_glr_download_examples_20_basics_example_tabular_classification.py:

diff --git a/development/_sources/examples/20_basics/example_tabular_regression.rst.txt b/development/_sources/examples/20_basics/example_tabular_regression.rst.txt
@@ -125,7 +125,7 @@ Search for an ensemble of machine learning algorithms
  .. code-block:: none
 
 
-    <autoPyTorch.api.tabular_regression.TabularRegressionTask object at 0x7f5fe1484d90>
+    <autoPyTorch.api.tabular_regression.TabularRegressionTask object at 0x7f6dbb9bcd90>
 
 
 
@@ -159,22 +159,21 @@ Print the final ensemble performance
 
  .. code-block:: none
 
-    {'r2': 0.9408102126984811}
-    |    | Preprocessing                                                                        | Estimator                                                          |   Weight |
-    |---:|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------|---------:|
-    |  0 | None                                                                                 | CBLearner                                                          |     0.44 |
-    |  1 | SimpleImputer,Variance Threshold,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential    |     0.44 |
-    |  2 | SimpleImputer,Variance Threshold,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential    |     0.06 |
-    |  3 | SimpleImputer,Variance Threshold,NoEncoder,RobustScaler,NoFeaturePreprocessing       | no embedding,ShapedResNetBackbone,FullyConnectedHead,nn.Sequential |     0.04 |
-    |  4 | SimpleImputer,Variance Threshold,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential    |     0.02 |
+    {'r2': 0.9407884171054208}
+    |    | Preprocessing                                                                        | Estimator                                                       |   Weight |
+    |---:|:-------------------------------------------------------------------------------------|:----------------------------------------------------------------|---------:|
+    |  0 | None                                                                                 | CBLearner                                                       |     0.44 |
+    |  1 | SimpleImputer,Variance Threshold,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential |     0.42 |
+    |  2 | SimpleImputer,Variance Threshold,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential |     0.1  |
+    |  3 | None                                                                                 | LGBMLearner                                                     |     0.04 |
     autoPyTorch results:
-            Dataset name: 41e1340b-8997-11ec-881e-d58abe4e22c6
+            Dataset name: 9c684c0d-89ce-11ec-8818-a1cc4bcb5e23
             Optimisation Metric: r2
             Best validation score: 0.8670098636440993
-            Number of target algorithm runs: 24
-            Number of successful target algorithm runs: 23
+            Number of target algorithm runs: 29
+            Number of successful target algorithm runs: 29
             Number of crashed target algorithm runs: 0
-            Number of target algorithms that exceeded the time limit: 1
+            Number of target algorithms that exceeded the time limit: 0
             Number of target algorithms that exceeded the memory limit: 0
 
 
@@ -184,7 +183,7 @@ Print the final ensemble performance
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 5 minutes  30.224 seconds)
+   **Total running time of the script:** ( 5 minutes  30.478 seconds)
 
 
 .. _sphx_glr_download_examples_20_basics_example_tabular_regression.py:

diff --git a/development/_sources/examples/20_basics/sg_execution_times.rst.txt b/development/_sources/examples/20_basics/sg_execution_times.rst.txt
@@ -5,12 +5,12 @@
 
 Computation times
 =================
-**10:57.864** total execution time for **examples_20_basics** files:
+**11:02.317** total execution time for **examples_20_basics** files:
 
 +--------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_examples_20_basics_example_tabular_regression.py` (``example_tabular_regression.py``)         | 05:30.224 | 0.0 MB |
+| :ref:`sphx_glr_examples_20_basics_example_tabular_regression.py` (``example_tabular_regression.py``)         | 05:30.478 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_examples_20_basics_example_tabular_classification.py` (``example_tabular_classification.py``) | 05:22.134 | 0.0 MB |
+| :ref:`sphx_glr_examples_20_basics_example_tabular_classification.py` (``example_tabular_classification.py``) | 05:26.257 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_examples_20_basics_example_image_classification.py` (``example_image_classification.py``)     | 00:05.506 | 0.0 MB |
+| :ref:`sphx_glr_examples_20_basics_example_image_classification.py` (``example_image_classification.py``)     | 00:05.581 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------------------+-----------+--------+