Merge branch 'dfsg' into debian

* dfsg: DOC: some documentation fixes. whatsnew: gave myself some credit COSMIT typo ENHanced the multilabel example aspect MISC: species distribution example plotted
syhw · Jan 13, 2012 · d38004d · d38004d
2 parents 05bd43e + 81599bb
commit d38004d
Show file tree

Hide file tree

Showing 5 changed files with 56 additions and 47 deletions.
diff --git a/doc/modules/outlier_detection.rst b/doc/modules/outlier_detection.rst
@@ -4,6 +4,8 @@
 Novelty and Outlier Detection
 ===================================================
 
+.. currentmodule:: sklearn
+
 Many applications require being able to decide whether a new observation
 belongs to the same distribution as exiting observations (it is an
 `inlier`), or should be considered as different (it is an outlier).
@@ -52,19 +54,20 @@ observations. Otherwise, if they lay outside the frontier, we can say
 that they are abnormal with a given confidence in our assessment.
 
 The One-Class SVM has been introduced in [1] for that purpose and
-implemented in the `sklearn.svm` package in the :class:`OneClassSVM`
-object. It requires the choice of a kernel and a scalar parameter to
-define a frontier.  The RBF kernel is usually chosen although there
-exist no exact formula or algorithm to set its bandwith
-parameter. This is the default in the scikit-learn implementation. The
-:math:`\nu` parameter, also known as the margin of the One-Class SVM,
-corresponds to the probability of finding a new, but regular,
-observation outside the frontier.
+implemented in the :ref:`svm` module in the
+:class:`svm.OneClassSVM` object. It requires the choice of a
+kernel and a scalar parameter to define a frontier.  The RBF kernel is
+usually chosen although there exist no exact formula or algorithm to
+set its bandwith parameter. This is the default in the scikit-learn
+implementation. The :math:`\nu` parameter, also known as the margin of
+the One-Class SVM, corresponds to the probability of finding a new,
+but regular, observation outside the frontier.
 
 .. topic:: Examples:
 
-   * See :ref:`example_svm_plot_oneclass.py` for vizualizing the frontier
-     learned around some data by a :class:`OneClassSVM` object.
+   * See :ref:`example_svm_plot_oneclass.py` for vizualizing the
+     frontier learned around some data by a
+     :class:`svm.OneClassSVM` object.
 
 .. figure:: ../auto_examples/svm/images/plot_oneclass_1.png
    :target: ../auto_examples/svm/plot_oneclasse.html
@@ -84,17 +87,17 @@ of regular observations that can be used to train any tool.
 Fitting an elliptic envelop
 -----------------------------
 
-.. currentmodule:: sklearn.covariance
 
 One common way of performing outlier detection is to assume that the
 regular data come from a known distribution (e.g. data are Gaussian
 distributed). From this assumption, we generaly try to define the
 "shape" of the data, and can define outlying observations as
 observations which stand far enough from the fit shape. 
 
-The scikit-learn provides an object :class:`EllipticEnvelop` that fits a
-robust covariance estimate to the data, and thus fits an ellipse to the
-central data points, ignoring points outside the central mode.
+The scikit-learn provides an object
+:class:`covariance.EllipticEnvelop` that fits a robust covariance
+estimate to the data, and thus fits an ellipse to the central data
+points, ignoring points outside the central mode.
 
 For instance, assuming that the inlier data are Gaussian distributed, it
 will estimate the inlier location and covariance in a robust way (i.e.
@@ -111,9 +114,9 @@ This strategy is illustrated below.
 
    * See :ref:`example_covariance_plot_mahalanobis_distances.py` for
      an illustration of the difference between using a standard
-     (:class:`EmpiricalCovariance`) or a robust estimate (:class:`MinCovDet`)
-     of location and covariance to assess the degree of outlyingness of an
-     observation.
+     (:class:`covariance.EmpiricalCovariance`) or a robust estimate
+     (:class:`covariance.MinCovDet`) of location and covariance to
+     assess the degree of outlyingness of an observation.
 
 One-class SVM versus elliptic envelop
 --------------------------------------
@@ -126,8 +129,9 @@ inlying data is very challenging, and a One-class SVM gives useful
 results in these situations.
 
 The examples below illustrate how the performance of the
-:class:`EllipticEnvelop` degrades as the data is less and less unimodal.
-:class:`OneClassSVM` works better on data with multiple modes.
+:class:`coavariance.EllipticEnvelop` degrades as the data is less and
+less unimodal.  :class:`svm.OneClassSVM` works better on data with
+multiple modes.
 
 .. |outlier1| image:: ../auto_examples/covariance/images/plot_outlier_detection_1.png
    :target: ../auto_examples/covariance/plot_outlier_detection.html
@@ -146,32 +150,35 @@ The examples below illustrate how the performance of the
 
    *
       - For a inlier mode well-centered and elliptic, the
-        :class:`OneClassSVM` is not able to benefit from the rotational
-        symmetry of the inlier population. In addition, it fits a bit the 
-        outlyers present in the training set. On the opposite, the
-        decision rule based on fitting an :class:`EllipticEnvelop`
-        learns an ellipse, which fits well the inlier distribution.
+        :class:`svm.OneClassSVM` is not able to benefit from the
+        rotational symmetry of the inlier population. In addition, it
+        fits a bit the outlyers present in the training set. On the
+        opposite, the decision rule based on fitting an
+        :class:`covariance.EllipticEnvelop` learns an ellipse, which
+        fits well the inlier distribution.
       - |outlier1| 
 
    * 
-      - As the inlier distribution becomes bimodal, the 
-        :class:`EllipticEnvelop` does not fit well the inliers. However,
-        we can see that the :class:`OneClassSVM` tends to overfit:
-        because it has not model of inliers, it interprets a region
-        where, by chance some outliers are clustered, as inliers.
+      - As the inlier distribution becomes bimodal, the
+        :class:`covariance.EllipticEnvelop` does not fit well the
+        inliers. However, we can see that the :class:`svm.OneClassSVM`
+        tends to overfit: because it has not model of inliers, it
+        interprets a region where, by chance some outliers are
+        clustered, as inliers.
       - |outlier2| 
 
    * 
       - If the inlier distribution is strongly non Gaussian, the
-        :class:`OneClassSVM` is able to recover a reasonable
-        approximation, whereas the :class:`EllipticEnvelop` completely
-        fails.
+        :class:`svm.OneClassSVM` is able to recover a reasonable
+        approximation, whereas the :class:`covariance.EllipticEnvelop`
+        completely fails.
       - |outlier3|
 
 .. topic:: Examples:
 
-   * See :ref:`example_covariance_plot_outlier_detection.py` for a comparison
-     of the :class:`OneClassSVM` (tuned to perform like an outlier detection
-     method) and a covariance-based outlier detection with :class:`MinCovDet`.
+   * See :ref:`example_covariance_plot_outlier_detection.py` for a
+     comparison of the :class:`svm.OneClassSVM` (tuned to perform like
+     an outlier detection method) and a covariance-based outlier
+     detection with :class:`covariance.MinCovDet`.
 
 
diff --git a/doc/themes/scikit-learn/layout.html b/doc/themes/scikit-learn/layout.html
@@ -135,7 +135,7 @@
     {% else %}
     <h3>News</h3>
 
-    <p>scikit-learn 0.9 is available
+    <p>scikit-learn 0.10 is available
     for <a href="https://sourceforge.net/projects/scikit-learn/files/">download</a>.
     See <a href="{{pathto('whats_new')}}">what's new</a> and tips
     on <a href="{{pathto('install')}}">installing</a>.</p>

diff --git a/doc/whats_new.rst b/doc/whats_new.rst
@@ -49,7 +49,7 @@ Changelog
    - :ref:`outlier_detection`: outlier and novelty detection, by
      `Virgile Fritsch`_.
 
-   - :ref:`kernel_approximation`: a transform implement kernel
+   - :ref:`kernel_approximation`: a transform implementing kernel
      approximation for fast SGD on non-linear kernels by
      `Andreas Müller`_.
 
@@ -86,6 +86,8 @@ Changelog
    - :class:`sklearn.cross_validation.ShuffleSplit` can subsample the train
      sets as well as the test sets by `Olivier Grisel`_.
 
+   - Errors in the build of the documentation fixed by `Andreas Müller`_.
+
 
 API changes summary
 -------------------

diff --git a/examples/applications/plot_species_distribution_modeling.py b/examples/applications/plot_species_distribution_modeling.py
@@ -206,6 +206,5 @@ def plot_species_distribution(species=["bradypus_variegatus_0",
     print "\ntime elapsed: %.2fs" % (time() - t0)
 
 
-if __name__ == '__main__':
-    plot_species_distribution()
-    pl.show()
+plot_species_distribution()
+pl.show()
diff --git a/examples/plot_multilabel.py b/examples/plot_multilabel.py
@@ -41,7 +41,7 @@ def plot_hyperplane(clf, min_x, max_x, linestyle, label):
     # get the separating hyperplane
     w = clf.coef_[0]
     a = -w[0] / w[1]
-    xx = np.linspace(min_x, max_x)
+    xx = np.linspace(min_x - 5, max_x + 5)  # make sure the line is long enough
     yy = a * xx - (clf.intercept_[0]) / w[1]
     pl.plot(xx, yy, linestyle, label=label)
 
@@ -81,27 +81,28 @@ def plot_subfigure(X, Y, subplot, title, transform):
     pl.xticks(())
     pl.yticks(())
 
-    if subplot == 1:
+    if subplot == 2:
+        pl.xlim(min_x - 5, max_x)
         pl.xlabel('First principal component')
         pl.ylabel('Second principal component')
-        pl.legend(loc="upper right")
+        pl.legend(loc="upper left")
 
 
-pl.figure(figsize=(13, 6))
+pl.figure(figsize=(8, 6))
 
 X, Y = make_multilabel_classification(n_classes=2, n_labels=1,
                                       allow_unlabeled=True,
-                                      random_state=0)
+                                      random_state=1)
 
 plot_subfigure(X, Y, 1, "With unlabeled samples + CCA", "cca")
 plot_subfigure(X, Y, 2, "With unlabeled samples + PCA", "pca")
 
 X, Y = make_multilabel_classification(n_classes=2, n_labels=1,
                                       allow_unlabeled=False,
-                                      random_state=0)
+                                      random_state=1)
 
 plot_subfigure(X, Y, 3, "Without unlabeled samples + CCA", "cca")
 plot_subfigure(X, Y, 4, "Without unlabeled samples + PCA", "pca")
 
-pl.subplots_adjust(.04, .07, .97, .90, .09, .2)
+pl.subplots_adjust(.04, .02, .97, .94, .09, .2)
 pl.show()