Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/local outlier factor #164

Merged
merged 12 commits into from Sep 23, 2019
1 change: 1 addition & 0 deletions docs/modules/datasets.rst
Expand Up @@ -17,6 +17,7 @@ The following functions are used to retrieve specific functional datasets:
skfda.datasets.fetch_medflies
skfda.datasets.fetch_weather
skfda.datasets.fetch_aemet
skfda.datasets.fetch_octane

Those functions return a dictionary with at least a "data" field containing the
instance data, and a "target" field containing the class labels or regression values,
Expand Down
15 changes: 11 additions & 4 deletions docs/modules/exploratory/outliers.rst
Expand Up @@ -4,12 +4,15 @@ Outlier detection
Functional outlier detection is the identification of functions that do not seem to behave like the others in the
dataset. There are several ways in which a function may be different from the others. For example, a function may
have a different shape than the others, or its values could be more extreme. Thus, outlyingness is difficult to
categorize exactly as each outlier detection method looks at different features of the functions in order to
categorize exactly as each outlier detection method looks at different features of the functions in order to
identify the outliers.

Each of the outlier detection methods in scikit-fda has the same API as the outlier detection methods of
`scikit-learn <https://scikit-learn.org/stable/modules/outlier_detection.html>`_.

Interquartilic Range Outlier Detector
------------------------------------

One of the most common ways of outlier detection is given by the functional data boxplot. An observation is marked
as an outlier if it has points :math:`1.5 \cdot IQR` times outside the region containing the deepest 50% of the curves
(the central region), where :math:`IQR` is the interquartilic range.
Expand All @@ -18,19 +21,23 @@ as an outlier if it has points :math:`1.5 \cdot IQR` times outside the region co
:toctree: autosummary

skfda.exploratory.outliers.IQROutlierDetector



DirectionalOutlierDetector
--------------------------

Other more novel way of outlier detection takes into account the magnitude and shape of the curves. Curves which have
a very different shape or magnitude are considered outliers.

.. autosummary::
:toctree: autosummary

skfda.exploratory.outliers.DirectionalOutlierDetector

For this method, it is necessary to compute the mean and variation of the directional outlyingness, which can be done
with the following function.

.. autosummary::
:toctree: autosummary

skfda.exploratory.outliers.directional_outlyingness_stats
skfda.exploratory.outliers.directional_outlyingness_stats
9 changes: 5 additions & 4 deletions skfda/_neighbors/base.py
Expand Up @@ -97,11 +97,11 @@ def multivariate_metric(x, y, _check=False, **kwargs):
class NeighborsBase(ABC, BaseEstimator):
"""Base class for nearest neighbors estimators."""

@abstractmethod
def __init__(self, n_neighbors=None, radius=None,
weights='uniform', algorithm='auto',
leaf_size=30, metric='l2', metric_params=None,
n_jobs=None, multivariate_metric=False):
"""Initializes the nearest neighbors estimator"""

self.n_neighbors = n_neighbors
self.radius = radius
Expand Down Expand Up @@ -166,6 +166,7 @@ def fit(self, X, y=None):
metric = lp_distance
else:
metric = self.metric

sklearn_metric = _to_multivariate_metric(metric,
self._sample_points)
else:
Expand Down Expand Up @@ -203,7 +204,7 @@ def kneighbors(self, X=None, n_neighbors=None, return_distance=True):
Indices of the nearest points in the population matrix.

Examples:
Firstly, we will create a toy dataset with 2 classes
Firstly, we will create a toy dataset.

>>> from skfda.datasets import make_sinusoidal_process
>>> fd1 = make_sinusoidal_process(phase_std=.25, random_state=0)
Expand Down Expand Up @@ -260,7 +261,7 @@ def kneighbors_graph(self, X=None, n_neighbors=None, mode='connectivity'):
A[i, j] is assigned the weight of edge that connects i to j.

Examples:
Firstly, we will create a toy dataset with 2 classes.
Firstly, we will create a toy dataset.

>>> from skfda.datasets import make_sinusoidal_process
>>> fd1 = make_sinusoidal_process(phase_std=.25, random_state=0)
Expand Down Expand Up @@ -329,7 +330,7 @@ def radius_neighbors(self, X=None, radius=None, return_distance=True):
within a ball of size ``radius`` around the query points.

Examples:
Firstly, we will create a toy dataset with 2 classes.
Firstly, we will create a toy dataset.

>>> from skfda.datasets import make_sinusoidal_process
>>> fd1 = make_sinusoidal_process(phase_std=.25, random_state=0)
Expand Down
8 changes: 6 additions & 2 deletions skfda/_neighbors/classification.py
Expand Up @@ -59,8 +59,9 @@ class KNeighborsClassifier(NeighborsBase, NeighborsMixin, KNeighborsMixin,
Doesn't affect :meth:`fit` method.
multivariate_metric : boolean, optional (default = False)
Indicates if the metric used is a sklearn distance between vectors (see
:class:`sklearn.neighbors.DistanceMetric`) or a functional metric of
the module :mod:`skfda.misc.metrics`.
:class:`~sklearn.neighbors.DistanceMetric`) or a functional metric of
the module `skfda.misc.metrics` if ``False``.

Examples
--------
Firstly, we will create a toy dataset with 2 classes
Expand Down Expand Up @@ -96,6 +97,7 @@ class KNeighborsClassifier(NeighborsBase, NeighborsMixin, KNeighborsMixin,
:class:`~skfda.ml.regression.KNeighborsRegressor`
:class:`~skfda.ml.regression.RadiusNeighborsRegressor`
:class:`~skfda.ml.clustering.NearestNeighbors`


Notes
-----
Expand Down Expand Up @@ -254,6 +256,7 @@ class RadiusNeighborsClassifier(NeighborsBase, NeighborsMixin,
:class:`~skfda.ml.regression.RadiusNeighborsRegressor`
:class:`~skfda.ml.clustering.NearestNeighbors`


Notes
-----
See Nearest Neighbors in the sklearn online documentation for a discussion
Expand Down Expand Up @@ -358,6 +361,7 @@ class and return a :class:`FData` object with only one sample
:class:`~skfda.ml.regression.RadiusNeighborsRegressor`
:class:`~skfda.ml.clustering.NearestNeighbors`


"""

def __init__(self, metric='l2', mean='mean'):
Expand Down