# 20 Sklearn Features You Didn't Know About | P(Guarantee) = 0.75
## 1. EllipticalEnvelope, 3. ExtraTrees, 7. Perceptron, 11. IsolationForest, 13. RobustScaler
![](images/unsplash.jpg)
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://unsplash.com/@alevisionco?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText'>alevision.co</a>
        on 
        <a href='https://unsplash.com/s/photos/secret?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText'>Unsplash</a>
    </strong>
</figcaption>

### Setup

In [15]:
import warnings

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import rcParams

rcParams["figure.figsize"] = [12, 9]
rcParams["figure.autolayout"] = True
rcParams["xtick.labelsize"] = 15
rcParams["ytick.labelsize"] = 15
rcParams["legend.fontsize"] = "small"

warnings.filterwarnings("ignore")

np.random.seed(1121218)

### Introduction

### 1. [covariance.EllipticEnvelope](https://scikit-learn.org/stable/modules/generated/sklearn.covariance.EllipticEnvelope.html#sklearn.covariance.EllipticEnvelope)

It is common for distributions to have outliers. There are many algorithms that deal with outliers and `EllipticalEnvelope` is an example that is directly built-in to Sklearn. The advantage of this algorithm is that it performs exceptionally well at detecting outliers in normally distributed (Gaussian) features:

In [18]:
import numpy as np
from sklearn.covariance import EllipticEnvelope

# Create a sample normal distribution
X = np.random.normal(loc=5, scale=2, size=50).reshape(-1, 1)

# Fit the estimator
ee = EllipticEnvelope(random_state=0)
_ = ee.fit(X)

# Test
test = np.array([6, 8, 20, 4, 5, 6, 10, 13]).reshape(-1, 1)

# predict returns 1 for an inlier and -1 for an outlier
ee.predict(test)

array([ 1,  1, -1,  1,  1,  1, -1, -1])

To test the estimator, we are creating a normal distribution with a mean of 5 and a standard deviation of 2. After it is trained, we pass some random numbers to its `predict` method. The method returns -1 for outliers in the `test` - 20, 10, 13.

### 2. feature_selection.RFECV

Feature selection is an invaluable skill in machine learning. Selecting the features that help with predictions the most is a must step to combat overfitting and reduce model complexity. There are various techniques for feature selection that work based on the properties of each feature or the relationships between them. However, there is a more robust algorithm offered by Sklearn - Recursive Feature Elimination (RFE). It automatically finds the most important features by using cross-validation and discards the rest.

An advantage of this estimator is that it is a wrapper - it can be used around any Sklearn algorithm that returns feature importance or coefficient scores. Here is an example on a synthetic dataset:

In [19]:
from sklearn.datasets import make_regression
from sklearn.feature_selection import RFECV
from sklearn.linear_model import BayesianRidge

# Build a synthetic dataset
X, y = make_regression(n_samples=10000, n_features=15, n_informative=10)

# Init/fit the selector
rfecv = RFECV(estimator=BayesianRidge(), cv=5)
_ = rfecv.fit(X, y)
# Transform the feature array
rfecv.transform(X).shape

(10000, 10)

The fake dataset has 15 features, 10 of which are informative, the rest are redundant. We are fitting 5-fold RFECV with `BayesianRidge` as an estimator. After training, you can use the `transform` method to discard the redundant features. Calling `.shape` shows us that the estimator managed to drop all 5 unnecessary features.

I have wrote an [entire article](https://towardsdatascience.com/powerful-feature-selection-with-recursive-feature-elimination-rfe-of-sklearn-23efb2cdb54e?source=your_stories_page-------------------------------------) on this algorithm that covers the nitty-gritty details of how it works with a real-world dataset.

### 3. ensemble.ExtraTrees

Random Forests is one of the most powerful algorithms in classic machine learning. It uses multiple decision trees and reduces overfitting by averaging the prediction scores. Each tree is trained on a random subset of the samples and during the construction of each tree, random subset of features are chosen to generate the node splits. By using these two sources of randomness, RF tries to decrease variance because its main disadvantage is that it tends to overfit.  

Regardless, in RF, the risk of overfitting is still high. Therefore, Sklearn offers a drop-in alternative to RF called ExtraTrees (both classifier and regressor). The word 'extra' does not mean more trees but more randomness. The algorithm uses another type of trees that closely resemble decision trees. The only difference is that instead of calculating the split thresholds while building each tree, these thresholds are drawn randomly for each feature and the best of these thresholds is chosen as a splitting rule. This allows to reduce the variance a bit at the cost of slight increase in bias:

In [23]:
from sklearn.ensemble import ExtraTreesRegressor, RandomForestRegressor
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeRegressor

X, y = make_regression(n_samples=10000, n_features=20)

# Decision trees
clf = DecisionTreeRegressor(max_depth=None, min_samples_split=2, random_state=0)
scores = cross_val_score(clf, X, y, cv=5)
scores.mean()

0.6376080094392635

In [21]:
# Random Forest
clf = RandomForestRegressor(
    n_estimators=10, max_depth=None, min_samples_split=2, random_state=0
)
scores = cross_val_score(clf, X, y, cv=5)
scores.mean()

0.8446103607404536

In [22]:
# ExtraTrees
clf = ExtraTreesRegressor(
    n_estimators=10, max_depth=None, min_samples_split=2, random_state=0
)
scores = cross_val_score(clf, X, y, cv=5)
scores.mean()

0.8737373931608834

### 4. impute.IterativeImputer and KNNImputer

### 5. linear_model.HuberRegressor

### 6. tree.plot_tree

### 7. linear_model.Perceptron

### 8. feature_selection.SelectFromModel

### 9. model_selection.TimeSeriesSplit

### 10. metrics.ConfusionMatrixDisplay

### 11. ensemble.IsolationForest

### 12. OVR and OVO

### 13. preprocessing.RobustScaler

### 14. Generalized Linear Models

### 15. compose.make_column_selector

### 16. compose.make_column_transformer

### 17. preprocessing.OrdinalEncoder

### 18. metrics.get_scorer

### 19. utils.class_weight.compute_class_weight

###  20. model_selection.HalvingGrid and HalvingRandomizedSearchCV

### 21. preprocessing.PowerTransformer

### 22. sklearn.utils