Given a set of features, you want to reduce the number of features while
retaining the variance in the data

Use principal component analysis with scikit’s PCA



In [1]:
# Load libraries
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn import datasets

In [2]:
# Load the data
digits = datasets.load_digits()
# Standardize the feature matrix
features = StandardScaler().fit_transform(digits.data)
# Create a PCA that will retain 99% of variance
pca = PCA(n_components=0.99, whiten=True)
# Conduct PCA
features_pca = pca.fit_transform(features)
# Show results
print("Original number of features:", features.shape[1])
print("Reduced number of features:", features_pca.shape[1])

Original number of features: 64
Reduced number of features: 54


Principal component analysis (PCA) is a popular linear dimensionality reduction
technique. PCA projects observations onto the (hopefully fewer) principal
components of the feature matrix that retain the most variance. PCA is an
unsupervised technique, meaning that it does not use the information from the
target vector and instead only considers the feature matrix.
For a mathematical description of how PCA works, see the external resources
listed at the end of this recipe. However, we can understand the intuition behind

PCA using a simple example. In the following figure, our data contains two
features, x1 and x2
. Looking at the visualization, it should be clear that
observations are spread out like a cigar, with a lot of length and very little height.
More specifically, we can say that the variance of the “length” is significantly
greater than the “height.” Instead of length and height, we refer to the
“directions” with the most variance as the first principal component and the
“direction” with the second-most variance as the second principal component
(and so on).
If we wanted to reduce our features, one strategy would be to project all
observations in our 2D space onto the 1D principal component. We would lose
the information captured in the second principal component, but in some
situations that would be an acceptable trade-off. This is PCA.
PCA is implemented in scikit-learn using the pca method. n_components has
two operations, depending on the argument provided. If the argument is greater
than 1, n_components will return that many features. This leads to the question
of how to select the number of features that is optimal. Fortunately for us, if the
argument to n_components is between 0 and 1, pca returns the minimum
amount of features that retain that much variance. It is common to use values of
0.95 and 0.99, meaning 95% and 99% of the variance of the original features has
been retained, respectively. whiten=True transforms the values of each principal
component so that they have zero mean and unit variance. Another parameter
and argument is svd_solver="randomized", which implements a stochastic
algorithm to find the first principal components in often significantly less time.
The output of our solution shows that PCA let us reduce our dimensionality by
10 features while still retaining 99% of the information (variance) in the feature
matrixs

![](./pics/observations.jpg)


