# Dimensionality Reduction in Python

High-dimensional datasets can be overwhelming and leave you not knowing where to start. Typically, you’d visually explore a new dataset first, but when you have too many dimensions the classical approaches will seem insufficient. Fortunately, there are visualization techniques designed specifically for high dimensional data. After exploring the data, you’ll often find that many features hold little information because they don’t show any variance or because they are duplicates of other features; detect these features and drop them from the dataset so that you can focus on the informative ones.  In a next step, you might want to build a model on these features, and it may turn out that some don’t have any effect on the thing you’re trying to predict. You’ll learn how to detect and drop these irrelevant features too, in order to reduce dimensionality and thus complexity. Finally, you’ll learn how feature extraction techniques can reduce dimensionality for you through the calculation of uncorrelated principal components.

### Exploring High Dimensional Data
Learn the difference between feature selection and feature extraction and will apply both techniques for data exploration.
* **Dimensionality:** the number of columns in your dataset (assuming that you have a tidy dataset)
* **Tidy data set:** Every column represents a variable or feature and every row represents an observation or instance of each variable.
* **High-dimensional:** When you have many columns, or features, in your dataset; high-dimensionality indicates complexity.
* **Note:** by default, `.describe()` ignores the non-numeric columns in a dataset; we can tell describe to do the opposite, by passing the argument `exclude='number'`; or, `df.describe(exclude='number')`; we will then get summary statistics adapted to non-numeric data

* Becoming familiar with the shape of your dataset and the properties of the features within it, is a crucial step you should take before you make the decision to reduce dimensionality

#### Methods for reducing dimensionality:
* Drop columns with little to no variance (when you are looking to determine differences among observations in a dataset)