# __Features and Feature Engineering__

__Features__ represent the transformed raw data.A feature—also called a dimension—is an input variable used to generate model predictions. 
__Feature Engineering__ preprocesses raw data into a machine-readable format. It optimizes ML model performance by transforming and selecting relevant features.

#### __Dimensionality Reduction__

It can be described as a combination of both feature selection and feature extraction. Essentially we can lower the data dimensionality using both techniques. 

- When applying feature selection, working with a subset of original data indicates that we will have fewer features compared to the original dataset.
- When applying feature extraction, although the no. of newly constructed features might be the same as the original, we usually use only the significant ones.
- In particular, evaluating the retained variants in the dataset helps us select those features that preserve the most information for our data.

## __Feature engineering techniques__

### __Feature transformation__

Feature transformation is the process of converting one feature type into another, more readable form for a particular model. This consists of transforming continuous into categorical data, or vice-versa.

#### __Binning__

This technique essentially transforms continuous, numerical values into categorical features. Specifically, binning compares each value to the neighborhood of values surrounding it and then sorts data points into a number of bins. A rudimentary example of binning is age demographics, in which continuous ages are divided into age groups.

- Once values have been placed into bins, one can further smooth the bins by means, medians or boundaries.
- Smoothing bins replaces a bin’s contained values with bin-derived values.
- Binning creates categorical values from continuous ones.
- Smoothing bins is a form of local smoothing meant to reduce noise in input data.

#### __One-hot encoding__

This is the inverse of binning; it creates numerical features from categorical variables. One-hot encoding maps categorical features to binary representations, which are used to map the feature in a matrix or vector space. 

- Bag of words models are an example of one-hot encoding frequently used in natural language processing tasks.
- Another example of one-hot encoding is spam filtering classification in which the categories spam and not spam are converted to 1 and 0 respectively.

#### __Feature Selection__

Feature selection is the process of reducing the number of features used in various types of machine learning models, in a way that leaves the predictive
ability of the data preserved. It is the processing of selecting a subset of variables in order to create a new model with the purpose of reducing multicollinearity, and so maximize model generalizability and optimization.

- When dealing with real-world data, we can encounter hundreds or thousands of features that exceed the no. of observations. This leads to a high-dimensional dataset.
- A set with many features probably means that the volume is large but the data points are very few and far apart.
- This can impact the performance of the machine learning algorithm, making it unreliable.
- In such cases, we can try to remove the irrelevant features without losing essential information from the dataset.
- Hence, reducing the features can boost the training accuracy of the model, hence its predictive power.

#### __Feature Extraction__

Feature Extraction is the process of transforming existing features into new ones. In essence, we are not using a subset of the original features. Instead, we create new ones, a linear combination of the originals. Here, also the main goal is to capture the same information with fewer features.

The difference between __feature selection__ and __feature extraction__ is that the first technique retains a subset of the original features while feature extraction generates new features.

### __Feature scaling__

Certain features have upper and lower bounds intrinsic to data that limits possible feature values, such as time-series data or age. But in many cases, model features may not have a limitation on possible values, and such large feature scales (being the difference between a features lowest and highest values) can negatively affect certain models. 

- Feature scaling (sometimes called feature normalization) is a standardization technique to rescale features and limit the impact of large scales on models.12
- While feature transformation transforms data from one type to another, feature scaling transforms data in terms of range and distribution, maintaining its original data type.

#### __Min-max scaling__

Min-max scaling rescales all values for a given feature so that they fall between specified minimum and maximum values, often 0 and 1. Each data point’s value for the selected feature (represented by x) is computed against the decided minimum and maximum feature values, min(x) and max(x) respectively, which produces the new feature value for that data point (represented by $x̃$ ).

<img src="https://www.ibm.com/content/dam/connectedassets-adobe-cms/worldwide-content/creative-assets/s-migr/ul/g/e4/b2/screen2.jpg" width=250 />

#### __Z-score scaling__

Also known as standardization and variance scaling. Whereas min-max scaling scales feature values to fit within designated minimum and maximum values, z-score scaling rescales features so that they have a shared standard deviation of 1 with a mean of 0.

<img src="https://www.ibm.com/content/dam/connectedassets-adobe-cms/worldwide-content/creative-assets/s-migr/ul/g/b4/c0/screen1.jpg" width=250 />