## Intro. 
- This is often one of the most valuable tasks a data scientist can do to improve model performance, for 3 big reasons:
    - You can isolate and highlight key information, which helps your algorithms "focus" on what’s important.
    - You can bring in your own domain expertise.
    - Most importantly, once you understand the "vocabulary" of feature engineering, you can bring in other people’s domain expertise!

## Interaction features
- The first of these heuristics is checking to see if you can create any interaction features that make sense. These are combinations of two or more features.
- By the way, in some contexts, "interaction terms" must be products between two variables. In our context, interaction features can be products, sums, or differences between two features.
- **EXAMPLE**
    - Let's say we already had a feature called 'num_schools', i.e. the number of schools within 5 miles of a property.
    - Let's say we also had the feature 'median_school', i.e. the median quality score of those schools.
    - However, we might suspect that what's really important is having many school options, but only if they are good.
    - We could simple create a new feature 'school_score' = 'num_schools' x 'median_school'b

## Sparse classes
- Sparse classes (in categorical features) are those that have very few total observations. They can be problematic for certain machine learning algorithms, causing models to be overfit.
    - There's no formal rule of how many each class needs.
    - It also depends on the size of your dataset and the number of other features you have.
    - As a rule of thumb, we recommend combining classes until each one has at least ~50 observations. As with any "rule" of thumb, use this as a guideline (not actually as a rule).
- **EXAMPLE**
    - We might want to group 'Wood Siding', 'Wood Shingle', and 'Wood' into a single class. In fact, let's just label all of them as 'Wood'.
    - We'd group 'Concrete Block', 'Stucco', 'Masonry', 'Other', and 'Asbestos shingle' into just 'Other'.
- After combining sparse classes, we have fewer unique classes, but each one has more observations.
- Often, an eyeball test is enough to decide if you want to group certain classes together.   

## Dummy Variables
- Most machine learning algorithms cannot directly handle categorical features. Specifically, they cannot handle text values.
- We need to create dummy variables for our categorical features.
- Dummy variables are a set of binary (0 or 1) variables that each represent a single class from a categorical feature.
- The information you represent is exactly the same, but this numeric representation allows you to pass the technical requirements for algorithms.

## Remove unused
- Remove unused or redundant features from the dataset.
- Unused features are those that don’t make sense to pass into our machine learning algorithms. 
- **EXAMPLE**
    - ID columns
    - Features that wouldn't be available at the time of prediction
    - Other text descriptions
- Redundant features would typically be those that have been replaced by other features that you’ve added during feature engineering.