# üìå What is Feature Engineering?

- Feature Engineering is the process of **creating, modifying, or selecting variables (features) from raw data** to make machine learning models more effective. Instead of feeding raw data directly into models, we transform it into features that help the model learn better patterns.

# ü§î Why Feature Engineering Matters:

üöÄ **Improves model performance** ‚Äî more meaningful inputs help predictions. 

üß© **Highlights patterns** that raw data might hide. 

üìâ **Reduces noise & overfitting** by removing irrelevant data. 

üîç **Improves interpretability** ‚Äî easier to explain model outputs.

**- What is a Feature?** 

-   A feature is a **measurable property** or characteristic of the data used as input to an ML model (e.g., age, income, text tokens).

# Classification: 

1. **Feature Transformation** : transforming feature from one to another
2. **Feature Construction** : Adding a feature for better performance
3. **Feature Selection** : select important feature for our model
4. **Feature Extraction** : extract feature according to model


# 1. üîÑ **Feature Transformation**

- It includes:

    1. Missing value imputation<br>
    2. Handling categorical values<br>
    3. Outlier detection<br>
    4. Feature scaling:<br>
        1. Standardization<br>
        2. Normalization

#### 1. Missing value imputation

- There are many methods for missing value.

- like get a mean, median or most frequent categorical data etc according to the model, this all are called imputation.


#### 2. Handling categorical values

- We do One hot encoding

- In this, category is represented by a separate column with a 1 indicating its presence and 0s for all other categories

#### 3. Outlier detection

- We can find outliers with plotting graph 

- like boxplot

#### 4. Feature scaling

- It is a **preprocessing technique used in machine learning to transform the values of numerical features to a common scale.**

- two methods scaling and normalization we show it in detail on tomorrow.

# 2. üÜï **Feature Construction**

- It is the process of **creating new input features (variables) from existing data** to improve the performance of machine learning models.

- Feature construction involves:

    - Combining existing features (e.g., creating interaction terms).

    - Transforming features (e.g., log, square, or polynomial transformations).

    - Creating new features from domain knowledge (e.g., extracting date parts from a timestamp).

    - Encoding categorical variables in meaningful ways .


**Example**: Feature Construction in a Dataset

- Suppose you have a dataset of **house sales** with the following columns:

    `bedrooms` (number of bedrooms)

    `bathrooms` (number of bathrooms)

    `area` (total area in square feet)

    `year_built` (year the house was built)


- **Original Features**:

    `bedrooms`

    `bathrooms`

    `area`

    `year_built`


- **Constructed Features**:

    - Total rooms: `bedrooms + bathrooms` (captures overall room count).

    - Area per room: `area / (bedrooms + bathrooms)` (captures space per room).

    - House age: **2026 - year_built** (captures how old the house is).

    - Is recent: 1 if (2026 - year_built) <= 10 else 0 (captures if the house is recently built).

These new features may help the model better understand the relationship between house characteristics and price, potentially improving predictions .

# 3. üßπ **Feature Selection**

- **Remove redundant or irrelevant features to reduce model complexity**.

**Example**: Feature Selection in a Dataset

- Suppose you have a dataset for predicting house prices with the following features:

    `bedrooms`

    `bathrooms`

    `area`

    `age`

    `garage`

    `distance_to_city`

    `color (e.g., red, blue, green)`


- Feature Selection Process:

    - **Irrelevant Feature**: color is unlikely to affect house price, so it can be removed.

    - **Redundant Feature**: If garage and area are highly correlated (e.g., all houses with a garage have large area), you might keep only one.

    - **Relevant Features**: `bedrooms`, `bathrooms`, `area`, `age`, and `distance_to_city` are likely to be important for predicting price.

- After selection, your model uses only the most relevant features: `bedrooms`, `bathrooms`, `area`, `age`, and `distance_to_city`.

* **?Why Is Feature Selection Important**?

    - Improves model accuracy and generalization.

    - Reduces computational cost and training time.

    - Helps avoid overfitting by removing noise.

    - Makes models simpler and easier to understand

# 4. üîé **Feature Extraction**

- Feature extraction is the process of **transforming raw data into a set of new, more informative features** that can be used by machine learning models.