### Feature Engineering
- It is a process of creating, transforming (or) selecting the most relevant variables(features) from the raw data to improve model performance.
- Effective feature help model capture important patterens and relatinships in the data.
- It directly contributes to model building in the following ways :
- 1. Reduce noise and irrelevant information, improving prediction accuracy.
  2. Helps overfitting by emphasizing meaningful data signals.
  3. Well-designed features allow models to learn complex patterns more effectively.
#### Types of various techniques
#### 1. Scaling :
- It rescales each feature by dividing all values by the maximum absoulte value of the feature.
- This ensure that range from -1 to 1.
- It is highly sensitive to outliers which can max skew the max absoulte value and negatively impact scaling quality.
#### Performing Absolute Maximum Scaling :
  - np.max(np.abs(df), axis=0) - Computes max absolute value per column.
  - Divide each value by that max absolute value.
  - Finally print scaled rows by using df.head().

In [2]:
import numpy as np
import pandas as pd
df=pd.read_csv('Housing.csv')
df.head()

Unnamed: 0,LotArea,MSSubClass
0,8450,60
1,9600,20
2,11250,60
3,9550,70
4,14260,60


In [3]:
max_abs = np.max(np.abs(df), axis=0)
scaled_df = df / max_abs
scaled_df.head()

Unnamed: 0,LotArea,MSSubClass
0,0.039258,0.315789
1,0.0446,0.105263
2,0.052266,0.315789
3,0.044368,0.368421
4,0.06625,0.315789


#### 2. Min_Max Scaling
- It transforms features by subtracting the minimum value by dividing by the difference between the maximun and minimum.
- This method maps features to specific ranges commonly 0 to 1.
- Perserving the original distribution shape but is still affected by outliers due to reliance on extreme values.
- Formula : x = x - min(x) / max(x) - min(x)
#### Performing Min-Max Scaling
- MinMaxScaler - Creates MinMaxScaler object to scale features to range.
- scaler.fit_transform(df) - Fits scaler to data and transform.
- df.columns() - Converts result to DataFrame maintaining column names.
- df.head() - Print first 5 rows of DataFrame.

In [6]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_data, columns = df.columns)
scaled_df.head()

Unnamed: 0,LotArea,MSSubClass
0,0.03342,0.235294
1,0.038795,0.0
2,0.046507,0.235294
3,0.038561,0.294118
4,0.060576,0.235294


#### 3. Normalization (Vector Normalization)
- It refers to the process of transforming vectors so that they are unit length (magnitude 1 ) while preserving their direction.
- This mainly focus on direction of data points rather than magnitude making it usefull to algorthims where angle (or) cosine
  similarity in relevant, such as text classifaction (or) clustering.
- Formula - v(normalization) = v(i) / |v|.
  where v(i) - Each individual value.
#### Performing Normalization 
- normalize() - In sckit learn, Which can handle multiple vector simultaneously.
- np.linalg.norm() - In NumPy function.

In [7]:
from sklearn.preprocessing import Normalizer
scaler = Normalizer()
scaled_data = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_data, columns=df.columns)
scaled_df.head()

Unnamed: 0,LotArea,MSSubClass
0,0.999975,0.0071
1,0.999998,0.002083
2,0.999986,0.005333
3,0.999973,0.00733
4,0.999991,0.004208


#### 4. Standardization
- It centers features by subtracting the mean and scales them by dividing the standard deviation, transforming by zero mean and unit varience.
- This assumptions normal distribution often benefits models like logistic, linear regression and neural networks by improving convergence speed and stability.
- Transforms data zero mean and standard deviation 1.
#### Performing Standardization

In [8]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler_data = scaler.fit_transform(df)
scaler_df = pd.DataFrame(scaled_data, columns = df.columns)
scaled_df.head()

Unnamed: 0,LotArea,MSSubClass
0,0.999975,0.0071
1,0.999998,0.002083
2,0.999986,0.005333
3,0.999973,0.00733
4,0.999991,0.004208


#### 5. Robust Scaling
- It is highly suitable when the dataset contains extreme values (or) noise.
- It uses median and inter quartile range(IQR) instead of mean and standard deviation making the transformation robust and skewed distributions.
- Formula - X(robust) -  X(i) - Q(0.5) / Q(0.75) - Q(0.25).
#### Performing Robust Scaling


In [9]:
from sklearn.preprocessing import RobustScaler
scaler = RobustScaler()
scaler_data = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_data, columns = df.columns )
scaled_df.head()

Unnamed: 0,LotArea,MSSubClass
0,0.999975,0.0071
1,0.999998,0.002083
2,0.999986,0.005333
3,0.999973,0.00733
4,0.999991,0.004208
