<a href="https://colab.research.google.com/github/Jerin2004/Lect-28-CIPHER-SCHOOL-/blob/main/Feature_Engineering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Handling Missing Values

### Definition
Handling missing values is crucial in data preprocessing. Common methods include imputation (replacing missing values with the mean, median, mode, or a specific value) and removing rows or columns with missing values.

### Example Table Data

| Feature1 | Feature2 | Feature3 |
|----------|----------|----------|
| 1.0      | 2.0      | NaN      |
| 2.0      | NaN      | 3.0      |
| NaN      | 4.0      | 3.5      |
| 4.0      | 5.0      | 4.0      |
| 5.0      | NaN      | 4.5      |

In this table, some values are missing and need to be handled.

In [None]:
import pandas as pd
from sklearn.impute import SimpleImputer

# Sample data
data = {
    'Feature1': [1.0, 2.0, None, 4.0, 5.0],
    'Feature2': [2.0, None, 4.0, 5.0, None],
    'Feature3': [None, 3.0, 3.5, 4.0, 4.5]
}
df = pd.DataFrame(data)

# Handling missing values
imputer = SimpleImputer(strategy='mean')
df_imputed = pd.DataFrame(imputer.fit_transform(df), columns=df.columns)
print("After Imputation:\n", df_imputed)

# 2. Encoding Categorical Variables

### Definition
Encoding categorical variables involves converting categorical data into a numerical format that can be used by machine learning algorithms. Common methods include one-hot encoding and label encoding.

### Example Table Data

| Color |
|-------|
| Red   |
| Blue  |
| Green |
| Blue  |
| Red   |

In this table, 'Color' is a categorical variable that needs to be encoded.

In [None]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

# Sample data
data = {
    'Color': ['Red', 'Blue', 'Green', 'Blue', 'Red']
}
df = pd.DataFrame(data)

# Encoding categorical variables
encoder = OneHotEncoder(sparse=False)
encoded_categories = encoder.fit_transform(df[['Color']])
df_encoded = pd.DataFrame(encoded_categories, columns=encoder.get_feature_names_out(['Color']))
df = pd.concat([df, df_encoded], axis=1).drop('Color', axis=1)
print("After One-Hot Encoding:\n", df)

# 3. Feature Scaling

### Definition
Feature scaling involves normalizing or standardizing features so that they have a similar scale. Common methods include min-max scaling and standardization (z-score normalization).

### Example Table Data

| Feature1 | Feature2 |
|----------|----------|
| 10       | 100      |
| 20       | 200      |
| 30       | 300      |
| 40       | 400      |
| 50       | 500      |

In this table, 'Feature1' and 'Feature2' have different scales and need to be normalized.

In [None]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Sample data
data = {
    'Feature1': [10, 20, 30, 40, 50],
    'Feature2': [100, 200, 300, 400, 500]
}
df = pd.DataFrame(data)

# Feature scaling
scaler = MinMaxScaler()
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
print("After Min-Max Scaling:\n", df_scaled)

# 4. Feature Creation

### Definition
Feature creation involves generating new features from existing ones to improve the predictive power of machine learning models. Common methods include polynomial features and interaction terms.

### Example Table Data

| Feature1 | Feature2 |
|----------|----------|
| 1        | 2        |
| 2        | 3        |
| 3        | 4        |
| 4        | 5        |
| 5        | 6        |

In this table, new features can be created from 'Feature1' and 'Feature2'.

In [None]:
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures

# Sample data
data = {
    'Feature1': [1, 2, 3, 4, 5],
    'Feature2': [2, 3, 4, 5, 6]
}
df = pd.DataFrame(data)

# Feature creation
poly = PolynomialFeatures(degree=2, include_bias=False)
poly_features = poly.fit_transform(df)
df_poly = pd.DataFrame(poly_features, columns=poly.get_feature_names_out(['Feature1', 'Feature2']))
print("After Creating Polynomial Features:\n", df_poly)