## **Feature Engineering**

**Q1. What is a parameter?**

* A parameter is a numerical value that describes a characteristic of a population in statistics.

**Key Points:**
* A population includes all possible observations (e.g., all employees in a company).

* A parameter is a fixed value that summarizes something about this population.

**Q2. What is correlation?**

*  Correlation measures the strength and direction of the linear relationship between two quantitative variables.

* A negative correlation means that as one variable increases, the other decreases — they move in opposite directions.

**Key Characteristics:**
*	Correlation coefficient (r) is less than 0 and can range from –1 to 0.
*	R = − 1: Perfect negative linear relationship
*	R = 0: No linear relationship
*	The stronger the negative value (closer to –1), the stronger the inverse relationship.

**Examples:**
**1. Temperature and heating bills**

* As temperature goes up, heating bills go down → negative correlation.

**2. Speed and travel time (for fixed distance)**

* As speed increases, time to reach destination decreases → negative correlation.

**3. Absences and test scores**

* More absences from class often lead to lower test scores.

**Q3. Define Machine Learning. What are the main components in Machine Learning?**

*  Machine Learning (ML) is a field of artificial intelligence (AI) that focuses on building algorithms and statistical models that enable computers to learn from data and make predictions or decisions without being explicitly programmed.

**Main Components of Machine Learning**

**1. Data**

* The raw input used to train and test models.

* Can be structured (tables) or unstructured (images, text, audio).

* Quality and quantity of data heavily affect model performance.

**2. Model**
* The mathematical representation or function that maps inputs to outputs.

* Examples: decision trees, neural networks, support vector machines.

* The model learns patterns in the data during training.

**3. Algorithm**
* The procedure or method used to train the model.

* It adjusts the model's parameters to minimize error.

* Examples: gradient descent, k-means, backpropagation.

**4. Training**
* The process of feeding data into the algorithm to let the model learn.

* Involves updating model parameters to reduce prediction errors.

**5. Evaluation (Testing)**
* Assessing how well the trained model performs on unseen data.

* Metrics used include accuracy, precision, recall, F1 score, RMSE, etc.

**6. Features**
* Individual measurable properties or characteristics used as input.

* Good feature selection and engineering improve model effectiveness.

**7. Labels (for supervised learning)**
* The correct outputs or answers provided during training.

* Used to guide the learning process in supervised learning.

**8. Loss Function / Objective Function**
* Measures the difference between predicted output and actual output.

* Guides the model on how to adjust its parameters.



**Q4. How does loss value help in determining whether the model is good or not?**

**How the Loss Value Helps Evaluate a Model**

* The loss value (or loss function output) is a key indicator of how well a machine learning model is performing during training or evaluation. It measures the difference between the model's predicted output and the true (actual) output.
*   List item

**What is a Loss Function?**
* A loss function quantifies the error for a single prediction (or a batch of predictions). The goal of training is to minimize the loss value, meaning the model's predictions are getting closer to the correct answers.

**Common examples:**

* Mean Squared Error (MSE) for regression

* Cross-Entropy Loss for classification

**Why the Loss Value Matters**

**1. Indicates model accuracy (indirectly):**
* A low loss means predictions are close to the target values → model is learning well.

* A high loss suggests large errors → model is underperforming.

**2. Guides learning during training:**
* Optimizers (like Gradient Descent) use the loss gradient to update model weights.

* If loss decreases steadily → the model is improving.

**3. Helps compare models:**
* You can use loss to compare multiple models or configurations.

* Lower loss on validation/test data typically indicates a better model.

**Q5. What are continuous and categorical variables?**

**1. Continuous Variables**
* A continuous variable can take any value within a given range, including fractions and decimals. These variables are typically measured rather than counted.

**Key Features:**
* Infinite or very large number of possible values

* Can be divided meaningfully (e.g., 3.5 kg, 172.2 cm)

**Examples:**
* Height (e.g., 170.5 cm)

* Temperature (e.g., 36.7°C)

* Weight (e.g., 68.2 kg)

* Time (e.g., 2.45 seconds)

* Age (when measured precisely, e.g., 23.75 years)

**2. Categorical Variables**
* A categorical variable (also called a qualitative variable) takes on a limited, fixed number of categories or groups. These values represent labels rather than quantities.

**Key Features:**
* Data falls into distinct groups

* Categories may be nominal (no order) or ordinal (with a logical order)

**Examples:**
* **Nominal (no natural order):**

* Gender: Male, Female

* Color: Red, Blue, Green

* Marital status: Single, Married, Divorced

* **Ordinal (with order):**

* Education level: High School, Bachelor’s, Master’s, PhD

* Satisfaction rating: Low, Medium, High



**Q6. How do we handle categorical variables in Machine Learning? What are the common techniques?**

*  Handling categorical variables is a crucial step in preparing data for machine learning models, as most models require numerical input. Here are the most common techniques for dealing with categorical variables:

**1. Label Encoding**
* **Description:** Assigns each unique category a different integer.

* **use:** Useful for ordinal variables (where order matters, e.g., "low", "medium", "high").

**Example:**

    ["red", "blue", "green"] → [0, 1, 2]
    
**Downside:** For nominal variables, the model may assume a relationship between the encoded values.

**2. One-Hot Encoding**
* **Description:** Converts each category into a new binary column (1 or 0).

* **use:** For nominal (unordered) categorical variables.

* **Example:**

      mathematica
      ["Red",  "Green",  "Blue"] →
      Red  Green   Blue
      1      0      0
      0      1      0
      0      0      1

* **Downside:** Can lead to high dimensionality (curse of dimensionality) for variables with many categories.

**3. Binary Encoding**
* **Description:** Converts categories to binary code and splits digits into separate columns.

* **use:** When the number of categories is large.
* **Example:**

      0 → 000, 1 → 001, 2 → 010, etc.
*  **Advantage:** Reduces dimensionality compared to one-hot encoding.

**4. Target Encoding (Mean Encoding)**
* **Description:** Replaces each category with the average of the target variable for that category.

* **use:** Often used in competitions or high-cardinality features.

* **Example:** If "City" = "New York" corresponds to 80% default rate, then "New York" → 0.8.

* **Risk:** Can cause data leakage; requires cross-validation techniques to avoid it.

**5. Frequency / Count Encoding**
* **Description:** Replaces categories with their frequency or count.

* **Example:**

      bash

      {"A": 100 times, "B": 50 times, "C": 10 times}
* Becomes:

      css

      A → 100, B → 50, C → 10

* **use**: When you suspect that frequency correlates with the target.

**6. Hashing Encoding (Feature Hashing)**

* **Description:** Hashes categories into a fixed number of columns using a hash function.

* **use:** Large datasets with high-cardinality features (e.g., log data).

* **Downside:** Potential for collisions (different values map to the same hash).

**Q7. What do you mean by training and testing a dataset?**

*  Here's a clear breakdown:

**1. Training a Dataset**
* **Definition:** The process of feeding the model with data it uses to learn patterns, relationships, or rules.

* **Goal:** To build the model by adjusting its internal parameters (like weights in a neural network or splits in a decision tree) based on the input features and the known output (target variable).

* **Example:** If you're predicting house prices, the training data would include house features (size, location, number of rooms) and the actual prices.

**2. Testing a Dataset**
* **Definition:** The process of evaluating the trained model on a separate set of data that it hasn't seen before.

* **Goal:** To assess how well the model generalizes to new, unseen data (i.e., performance in the real world).

* **Example:** After training the house price model, you test it on a new set of houses to see how accurately it predicts their prices.

**Q8. What is sklearn.preprocessing?**

*  sklearn.preprocessing is a module in scikit-learn, a popular Python machine learning library. This module provides tools and utilities to transform and prepare your data before feeding it into a machine learning model.


**Why is preprocessing important?**
* Real-world data can be messy: it might have different scales, missing values, or categorical variables.

* Many machine learning algorithms perform better if the data is scaled, normalized, or encoded properly.

* Preprocessing helps clean, transform, and format the data into a suitable form for ML models.

**Q9. What is a Test set?**

* A test set is a portion of your dataset that is set aside and not used during model training. Instead, it is used after the model has been trained to evaluate how well the model performs on unseen data.


**Key points about the Test Set:**
* Purpose: To estimate the model’s ability to generalize — that is, to make accurate predictions on data it has never seen before.

* Not used during training: The model doesn’t learn from this data; it only uses it to check performance.

* Helps detect overfitting: If a model performs great on training data but poorly on the test set, it’s likely overfitting (memorizing training data rather than learning general patterns).

* Usually a fixed percentage of the original dataset: Common splits are 20–30% for testing and the rest for training.

In [None]:
# Q10. How do we split data for model fitting (training and testing) in Python?

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Load dataset
X, y = load_iris(return_X_y=True)

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=1
)

# Fit model on training data
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Evaluate on test data
accuracy = model.score(X_test, y_test)
print("Test Accuracy:", accuracy)


Test Accuracy: 0.9736842105263158


**How do you approach a Machine Learning problem?**

* Approaching a machine learning (ML) problem systematically ensures better results and helps avoid common pitfalls. Here's a structured step-by-step approach to solving an ML problem:

**1. Understand the Problem**
* Clarify the objective: Classification? Regression? Clustering?

* Know the target variable: What are you predicting?

* Business understanding: Why does this problem matter?

**2. Collect and Explore the Data**
* Get the raw data: CSV, database, API, etc.

* Perform Exploratory Data Analysis (EDA):

 *  Summary statistics

  * Data types and structure

  * Class imbalance, missing values

  * Visualizations (histograms, scatterplots, boxplots)

**3. Preprocess the Data**
* Handle missing values (drop, fill, impute)

* Encode categorical variables (LabelEncoder, OneHotEncoder, etc.)

* Scale/normalize numerical features (StandardScaler, MinMaxScaler)

* Split data into training and testing sets

* Feature engineering:

  * Combine, transform, or create new features

  * Remove irrelevant or redundant features

**4. Choose the Right Model**
* Based on problem type and data:

 * Classification: Logistic Regression, Random Forest, SVM, XGBoost

 * Regression: Linear Regression, Decision Trees, Gradient Boosting

 * Clustering: KMeans, DBSCAN

* Start with a simple model, then try more complex ones

**5. Train the Model**
* Use the training data to fit the model:

       python

       model.fit(X_train, y_train)

**6. Evaluate the Model**
* Use the test set (unseen data)

* Choose the right evaluation metric:

 * Classification: Accuracy, Precision, Recall, F1-score, ROC-AUC

 * Regression: MAE, RMSE, R²

* Visualize performance: confusion matrix, residual plots, ROC curves

**7. Tune the Model**
* Use cross-validation (e.g., KFold)

* Tune hyperparameters (e.g., with GridSearchCV, RandomizedSearchCV)

* Try ensemble methods (e.g., Bagging, Boosting)

**8. Test and Validate Final Model**
* Evaluate on the final test set (if using validation split earlier)

* Watch for overfitting or underfitting

**9. Deploy the Model**
* Package the model (e.g., joblib, pickle)

* Create an API using Flask/FastAPI or deploy via cloud platforms

* Monitor performance in production

**10. Iterate and Improve**
* Collect feedback

* Update data

* Re-train and re-evaluate

**Q11. Why do we have to perform EDA before fitting a model to the data?**

*  Here's why EDA matters:

**1. Understand Your Data**
* Get a sense of data types, range of values, and data distributions.

* Understand what features you're working with and how they relate to the target variable.

* Example: You might discover that the "Age" column has outliers or that the "Gender" column is categorical.

**2. Identify and Handle Data Quality Issues**
* Missing values: Should you fill them, drop them, or flag them?

* Outliers: Do they need treatment, or are they meaningful?

* Incorrect data types: Strings stored as numbers, dates stored as text, etc.

* Example: A column may look numerical but actually represents categories.

**3. Discover Patterns and Relationships**
* Uncover correlations and feature importance.

* Visualize relationships between features and target variables (scatter plots, boxplots, heatmaps, etc.).

* Example: A strong correlation between Experience and Salary could be a valuable predictor.

**4. Detect Data Imbalances**
* In classification, check if classes are imbalanced (e.g., 90% "No", 10% "Yes").

* Imbalanced data can mislead accuracy and affect model performance.

* Example: You may need techniques like SMOTE or stratified sampling.

**5. Inform Preprocessing Decisions**
* EDA helps you decide:

  * Which features to scale or normalize

 * Which features to encode

 * Which ones to drop

* Example: Categorical features need encoding; skewed distributions may need transformation.

**6. Choose the Right Model and Metric**
* Some models assume normality, no multicollinearity, etc.

* EDA guides your choice of algorithms and evaluation metrics.

* Example: If the target is heavily skewed, using Mean Absolute Error (MAE) may be better than RMSE.

**Q12. What is correlation?**

* Correlation is a statistical measure that describes the strength and direction of the relationship between two variables.

**Example**

     Hours Studied	Exam Score
     1	              50
     2	              60
     3	              70
     4	              80

**Q13. What does negative correlation mean?**

*  A negative correlation means that as one variable increases, the other decreases — they move in opposite directions.

**In Simple Terms:**
* When X goes up, Y goes down — and vice versa.

**Example:**

    Hours Watching TV	Exam Score
          5                50
          4                60
          3                70
          2                80
          1                90

* As time spent watching TV increases, exam score decreases → ❗️Negative correlation.



**Q14. How can you find correlation between variables in Python?**

*  Here are some ways to find correlation:


**1. Using Pandas .corr() method**
* If you have a DataFrame, this computes the correlation matrix between all numerical columns:

      import pandas as pd

      # Example DataFrame
      df = pd.DataFrame({
          'age': [25, 32, 47, 51, 62],
          'income': [50000, 60000, 80000, 90000, 100000],
          'expenses': [20000, 25000, 30000, 35000, 40000]
      })

      # Correlation matrix
      corr_matrix = df.corr()
      print(corr_matrix)

**2. Correlation between two specific variables**

    corr = df['age'].corr(df['income'])
    print("Correlation between age and income:", corr)

**3. Using NumPy's corrcoef**

    import numpy as np

    corr_coef = np.corrcoef(df['age'], df['income'])[0, 1]
    print("Correlation coefficient:", corr_coef)

**4. Visualizing correlation matrix with Seaborn heatmap**

    import seaborn as sns
    import matplotlib.pyplot as plt

    sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
    plt.show()



**Q15. What is causation? Explain difference between correlation and causation with an example.**

* Causation means that one variable directly affects another — a cause-and-effect relationship.
In other words:

* If X causes Y, then changing X will change Y.


**Difference between correlation and causation**

**Correlation**

* **Definition:** A mutual relationship between variables

* **Direction:** 	No direction implied

* **Implied Cause:**  No

* **Testing:**  Statistical measures (e.g., Pearson)

* **Example:**	Ice cream ↑ and drowning ↑

**Causation**

* **Definition:** A direct cause-effect relationship

* **Direction:** One variable directly affects the other

* **Implied Cause:** Yes

* **Testing:** Requires controlled experiments or domain knowledge

* **Example:** Smoking → Lung cancer

**Q16. What is an Optimizer? What are different types of optimizers? Explain each with an example.**

*  An optimizer is an algorithm used to adjust the model’s parameters (like weights and biases) to minimize the loss function. In simple terms, it helps your model learn from data by making it better at predicting the correct output.


**Types of Optimizers**

**1. Gradient Descent (Batch Gradient Descent)**
*	Uses the entire dataset to compute the gradient before updating the parameters.
* Update Rule:

    *θ = θ – α ⋅ ∇ J (θ)*

*	Pros: Stable convergence
*	Cons: Slow for large datasets
* Example (Conceptual):

      # No specific library function; this is manually implemented in basic ML models

**2. Stochastic Gradient Descent (SGD)**
* Updates weights for each training example (sample).

* Introduces noise, which can help escape local minima.

* Example (PyTorch):

      optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

* Pros: Faster updates, handles large datasets

* Cons: More fluctuation, less stable

**3. Mini-Batch Gradient Descent**
* Uses a subset of the data (batch) for each update.

* A balance between batch GD and SGD.

* Example:
* Usually controlled by setting the batch_size in your DataLoader in PyTorch or Keras.

      # Mini-batch handled by DataLoader or training loop

**4. Momentum**
* Helps accelerate SGD by using a velocity vector to smooth updates.

* Reduces oscillation in high-curvature areas.

* Example:

      optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

**5. Adagrad (Adaptive Gradient Algorithm)**
* Adjusts learning rate for each parameter individually based on past gradients.

* Good for sparse data.

* Example:

      optimizer = torch.optim.Adagrad(model.parameters(), lr=0.01)
* Cons: Learning rate decays too much over time.

**6. RMSprop (Root Mean Square Propagation)**
* Like Adagrad but with a moving average of squared gradients to prevent decay of learning rate.

* Works well for non-stationary problems like RNNs.

* Example:

      optimizer = torch.optim.RMSprop(model.parameters(), lr=0.001)

**7. Adam (Adaptive Moment Estimation)**
* Combines ideas from Momentum and RMSprop.

* Maintains running averages of both gradients and squared gradients.

* Example:

      optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

* Pros: Fast, reliable, and widely used

* Best default choice for most deep learning tasks

**8. AdamW (Weight Decay Regularization)**
* A variant of Adam that decouples weight decay from the gradient update.

* Better generalization for deep models (especially transformers).

* Example:

      optimizer = torch.optim.AdamW(model.parameters(), lr=0.001, weight_decay=0.01)


**Q17. What is sklearn.linear_model?**

* sklearn.linear_model is a module in Scikit-learn that contains classes and functions for building linear models — models that make predictions based on linear relationships between features and the target variable.

**What Does It Do?**
* It provides tools to solve:

* Regression problems (predicting continuous values)

* Classification problems (predicting categories)



**Q18. What does model.fit() do? What arguments must be given?**

*  The .fit() method is used to train a machine learning model on your data. It finds the best parameters (like weights or coefficients) that minimize the loss function and allow the model to make accurate predictions.

**What Happens When You Call .fit()?**
* Takes your input data (features) and target values (labels)

* Computes how well the model is doing (using a loss function)

* Optimizes model parameters to minimize the error

* Stores the trained model parameters internally

**Required Arguments:**


| Argument | Description                          | Required |
|----------|--------------------------------------|----------|
| `X`      | Feature matrix (inputs), shape: `(n_samples, n_features)` | ✅ Yes |
| `y`      | Target labels/values (outputs), shape: `(n_samples,)`     | ✅ Yes |

---

##  Example: Using `LinearRegression`

```python
from sklearn.linear_model import LinearRegression

model = LinearRegression()

# X = features, y = target
model.fit(X_train, y_train)


**Q19. What does model.predict() do? What arguments must be given?**

* The model.predict() method is used after training a model (using fit()), and it is used to make predictions on new or unseen input data.

* In simple terms: predict() uses the model's learned patterns to output predictions based on input features.

**Arguments for model.predict()**


| Argument | Description                          | Required |
|----------|--------------------------------------|----------|
| `X`      | Input features for prediction, shape: `(n_samples, n_features)` | ✅ Yes |

---

##  Example:

```python
# Assuming model is already trained via model.fit()
predictions = model.predict(X_test)


**Q20. What are continuous and categorical variables?**

**Continuous Variables**
* **Definition:** Variables that can take any numeric value within a range.

* Usually measured quantities.

* Values are often decimals/floats, not just integers.

**Examples:**
 * Height (e.g., 170.5 cm)

 * Temperature (e.g., 23.8°C)

 * Weight (e.g., 65.2 kg)

 * Time taken (e.g., 12.45 seconds)


**Categorical Variables**
* **Definition:** Variables that represent categories or groups.

* Usually qualitative (non-numeric) or numeric codes representing groups.

* Limited number of possible values (called levels or classes).

**Types of Categorical Variables:**
* Nominal: No natural order (e.g., color: red, blue, green)

* Ordinal: Have a meaningful order (e.g., rating: low, medium, high)

**Examples:**
* Gender (male, female)

* Color (red, blue, green)

* Education level (high school, bachelor, master)

* Product category (electronics, clothing, groceries)



**Q21. What is feature scaling? How does it help in Machine Learning?**

* Feature scaling is the process of normalizing or standardizing numerical features so they share a common scale, without distorting differences in the ranges of values.

**How does feature scaling help in Machine Learning?**

**1. Faster convergence in optimization algorithms**
* Algorithms like Gradient Descent perform better when features are on a similar scale, speeding up training.

**2. Improved accuracy**
* Models that rely on distance or magnitude (like KNN, SVM, or K-means) perform better with scaled data.

**3. Prevents bias toward features with large values**
* Without scaling, features with bigger ranges might dominate the learning process.

**4. Makes coefficients more interpretable**
* In linear models, scaling helps to compare feature importances directly.

**Q22. How do we perform scaling in Python?**

**Step-by-Step: How to Perform Feature Scaling in Python**

**1. Import the necessary scaler**

Scikit-learn provides different scalers for different types of scaling:

* StandardScaler – for standardization (mean = 0, std = 1)

* MinMaxScaler – for normalization (scale to [0, 1])

* RobustScaler – for scaling with outliers

* MaxAbsScaler – scales based on max absolute value

**2. Example with StandardScaler (most common)**

    from sklearn.preprocessing import StandardScaler
    import pandas as pd

    # Example data
    data = {'height': [150, 160, 170, 180],
            'weight': [50, 60, 65, 80]}

    df = pd.DataFrame(data)

    # Create the scaler
    scaler = StandardScaler()

    # Fit and transform the data
    scaled_data = scaler.fit_transform(df)

    # Convert back to DataFrame
    scaled_df = pd.DataFrame(scaled_data, columns=df.columns)

    print(scaled_df)

**3. Using MinMaxScaler (scale to [0, 1])**

    from sklearn.preprocessing import MinMaxScaler

    scaler = MinMaxScaler()
    scaled_data = scaler.fit_transform(df)

    scaled_df = pd.DataFrame(scaled_data, columns=df.columns)
    print(scaled_df)

**Q23. What is sklearn.preprocessing?**

* sklearn.preprocessing is a module in the Scikit-learn library that provides a variety of data preprocessing tools used to prepare your dataset before feeding it into a machine learning model.


**Q24. How do we split data for model fitting (training and testing) in Python?**

*  In Python, the most common way to split data for model fitting — into training and testing sets — is by using the train_test_split() function from scikit-learn. This function randomly splits arrays or matrices into training and testing subsets.


* Here's a step-by-step example:

**1. Import the necessary modules**

       from sklearn.model_selection import train_test_split

**2. Assume you have your features X and target y**
* For example:

       X = df.drop('target_column', axis=1)  # Features
       y = df['target_column']              # Target

**3. Split the data**

       X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

**Parameters:**
*  test_size=0.2: 20% of the data will be used for testing.

* random_state=42: Ensures reproducibility by fixing the random seed.

**4. Now you can train your model on X_train, y_train and test it on X_test, y_test.**

**Optional Parameters:**
* shuffle=True: Whether or not to shuffle the data before splitting (default is True).

* stratify=y: Ensures that the proportion of classes is maintained (useful for classification tasks).

**Example with stratification:**


     X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=1)


**Q25. Explain data encoding?**

**What is Data Encoding?**
* Data encoding is the process of converting categorical data (non-numeric, like labels or text) into a numeric format so that it can be used in machine learning models, which typically require numerical input.


**Why is Encoding Important?**
* Machine learning algorithms (like linear regression, decision trees, or neural networks) cannot process strings or categorical variables directly. They need numerical representations.

* For example, you can't feed ['red', 'green', 'blue'] into a model. These must be encoded numerically first.

**Common Data Encoding Techniques**

**1. Label Encoding**
* Converts each category into a unique integer.

**Example:**

    from sklearn.preprocessing import LabelEncoder

    le = LabelEncoder()
    colors = ['red', 'green', 'blue', 'green']
    encoded = le.fit_transform(colors)
    print(encoded)  # [2 1 0 1]

**2. One-Hot Encoding**
* Creates a binary column for each category and marks the presence with 1 and absence with 0.

**Example:**

    import pandas as pd

    df = pd.DataFrame({'Color': ['Red', 'Green', 'Blue', 'Green']})
    encoded_df = pd.get_dummies(df, columns=['Color'])
    print(encoded_df)

**Output:**

       Color_Blue  Color_Green  Color_Red
    0           0            0          1
    1           0            1          0
    2           1            0          0
    3           0            1          0

**3. Ordinal Encoding**
* Maps each category to an integer based on its order.

**Example:**

    from sklearn.preprocessing import OrdinalEncoder

    oe = OrdinalEncoder(categories=[['Low', 'Medium', 'High']])
    data = [['Low'], ['Medium'], ['High']]
    encoded = oe.fit_transform(data)


**4. Binary Encoding / Target Encoding / Hashing Encoding**
* Used for high-cardinality features (e.g., country names, product codes). * These are more advanced and useful when:

* Too many unique values for one-hot encoding

* Want to reduce dimensionality

**Example libraries:** category_encoders (not in sklearn)