### Q1. What is the Filter method in feature selection, and how does it work?

The **Filter method** is a feature selection technique that selects relevant features for a model based on their statistical characteristics without involving any machine learning algorithm. It evaluates the importance of each feature using various statistical tests and selects the best ones according to a ranking criterion.

#### How it Works:
1. **Ranking Features**: Each feature is evaluated independently, usually based on a correlation measure, chi-square test, mutual information, or some other statistical test.
2. **Selection Criteria**: Features are ranked according to their scores, and only the top-ranked features are selected.
3. **Threshold**: A threshold is set to determine how many features to select. It can be based on a fixed number of top features or a cutoff value for the scores.
4. **Model Agnostic**: This method does not depend on any specific machine learning model, which makes it computationally efficient.

#### Commonly Used Filter Methods:
- **Correlation Coefficient**: Measures the correlation between each feature and the target variable.
- **Chi-square Test**: Assesses the independence of a feature with respect to the target class.
- **Mutual Information**: Quantifies the amount of information obtained about one variable through another.

#### Advantages:
- **Fast and Computationally Efficient**: Since it only involves statistical tests, it is less resource-intensive compared to other feature selection methods.
- **Model Agnostic**: It works independently of the machine learning model being used.

#### Disadvantages:
- **Ignores Feature Interaction**: Since features are evaluated independently, this method does not account for interactions between features.

In summary, the Filter method is useful for quickly identifying important features but may miss some intricate relationships between features.


### Q2. How does the Wrapper method differ from the Filter method in feature selection?

The **Wrapper method** and **Filter method** are both techniques for feature selection, but they differ in how they select features and their approach to evaluating feature importance.

#### Key Differences:

| **Feature**          | **Filter Method**                                          | **Wrapper Method**                                                                 |
|----------------------|------------------------------------------------------------|------------------------------------------------------------------------------------|
| **Evaluation Approach** | Uses statistical measures to evaluate each feature independently of any model. | Uses a machine learning model to evaluate feature subsets by training and testing the model. |
| **Speed**            | Faster since it does not involve model training.           | Slower because it repeatedly trains the model with different feature subsets.      |
| **Feature Interaction** | Does not consider interactions between features.        | Takes into account interactions between features as it evaluates subsets of features. |
| **Algorithm Dependence** | Model-agnostic (works without needing a specific model).| Model-specific (requires training a model to assess the quality of feature subsets). |
| **Computational Cost** | Low computational cost, as it only relies on simple statistical tests. | High computational cost due to the need for multiple training iterations.           |
| **Performance**      | May not always select the best features for a given model. | Usually provides better results because it is model-specific and considers feature combinations. |

#### How the Wrapper Method Works:
1. **Subset Selection**: Different subsets of features are created using algorithms such as forward selection, backward elimination, or exhaustive search.
2. **Model Training**: A machine learning model is trained on each subset of features.
3. **Performance Evaluation**: The performance of the model is evaluated on a validation set, typically using metrics like accuracy or F1-score.
4. **Best Subset**: The subset that gives the best model performance is selected.

#### Examples of Wrapper Methods:
- **Forward Selection**: Start with no features and add features one by one, evaluating the model each time.
- **Backward Elimination**: Start with all features and remove them one by one, evaluating the model each time.
- **Recursive Feature Elimination (RFE)**: Iteratively remove the least important features based on model coefficients or importance scores.

#### Advantages of Wrapper Method:
- **Better Accuracy**: Since it evaluates feature subsets with the actual machine learning model, it often provides better results for model performance.
- **Feature Interaction**: Accounts for interactions between features, which can lead to better feature selection.

#### Disadvantages of Wrapper Method:
- **Computationally Expensive**: Requires multiple rounds of model training and testing, which makes it slower and resource-intensive.
- **Overfitting Risk**: Since the method is model-dependent, there is a higher risk of overfitting to the training data.

In conclusion, while the **Filter method** is faster and simpler, the **Wrapper method** generally provides better results by considering the model's performance but at the cost of higher computational resources.


### Q3. What are some common techniques used in Embedded feature selection methods?

**Embedded feature selection methods** are techniques that perform feature selection during the process of model training. These methods combine the advantages of both **Filter** and **Wrapper** methods by being less computationally expensive than Wrappers while still taking the learning algorithm into account like Wrappers do.

#### Common Techniques in Embedded Feature Selection:

1. **Regularization Techniques**:
   Regularization methods introduce a penalty term to the loss function that helps shrink or remove less important features.
   
   - **Lasso (L1 Regularization)**: Adds an L1 penalty term to the loss function, which can shrink coefficients of less important features to zero, effectively performing feature selection.
     - **Example**: Lasso regression.
   - **Ridge (L2 Regularization)**: Adds an L2 penalty to reduce the magnitude of feature coefficients but does not eliminate them completely. While this is not a feature selection method, it can help in reducing the influence of less important features.
     - **Example**: Ridge regression.
   - **Elastic Net**: A combination of L1 and L2 regularization that can both shrink and eliminate features. This method provides a balance between Lasso and Ridge regression.
     - **Example**: Elastic Net regression.
   
2. **Decision Trees and Tree-Based Methods**:
   Tree-based models inherently perform feature selection by choosing features that best split the data at each node, ranking them based on their importance.
   
   - **Decision Trees**: Select features based on information gain (in classification) or reduction in variance (in regression) at each node.
   - **Random Forests**: Use feature importance scores based on how often features are used to split the data across multiple trees.
   - **Gradient Boosting Machines (GBM)**: Build trees sequentially, and the features that contribute most to reducing the error in the model are considered more important.
   
3. **Recursive Feature Elimination (RFE)**:
   This method recursively removes the least important features based on the model's coefficients or feature importance. It combines the benefits of a Wrapper approach but is more efficient because it's integrated with the model training process.
   
   - **Example**: Recursive feature elimination with support vector machines (SVM), decision trees, or linear models.
   
4. **Feature Importance from Coefficients**:
   Some models like linear regression, support vector machines (SVM), and logistic regression have coefficients that indicate feature importance.
   
   - **Linear Models**: In linear regression or logistic regression, the magnitude of the coefficients indicates the importance of features.
   - **Support Vector Machines (SVM)**: The magnitude of the coefficients in the linear SVM can be used to rank feature importance.
   
5. **L1-Based SVM**:
   Similar to Lasso regression, L1-regularized SVM performs feature selection by penalizing the absolute value of feature weights, forcing some of them to be zero.

#### Advantages of Embedded Methods:
- **Less Computational Cost**: Since the feature selection is done during the model training, it is more efficient compared to Wrapper methods.
- **Model-Specific**: These methods integrate feature selection directly into the model-building process, which often results in better performance.

#### Disadvantages:
- **Model Dependence**: These methods are tied to specific models, so the feature selection may not generalize well to other models.
  
Embedded feature selection methods provide a balanced approach by incorporating the learning process into feature selection, leading to more efficient and effective models.


### Q4. What are some drawbacks of using the Filter method for feature selection?

The **Filter method** is a popular technique for feature selection due to its simplicity and speed, but it has some drawbacks that can limit its effectiveness in certain situations. Below are some of the key drawbacks:

#### 1. **Ignores Feature Interactions**:
   - The Filter method evaluates each feature **independently** of the others. This means it does not consider potential interactions or dependencies between features.
   - In complex datasets, features may have little relevance individually, but in combination, they could significantly impact the model's performance. The Filter method would fail to capture such relationships.

#### 2. **Model-Agnostic Nature**:
   - Since the Filter method does not involve any machine learning model, it selects features based purely on their intrinsic characteristics, like correlation or variance.
   - The features selected may not necessarily be the best ones for the specific machine learning model being used, as the method is not tailored to any specific learning algorithm.

#### 3. **Risk of Selecting Redundant Features**:
   - Filter methods may select features that are correlated with each other, leading to **redundancy** in the selected feature set.
   - For example, multiple features that are highly correlated with each other could carry similar information, but the Filter method might still select all of them because it does not account for redundancy.

#### 4. **Less Accurate Compared to Wrapper or Embedded Methods**:
   - Since it does not evaluate feature subsets with the machine learning model, the **accuracy** of the selected features may be lower compared to **Wrapper** or **Embedded** methods.
   - While it is computationally efficient, the selected features may not yield the best performance for a model, especially in complex tasks.

#### 5. **Depends Heavily on Statistical Tests**:
   - The effectiveness of the Filter method is largely determined by the statistical test used (e.g., correlation, chi-square). These tests may not always capture the most important features, especially when the relationships between the features and the target variable are non-linear or complex.
   
#### 6. **May Lead to Overfitting or Underfitting**:
   - If the selected features are not well-suited to the model or the problem at hand, it may lead to **overfitting** (selecting too many irrelevant features) or **underfitting** (discarding important features).
   - This happens because the Filter method doesn’t directly optimize the model’s performance.

#### 7. **Inconsistent with Model's Objective**:
   - The objective of many machine learning models is to minimize loss or maximize accuracy. However, the Filter method ranks features based on statistical metrics like correlation, which may not align well with the model’s objective.

#### Summary of Drawbacks:
- Ignores feature interactions.
- Model-agnostic, not tailored to specific models.
- Can select redundant features.
- Less accurate compared to Wrapper/Embedded methods.
- Heavily dependent on statistical tests.
- May lead to overfitting or underfitting.
- Inconsistent with model objectives.

While the **Filter method** is a quick and efficient approach to feature selection, its limitations make it less suitable for more complex datasets and models that require a deeper understanding of feature interactions.


### Q5. In which situations would you prefer using the Filter method over the Wrapper method for feature selection?

The **Filter method** is preferred over the **Wrapper method** in certain situations where computational efficiency and simplicity are more critical than finding the optimal set of features. Below are some key scenarios where the Filter method is more appropriate:

#### 1. **When Dealing with High-Dimensional Data**:
   - In cases where the dataset has a **large number of features** (high-dimensional data), the Filter method is preferred due to its ability to handle large feature spaces quickly.
   - Example: In fields like **genomics** or **text classification**, datasets may have thousands of features (genes or words), and applying a computationally expensive Wrapper method would be impractical.

#### 2. **When Computational Resources are Limited**:
   - The Filter method is computationally cheaper and faster because it does not require training a machine learning model multiple times.
   - For resource-constrained environments, where **computational power**, **time**, or **memory** is limited, the Filter method is ideal.

#### 3. **When You Need a Fast, Preliminary Feature Selection**:
   - Filter methods are useful when you need to perform a **quick, initial feature selection** to reduce the dimensionality of the dataset before applying more sophisticated techniques.
   - Example: You may use the Filter method as a first step to eliminate irrelevant features and then apply more computationally expensive methods (like Wrapper or Embedded) on the reduced feature set.

#### 4. **When the Focus is on Interpretability**:
   - Since the Filter method selects features based on their statistical properties, the selected features are easier to interpret, especially when using simple metrics like correlation.
   - This makes the Filter method preferable when you need to explain feature selection to **non-technical stakeholders** or in areas like **scientific research**, where interpretability is critical.

#### 5. **When Building Simple or Baseline Models**:
   - For **baseline models** or when building a quick prototype, the Filter method provides an efficient way to select relevant features without the need for extensive computation.
   - It allows you to create a reasonably good model without spending too much time on feature selection.

#### 6. **When Overfitting is a Concern**:
   - The Wrapper method can sometimes lead to **overfitting** because it directly optimizes the feature set for the model's performance on the training data.
   - The Filter method is less prone to overfitting as it is not influenced by the model’s learning process and relies on statistical properties that are less sensitive to the training data.

#### 7. **When Model-Agnostic Feature Selection is Required**:
   - Since the Filter method is **model-agnostic**, it is suitable when you want to select features independently of the machine learning algorithm.
   - Example: When you are experimenting with different types of models (e.g., SVM, Random Forest, Logistic Regression), the Filter method provides a consistent set of features that can be used across different models.

#### 8. **When Reducing Noise from the Dataset**:
   - The Filter method can help remove noisy, irrelevant, or redundant features, especially in cases where there is **collinearity** between features. This improves model performance without the need for complex evaluations.
   - Example: In sensor data or text data, where many features may be noisy or irrelevant, the Filter method can help reduce noise.

#### Summary of Situations to Prefer Filter Method:
- High-dimensional datasets with many features.
- Limited computational resources or time constraints.
- Need for quick, preliminary feature selection.
- Emphasis on interpretability of selected features.
- Building simple or baseline models.
- Avoiding overfitting due to model-specific optimization.
- Model-agnostic feature selection across different algorithms.
- Reducing noise and irrelevant features in the dataset.

In conclusion, the **Filter method** is best suited for situations where **speed**, **simplicity**, and **scalability** are more important than finding the most optimal feature subset. It is ideal for high-dimensional data, limited resources, and when the focus is on initial feature reduction or model-agnostic selection.


### Q6. In a telecom company, you are working on a project to develop a predictive model for customer churn. You are unsure of which features to include in the model because the dataset contains several different ones. Describe how you would choose the most pertinent attributes for the model using the Filter Method.

When developing a predictive model for **customer churn** in a telecom company, selecting the most relevant features is crucial for building an accurate and efficient model. The **Filter method** provides a quick and model-agnostic way to choose pertinent attributes. Here's a step-by-step approach for using the Filter method in this scenario:

#### 1. **Understand the Problem and Dataset**:
   - **Target Variable**: The target variable is **churn** (whether a customer leaves the telecom service or not), typically represented as a binary outcome (e.g., 0 = no churn, 1 = churn).
   - **Feature Variables**: The dataset might contain various features such as:
     - **Customer Demographics**: Age, gender, income level.
     - **Service Usage**: Number of calls, data usage, SMS usage, etc.
     - **Contract Information**: Contract duration, subscription type (monthly, yearly), number of services subscribed.
     - **Billing and Payment Data**: Monthly bill amount, payment method, number of missed payments.
     - **Customer Support Interactions**: Number of support tickets, time to resolve issues, etc.

#### 2. **Preprocessing the Data**:
   Before applying any feature selection technique, it's essential to preprocess the data:
   - **Handle Missing Values**: Impute missing values or remove rows/columns with excessive missing data.
   - **Encode Categorical Variables**: Convert categorical features (e.g., payment method, subscription type) into numerical form using one-hot encoding or label encoding.
   - **Normalize or Standardize Data**: Depending on the nature of the features, normalize or standardize them to bring them to a comparable scale.

#### 3. **Select Relevant Statistical Metrics**:
   Choose appropriate statistical measures depending on the types of features (numerical or categorical) and the relationship to the target variable (binary in this case).

   - **For Numerical Features**: Use **correlation coefficients** (such as Pearson correlation) to measure the linear relationship between numerical features and the target variable (churn).
   - **For Categorical Features**: Use the **chi-square test** to assess the association between categorical features (e.g., subscription type, payment method) and the churn variable.
   - **For Both Types of Features**: Use **mutual information** to measure the dependency between each feature (numerical or categorical) and churn. Mutual information works well for both linear and non-linear relationships.

#### 4. **Apply the Filter Method**:
   - **Step 1**: Calculate the selected statistical metric (e.g., correlation, chi-square, or mutual information) for each feature in relation to the target variable (churn).
   - **Step 2**: Rank the features based on their scores. Features with higher scores have a stronger relationship with the target variable and are more likely to be important for predicting churn.
   - **Step 3**: Select the top features based on a predefined threshold. This can be a fixed number of top-ranked features or a score-based cutoff (e.g., selecting features with correlation > 0.3 or chi-square p-value < 0.05).

#### 5. **Interpret the Results**:
   Once the top features are selected, interpret the results to ensure they make business sense:
   - **High correlation features**: Features like contract length, payment history, and service usage might have strong correlations with churn. These would be highly relevant to include in the model.
   - **Low correlation features**: Features like customer gender or phone model might have weak correlations with churn and can be discarded.

#### 6. **Validate the Selection**:
   After filtering the features, it's important to validate the feature selection process:
   - **Cross-validation**: Use cross-validation techniques to ensure that the selected features lead to stable model performance across different data splits.
   - **Reassess with Wrapper or Embedded Methods**: After the Filter method reduces the feature set, you can apply Wrapper or Embedded methods to fine-tune the feature selection.

#### 7. **Advantages of Using the Filter Method in This Case**:
   - **Speed**: The Filter method is computationally efficient and well-suited for large telecom datasets with potentially hundreds of features.
   - **Independence from the Model**: It allows you to perform a quick selection without having to train multiple models, making it a good first step in feature selection.

#### Example Workflow Using the Filter Method:
```python
# Step 1: Calculate correlation between numerical features and target variable (churn)
import pandas as pd
from sklearn.feature_selection import mutual_info_classif

# Assuming 'data' is the telecom dataset and 'churn' is the target variable
# Calculate mutual information between each feature and churn
X = data.drop(columns='churn')
y = data['churn']
mi_scores = mutual_info_classif(X, y)

# Step 2: Rank features based on mutual information scores
mi_scores = pd.Series(mi_scores, index=X.columns)
mi_scores = mi_scores.sort_values(ascending=False)

# Step 3: Select top N features based on mutual information score
top_features = mi_scores.head(10)
print("Top 10 Features Selected by Filter Method:", top_features)


### Q7. You are working on a project to predict the outcome of a soccer match. You have a large dataset with many features, including player statistics and team rankings. Explain how you would use the Embedded method to select the most relevant features for the model.

In a soccer match prediction project, where you have a large dataset with various features (e.g., player statistics, team rankings, historical performance), **Embedded feature selection methods** offer a powerful approach by integrating feature selection directly into the model training process. Here's a step-by-step guide on how to use the Embedded method for selecting the most relevant features for your model:

#### 1. **Understand the Dataset and Features**:
   - **Target Variable**: The outcome of the soccer match, typically a classification problem (win, loss, draw).
   - **Feature Variables**:
     - **Player Statistics**: Number of goals, assists, tackles, pass accuracy, distance covered, etc.
     - **Team Rankings**: Current rank, average points per match, win/loss ratio.
     - **Historical Performance**: Performance in home/away matches, previous match outcomes.
     - **Other Factors**: Weather conditions, number of injuries, home advantage.

#### 2. **Preprocess the Data**:
   Before applying any feature selection technique, you need to preprocess the data:
   - **Handle Missing Data**: Impute or remove missing values.
   - **Encode Categorical Variables**: Convert categorical features such as player positions, match location (home/away) into numerical form using one-hot encoding or label encoding.
   - **Standardize/Normalize Features**: Apply normalization or standardization to features that have different scales (e.g., number of goals vs. team ranking).

#### 3. **Select the Embedded Method**:
   Choose a machine learning algorithm that has built-in feature selection capabilities or allows for **regularization**. The most common techniques used in Embedded methods are based on **regularization** or **tree-based models**.

   - **L1 Regularization (Lasso)**:
     Lasso adds an L1 penalty to the loss function, shrinking some feature coefficients to zero. Features with coefficients equal to zero are removed from the model, which makes it an automatic feature selection method.
   
   - **Tree-Based Models (e.g., Random Forest, Gradient Boosting)**:
     Decision trees and tree-based models like **Random Forest** and **Gradient Boosting** perform feature selection by selecting the most informative features at each split. Feature importance scores are calculated based on how often a feature is used across the trees.

   - **Regularized Logistic Regression**:
     Logistic regression with **L1 (Lasso)** or **Elastic Net** regularization can help select the most relevant features for predicting match outcomes, particularly useful for binary classification problems (win/loss).

#### 4. **Train the Model with Feature Selection**:
   Apply the chosen model to the dataset and let the model automatically select important features during the training process.

   ##### Example: Using Lasso Regression for Feature Selection
   ```python
   from sklearn.linear_model import LogisticRegression
   from sklearn.model_selection import train_test_split
   from sklearn.preprocessing import StandardScaler

   # Assume X is the feature set (player stats, team rankings, etc.), y is the target (match outcome)
   X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

   # Standardize features
   scaler = StandardScaler()
   X_train_scaled = scaler.fit_transform(X_train)
   X_test_scaled = scaler.transform(X_test)

   # Apply Logistic Regression with L1 regularization (Lasso)
   model = LogisticRegression(penalty='l1', solver='saga', C=1.0, max_iter=1000)
   model.fit(X_train_scaled, y_train)

   # Extract non-zero feature coefficients
   selected_features = X.columns[model.coef_.ravel() != 0]
   print("Selected Features:", selected_features)

# 5. Evaluate Feature Importance (Tree-Based Models):
# If using a tree-based model like Random Forest or Gradient Boosting, you can directly extract feature importance scores.

Example: Using Random Forest for Feature Importance
from sklearn.ensemble import RandomForestClassifier

# Train a Random Forest Classifier
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

# Get feature importance scores
feature_importances = rf_model.feature_importances_

# Rank features by importance
feature_importance_df = pd.DataFrame({
    'Feature': X.columns,
    'Importance': feature_importances
}).sort_values(by='Importance', ascending=False)

print("Top Selected Features:", feature_importance_df.head(10))
```
6. Select the Most Important Features:
Based on the feature importance or regularization results, select the top features that are deemed most relevant by the model.

For Lasso regression, select features with non-zero coefficients.
For tree-based models, rank features by their importance scores and select the top ones.

7. Refine and Iterate:
Cross-validate: Use cross-validation to ensure that the selected features lead to consistent model performance.
Hyperparameter Tuning: Adjust the regularization strength (e.g., Lasso's alpha or Random Forest's n_estimators) to fine-tune the feature selection process.
Remove Less Important Features: Based on the feature importance or regularization results, remove irrelevant or low-importance features to reduce the complexity of the model.
8. Advantages of Using the Embedded Method:
Efficient Feature Selection: Feature selection happens during model training, which reduces computational cost compared to Wrapper methods.
Model-Specific Feature Selection: Embedded methods select features that are most relevant to the specific machine learning model, leading to better performance.
Automatic Regularization: Regularization techniques like Lasso or Elastic Net automatically shrink unimportant feature coefficients, simplifying the feature selection process.

### Q8. You are working on a project to predict the price of a house based on its features, such as size, location, and age. You have a limited number of features, and you want to ensure that you select the most important ones for the model. Explain how you would use the Wrapper method to select the best set of features for the predictor.

The **Wrapper method** is an iterative feature selection process that evaluates different subsets of features by training and testing a machine learning model on each subset. The goal is to identify the subset of features that leads to the best predictive performance for the target variable—in this case, **house price**.

Here’s a step-by-step explanation of how you would use the Wrapper method to select the best set of features for predicting house prices:

#### 1. **Understand the Dataset and Features**:
   - **Target Variable**: House price.
   - **Feature Variables**:
     - **Size**: Square footage or number of rooms.
     - **Location**: Proximity to the city center, neighborhood rating, or postal code.
     - **Age**: Age of the house.
     - **Other Features**: Number of bathrooms, presence of a garage, garden size, etc.

#### 2. **Preprocess the Data**:
   - **Handle Missing Data**: Impute or remove any missing values in the dataset.
   - **Encode Categorical Variables**: Convert categorical features like location or house type into numerical form using one-hot encoding or label encoding.
   - **Standardize/Normalize Features**: If necessary, scale features to ensure they are on comparable scales (especially for models like linear regression).

#### 3. **Select a Machine Learning Algorithm**:
   Since the Wrapper method requires repeatedly training and testing the model on different subsets of features, choose a model that is appropriate for the regression task. Common choices include:
   - **Linear Regression**: If the relationship between features and the target variable is linear.
   - **Decision Tree Regressor**: If the relationship is more complex and non-linear.
   - **Random Forest Regressor**: For more robust performance, Random Forest can capture non-linear relationships and is less sensitive to outliers.

#### 4. **Apply the Wrapper Method**:
   The most common wrapper methods are **Forward Selection**, **Backward Elimination**, and **Recursive Feature Elimination (RFE)**.

##### 4.1. **Forward Selection**:
   - Start with an empty set of features.
   - Iteratively add one feature at a time to the model, choosing the feature that improves the model’s performance the most.
   - Stop when adding any further features does not improve performance significantly.

   ##### Example: Forward Selection in Python
   ```python
   from sklearn.model_selection import train_test_split
   from sklearn.linear_model import LinearRegression
   from sklearn.metrics import mean_squared_error
   import pandas as pd
   import numpy as np

   # Assuming 'data' is the house dataset and 'price' is the target variable
   X = data.drop(columns='price')
   y = data['price']
   X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

   selected_features = []
   remaining_features = list(X.columns)
   best_score = np.inf  # We aim to minimize the error, so start with a large value

   while remaining_features:
       scores = []
       for feature in remaining_features:
           # Try adding each feature and assess performance
           trial_features = selected_features + [feature]
           model = LinearRegression()
           model.fit(X_train[trial_features], y_train)
           y_pred = model.predict(X_test[trial_features])
           mse = mean_squared_error(y_test, y_pred)
           scores.append((mse, feature))

       # Choose the feature that gives the lowest MSE
       best_new_score, best_feature = min(scores)
       if best_new_score < best_score:
           selected_features.append(best_feature)
           remaining_features.remove(best_feature)
           best_score = best_new_score
       else:
           break

   print("Selected Features:", selected_features)

4.2. Backward Elimination:
Start with all the features in the model.
Iteratively remove the least important feature (i.e., the feature whose removal causes the smallest decrease in model performance).
Stop when removing further features decreases performance significantly.
Example: Backward Elimination
```python

import statsmodels.api as sm

# Add a constant to the dataset (required for statsmodels' OLS)
X = sm.add_constant(X)
model = sm.OLS(y, X).fit()

# Perform backward elimination
while len(X.columns) > 1:
    p_values = model.pvalues
    max_p_value = p_values.max()  # Find the highest p-value
    if max_p_value > 0.05:  # Remove features with p-value > 0.05
        worst_feature = p_values.idxmax()
        X = X.drop(columns=[worst_feature])
        model = sm.OLS(y, X).fit()
    else:
        break

print("Selected Features:", X.columns)
5. Evaluate Model Performance:
After selecting a subset of features, evaluate the performance of your model on the test set:

Mean Squared Error (MSE): For regression tasks, use MSE to measure the difference between predicted and actual house prices.
Cross-Validation: Use cross-validation to ensure the selected features lead to consistent performance across different data splits.
6. Advantages of the Wrapper Method:
Model-Specific Feature Selection: Wrapper methods are model-specific, meaning the features are selected based on their impact on the actual prediction task, leading to potentially higher accuracy.
Works Well with Small Feature Sets: Since you have a limited number of features, the Wrapper method is feasible because the computational cost will not be prohibitively high.
Optimal Feature Subset: Wrapper methods aim to find the feature subset that provides the best performance for the specific model being used.
7. Drawbacks of the Wrapper Method:
Computationally Expensive: Since the model is trained and tested multiple times for different subsets of features, Wrapper methods can be computationally expensive, especially for large datasets with many features.
Risk of Overfitting: Wrapper methods optimize the feature selection based on the model’s performance on the training data, which can sometimes lead to overfitting if the dataset is not large enough.