## Forward Selection

 * Forward Selection starts with an empty model and adds features one by one, choosing the feature that most improves the model performance (e.g., lowest AIC, highest R², etc.) in each iteration.

 * This method is greedy and deterministic, useful when we want to quickly build a performant model using only the most significant features.

 **How it Works:**

   * Start with an empty model.

   * For each feature not already in the model:

   * Train the model by adding that feature.

   * Evaluate performance using a metric (e.g., accuracy, R², AIC).

   * Add the feature that improves performance the most.

   * Repeat steps 2–3 until:

     * No significant performance gain is observed, or
     * A pre-defined number of features is reached.


 **When to Use:**

  * When model simplicity and interpretability are important.

  * When the dataset has many irrelevant features.
  * When computational power is limited.


 **Advantages:**

  * Efficient with a smaller feature set.

  * Helps avoid overfitting by selecting only the most informative features.

 **Limitations:**

  * Can miss combinations of features that only work well together.

  * May be time-consuming for high-dimensional datasets.



In [None]:
import pandas as pd
import statsmodels.api as sm
from sklearn.datasets import fetch_california_housing

# Load California Housing dataset
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target, name="MedHouseValue")

def forward_selection(X, y):
    initial_features = []
    remaining_features = list(X.columns)
    best_score = float('inf')
    selected_features = []

    while remaining_features:
        scores_with_candidates = []
        for candidate in remaining_features:
            model = sm.OLS(y, sm.add_constant(X[initial_features + [candidate]])).fit()
            aic = model.aic
            scores_with_candidates.append((aic, candidate))

        scores_with_candidates.sort()
        best_new_score, best_candidate = scores_with_candidates[0]

        if best_new_score < best_score:
            remaining_features.remove(best_candidate)
            initial_features.append(best_candidate)
            best_score = best_new_score
            selected_features.append(best_candidate)
        else:
            break

    return selected_features

selected_forward = forward_selection(X, y)
print("Selected Features using Forward Selection:\n", selected_forward)


# Backward Elimination


 * Backward Elimination starts with all features included and removes the least significant one at each step, based on statistical metrics or model performance.

**How it Works:**

  1. Begin with all available features in the model.

  2. Train the model and evaluate the importance or significance of each feature.

  3. Remove the least significant feature (e.g., highest p-value or least effect on performance).
 
  4. Repeat steps 2–3 until:
     * All remaining features are statistically significant, or
     * Removing more features worsens the model.

**When to Use:**

 * When all features are assumed to be relevant initially.
 
 * When dataset is moderate in size (as it's computationally expensive).

 * To find the most influential subset of features.

**Advantages:**

  * Considers the full model context initially.

  * Captures interactions between features that may be lost in forward selection.

**Limitations:**

  * Computationally expensive with large feature sets.

  * Risk of retaining redundant features early on.
     
    

In [None]:
import pandas as pd
import statsmodels.api as sm
from sklearn.datasets import fetch_california_housing

# Load California Housing dataset
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target, name="MedHouseValue")


def backward_elimination(X, y, significance_level=0.05):
    features = X.columns.tolist()
    while len(features) > 0:
        X_model = sm.add_constant(X[features])
        model = sm.OLS(y, X_model).fit()
        p_values = model.pvalues.iloc[1:]  # exclude intercept
        max_p_value = p_values.max()
        if max_p_value > significance_level:
            excluded_feature = p_values.idxmax()
            features.remove(excluded_feature)
        else:
            break
    return features

selected_backward = backward_elimination(X, y)
print("Selected Features using Backward Elimination:\n", selected_backward)
