In [None]:
#18 March Assignment Solution

### Ans 1:
The Filter method in feature selection is a technique used in machine learning to select the most relevant features from a dataset before training a model. This method relies on the statistical properties of the features themselves, independent of any machine learning algorithm. Here's a detailed explanation of how the Filter method works:

### Key Concepts and Steps

1. **Independent Evaluation**:
   - The Filter method evaluates each feature individually based on certain statistical criteria.
   - It does not involve any machine learning algorithm during the feature selection process, making it computationally efficient.

2. **Criteria for Evaluation**:
   - **Correlation Coefficient**: Measures the linear relationship between each feature and the target variable. Commonly used correlation measures include Pearson, Spearman, and Kendall tau.
   - **Mutual Information**: Quantifies the amount of information obtained about one variable through another variable, capturing non-linear relationships as well.
   - **Chi-Square Test**: Assesses whether there is a significant association between categorical features and the target variable.
   - **Variance Threshold**: Eliminates features with low variance, assuming they have little predictive power.
   - **ANOVA (Analysis of Variance)**: Compares the means of different groups and is typically used for numeric features against categorical target variables.

3. **Process**:
   - **Compute Scores**: Each feature is scored based on the chosen statistical measure.
   - **Rank Features**: Features are ranked according to their scores.
   - **Select Top Features**: A threshold is set (either a fixed number of features or a score cutoff), and only the top-ranking features are selected for model training.

### Advantages

- **Simplicity and Speed**: Filter methods are simple to implement and computationally efficient, making them suitable for high-dimensional datasets.
- **Independence**: They are independent of any machine learning algorithms, ensuring that the feature selection process is not biased by a specific model's performance.
- **Interpretability**: The criteria used (e.g., correlation, mutual information) are straightforward and interpretable.

### Disadvantages

- **Ignore Interactions**: Since each feature is evaluated independently, interactions between features are not considered.
- **Potentially Suboptimal**: The selected features might not be the best combination for a particular model, as the method does not take model performance into account.

### Example Workflow

1. **Calculate Correlation**:
   ```python
   import pandas as pd
   from sklearn.datasets import load_boston

   # Load dataset
   data = load_boston()
   X = pd.DataFrame(data.data, columns=data.feature_names)
   y = pd.Series(data.target)

   # Calculate Pearson correlation
   correlations = X.apply(lambda x: x.corr(y))
   ```

2. **Rank Features**:
   ```python
   ranked_features = correlations.abs().sort_values(ascending=False)
   ```

3. **Select Top Features**:
   ```python
   top_features = ranked_features.head(5).index.tolist()
   X_selected = X[top_features]
   ```

### Common Use Cases

- **Preprocessing**: As an initial step in the data preprocessing pipeline to reduce dimensionality.
- **Exploratory Data Analysis (EDA)**: To identify and visualize the most significant features related to the target variable.
- **Baseline Model**: As a baseline feature selection method before applying more complex techniques like Wrapper or Embedded methods.

In summary, the Filter method is a fast and efficient way to perform feature selection based on statistical measures. While it may not always yield the best subset of features for a particular model, it is a valuable tool for initial feature reduction and gaining insights into the data.

### Ans 2:
Key Differences
Dependency on Model: The Wrapper method depends on a specific machine learning model for feature selection, while the Filter method is model-agnostic.

Computation: The Wrapper method is computationally more intensive due to repeated model training and evaluation, whereas the Filter method is faster and more efficient.

Feature Interactions: The Wrapper method can account for interactions between features, while the Filter method evaluates each feature independently.

Outcome: The Wrapper method typically provides a feature subset that is optimized for the specific model and task, whereas the Filter method provides a generally good subset of features based on statistical relevance.

### Ans 3:

Embedded feature selection methods integrate the feature selection process within the model training. These methods are often more efficient than Wrapper methods because they utilize the learning algorithm itself to determine which features are most important, often as part of the model training process. Here are some common techniques used in Embedded feature selection methods:

### 1. Regularization Methods
Regularization techniques add a penalty to the model for having large coefficients, which can effectively shrink some coefficients to zero, thus performing feature selection.

- **Lasso (L1 Regularization)**: The Lasso (Least Absolute Shrinkage and Selection Operator) adds a penalty equal to the absolute value of the magnitude of coefficients. This can force some coefficients to be exactly zero, effectively performing feature selection.
  ```python
  from sklearn.linear_model import Lasso

  model = Lasso(alpha=0.1)  # alpha is the regularization strength
  model.fit(X, y)
  selected_features = X.columns[model.coef_ != 0]
  ```

- **Ridge (L2 Regularization)**: Ridge regression adds a penalty equal to the square of the magnitude of coefficients. Although it does not perform feature selection (since it does not shrink coefficients to zero), it is often used alongside Lasso in the Elastic Net method.
  
- **Elastic Net**: This method combines both L1 and L2 regularization, providing a balance between the two.
  ```python
  from sklearn.linear_model import ElasticNet

  model = ElasticNet(alpha=0.1, l1_ratio=0.5)  # l1_ratio balances L1 and L2 regularization
  model.fit(X, y)
  selected_features = X.columns[model.coef_ != 0]
  ```

### 2. Decision Tree-Based Methods
Decision tree algorithms and their ensembles (like Random Forests and Gradient Boosting) inherently perform feature selection by evaluating the importance of each feature in making splits in the tree.

- **Decision Trees**: Features that are used more frequently in decision splits are considered more important.
  ```python
  from sklearn.tree import DecisionTreeClassifier

  model = DecisionTreeClassifier()
  model.fit(X, y)
  feature_importances = model.feature_importances_
  selected_features = X.columns[feature_importances > threshold]
  ```

- **Random Forests**: An ensemble of decision trees that averages the feature importance scores across all trees.
  ```python
  from sklearn.ensemble import RandomForestClassifier

  model = RandomForestClassifier()
  model.fit(X, y)
  feature_importances = model.feature_importances_
  selected_features = X.columns[feature_importances > threshold]
  ```

- **Gradient Boosting Machines (GBM)**: Similar to Random Forests, but builds trees sequentially, with each tree trying to correct the errors of the previous one.
  ```python
  from sklearn.ensemble import GradientBoostingClassifier

  model = GradientBoostingClassifier()
  model.fit(X, y)
  feature_importances = model.feature_importances_
  selected_features = X.columns[feature_importances > threshold]
  ```

### 3. Regularized Linear Models with Embedded Feature Selection
Some linear models have built-in feature selection capabilities due to regularization techniques:

- **Logistic Regression with L1 Regularization**: Similar to Lasso, but for classification problems.
  ```python
  from sklearn.linear_model import LogisticRegression

  model = LogisticRegression(penalty='l1', solver='liblinear')
  model.fit(X, y)
  selected_features = X.columns[model.coef_[0] != 0]
  ```

### 4. Feature Importance from Other Models
Other models can also provide feature importance scores as part of their output:

- **Support Vector Machines (SVM) with L1 Penalty**: Like Lasso, can perform feature selection by shrinking some weights to zero.
  ```python
  from sklearn.svm import LinearSVC

  model = LinearSVC(penalty='l1', dual=False)
  model.fit(X, y)
  selected_features = X.columns[model.coef_[0] != 0]
  ```

### 5. Tree-Based Feature Selection Methods
Some specific methods are designed to interpret the results of tree-based models to select features:

- **Boruta**: A wrapper method around Random Forests designed to find all relevant features, not just the top features.
  ```python
  from boruta import BorutaPy
  from sklearn.ensemble import RandomForestClassifier

  model = RandomForestClassifier()
  boruta = BorutaPy(model, n_estimators='auto', random_state=42)
  boruta.fit(X.values, y.values)
  selected_features = X.columns[boruta.support_]
  ```

### Summary
Embedded feature selection methods leverage the model's training process to perform feature selection, often resulting in more efficient and effective identification of relevant features. Common techniques include regularization methods like Lasso and Elastic Net, decision tree-based methods, and models that inherently provide feature importance scores. These methods can often outperform simple Filter methods by considering feature interactions and the specific needs of the predictive model.

### ANS 4:
### Disadvantages

- **Ignore Interactions**: Since each feature is evaluated independently, interactions between features are not considered.
- **Potentially Suboptimal**: The selected features might not be the best combination for a particular model, as the method does not take model performance into account.

### ANS 5:
There are several situations where you might prefer using the Filter method over the Wrapper method for feature selection. Here are some key scenarios:

### 1. **Large Datasets with High Dimensionality**
- **Efficiency**: The Filter method is computationally efficient because it evaluates features independently of the model training process. This makes it suitable for datasets with a large number of features and instances where running multiple iterations of model training (as required by Wrapper methods) would be computationally expensive and time-consuming.
- **Scalability**: For high-dimensional datasets, the Filter method can quickly reduce the feature space, making subsequent modeling steps more manageable.

### 2. **Preprocessing and Initial Feature Reduction**
- **Initial Screening**: When you need to perform an initial reduction of the feature space before applying more sophisticated and computationally expensive methods, the Filter method serves as a good first step. It can help eliminate irrelevant or redundant features early in the process.
- **Baseline Feature Selection**: Filter methods can provide a baseline feature set that can be refined using other methods later. This can be particularly useful in the early stages of exploratory data analysis (EDA).

### 3. **Independence from Learning Algorithms**
- **Algorithm-Agnostic**: The Filter method is independent of any specific learning algorithm. If you are unsure about which model to use or want to keep your feature selection process model-agnostic, the Filter method is a good choice.
- **Consistency Across Models**: Since Filter methods are not tied to a particular model, the selected features are generally useful across different models, providing a versatile feature set.

### 4. **Avoiding Overfitting**
- **Less Risk of Overfitting**: Filter methods rely on statistical properties of the data rather than model performance. This reduces the risk of overfitting that might occur with Wrapper methods, especially in situations with small datasets or when cross-validation is not feasible.

### 5. **Simplicity and Interpretability**
- **Ease of Interpretation**: Filter methods are often easier to understand and interpret because they use straightforward statistical measures (e.g., correlation, mutual information). This can be beneficial when you need to explain the feature selection process to stakeholders or when interpretability is crucial.
- **Transparency**: The criteria used by Filter methods (such as variance, correlation, or chi-square tests) are transparent and easy to justify.

### 6. **Computational Constraints**
- **Limited Resources**: If computational resources are limited, the Filter method is preferable due to its lower computational demands. Wrapper methods require multiple iterations of model training and validation, which can be resource-intensive.

### Example Scenarios

- **Text Mining and Natural Language Processing (NLP)**: When working with text data, you often deal with thousands or even millions of features (e.g., words or n-grams). The Filter method can quickly identify and remove irrelevant or low-frequency terms.
- **Genomic Data Analysis**: In bioinformatics, genomic datasets can have tens of thousands of features (genes). Using Filter methods to reduce dimensionality before applying more complex models is a common practice.
- **Real-Time Systems**: In applications where real-time processing is required, such as online recommendation systems or fraud detection, the Filter method's speed and efficiency are advantageous.

### Summary
While Wrapper methods can provide more tailored feature sets optimized for specific models, they are computationally expensive and risk overfitting. The Filter method is a simpler, faster, and more interpretable approach suitable for initial feature reduction, high-dimensional datasets, and situations with computational constraints. By leveraging statistical properties of the data, Filter methods provide a robust baseline feature selection that can be refined further if necessary.

### ANS 6:
To select the most pertinent attributes for a predictive model using the Filter Method, follow these steps:

### Step-by-Step Process for Feature Selection Using Filter Method:

1. **Understand the Dataset**:
   - Begin with a thorough understanding of the dataset and the business context.
   - Identify the target variable (churn) and the feature set (all other attributes).

2. **Preprocess the Data**:
   - Handle missing values through imputation or removal.
   - Convert categorical variables to numerical formats using encoding techniques such as One-Hot Encoding or Label Encoding.
   - Normalize or standardize the features if necessary.

3. **Univariate Statistical Tests**:
   - Apply statistical tests to score the relationship between each feature and the target variable.
   - For continuous features, use statistical tests such as Pearson’s correlation coefficient.
   - For categorical features, use tests like the Chi-Squared test.

4. **Correlation Analysis**:
   - Calculate the correlation matrix to identify highly correlated features with the target variable.
   - For continuous features, use Pearson’s correlation.
   - For categorical features, use methods like ANOVA or mutual information.

5. **Feature Ranking and Selection**:
   - Rank the features based on the results from the statistical tests and correlation analysis.
   - Select the top-ranked features based on a threshold or a predefined number of features.

6. **Evaluate Multicollinearity**:
   - Check for multicollinearity among the selected features using Variance Inflation Factor (VIF).
   - Remove features with high multicollinearity (high VIF values).

7. **Iterative Validation**:
   - Validate the selected features by training a baseline model and evaluating its performance.
   - Iteratively refine the feature set based on model performance and insights.


### ANS 7:
The Embedded method for feature selection integrates the feature selection process within the model training process. It uses algorithms that have built-in mechanisms to identify the most relevant features during the model fitting process. Here’s how you can use the Embedded method to select the most relevant features for predicting the outcome of a soccer match:

Feature Selection Using the Embedded Method

### Example Implementation in Python

Below is an example of using the Embedded method with a Random Forest classifier to select the most relevant features for predicting the outcome of a soccer match.

```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import numpy as np

# Load dataset
df = pd.read_csv('soccer_matches.csv')

# Preprocess data
# Handle missing values
df.fillna(df.mean(), inplace=True)

# Encode categorical variables
df = pd.get_dummies(df, drop_first=True)

# Define features and target variable
X = df.drop(columns=['match_outcome'])
y = df['match_outcome']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Random Forest model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Get feature importances
feature_importances = model.feature_importances_

# Create a DataFrame for feature importances
features_df = pd.DataFrame({
    'Feature': X.columns,
    'Importance': feature_importances
})

# Sort features by importance
features_df = features_df.sort_values(by='Importance', ascending=False)

# Select top features
top_features = features_df['Feature'].head(10)

# Create a new dataset with only the top features
X_train_selected = X_train[top_features]
X_test_selected = X_test[top_features]

# Train the model again with selected features
model.fit(X_train_selected, y_train)

# Predict and evaluate the model
y_pred = model.predict(X_test_selected)
accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy with selected features: {accuracy:.2f}')
print('Selected Features:')
print(features_df.head(10))
```

### Explanation:

1. **Data Preprocessing**:
   - Missing values are handled by filling with the mean.
   - Categorical variables are encoded using One-Hot Encoding.

2. **Model Training**:
   - A Random Forest classifier is trained on the full feature set.
   - Feature importances are extracted from the trained model.

3. **Feature Selection**:
   - Features are ranked by their importance scores.
   - The top features are selected based on their importance scores.

4. **Model Evaluation**:
   - The model is re-trained using only the selected top features.
   - The model's performance is evaluated on the test set using accuracy as the metric.

This approach ensures that only the most relevant features are selected during the model training process, leveraging the embedded feature selection capabilities of the Random Forest classifier. This method can be similarly applied using other models like Lasso regression for linear models or Gradient Boosting Machines for tree-based models.

In [None]:
### ANs 8:
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import RFE

# Load dataset
df = pd.read_csv('house_prices.csv')

# Preprocess data
# Handle missing values
df.fillna(df.mean(), inplace=True)

# Encode categorical variables
df = pd.get_dummies(df, drop_first=True)

# Define features and target variable
X = df.drop(columns=['price'])
y = df['price']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the model
model = LinearRegression()

# Apply Recursive Feature Elimination (RFE)
# Select the number of features to retain (e.g., 5)
rfe = RFE(model, n_features_to_select=5)
rfe.fit(X_train, y_train)

# Get the selected features
selected_features = X_train.columns[rfe.support_]

# Train the model with the selected features
X_train_selected = X_train[selected_features]
X_test_selected = X_test[selected_features]

model.fit(X_train_selected, y_train)

# Predict and evaluate the model
y_pred = model.predict(X_test_selected)
r2_score = model.score(X_test_selected, y_test)

print(f'R^2 Score with selected features: {r2_score:.2f}')
print('Selected Features:')
print(selected_features)
