**Q1. What is the Filter method in feature selection, and how does it work?**

**ANSWER:---**

The Filter method in feature selection is a technique used to select important features (variables) in a dataset based on certain statistical criteria, without involving any machine learning algorithms. The main goal of this method is to remove irrelevant or less important features before the modeling process, thereby improving the performance and reducing the complexity of the model.

### How the Filter Method Works:

1. **Statistical Criteria**: The filter method relies on statistical measures to evaluate the importance of each feature with respect to the target variable. Common statistical criteria include correlation coefficients, chi-square tests, mutual information, variance thresholds, and ANOVA F-tests.

2. **Independence from Models**: Unlike wrapper and embedded methods, filter methods do not involve training models. They are independent of any machine learning algorithm, which makes them computationally efficient and faster.

3. **Ranking and Selection**: Features are ranked based on their scores from the statistical tests. A threshold is set (either manually or automatically) to select the top-ranked features. For example, if the correlation between a feature and the target variable is high, that feature might be considered important.

### Common Statistical Measures Used in Filter Methods:

- **Correlation Coefficient (Pearson, Spearman)**: Measures the linear or rank-based relationship between features and the target variable. Features with high correlation are selected.
  
- **Chi-Square Test**: Used for categorical features to assess the independence between a feature and the target variable. Features with low p-values are considered important.

- **Mutual Information**: Measures the amount of information shared between a feature and the target variable. Higher mutual information indicates higher relevance.

- **Variance Threshold**: Removes features with low variance, assuming they have little information to contribute to the model.

- **ANOVA F-test**: Compares the means of different groups for categorical features to find significant differences that might indicate importance.

### Advantages and Disadvantages of Filter Methods:

**Advantages**:
- **Simplicity and Speed**: Since no model training is involved, filter methods are computationally efficient and faster than wrapper and embedded methods.
- **Model Independence**: They do not depend on a specific learning algorithm, making them more generalizable.
- **Scalability**: Suitable for large datasets as they are less computationally intensive.

**Disadvantages**:
- **Ignoring Feature Interactions**: Filter methods evaluate features individually and may miss interactions between features that could be important.
- **Potential for Overlooking Non-linear Relationships**: Simple statistical measures might not capture complex relationships between features and the target variable.

### Example Workflow of the Filter Method:

1. **Calculate Statistical Measures**: Compute the chosen statistical measure (e.g., correlation) between each feature and the target variable.
2. **Rank Features**: Rank the features based on the computed scores.
3. **Set a Threshold**: Determine a threshold for selection (e.g., top 10 features or features with a correlation above 0.5).
4. **Select Features**: Select the top-ranked features that meet the threshold criteria.


**Q2. How does the Wrapper method differ from the Filter method in feature selection?**

**ANSWER:----**

The Wrapper method and the Filter method are both techniques used for feature selection, but they differ significantly in their approaches, processes, and the way they evaluate features. Here’s a detailed comparison of the two methods:

### Wrapper Method:

1. **Model-Based Evaluation**: The Wrapper method evaluates the importance of features by using a predictive model. It assesses different subsets of features by training and testing a model on each subset and selecting the subset that produces the best performance.

2. **Iterative Search**: Wrapper methods typically involve an iterative search process, such as forward selection, backward elimination, or recursive feature elimination. These methods add or remove features one at a time, evaluating the model performance at each step.

3. **Performance Metric**: The selection process is guided by a performance metric (e.g., accuracy, F1-score, AUC) obtained from the model. Features are selected based on how well they improve this performance metric.

4. **Computationally Intensive**: Since wrapper methods involve training and testing models multiple times on different subsets of features, they are computationally more expensive and time-consuming than filter methods.

5. **Capturing Feature Interactions**: Wrapper methods can capture interactions between features because they evaluate the combined effect of features on the model’s performance.

### Filter Method:

1. **Statistical Criteria**: The Filter method evaluates the importance of features based on statistical measures or criteria such as correlation coefficients, chi-square tests, mutual information, variance thresholds, or ANOVA F-tests. These measures assess the relationship between each feature and the target variable independently of any model.

2. **Single Step Process**: Filter methods usually involve a single-step process where features are ranked based on the chosen statistical measure, and the top-ranked features are selected.

3. **No Model Training**: Filter methods do not involve training and testing predictive models, making them computationally efficient and faster compared to wrapper methods.

4. **Ignoring Feature Interactions**: Since filter methods evaluate features independently, they may miss important interactions between features.

5. **Simplicity and Scalability**: Filter methods are simpler and more scalable, especially suitable for large datasets due to their lower computational requirements.

### Key Differences:

1. **Evaluation Basis**:
   - **Wrapper Method**: Evaluates feature subsets based on model performance.
   - **Filter Method**: Evaluates individual features based on statistical measures.

2. **Process**:
   - **Wrapper Method**: Iterative and involves model training and testing for each subset of features.
   - **Filter Method**: Single-step process without involving model training.

3. **Computational Complexity**:
   - **Wrapper Method**: More computationally intensive due to repeated model training and testing.
   - **Filter Method**: Less computationally intensive and faster.

4. **Feature Interactions**:
   - **Wrapper Method**: Can capture interactions between features.
   - **Filter Method**: Ignores interactions between features.

5. **Applicability**:
   - **Wrapper Method**: More suitable for smaller datasets where computational cost is not a major concern.
   - **Filter Method**: More suitable for large datasets due to its efficiency and speed.

### Example of Each Method:

- **Wrapper Method**:
  - **Forward Selection**: Start with no features, add features one by one, and select the subset that maximizes the model’s performance.
  - **Backward Elimination**: Start with all features, remove features one by one, and select the subset that maintains or improves model performance.

- **Filter Method**:
  - **Correlation Coefficient**: Calculate the correlation between each feature and the target variable, rank features by correlation, and select the top-ranked features.
  - **Chi-Square Test**: Perform a chi-square test for independence between each feature and the target variable, rank features by their p-values, and select the most significant features.



**Q3. What are some common techniques used in Embedded feature selection methods?**

**ANSWER:----**

Embedded feature selection methods integrate the process of feature selection with the training process of a predictive model. These methods select features while the model is being built, often leveraging the model's own mechanisms for identifying important features. Here are some common techniques used in embedded feature selection methods:

### Common Embedded Feature Selection Techniques:

1. **Regularization Methods**:
   - **Lasso (L1 Regularization)**: Adds a penalty equal to the absolute value of the magnitude of coefficients to the loss function. This can shrink some coefficients to zero, effectively performing feature selection.
   - **Ridge (L2 Regularization)**: Adds a penalty equal to the square of the magnitude of coefficients. While it does not perform feature selection directly, it can be used in conjunction with other methods.
   - **Elastic Net**: Combines L1 and L2 regularization, promoting both sparsity (like Lasso) and grouping of correlated features.

2. **Tree-Based Methods**:
   - **Decision Trees**: Inherently perform feature selection by splitting nodes based on the most informative features. Features used in splits are considered important.
   - **Random Forests**: An ensemble of decision trees, where the importance of a feature is determined by averaging the importance of that feature across all trees in the forest.
   - **Gradient Boosting Machines (GBMs)**: Similar to random forests but builds trees sequentially. Feature importance is determined based on the contribution of each feature to the reduction of the loss function.

3. **Regularized Linear Models**:
   - **Logistic Regression with L1 Regularization**: For classification tasks, logistic regression with L1 regularization can be used to perform feature selection by shrinking some coefficients to zero.
   - **Support Vector Machines (SVM) with L1 Penalty**: For linear SVMs, using an L1 penalty can lead to sparse solutions where some feature weights are zero.

4. **Others**:
   - **Least Absolute Shrinkage and Selection Operator (LASSO)**: A regression analysis method that performs both feature selection and regularization to enhance the prediction accuracy and interpretability of the statistical model it produces.
   - **Embedded Methods in Neural Networks**: Techniques like dropout and L1/L2 regularization can be used in neural networks to encourage sparsity and reduce overfitting, indirectly performing feature selection.

### Key Characteristics of Embedded Methods:

- **Integrated Process**: Feature selection occurs during the model training process, often resulting in more efficient and effective selection.
- **Algorithm-Specific**: These methods are usually tied to specific types of models or algorithms, such as linear models or tree-based methods.
- **Automatic Selection**: Embedded methods automatically select features based on the criteria defined by the model's learning process.

### Advantages and Disadvantages:

**Advantages**:
- **Efficiency**: Since feature selection is integrated into the training process, it can be computationally efficient compared to wrapper methods.
- **Better Generalization**: By selecting features based on model performance, embedded methods often result in models that generalize better to unseen data.
- **Model Interpretability**: Many embedded methods, especially those involving regularization, produce models that are easier to interpret due to the reduced number of features.

**Disadvantages**:
- **Algorithm Dependency**: These methods are specific to the algorithm used and may not be easily transferable to other types of models.
- **Complexity**: The integration of feature selection into the model training process can sometimes add complexity to the model-building process.

### Examples of Application:

- **Lasso Regression for Sparse Models**: Used in high-dimensional datasets where feature selection and prediction are both needed.
- **Random Forests for Variable Importance**: Often used in problems where understanding the importance of variables is crucial, such as in medical or financial applications.
- **Gradient Boosting for Robust Prediction**: Used in competitive machine learning and scenarios requiring high predictive accuracy, with feature importance providing insights into the model.



**Q4. What are some drawbacks of using the Filter method for feature selection?**

**ANSWER:-----**

The Filter method for feature selection, while simple and computationally efficient, does have several drawbacks:

1. **Ignoring Feature Interactions**: Filter methods evaluate each feature independently without considering the interactions between features. This can lead to the exclusion of features that, while not highly informative on their own, might be important in combination with other features.

2. **Oversimplified Assumptions**: The statistical measures used in filter methods (e.g., correlation, mutual information) often assume linear relationships between features and the target variable. This can result in the omission of features that have non-linear but significant relationships with the target.

3. **Potential for Overlooking Non-linear Relationships**: Simple statistical criteria may not capture complex, non-linear relationships between features and the target variable, potentially overlooking important features.

4. **Relevance to Specific Models**: Filter methods do not account for the specific learning algorithm to be used. Features selected based on general statistical criteria may not be the most useful for a particular model or algorithm.

5. **Threshold Sensitivity**: The choice of threshold for selecting features can be somewhat arbitrary and may significantly impact the results. Setting the threshold too high may exclude useful features, while setting it too low may include irrelevant features.

6. **Stability Issues**: Filter methods might be unstable in the presence of noisy data or when dealing with high-dimensional datasets. The selected features can vary significantly with slight changes in the data.

7. **Scalability Concerns**: While filter methods are generally more scalable than wrapper or embedded methods, they can still struggle with very large datasets if the chosen statistical test is computationally intensive.

8. **No Consideration of Model Performance**: Filter methods do not directly evaluate the impact of selected features on model performance. As a result, the selected features may not necessarily lead to the best-performing model.

### Example Scenario Highlighting Drawbacks:

Consider a dataset with features that interact in complex ways. A feature might not show a strong individual correlation with the target variable but could be crucial when combined with other features. A filter method that looks at each feature independently would likely miss such interactions, potentially leading to suboptimal feature selection.

### Comparison with Other Methods:

- **Wrapper Methods**: While more computationally intensive, wrapper methods evaluate feature subsets based on model performance, capturing interactions between features and tailoring the selection to the specific algorithm.
- **Embedded Methods**: These integrate feature selection within the model training process, often leveraging regularization techniques to select features based on their contribution to model performance, accounting for feature interactions and algorithm-specific needs.



**Q5. In which situations would you prefer using the Filter method over the Wrapper method for feature
selection?**

**ANSWER:-----**

Choosing between the Filter method and the Wrapper method for feature selection depends on various factors such as dataset size, computational resources, model requirements, and specific goals of the analysis. Here are situations where the Filter method would be preferred over the Wrapper method:

### Situations Favoring the Filter Method:

1. **Large Datasets**:
   - **High Dimensionality**: When dealing with datasets that have a large number of features, the Filter method is computationally more efficient and scalable, making it suitable for high-dimensional data.
   - **Big Data**: In scenarios where the volume of data is very large, the Filter method's computational efficiency is advantageous.

2. **Computational Constraints**:
   - **Limited Resources**: If there are constraints on computational power and time, the Filter method is preferred as it is faster and less resource-intensive compared to the Wrapper method.

3. **Initial Feature Reduction**:
   - **Preprocessing Step**: The Filter method is useful as an initial step to quickly eliminate irrelevant or redundant features before applying more computationally intensive methods like Wrapper or Embedded methods.

4. **Quick Insights and Interpretability**:
   - **Simple and Transparent Criteria**: The Filter method provides a straightforward way to assess feature importance based on clear statistical criteria, making it easier to interpret and explain to stakeholders.

5. **Baseline Models**:
   - **Initial Model Building**: For building baseline models quickly and establishing a benchmark, the Filter method helps in reducing the feature set without involving complex model training.

6. **Irrelevant Feature Removal**:
   - **Obvious Irrelevance**: In cases where certain features are clearly irrelevant or have no meaningful relationship with the target variable, Filter methods can effectively remove such features early in the analysis.

7. **Independent Feature Evaluation**:
   - **No Need for Interaction Analysis**: If the problem does not require considering interactions between features or if interactions are known to be minimal, the Filter method is suitable.

### Example Scenarios:

1. **Text Mining**:
   - When working with text data involving thousands of features (e.g., words or n-grams), using Filter methods like term frequency-inverse document frequency (TF-IDF) or chi-square tests to reduce dimensionality before applying more sophisticated models.

2. **Preliminary Data Analysis**:
   - In exploratory data analysis, using Filter methods to gain initial insights into which features might be important based on statistical measures, before committing to more resource-intensive methods.

3. **Genomic Data**:
   - In bioinformatics, where datasets can have tens of thousands of genetic markers, Filter methods can quickly reduce the number of features to a manageable size before applying detailed modeling techniques.

4. **Real-time Applications**:
   - For applications requiring real-time or near-real-time processing, where computational efficiency is critical, the Filter method provides a quick way to reduce the feature set.

### Contrast with Wrapper Methods:

- **Wrapper Methods**: While more thorough in evaluating feature subsets and their interactions, Wrapper methods are computationally intensive as they involve training and testing models repeatedly for different feature combinations.
- **Filter Methods**: They are faster and less resource-intensive, suitable for initial feature reduction, large datasets, and situations with limited computational resources.


**Q6. In a telecom company, you are working on a project to develop a predictive model for customer churn.
You are unsure of which features to include in the model because the dataset contains several different
ones. Describe how you would choose the most pertinent attributes for the model using the Filter Method.**

**ANSWER:-----**

To select the most pertinent features for a predictive model for customer churn using the Filter method, you can follow these steps:

### Step-by-Step Process:

1. **Understand the Data**:
   - Familiarize yourself with the dataset, including the types of features (e.g., numerical, categorical), their distributions, and their potential relevance to customer churn.

2. **Data Preprocessing**:
   - **Handle Missing Values**: Address missing data through imputation or removal.
   - **Encode Categorical Variables**: Convert categorical variables into numerical values using techniques like one-hot encoding or label encoding.
   - **Normalize/Scale Data**: Standardize numerical features to ensure they are on a similar scale.

3. **Select Statistical Criteria**:
   Choose appropriate statistical measures to evaluate the relationship between each feature and the target variable (churn). Common criteria include:

   - **Correlation Coefficient**: For numerical features, calculate the Pearson or Spearman correlation between each feature and the churn variable.
   - **Chi-Square Test**: For categorical features, perform chi-square tests to assess the independence between each feature and churn.
   - **Mutual Information**: Measure the mutual information between each feature and the churn variable to understand the amount of shared information.
   - **Variance Threshold**: Remove features with low variance, as they provide little information.

4. **Compute Statistical Scores**:
   - Calculate the chosen statistical measure for each feature in relation to the churn variable.
   - For numerical features, use correlation coefficients or mutual information.
   - For categorical features, use chi-square tests or mutual information.

5. **Rank Features**:
   - Rank the features based on their computed statistical scores. Higher scores indicate a stronger relationship with the churn variable.

6. **Set a Selection Threshold**:
   - Determine a threshold for selecting features. This could be based on a predefined number of top features, a specific score cutoff, or using cross-validation to find the optimal threshold.
   - For example, you might choose the top 10 features with the highest correlation coefficients or chi-square scores.

7. **Select Features**:
   - Select the features that meet the threshold criteria. These are the features deemed most pertinent for predicting customer churn.

8. **Evaluate and Iterate**:
   - Use the selected features to build an initial predictive model.
   - Evaluate the model’s performance using appropriate metrics (e.g., accuracy, precision, recall, AUC-ROC).
   - If performance is not satisfactory, consider adjusting the threshold, trying different statistical measures, or incorporating additional domain knowledge to refine the feature selection process.

### Example:

Suppose your dataset includes features such as customer age, tenure, monthly charges, contract type, payment method, and usage patterns. Here's how you might apply the Filter method:

1. **Preprocessing**:
   - Impute missing values, encode categorical variables (e.g., contract type, payment method), and scale numerical features (e.g., monthly charges).

2. **Select Statistical Criteria**:
   - For numerical features (age, tenure, monthly charges), use the Pearson correlation coefficient.
   - For categorical features (contract type, payment method), use chi-square tests.

3. **Compute Scores**:
   - Calculate the Pearson correlation between numerical features and churn.
   - Perform chi-square tests for categorical features against the churn variable.

4. **Rank Features**:
   - Rank numerical features based on their correlation coefficients with churn.
   - Rank categorical features based on their chi-square test p-values.

5. **Set a Threshold and Select Features**:
   - Select the top-ranked features, such as those with a correlation coefficient above 0.2 or chi-square p-value below 0.05.

6. **Evaluate and Iterate**:
   - Build a predictive model using the selected features.
   - Evaluate model performance and iterate on feature selection if necessary.



In [2]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.feature_selection import chi2, mutual_info_classif
from scipy.stats import pearsonr

# Create a sample dataset
data = pd.DataFrame({
    'CustomerID': range(1, 11),
    'Age': [23, 45, 56, 25, 39, 33, 46, 52, 40, 29],
    'Tenure': [1, 2, 8, 1, 5, 3, 9, 10, 4, 2],
    'MonthlyCharges': [29.85, 56.95, 53.85, 42.30, 70.70, 99.65, 89.10, 29.75, 49.95, 18.95],
    'Contract': ['Month-to-month', 'One year', 'Two year', 'Month-to-month', 'One year', 'One year', 'Two year', 'Month-to-month', 'One year', 'Month-to-month'],
    'PaymentMethod': ['Electronic check', 'Mailed check', 'Bank transfer', 'Electronic check', 'Bank transfer', 'Credit card', 'Electronic check', 'Mailed check', 'Credit card', 'Electronic check'],
    'Churn': [1, 0, 0, 1, 0, 0, 0, 1, 0, 1]
})

# Display the first few rows of the dataset
print(data.head())

# Preprocess data
data.fillna(data.mean(), inplace=True)  # Example imputation
categorical_features = ['Contract', 'PaymentMethod']
numerical_features = ['Age', 'Tenure', 'MonthlyCharges']

# Encode categorical features
label_encoders = {}
for feature in categorical_features:
    le = LabelEncoder()
    data[feature] = le.fit_transform(data[feature])
    label_encoders[feature] = le

# Scale numerical features
scaler = StandardScaler()
data[numerical_features] = scaler.fit_transform(data[numerical_features])

# Calculate statistical scores
correlation_scores = {feature: pearsonr(data[feature], data['Churn'])[0] for feature in numerical_features}
chi2_scores, _ = chi2(data[categorical_features], data['Churn'])
mutual_info_scores = mutual_info_classif(data[numerical_features + categorical_features], data['Churn'])

# Rank features
sorted_correlation_scores = sorted(correlation_scores.items(), key=lambda item: abs(item[1]), reverse=True)
sorted_chi2_scores = sorted(zip(categorical_features, chi2_scores), key=lambda item: item[1], reverse=True)
sorted_mutual_info_scores = sorted(zip(numerical_features + categorical_features, mutual_info_scores), key=lambda item: item[1], reverse=True)

# Select top features based on threshold
top_features = [feature for feature, score in sorted_correlation_scores[:3]] + \
               [feature for feature, score in sorted_chi2_scores[:3]] + \
               [feature for feature, score in sorted_mutual_info_scores[:3]]

# Remove duplicates
top_features = list(set(top_features))

print(f"Selected Features: {top_features}")


   CustomerID  Age  Tenure  MonthlyCharges        Contract     PaymentMethod  \
0           1   23       1           29.85  Month-to-month  Electronic check   
1           2   45       2           56.95        One year      Mailed check   
2           3   56       8           53.85        Two year     Bank transfer   
3           4   25       1           42.30  Month-to-month  Electronic check   
4           5   39       5           70.70        One year     Bank transfer   

   Churn  
0      1  
1      0  
2      0  
3      1  
4      0  
Selected Features: ['MonthlyCharges', 'Contract', 'Age', 'Tenure', 'PaymentMethod']


  data.fillna(data.mean(), inplace=True)  # Example imputation


**Q7. You are working on a project to predict the outcome of a soccer match. You have a large dataset with
many features, including player statistics and team rankings. Explain how you would use the Embedded
method to select the most relevant features for the model.**

**ANSWER:------**

To predict the outcome of a soccer match using an Embedded method for feature selection, you can leverage techniques that incorporate feature selection directly into the model training process. This ensures that the selected features are the most relevant for the predictive task. Here’s a step-by-step guide on how to do this:

### Step-by-Step Process:

1. **Understand the Data**:
   - Familiarize yourself with the dataset, including player statistics (e.g., goals, assists, tackles), team rankings, and other relevant features.
   - Identify the target variable, which in this case could be the match outcome (e.g., win, loss, draw).

2. **Data Preprocessing**:
   - **Handle Missing Values**: Impute or remove missing data.
   - **Encode Categorical Variables**: Convert categorical variables (e.g., team names, positions) to numerical values.
   - **Normalize/Scale Data**: Standardize numerical features to ensure they are on a similar scale.

3. **Choose an Appropriate Model**:
   - Select a model that supports embedded feature selection. Common choices include:
     - **Lasso Regression** (L1 regularization): Encourages sparsity by penalizing the absolute values of the coefficients.
     - **Tree-based Methods**: Models like Decision Trees, Random Forests, and Gradient Boosting Trees naturally rank features based on their importance.
     - **Regularized Linear Models**: Elastic Net combines both L1 and L2 regularization.
     - **Support Vector Machines (SVMs)** with regularization terms.

4. **Train the Model with Embedded Feature Selection**:
   - Train the chosen model on the dataset, allowing it to perform feature selection during the training process.
   - For Lasso Regression, the regularization parameter (alpha) controls the degree of sparsity. A higher alpha leads to more feature coefficients being shrunk to zero.
   - For tree-based methods, feature importance scores can be derived directly from the trained model.

5. **Extract and Rank Features**:
   - After training, extract the features selected by the model.
   - For Lasso Regression, identify features with non-zero coefficients.
   - For tree-based models, use feature importance scores to rank the features.

6. **Validate and Fine-tune**:
   - Evaluate the model’s performance using cross-validation.
   - Adjust hyperparameters (e.g., regularization strength in Lasso, number of trees in Random Forest) to optimize performance.
   - Ensure that the selected features consistently contribute to good model performance across different validation sets.

7. **Iterate and Refine**:
   - Based on model performance and feature importance, iterate on feature selection and model training.
   - Consider domain knowledge to include or exclude certain features, ensuring that the model remains interpretable and relevant to the soccer match prediction task.


### Key Points:
- **Lasso Regression** shrinks coefficients of less important features to zero, effectively selecting a subset of relevant features.
- **Random Forest** ranks features by importance, allowing selection based on their contribution to model accuracy.
- **Cross-validation** ensures that the selected features generalize well to unseen data, avoiding overfitting.

Using the Embedded method, you can leverage model-based feature selection techniques to identify and use the most relevant features for predicting soccer match outcomes, leading to more accurate and interpretable models.

In [4]:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier

# Create a sample dataset
np.random.seed(42)
data = pd.DataFrame({
    'Player1_Goals': np.random.randint(0, 10, size=100),
    'Player1_Assists': np.random.randint(0, 10, size=100),
    'Player2_Goals': np.random.randint(0, 10, size=100),
    'Player2_Assists': np.random.randint(0, 10, size=100),
    'Team_Ranking': np.random.randint(1, 21, size=100),
    'Home_Advantage': np.random.choice([0, 1], size=100),
    'MatchOutcome': np.random.choice([0, 1], size=100)  # 0: Loss/Draw, 1: Win
})

# Preprocess data
categorical_features = ['Home_Advantage']
numerical_features = [col for col in data.columns if col not in categorical_features + ['MatchOutcome']]

# Scale numerical features
scaler = StandardScaler()
data[numerical_features] = scaler.fit_transform(data[numerical_features])

# Define features and target
X = data.drop(columns=['MatchOutcome'])
y = data['MatchOutcome']

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# --- Lasso Regression for Feature Selection ---
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)

# Extract selected features
selected_features_lasso = X_train.columns[lasso.coef_ != 0]
print(f"Selected Features by Lasso Regression: {selected_features_lasso}")

# Evaluate Lasso model
scores_lasso = cross_val_score(lasso, X_train, y_train, cv=5)
print(f"Lasso Cross-Validation Scores: {scores_lasso}")
print(f"Lasso Mean CV Score: {scores_lasso.mean()}")

# --- Random Forest for Feature Selection ---
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

# Extract feature importances
importances = rf.feature_importances_
feature_names = X_train.columns
feature_importances = pd.DataFrame({'Feature': feature_names, 'Importance': importances})
feature_importances = feature_importances.sort_values(by='Importance', ascending=False)

# Select top features
selected_features_rf = feature_importances[feature_importances['Importance'] > 0.01]['Feature']
print(f"Selected Features by Random Forest: {selected_features_rf}")

# Evaluate Random Forest model
scores_rf = cross_val_score(rf, X_train, y_train, cv=5)
print(f"Random Forest Cross-Validation Scores: {scores_rf}")
print(f"Random Forest Mean CV Score: {scores_rf.mean()}")


Selected Features by Lasso Regression: Index([], dtype='object')
Lasso Cross-Validation Scores: [-0.02604167 -0.11009462  0.00209124 -0.10943808 -0.02441406]
Lasso Mean CV Score: -0.05357943697834031
Selected Features by Random Forest: 3    Player2_Assists
4       Team_Ranking
0      Player1_Goals
1    Player1_Assists
2      Player2_Goals
5     Home_Advantage
Name: Feature, dtype: object
Random Forest Cross-Validation Scores: [0.25   0.3125 0.5    0.5    0.375 ]
Random Forest Mean CV Score: 0.3875


**Q8. You are working on a project to predict the price of a house based on its features, such as size, location,
and age. You have a limited number of features, and you want to ensure that you select the most important
ones for the model. Explain how you would use the Wrapper method to select the best set of features for the
predictor.**

**ANSWER:-----**

Using the Wrapper method for feature selection involves evaluating different subsets of features based on model performance. This method is computationally intensive but can provide high-quality feature subsets by considering interactions between features. Here’s a step-by-step guide on how to use the Wrapper method to select the best set of features for predicting house prices.

### Step-by-Step Process:

1. **Understand the Data**:
   - Familiarize yourself with the dataset, which includes features such as house size, location, age, and other relevant attributes.
   - Identify the target variable, which in this case is the house price.

2. **Data Preprocessing**:
   - **Handle Missing Values**: Impute or remove missing data.
   - **Encode Categorical Variables**: Convert categorical variables (e.g., location) to numerical values.
   - **Normalize/Scale Data**: Standardize numerical features to ensure they are on a similar scale.

3. **Choose a Model**:
   - Select a machine learning model that you will use to evaluate feature subsets. Common choices include:
     - Linear Regression
     - Decision Trees
     - Support Vector Machines (SVMs)
     - Random Forests

4. **Define a Search Strategy**:
   - **Forward Selection**: Start with no features and add one feature at a time that improves model performance the most.
   - **Backward Elimination**: Start with all features and remove one feature at a time that degrades model performance the least.
   - **Recursive Feature Elimination (RFE)**: Train the model and remove the least important feature iteratively.

5. **Evaluate Model Performance**:
   - Use a performance metric appropriate for regression tasks, such as Mean Squared Error (MSE), Mean Absolute Error (MAE), or R-squared.
   - Use cross-validation to ensure the performance is robust and not overfitted to the training data.

6. **Implement the Wrapper Method**:
   - Here, I will demonstrate using Recursive Feature Elimination (RFE) with a Linear Regression model.


### Explanation:

1. **Sample Dataset Creation**:
   - A sample dataset is created with features such as house size, location, age, bedrooms, bathrooms, and the target variable, price.

2. **Data Preprocessing**:
   - **Categorical Variables**: Convert categorical variables (e.g., 'Location') to numerical values using one-hot encoding.
   - **Scaling**: Standardize numerical features to ensure they are on a similar scale.

3. **Feature and Target Definition**:
   - Split the data into features (`X`) and target (`y`).

4. **Train-Test Split**:
   - Split the data into training and test sets.

5. **Recursive Feature Elimination (RFE)**:
   - **Model Initialization**: Use Linear Regression as the base model.
   - **RFE Initialization**: Initialize RFE to select the top 5 features.
   - **RFE Fitting**: Fit RFE to the training data.
   - **Feature Selection**: Extract and print the selected features.
   - **Model Evaluation**: Evaluate the model using cross-validation and print the scores.

### Key Points:

- **Wrapper Methods** are computationally expensive because they involve training and evaluating models multiple times for different subsets of features.
- **RFE** is an iterative process that removes the least important feature based on model performance until the desired number of features is reached.
- **Cross-validation** ensures that the feature selection is robust and generalizes well to unseen data.


In [5]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import RFE
from sklearn.preprocessing import StandardScaler

# Create a sample dataset
np.random.seed(42)
data = pd.DataFrame({
    'Size': np.random.randint(500, 3500, size=100),
    'Location': np.random.choice(['A', 'B', 'C', 'D'], size=100),
    'Age': np.random.randint(1, 100, size=100),
    'Bedrooms': np.random.randint(1, 6, size=100),
    'Bathrooms': np.random.randint(1, 4, size=100),
    'Price': np.random.randint(100000, 1000000, size=100)
})

# Preprocess data
data = pd.get_dummies(data, columns=['Location'], drop_first=True)

# Define features and target
X = data.drop(columns=['Price'])
y = data['Price']

# Scale numerical features
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the model
model = LinearRegression()

# Initialize RFE
rfe = RFE(model, n_features_to_select=5)

# Fit RFE
rfe.fit(X_train, y_train)

# Get the selected features
selected_features = [f for f, s in zip(data.drop(columns=['Price']).columns, rfe.support_) if s]
print(f"Selected Features by RFE: {selected_features}")

# Evaluate model with selected features
scores = cross_val_score(model, X_train[:, rfe.support_], y_train, cv=5, scoring='neg_mean_squared_error')
print(f"RFE Cross-Validation Scores: {scores}")
print(f"RFE Mean CV Score: {scores.mean()}")


Selected Features by RFE: ['Size', 'Age', 'Location_B', 'Location_C', 'Location_D']
RFE Cross-Validation Scores: [-1.12118143e+11 -6.37291349e+10 -7.59234649e+10 -6.08566543e+10
 -9.43256003e+10]
RFE Mean CV Score: -81390599585.91771
