# Removing  features 
Can help reduce noise, improve model performance, and reduce computational complexity.
Always analize if it makes sense or not erase the value for example, in a dataset of customer transactions, a feature like "discount applied" might have a low variance if discounts are rare, but it could still be important.For example, a feature like "customer ID" might have high variance but is not useful for modeling.

### Erasing low-variance features
Low-variance features are those that have little to no variability in their values, meaning they provide little to no useful information for the model.

Start with a conservative threshold (e.g., 0.0) and gradually increase it while monitoring model performance.
Use cross-validation to evaluate the impact of removing low-variance features.

In [None]:
import pandas as pd
from sklearn.feature_selection import VarianceThreshold

# Sample DataFrame with low-variance features
data = {
    'feature1': [1, 1, 1, 1, 1],  # Low variance (all values are the same)
    'feature2': [1, 2, 3, 4, 5],  # High variance
    'feature3': [0, 0, 0, 0, 0],  # Zero variance
    'feature4': [10, 20, 30, 40, 50]  # High variance
}

# Create DataFrame
df = pd.DataFrame(data)

# Display the original DataFrame
print("Original DataFrame:")
print(df)

# Step 1: Initialize VarianceThreshold
# Set a threshold for variance (e.g., 0.0 to remove features with zero variance) A variance of 0.1 indicates that the values of the feature are clustered closely around the mean.
selector = VarianceThreshold(threshold=0.1)

# Step 2: Fit and transform the data
df_cleaned = selector.fit_transform(df) #Applies the variance threshold and removes low-variance features.

# Step 3: Get the selected feature names
selected_features = df.columns[selector.get_support()] #returns a boolean mask of the selected features.

# Create a new DataFrame with the selected features
df_cleaned = pd.DataFrame(df_cleaned, columns=selected_features)

# Display the cleaned DataFrame
print("\nDataFrame after removing low-variance features:")
print(df_cleaned)

Original DataFrame:
   feature1  feature2  feature3  feature4
0         1         1         0        10
1         1         2         0        20
2         1         3         0        30
3         1         4         0        40
4         1         5         0        50

DataFrame after removing low-variance features:
   feature2  feature4
0         1        10
1         2        20
2         3        30
3         4        40
4         5        50
