# Feature Selection for Machine Learning Model

## Problem Statement:
### In a machine learning project, selecting the right subset of features is crucial for improving model accuracy and reducing computation time. We want to maximize model accuracy while minimizing the number of selected features using Linear Programming (LP).

## Business Scenario:
### A company is developing a predictive model to classify customer churn. There are 10 available features, but using all of them may lead to overfitting and unnecessary computational costs. We want to select the best features while ensuring the model accuracy remains high.

## Optimization Goal:
### Maximize accuracy while selecting a limited number of features (at most 5 features).
### Each feature contributes a different weight to accuracy.
### The total cost of selected features should not exceed a predefined budget.

In [2]:
from pulp import LpMaximize, LpProblem, LpVariable, lpSum

# Define feature contributions to accuracy (weights)
accuracy_weights = [0.12, 0.18, 0.15, 0.10, 0.22, 0.30, 0.25, 0.16, 0.14, 0.20]

# Define feature costs (example)
feature_costs = [5, 7, 6, 4, 9, 12, 10, 6, 5, 8]

# Define the budget constraint
budget = 25

# Define the LP problem
problem = LpProblem("Feature_Selection_Optimization", LpMaximize)

# Define binary decision variables for selecting features
x = [LpVariable(f"x{i}", cat="Binary") for i in range(10)]

# Define the objective function (maximize accuracy)
problem += lpSum(accuracy_weights[i] * x[i] for i in range(10)), "Total_Accuracy"

# Constraint: At most 5 features can be selected
problem += lpSum(x) <= 5, "Feature_Limit"

# Constraint: Budget should not be exceeded
problem += lpSum(feature_costs[i] * x[i] for i in range(10)) <= budget, "Budget_Constraint"

# Solve the problem
problem.solve()

# Print selected features
selected_features = [i+1 for i in range(10) if x[i].value() == 1]
print(f"Selected Features: {selected_features}")
print(f"Maximum Achievable Accuracy: {problem.objective.value()}")


Selected Features: [3, 8, 9, 10]
Maximum Achievable Accuracy: 0.65


## Expected Outcome & Insights:
### The solver will select the best 5 features that maximize model accuracy while keeping the total cost within the budget.
### This optimization helps data scientists automate feature selection, improving model performance and reducing unnecessary computational complexity.
### The selected features can be used to train a machine learning model, reducing overfitting and enhancing interpretability.