<a href="https://colab.research.google.com/github/MussaddikKhan/Data-Science-College-Practicals-/blob/main/Experiment_No_8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Experiment – 8**  
**Date:**  
**Roll No.: 24201013**  
**Title:** *Logistic Regression using Wrapper Selection (Recursive Feature Elimination)*

---

## **Theory**

### **Logistic Regression (Introduction)**  
Logistic Regression is a supervised classification algorithm used to predict **binary outcomes** (0 or 1).  
Instead of predicting continuous values, it predicts the **probability** that an input belongs to a certain class.  

The model uses the **sigmoid function**:

<br>

$$
P(y = 1 \mid x)=\frac{1}{1 + e^{-(b_0+b_1x_1+b_2x_2+\dots +b_nx_n)}}
$$

<br>

### **Decision Rule**
- If probability ≥ 0.5 → **Class 1**  
- If probability < 0.5 → **Class 0**

---

## **Feature Selection – Wrapper Method**

Wrapper methods repeatedly train the model and evaluate performance to choose the best subset of features.

### **Steps**
1. Train model on selected features  
2. Evaluate accuracy  
3. Add/remove features  
4. Select feature subset that gives highest accuracy  

---

## **Recursive Feature Elimination (RFE)**

RFE is a **backward elimination** technique.

### **How RFE Works**
1. Train Logistic Regression model on **all features**  
2. Check feature importance (coefficients)  
3. Remove the **least important feature**  
4. Repeat until only required number of features remain  

---

## **Advantages**
- Improves model accuracy  
- Reduces overfitting  
- Works well on small and medium datasets  

## **Disadvantages**
- Computationally expensive  
- Performance depends on dataset quality  
- Not ideal if dataset has too much noise  

---


In [1]:
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# ---------------------------------------
# Step 1: Create synthetic dataset
# ---------------------------------------
X, y = make_classification(
    n_samples=1000,
    n_features=20,
    n_informative=10,
    n_redundant=5,
    random_state=42
)

# ---------------------------------------
# Step 2: Train-Test Split
# ---------------------------------------
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# ---------------------------------------
# Step 3: Standardize the Data
# ---------------------------------------
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# ---------------------------------------
# Step 4: Initialize Logistic Regression
# ---------------------------------------
estimator = LogisticRegression(solver='liblinear', random_state=42)

# ---------------------------------------
# Step 5: Apply RFE (Recursive Feature Elimination)
# ---------------------------------------
selector = RFE(estimator=estimator, n_features_to_select=10, step=1)
selector.fit(X_train_scaled, y_train)

# Transform data using selected features
X_train_selected = selector.transform(X_train_scaled)
X_test_selected = selector.transform(X_test_scaled)

# ---------------------------------------
# Step 6: Train Final Model
# ---------------------------------------
final_model = LogisticRegression(solver='liblinear', random_state=42)
final_model.fit(X_train_selected, y_train)

# ---------------------------------------
# Step 7: Evaluate
# ---------------------------------------
accuracy = final_model.score(X_test_selected, y_test)
print(f"Model accuracy with selected features: {accuracy:.4f}")

# Feature selection details
print("Selected Features Mask:\n", selector.support_)
print("Feature Ranking:\n", selector.ranking_)


Model accuracy with selected features: 0.8367
Selected Features Mask:
 [False  True  True  True False False False  True False False False  True
 False False  True  True  True  True  True False]
Feature Ranking:
 [ 4  1  1  1  6  2 11  1  8 10  7  1  3  9  1  1  1  1  1  5]
