### Feature Selection With GA
#### Adel Ahmadi

##### In this step, we try to use Genetic algorithm for feature selection. first, lets load data and our most accurate model from previous steps.

In [1]:
pip install kagglehub

Note: you may need to restart the kernel to use updated packages.


In [4]:
# Loading Dataset
import kagglehub

# Download latest version
path = kagglehub.dataset_download("paperxd/cleaned-life-expectancy-dataset")

print("Path to dataset files:", path)
print("File name: Cleaned-Life-Exp.csv")

Path to dataset files: /Users/adelahmadi/.cache/kagglehub/datasets/paperxd/cleaned-life-expectancy-dataset/versions/1
File name: Cleaned-Life-Exp.csv


In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
filename = path + '/Cleaned-Life-Exp.csv'
filename
data = pd.read_csv(filename)
# Binning countinues life expectancy
bins = [-4,-1.5,0.5,3]
labels = ['Low','Medium','High']
data['lifeexp_category'] = pd.cut(data['Life expectancy'], bins = bins, labels = labels)
X = data.drop(['Life expectancy','lifeexp_category'],axis=1)
column_names = data.columns
X = pd.get_dummies(X, columns=['Country'], drop_first=True)
Y = data['lifeexp_category']
from sklearn.model_selection import train_test_split
X_Train, X_Test, Y_Train, Y_Test = train_test_split(X,Y,test_size = .3, random_state = 20)
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
scaler = StandardScaler()
X_Train_Scaled = scaler.fit_transform(X_Train)
X_Test_Scaled = scaler.transform(X_Test)
model = SVC(kernel='rbf', C=100, gamma=0.1)
model.fit(X_Train_Scaled, Y_Train)
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
y_predicted = model.predict(X_Test_Scaled)
print(classification_report(Y_Test, y_predicted))
svm_accuracy = accuracy_score(Y_Test, y_predicted)
print('SVM Accuracy:',svm_accuracy)

              precision    recall  f1-score   support

        High       0.93      0.92      0.93       317
         Low       0.84      0.71      0.77        86
      Medium       0.90      0.93      0.91       479

    accuracy                           0.91       882
   macro avg       0.89      0.85      0.87       882
weighted avg       0.91      0.91      0.90       882

SVM Accuracy: 0.9058956916099773


Lets move on to feature selection with GA. First, we install sklearn-genetic.

In [6]:
pip install sklearn-genetic

Note: you may need to restart the kernel to use updated packages.


In [8]:
from genetic_selection import GeneticSelectionCV
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score

# SVM model
model = SVC(kernel='rbf', C=100, gamma=0.1)

# Genetic Algorithm for Feature Selection
ga_selector = GeneticSelectionCV(
    estimator=model,
    cv=5,
    verbose=1,
    scoring="accuracy",
    max_features=min(10, X_Train_Scaled.shape[1]),  # Limiting max features
    n_population=50,  # Number of individuals in the population
    n_generations=50,  # Number of generations
    crossover_proba=0.5,  # Probability of crossover
    mutation_proba=0.2,  # Probability of mutation
    n_jobs=-1  # Use all available cores
)

# Fit GA selector
ga_selector = ga_selector.fit(X_Train_Scaled, Y_Train)

# After GA feature selection, use the selected features (columns) from the original dataset (X_Train)
selected_features = X.columns[ga_selector.support_]
print("Selected Features Recommended by GA:", selected_features)

# Select columns based on the selected features from the original DataFrame (before scaling)
X_Train_Selected = X_Train[selected_features]
X_Test_Selected = X_Test[selected_features]

# Scale the selected features
X_Train_Selected_Scaled = scaler.fit_transform(X_Train_Selected)
X_Test_Selected_Scaled = scaler.transform(X_Test_Selected)

# Train SVM on the selected features
model.fit(X_Train_Selected_Scaled, Y_Train)
y_predicted_ga = model.predict(X_Test_Selected_Scaled)

# Evaluate the model
print(classification_report(Y_Test, y_predicted_ga))
svm_ga_accuracy = accuracy_score(Y_Test, y_predicted_ga)
print("SVM Model Accuracy After GA Feature Selection:", svm_ga_accuracy)

Selecting features with genetic algorithm.
gen	nevals	avg                            	std                            	min                            	max                               
0  	50    	[ 0.565397  5.22      0.008447]	[ 0.051547  2.900276  0.007978]	[ 0.526264  1.        0.00098 ]	[  0.704766  10.         0.031735]
1  	33    	[-2199.524243     9.14      2200.01172 ]	[ 4142.715703     5.051772  4142.456811]	[-10000.            1.            0.002715]	[     0.704766     24.        10000.      ]
2  	23    	[-2199.496588     8.34      2200.009679]	[ 4142.73039      4.777489  4142.457895]	[-10000.            1.            0.002715]	[     0.706715     23.        10000.      ]
3  	29    	[-1599.436004     7.3       1600.007846]	[ 3666.306704     5.85235   3666.057132]	[-10000.           1.           0.00098]   	[     0.720329     27.        10000.      ]
4  	34    	[-1599.415332     5.82      1600.006181]	[ 3666.315726     4.568107  3666.057858]	[-10000.            2.            0.0