# LLM-Based Feature Selection Methods Comparison

This notebook compares different LLM-based feature selection methods with traditional approach.

## Methods Implemented:
1. **Text-based Feature Selection** (Li et al. 2024)
2. **LLM4FS Hybrid Approach** (Li & Xiu 2025) 
3. **Traditional Baselines** (Random Forest, Mutual Information)

## Datasets (All Classification):
- **CMC** (Contraceptive Method Choice) - 3 classes
- **Vehicle** - 4 classes  
- **Electricity** - 2 classes

## Evaluation Metric:
- **Accuracy** for all classification tasks

## Environment Setup & Dependencies

In [1]:
# Import required libraries
import pandas as pd
import numpy as np
import json
import os
import time
import warnings
from typing import List, Dict, Any, Optional, Tuple
from pathlib import Path

# API clients
import anthropic
import openai
from dotenv import load_dotenv

# ML libraries
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.metrics import accuracy_score, roc_auc_score, mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.feature_selection import SelectKBest, f_classif, mutual_info_classif

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

warnings.filterwarnings('ignore')
np.random.seed(42)

print("All imports successful!")

All imports successful!


## API Key Configuration

Create a `.env` file in your thesis directory with your API keys:

```
ANTHROPIC_API_KEY=your_anthropic_key_here
OPENAI_API_KEY=your_openai_key_here
```

In [2]:
# Load environment variables
load_dotenv()

# Check for API keys
anthropic_key = os.getenv('ANTHROPIC_API_KEY')
openai_key = os.getenv('OPENAI_API_KEY')

print("API Key Status:")
print(f"Anthropic API Key: {'Found' if anthropic_key else 'Not found'}")
print(f"OpenAI API Key: {'Found' if openai_key else 'Not found'}")

# Initialize API clients
anthropic_client = None
openai_client = None

if anthropic_key:
    anthropic_client = anthropic.Anthropic(api_key=anthropic_key)
    print("Anthropic client initialized")

if openai_key:
    openai_client = openai.OpenAI(api_key=openai_key)
    print("OpenAI client initialized")

if not anthropic_key and not openai_key:
    print("\nNo API keys found. Please add them to your .env file.")
    print("Create a .env file in your thesis directory with:")
    print("ANTHROPIC_API_KEY=your_key_here")
    print("OPENAI_API_KEY=your_key_here")

API Key Status:
Anthropic API Key: Found
OpenAI API Key: Found
Anthropic client initialized
OpenAI client initialized


## Dataset Loading and Preparation

In [3]:
# Dataset paths
DATASET_DIR = Path("datasets_csv")

# Available datasets from your thesis tasks
AVAILABLE_DATASETS = [
    'cmc', 'connect-4', 'electricity', 'eye_movements', 
    'kc1', 'phoneme', 'pol', 'splice', 'vehicle'
]

def load_dataset(dataset_name: str) -> Tuple[pd.DataFrame, Dict]:
    """
    Load dataset and its metadata.
    
    Args:
        dataset_name: Name of the dataset to load
        
    Returns:
        Tuple of (dataframe, metadata_dict)
    """
    csv_path = DATASET_DIR / f"{dataset_name}.csv"
    metadata_path = DATASET_DIR / f"{dataset_name}_metadata.json"
    
    if not csv_path.exists():
        raise FileNotFoundError(f"Dataset {dataset_name} not found at {csv_path}")
    
    # Load dataset
    df = pd.read_csv(csv_path)
    
    # Load metadata if available
    metadata = {}
    if metadata_path.exists():
        with open(metadata_path, 'r') as f:
            metadata = json.load(f)
    
    return df, metadata

def get_dataset_info(dataset_name: str) -> Dict:
    """
    Get comprehensive information about a dataset.
    """
    df, metadata = load_dataset(dataset_name)
    
    info = {
        'name': dataset_name,
        'shape': df.shape,
        'columns': list(df.columns),
        'dtypes': df.dtypes.to_dict(),
        'missing_values': df.isnull().sum().to_dict(),
        'metadata': metadata,
        'sample_data': df.head(3).to_dict('records')
    }
    
    return info

# Test loading a dataset
print("Available datasets:")
for i, dataset in enumerate(AVAILABLE_DATASETS, 1):
    print(f"{i:2d}. {dataset}")

# Load first dataset as example
if AVAILABLE_DATASETS:
    test_dataset = AVAILABLE_DATASETS[0]
    print(f"\nLoading example dataset: {test_dataset}")
    try:
        df_test, metadata_test = load_dataset(test_dataset)
        print(f"Successfully loaded {test_dataset}")
        print(f"Shape: {df_test.shape}")
        print(f"Columns: {list(df_test.columns)}")
    except Exception as e:
        print(f"Error loading {test_dataset}: {e}")

Available datasets:
 1. cmc
 2. connect-4
 3. electricity
 4. eye_movements
 5. kc1
 6. phoneme
 7. pol
 8. splice
 9. vehicle

Loading example dataset: cmc
Successfully loaded cmc
Shape: (1473, 10)
Columns: ['Wifes_age', 'Wifes_education', 'Husbands_education', 'Number_of_children_ever_born', 'Wifes_religion', 'Wifes_now_working%3F', 'Husbands_occupation', 'Standard-of-living_index', 'Media_exposure', 'target']


## LLM API Interface Classes

In [4]:
class LLMInterface:
    """
    Base interface for LLM API calls.
    """
    
    def __init__(self, provider: str = "anthropic", model: str = None):
        self.provider = provider
        self.model = model or self._get_default_model()
        self.client = self._initialize_client()
        
    def _get_default_model(self) -> str:
        """Get default model for each provider."""
        defaults = {
            "anthropic": "claude-3-haiku-20240307",
            "openai": "gpt-3.5-turbo"
        }
        return defaults.get(self.provider, "claude-3-haiku-20240307")
    
    def _initialize_client(self):
        """Initialize the appropriate API client."""
        if self.provider == "anthropic" and anthropic_client:
            return anthropic_client
        elif self.provider == "openai" and openai_client:
            return openai_client
        else:
            raise ValueError(f"No API client available for provider: {self.provider}")
    
    def call_llm(self, prompt: str, max_tokens: int = 1000, temperature: float = 0.1) -> str:
        """
        Make API call to LLM.
        
        Args:
            prompt: The prompt to send to the LLM
            max_tokens: Maximum tokens to generate
            temperature: Temperature for generation (0.0 to 1.0)
            
        Returns:
            Generated text response
        """
        try:
            if self.provider == "anthropic":
                response = self.client.messages.create(
                    model=self.model,
                    max_tokens=max_tokens,
                    temperature=temperature,
                    messages=[{"role": "user", "content": prompt}]
                )
                return response.content[0].text
            
            elif self.provider == "openai":
                response = self.client.chat.completions.create(
                    model=self.model,
                    max_tokens=max_tokens,
                    temperature=temperature,
                    messages=[{"role": "user", "content": prompt}]
                )
                return response.choices[0].message.content
            
        except Exception as e:
            print(f"API call failed: {e}")
            return None
            
    def test_connection(self) -> bool:
        """
        Test if the LLM API connection is working.
        """
        test_prompt = "Hello! Please respond with 'Connection successful' if you can read this."
        response = self.call_llm(test_prompt, max_tokens=50)
        
        if response:
            print(f"{self.provider.title()} API connection successful")
            print(f"Model: {self.model}")
            print(f"Response: {response[:100]}...")
            return True
        else:
            print(f"{self.provider.title()} API connection failed")
            return False

# Test API connections
print("Testing API connections...\n")

# Test Anthropic if available
if anthropic_client:
    try:
        anthropic_llm = LLMInterface(provider="anthropic")
        anthropic_llm.test_connection()
    except Exception as e:
        print(f"Anthropic connection error: {e}")

print()

# Test OpenAI if available  
if openai_client:
    try:
        openai_llm = LLMInterface(provider="openai")
        openai_llm.test_connection()
    except Exception as e:
        print(f"OpenAI connection error: {e}")

Testing API connections...

Anthropic API connection successful
Model: claude-3-haiku-20240307
Response: Connection successful...

Openai API connection successful
Model: gpt-3.5-turbo
Response: Connection successful...


## Method 1: Text-based Feature Selection (Li et al. 2024)

This method uses dataset descriptions and feature metadata to guide LLM-based feature selection without requiring actual data samples.

In [5]:
class TextBasedFeatureSelector:
    """
    Implements text-based feature selection using LLM semantic understanding.
    Based on Li et al. 2024 - "Exploring Large Language Models for Feature Selection"
    """
    
    def __init__(self, llm_interface: LLMInterface):
        self.llm = llm_interface
        self.feature_scores = {}
        
    def create_dataset_description(self, dataset_info: Dict) -> str:
        """
        Create a comprehensive dataset description for the LLM.
        """
        name = dataset_info['name']
        shape = dataset_info['shape']
        columns = dataset_info['columns']
        metadata = dataset_info.get('metadata', {})
        
        description = f"""
Dataset: {name}
Description: This is a tabular dataset with {shape[0]} samples and {shape[1]} features.
Task: {metadata.get('task_type', 'Classification/Regression')}
Target Variable: {columns[-1] if columns else 'Unknown'}
Domain: {metadata.get('domain', 'General')}

Features:
"""
        
        for i, col in enumerate(columns[:-1]):
            description += f"- {col}: {metadata.get('feature_descriptions', {}).get(col, 'Numerical/Categorical feature')}\n"
            
        return description.strip()
    
    def create_feature_selection_prompt(self, dataset_description: str, feature_name: str) -> str:
        """
        Create a prompt for feature importance scoring.
        """
        prompt = f"""
You are an expert data scientist specializing in feature selection for machine learning.

{dataset_description}

Your task is to evaluate the importance of the feature "{feature_name}" for predicting the target variable.

Please provide:
1. An importance score between 0.0 and 1.0 (where 1.0 is most important)
2. A brief reasoning for your score

Format your response as JSON:
{{
    "feature": "{feature_name}",
    "importance_score": 0.XX,
    "reasoning": "Brief explanation of why this feature is important/unimportant"
}}
"""
        return prompt
    
    def score_feature(self, dataset_info: Dict, feature_name: str) -> Dict:
        """
        Score a single feature using LLM.
        """
        dataset_desc = self.create_dataset_description(dataset_info)
        prompt = self.create_feature_selection_prompt(dataset_desc, feature_name)
        
        response = self.llm.call_llm(prompt, max_tokens=300, temperature=0.1)
        
        if not response:
            return {"feature": feature_name, "importance_score": 0.0, "reasoning": "API call failed"}
        
        try:
            # Try to parse JSON response
            result = json.loads(response)
            return result
        except json.JSONDecodeError:
            # Fallback: extract score from text
            import re
            score_match = re.search(r'"importance_score":\s*([0-9.]+)', response)
            score = float(score_match.group(1)) if score_match else 0.5
            
            return {
                "feature": feature_name,
                "importance_score": score,
                "reasoning": "Parsed from unstructured response"
            }
    
    def select_features(self, dataset_info: Dict, top_k: int = None) -> List[Dict]:
        """
        Select top features using text-based LLM approach.
        
        Args:
            dataset_info: Dataset information dictionary
            top_k: Number of top features to select (None for all)
            
        Returns:
            List of feature scores sorted by importance
        """
        features = dataset_info['columns'][:-1]  # Exclude target variable
        results = []
        
        print(f"Scoring {len(features)} features using text-based approach...")
        
        for i, feature in enumerate(features):
            print(f"{i+1}/{len(features)}: {feature}")
            result = self.score_feature(dataset_info, feature)
            results.append(result)
            
            # Small delay to be respectful to API
            time.sleep(0.5)
        
        # Sort by importance score
        results.sort(key=lambda x: x['importance_score'], reverse=True)
        
        if top_k:
            results = results[:top_k]
            
        self.feature_scores = {r['feature']: r['importance_score'] for r in results}
        
        return results

print("Text-based Feature Selector class defined")

Text-based Feature Selector class defined


## Method 2: LLM4FS Hybrid Approach (Li & Xiu 2025)

This method combines LLM reasoning with traditional statistical methods by providing data samples to the LLM and instructing it to apply classical feature selection techniques.

In [6]:
class LLM4FS_HybridSelector:
    """
    Implements LLM4FS hybrid feature selection approach.
    Based on Li & Xiu 2025 - "LLM4FS: Leveraging Large Language Models for Feature Selection"
    """
    
    def __init__(self, llm_interface: LLMInterface):
        self.llm = llm_interface
        self.methods = [
            "random_forest", "forward_selection", "backward_selection",
            "recursive_feature_elimination", "mutual_information", "mrmr"
        ]
        
    def prepare_data_sample(self, df: pd.DataFrame, sample_size: int = 200) -> str:
        """
        Prepare a data sample for the LLM (typically 200 samples as per paper).
        """
        # Sample data (or use all if less than sample_size)
        if len(df) > sample_size:
            sample_df = df.sample(n=sample_size, random_state=42)
        else:
            sample_df = df.copy()
            
        # Convert to CSV string format
        return sample_df.to_csv(index=False)
    
    def create_hybrid_prompt(self, data_csv: str, task_type: str = "classification") -> str:
        """
        Create prompt for hybrid LLM4FS approach.
        """
        prompt = f"""
You are an expert data scientist. Please analyze the following dataset and apply traditional feature selection methods to rank features by importance.

Dataset (CSV format):
{data_csv}

Task: This is a {task_type} problem. The last column is the target variable.

Please apply the following feature selection methods and provide importance scores:
1. Random Forest feature importance
2. Mutual Information
3. Recursive Feature Elimination (RFE)
4. Forward/Backward Selection

For each feature (excluding the target), provide an importance score between 0.0 and 1.0.

Format your response EXACTLY as JSON with this structure:
{{
    "method": "hybrid_llm4fs",
    "features": [
        {{"name": "feature_name", "importance_score": 0.XX, "reasoning": "Brief explanation"}},
        {{"name": "feature_name", "importance_score": 0.XX, "reasoning": "Brief explanation"}}
    ]
}}

Base your analysis on statistical relationships in the data, not just semantic understanding.
"""
        return prompt
    
    def select_features(self, df: pd.DataFrame, dataset_info: Dict, top_k: int = None) -> List[Dict]:
        """
        Select features using LLM4FS hybrid approach.
        """
        print("Applying LLM4FS hybrid feature selection...")
        
        # Prepare data sample
        data_csv = self.prepare_data_sample(df)
        task_type = dataset_info.get('metadata', {}).get('task_type', 'classification')
        
        # Create prompt
        prompt = self.create_hybrid_prompt(data_csv, task_type)
        
        # Make LLM call with higher token limit
        response = self.llm.call_llm(prompt, max_tokens=3000, temperature=0.1)
        
        if not response:
            print("API call failed")
            return []
        
        try:
            # Parse JSON response
            result = json.loads(response)
            features = result.get('features', [])
            
            # Sort by importance score
            features.sort(key=lambda x: x.get('importance_score', 0), reverse=True)
            
            if top_k:
                features = features[:top_k]
                
            return features
            
        except json.JSONDecodeError as e:
            print(f"Failed to parse LLM response as complete JSON: {e}")
            print("Attempting to extract partial features from response...")
            
            # Try to extract partial JSON from truncated response
            features = self._extract_partial_features(response)
            
            if features:
                print(f"Successfully extracted {len(features)} features from partial response")
                features.sort(key=lambda x: x.get('importance_score', 0), reverse=True)
                
                if top_k:
                    features = features[:top_k]
                    
                return features
            else:
                print("Could not extract features from response")
                print(f"Response preview: {response[:300]}...")
                return []
    
    def _extract_partial_features(self, response: str) -> List[Dict]:
        """
        Extract features from a potentially truncated JSON response.
        """
        import re
        
        features = []
        
        # Try to find feature objects even if JSON is incomplete
        feature_pattern = r'"name":\s*"([^"]+)",\s*"importance_score":\s*([0-9.]+),\s*"reasoning":\s*"([^"]*)"'
        matches = re.findall(feature_pattern, response, re.DOTALL)
        
        for name, score, reasoning in matches:
            try:
                features.append({
                    'name': name,
                    'importance_score': float(score),
                    'reasoning': reasoning[:100] + "..." if len(reasoning) > 100 else reasoning
                })
            except ValueError:
                continue
                
        return features
            
    
    def _extract_partial_features(self, response: str) -> List[Dict]:
        """
        Extract features from a potentially truncated JSON response.
        """
        import re
        
        features = []
        
        # Try to find feature objects even if JSON is incomplete
        feature_pattern = r'"name":\s*"([^"]+)",\s*"importance_score":\s*([0-9.]+),\s*"reasoning":\s*"([^"]*)"'
        matches = re.findall(feature_pattern, response)
        
        for name, score, reasoning in matches:
            try:
                features.append({
                    'name': name,
                    'importance_score': float(score),
                    'reasoning': reasoning
                })
            except ValueError:
                continue
                
        return features

print("LLM4FS Hybrid Selector class defined")

LLM4FS Hybrid Selector class defined


## Method 3: Traditional Baseline Methods

For comparison, we'll also implement traditional feature selection methods as baselines.

In [7]:
class TraditionalFeatureSelector:
    """
    Traditional feature selection methods for comparison.
    """
    
    def __init__(self):
        self.feature_scores = {}
        
    def mutual_information_selection(self, X: pd.DataFrame, y: pd.Series, top_k: int = None) -> List[Dict]:
        """
        Mutual information based feature selection.
        """
        # Handle categorical target
        if y.dtype == 'object':
            le = LabelEncoder()
            y_encoded = le.fit_transform(y)
            scores = mutual_info_classif(X, y_encoded, random_state=42)
        else:
            from sklearn.feature_selection import mutual_info_regression
            scores = mutual_info_regression(X, y, random_state=42)
            
        results = []
        for i, feature in enumerate(X.columns):
            results.append({
                'name': feature,
                'importance_score': scores[i],
                'reasoning': 'Mutual information score'
            })
            
        results.sort(key=lambda x: x['importance_score'], reverse=True)
        
        if top_k:
            results = results[:top_k]
            
        return results
        
    def random_forest_selection(self, X: pd.DataFrame, y: pd.Series, top_k: int = None) -> List[Dict]:
        """
        Random Forest based feature importance.
        """
        # Handle categorical target
        if y.dtype == 'object':
            le = LabelEncoder()
            y_encoded = le.fit_transform(y)
            rf = RandomForestClassifier(n_estimators=100, random_state=42)
        else:
            y_encoded = y
            rf = RandomForestRegressor(n_estimators=100, random_state=42)
            
        rf.fit(X, y_encoded)
        importances = rf.feature_importances_
        
        results = []
        for i, feature in enumerate(X.columns):
            results.append({
                'name': feature,
                'importance_score': importances[i],
                'reasoning': 'Random Forest feature importance'
            })
            
        results.sort(key=lambda x: x['importance_score'], reverse=True)
        
        if top_k:
            results = results[:top_k]
            
        return results

print("Traditional Feature Selector class defined")

Traditional Feature Selector class defined


## Evaluation Framework

Framework for comparing different feature selection methods.

In [8]:
class FeatureSelectionEvaluator:
    """
    Improved evaluator that properly detects classification vs regression tasks.
    """
    
    def __init__(self):
        self.results = {}
        
    def _is_classification_task(self, y: pd.Series) -> bool:
        """
        Determine if this is a classification task based on target variable characteristics.
        """
        # Check if dtype is object (string labels)
        if y.dtype == 'object':
            return True
        
        # Check if it's integer with small number of unique values (likely categorical)
        if pd.api.types.is_integer_dtype(y):
            n_unique = y.nunique()
            n_samples = len(y)
            # If less than 20 unique values OR less than 5% of samples are unique values
            if n_unique <= 20 or (n_unique / n_samples) < 0.05:
                return True
        
        # For continuous variables, assume regression
        return False
        
    def evaluate_method(self, method_name: str, selected_features: List[str], 
                       X: pd.DataFrame, y: pd.Series) -> Dict:
        """
        Evaluate a feature selection method by training a model on selected features.
        """
        if not selected_features:
            return {'error': 'No features selected'}
            
        # Select features
        X_selected = X[selected_features]
        
        # Determine task type
        is_classification = self._is_classification_task(y)
        
        if is_classification:
            # Classification task
            if y.dtype == 'object':
                le = LabelEncoder()
                y_encoded = le.fit_transform(y)
            else:
                y_encoded = y  # Already numeric
            model = LogisticRegression(random_state=42, max_iter=1000)
            scoring = 'accuracy'
        else:
            # Regression task
            y_encoded = y
            model = LinearRegression()
            scoring = 'r2'
            
        # Cross-validation
        scores = cross_val_score(model, X_selected, y_encoded, cv=5, scoring=scoring)
        
        result = {
            'method': method_name,
            'num_features': len(selected_features),
            'selected_features': selected_features,
            'cv_scores': scores.tolist(),
            'mean_score': scores.mean(),
            'std_score': scores.std(),
            'metric': scoring,
            'task_type': 'classification' if is_classification else 'regression'
        }
        
        self.results[method_name] = result
        return result
    
    def compare_methods(self) -> pd.DataFrame:
        """
        Create comparison table of all evaluated methods.
        """
        comparison_data = []
        
        for method_name, result in self.results.items():
            if 'error' not in result:
                comparison_data.append({
                    'Method': method_name,
                    'Num Features': result['num_features'],
                    'Mean Score': result['mean_score'],
                    'Std Score': result['std_score'],
                    'Metric': result['metric']
                })
                
        df = pd.DataFrame(comparison_data)
        if not df.empty:
            df = df.sort_values('Mean Score', ascending=False)
            
        return df
    
    def plot_comparison(self):
        """
        Plot comparison of methods.
        """
        df = self.compare_methods()
        
        if df.empty:
            print("No results to plot")
            return
            
        plt.figure(figsize=(12, 6))
        
        # Performance comparison
        plt.subplot(1, 2, 1)
        plt.bar(df['Method'], df['Mean Score'], yerr=df['Std Score'], capsize=5)
        plt.title('Feature Selection Method Performance')
        plt.ylabel(f'Score ({df["Metric"].iloc[0]})')
        plt.xticks(rotation=45)
        
        # Number of features
        plt.subplot(1, 2, 2)
        plt.bar(df['Method'], df['Num Features'])
        plt.title('Number of Selected Features')
        plt.ylabel('Number of Features')
        plt.xticks(rotation=45)
        
        plt.tight_layout()
        plt.show()

print("Improved Feature Selection Evaluator class defined")

Improved Feature Selection Evaluator class defined


## Multi-Dataset Classification Experiment

Comparing all methods on three classification datasets:
1. **CMC** (Contraceptive Method Choice) - 3 classes, social/demographic data
2. **Vehicle** - 4 classes, geometric/visual features 
3. **Electricity** - 2 classes, energy/time series data

All datasets evaluated using **accuracy** as the performance metric.

In [9]:
# Multi-dataset experiment setup
datasets_to_test = ["cmc", "vehicle", "electricity"]
all_results = {}

print("="*80)
print("MULTI-DATASET FEATURE SELECTION EXPERIMENT")
print("="*80)

for dataset_name in datasets_to_test:
    print(f"\n{'='*20} DATASET: {dataset_name.upper()} {'='*20}")
    
    # Load dataset
    df, metadata = load_dataset(dataset_name)
    dataset_info = get_dataset_info(dataset_name)
    
    print(f"Shape: {df.shape}")
    print(f"Features: {len(dataset_info['columns'][:-1])}")
    print(f"Target: {dataset_info['columns'][-1]}")
    
    # Prepare data
    X = df.iloc[:, :-1]
    y = df.iloc[:, -1]
    
    print(f"Target distribution: {dict(y.value_counts().head(3))}")
    
    # Store dataset info
    all_results[dataset_name] = {
        'dataset_info': dataset_info,
        'X': X,
        'y': y,
        'df': df
    }

print(f"\nLoaded {len(datasets_to_test)} datasets successfully!")

MULTI-DATASET FEATURE SELECTION EXPERIMENT

Shape: (1473, 10)
Features: 9
Target: target
Target distribution: {1: np.int64(629), 3: np.int64(511), 2: np.int64(333)}

Shape: (846, 19)
Features: 18
Target: target
Target distribution: {'bus': np.int64(218), 'saab': np.int64(217), 'opel': np.int64(212)}

Shape: (45312, 9)
Features: 8
Target: target
Target distribution: {'DOWN': np.int64(26075), 'UP': np.int64(19237)}

Loaded 3 datasets successfully!


In [10]:
# Initialize methods and results storage
llm = LLMInterface(provider="anthropic")
text_selector = TextBasedFeatureSelector(llm)
hybrid_selector = LLM4FS_HybridSelector(llm)
traditional_selector = TraditionalFeatureSelector()

# Storage for all experiment results
experiment_results = {}
performance_summary = []

print("All methods initialized for multi-dataset experiment")

All methods initialized for multi-dataset experiment


In [11]:
# Run experiments on all datasets
for dataset_name in datasets_to_test:
    print(f"\n{'='*60}")
    print(f"PROCESSING DATASET: {dataset_name.upper()}")
    print(f"{'='*60}")
    
    # Get dataset components
    dataset_info = all_results[dataset_name]['dataset_info']
    X = all_results[dataset_name]['X']
    y = all_results[dataset_name]['y']
    df = all_results[dataset_name]['df']
    
    # Initialize evaluator for this dataset
    evaluator = FeatureSelectionEvaluator()
    dataset_results = {}
    
    print(f"\n1. TEXT-BASED FEATURE SELECTION")
    print("-" * 40)
    try:
        text_results = text_selector.select_features(dataset_info, top_k=5)
        text_features = [r['feature'] for r in text_results]
        
        if text_features:
            print(f"Selected: {text_features}")
            text_eval = evaluator.evaluate_method("Text-based", text_features, X, y)
            dataset_results['text'] = {
                'features': text_features,
                'performance': text_eval
            }
        else:
            print("Text-based method failed")
            dataset_results['text'] = None
            
    except Exception as e:
        print(f"Text-based method error: {e}")
        dataset_results['text'] = None
    
    print(f"\n2. LLM4FS HYBRID APPROACH")
    print("-" * 40)
    try:
        hybrid_results = hybrid_selector.select_features(df, dataset_info, top_k=5)
        hybrid_features = [r['name'] for r in hybrid_results] if hybrid_results else []
        
        if hybrid_features:
            print(f"Selected: {hybrid_features}")
            hybrid_eval = evaluator.evaluate_method("LLM4FS Hybrid", hybrid_features, X, y)
            dataset_results['hybrid'] = {
                'features': hybrid_features,
                'performance': hybrid_eval
            }
        else:
            print("LLM4FS Hybrid method failed")
            dataset_results['hybrid'] = None
            
    except Exception as e:
        print(f"LLM4FS Hybrid method error: {e}")
        dataset_results['hybrid'] = None
    
    print(f"\n3. TRADITIONAL METHODS")
    print("-" * 40)
    try:
        # Random Forest
        rf_results = traditional_selector.random_forest_selection(X, y, top_k=5)
        rf_features = [r['name'] for r in rf_results]
        print(f"Random Forest selected: {rf_features}")
        rf_eval = evaluator.evaluate_method("Random Forest", rf_features, X, y)
        
        # Mutual Information
        mi_results = traditional_selector.mutual_information_selection(X, y, top_k=5)
        mi_features = [r['name'] for r in mi_results]
        print(f"Mutual Information selected: {mi_features}")
        mi_eval = evaluator.evaluate_method("Mutual Information", mi_features, X, y)
        
        dataset_results['random_forest'] = {
            'features': rf_features,
            'performance': rf_eval
        }
        dataset_results['mutual_info'] = {
            'features': mi_features,
            'performance': mi_eval
        }
        
    except Exception as e:
        print(f"Traditional methods error: {e}")
        dataset_results['random_forest'] = None
        dataset_results['mutual_info'] = None
    
    # Store results for this dataset
    experiment_results[dataset_name] = {
        'evaluator': evaluator,
        'results': dataset_results
    }
    
    # Show performance for this dataset
    comparison_df = evaluator.compare_methods()
    if not comparison_df.empty:
        print(f"\n{dataset_name.upper()} PERFORMANCE:")
        print(comparison_df.to_string(index=False))
        
        # Add to summary
        for _, row in comparison_df.iterrows():
            performance_summary.append({
                'Dataset': dataset_name,
                'Method': row['Method'],
                'Score': row['Mean Score'],
                'Std': row['Std Score'],
                'Metric': row['Metric']
            })
    else:
        print(f"\nNo valid results for {dataset_name}")

print(f"\n{'='*60}")
print("ALL DATASETS PROCESSED")
print(f"{'='*60}")


PROCESSING DATASET: CMC

1. TEXT-BASED FEATURE SELECTION
----------------------------------------
Scoring 9 features using text-based approach...
1/9: Wifes_age
2/9: Wifes_education
3/9: Husbands_education
4/9: Number_of_children_ever_born
5/9: Wifes_religion
6/9: Wifes_now_working%3F
7/9: Husbands_occupation
8/9: Standard-of-living_index
9/9: Media_exposure
Selected: ['Wifes_age', 'Wifes_education', 'Standard-of-living_index', 'Husbands_education', 'Number_of_children_ever_born']

2. LLM4FS HYBRID APPROACH
----------------------------------------
Applying LLM4FS hybrid feature selection...
Selected: ['Standard-of-living_index', 'Number_of_children_ever_born', 'Wifes_education', 'Wifes_age', 'Husbands_education']

3. TRADITIONAL METHODS
----------------------------------------
Random Forest selected: ['Wifes_age', 'Number_of_children_ever_born', 'Wifes_education', 'Standard-of-living_index', 'Husbands_occupation']
Mutual Information selected: ['Number_of_children_ever_born', 'Husbands

In [12]:
# Feature Selection Patterns Analysis
print("\n" + "="*60)
print("SELECTED FEATURES BY METHOD AND DATASET")
print("="*60)

for dataset_name in datasets_to_test:
    print(f"\n{dataset_name.upper()} DATASET:")
    print("-" * 30)
    
    if dataset_name in experiment_results:
        results = experiment_results[dataset_name]['results']
        
        for method_name, method_data in results.items():
            if method_data and 'features' in method_data:
                features = method_data['features']
                method_display = method_name.replace('_', ' ').title()
                print(f"{method_display:18}: {features}")
        
        # Find consensus features
        all_features = set()
        method_features = {}
        
        for method_name, method_data in results.items():
            if method_data and 'features' in method_data:
                features = method_data['features']
                method_features[method_name] = set(features)
                all_features.update(features)
        
        if len(method_features) > 1:
            feature_counts = {}
            for feature in all_features:
                count = sum(1 for features in method_features.values() if feature in features)
                feature_counts[feature] = count
            
            consensus_features = [f for f, count in feature_counts.items() if count >= 2]
            if consensus_features:
                print(f"{'Consensus (2+)':18}: {consensus_features}")
    else:
        print("No results available")

print(f"\n{'='*60}")


SELECTED FEATURES BY METHOD AND DATASET

CMC DATASET:
------------------------------
Text              : ['Wifes_age', 'Wifes_education', 'Standard-of-living_index', 'Husbands_education', 'Number_of_children_ever_born']
Hybrid            : ['Standard-of-living_index', 'Number_of_children_ever_born', 'Wifes_education', 'Wifes_age', 'Husbands_education']
Random Forest     : ['Wifes_age', 'Number_of_children_ever_born', 'Wifes_education', 'Standard-of-living_index', 'Husbands_occupation']
Mutual Info       : ['Number_of_children_ever_born', 'Husbands_education', 'Wifes_education', 'Husbands_occupation', 'Wifes_age']
Consensus (2+)    : ['Wifes_education', 'Husbands_education', 'Number_of_children_ever_born', 'Wifes_age', 'Standard-of-living_index', 'Husbands_occupation']

VEHICLE DATASET:
------------------------------
Text              : ['COMPACTNESS', 'CIRCULARITY', 'DISTANCE_CIRCULARITY', 'RADIUS_RATIO', 'PR.AXIS_ASPECT_RATIO']
Hybrid            : ['COMPACTNESS', 'SCALED_VARIANCE_MAJ