## Ensemble Feature Selection Tutorial
Welcome to the Ensemble Feature Selection Tutorial! In this tutorial, we will guide you through the straightforward process of using a multi-objective optimization approach to select features.

## Example dataset

To illustrate our method we will use the wine dataset from sklearn

In [1]:
from sklearn.datasets import load_wine
import pandas as pd

# Load Wine dataset
wine_data = load_wine()
wine_df = pd.DataFrame(data= wine_data.data, columns= wine_data.feature_names)
wine_df['target'] = wine_data.target
print(wine_df.head())

   alcohol  malic_acid   ash  alcalinity_of_ash  magnesium  total_phenols  \
0    14.23        1.71  2.43               15.6      127.0           2.80   
1    13.20        1.78  2.14               11.2      100.0           2.65   
2    13.16        2.36  2.67               18.6      101.0           2.80   
3    14.37        1.95  2.50               16.8      113.0           3.85   
4    13.24        2.59  2.87               21.0      118.0           2.80   

   flavanoids  nonflavanoid_phenols  proanthocyanins  color_intensity   hue  \
0        3.06                  0.28             2.29             5.64  1.04   
1        2.76                  0.26             1.28             4.38  1.05   
2        3.24                  0.30             2.81             5.68  1.03   
3        3.49                  0.24             2.18             7.80  0.86   
4        2.69                  0.39             1.82             4.32  1.04   

   od280/od315_of_diluted_wines  proline  target  
0          

## EFS pipeline

To run the Ensemble Feature Selection Pipeline, you need to define several key parameters. For detailed information on each parameter, please refer to the documentation here or consult the Getting Started tutorial here for an in-depth exploration.

Now, we will outline a basic configuration of parameters and provide a simple demonstration of how to execute the pipeline to select features effectively.

Let's first import the main class and define the necessary parameters: 

In [None]:
from moosefs import FeatureSelectionPipeline

fs_methods = [
    "f_statistic_selector",
    "random_forest_selector",
    "mutual_info_selector",
    "xgboost_selector",
    "svm_selector"
]
merging_strategy = "union_of_intersections_merger"
num_repeats = 5
task = "classification"
num_features_to_select = 6


Now that we have defined the necessary parameters, we can proceed to initialize a pipeline object:

In [3]:
pipeline = FeatureSelectionPipeline(
    data=wine_df,
    fs_methods=fs_methods,
    merging_strategy=merging_strategy,
    num_repeats=num_repeats,
    task=task,
    num_features_to_select=num_features_to_select
)

You can directly execute the pipeline using the .run() method or by calling the pipeline object itself—it's that simple! This execution will return a list of the selected features, along with the best repeat and the group name associated with those selected features.

In [4]:
# Example of running the pipeline
selected_features, best_repeat_index, best_group_name = pipeline.run()
print("Results of the Feature Selection Pipeline:\n")
print("Selected Features:")
for index, feature in enumerate(selected_features, start=1):
    print(f"{index}. {feature}")

print("\nBest Group Name:", best_group_name)
print("Best Repeat Index:", best_repeat_index)

Pipeline Progress: 100%|██████████| 5/5 [02:25<00:00, 29.12s/it]

Results of the Feature Selection Pipeline:

Selected Features:
1. hue
2. proline
3. malic_acid
4. od280/od315_of_diluted_wines
5. alcohol
6. flavanoids
7. color_intensity

Best Group Name: ('FStatistic', 'RandomForest', 'MutualInfo', 'XGBoost', 'SVM')
Best Repeat Index: 3





## Conclusion
Thank you for following this tutorial on ensemble feature selection. Utilize these techniques to improve the robustness and performance of your feature selection tasks.