# Practical 4: Particle Swarm Optimisation for Feature Selection

In today's lab, we will learn how to perform feature selection using particle swarm optimistation, utilising the __[pyswarms](https://pyswarms.readthedocs.io/en/latest/index.html)__ toolkit.


The aim of feature selection is to automatically select suitable data features suitable for use in AI models and algorithms. This is an important process before model training (covered in machine learning and deep learning parts of the course) as too many or redundant features can negatively impact the learning and accuracy of the model.

<hr style="border:1px solid black"> </hr>

## Step 1: Install and importing relevant modules 

In [1]:
pip install pyswarms

Collecting pyswarms
  Downloading pyswarms-1.3.0-py2.py3-none-any.whl (104 kB)
[K     |████████████████████████████████| 104 kB 3.6 MB/s eta 0:00:01
Collecting future
  Downloading future-0.18.2.tar.gz (829 kB)
[K     |████████████████████████████████| 829 kB 5.7 MB/s eta 0:00:01
[?25hCollecting pyyaml
  Using cached PyYAML-6.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (701 kB)
Building wheels for collected packages: future
  Building wheel for future (setup.py) ... [?25ldone
[?25h  Created wheel for future: filename=future-0.18.2-py3-none-any.whl size=491058 sha256=1b883fcd5e71d697ed3fcbc55bffa456c3901b33c35ec6a87c77db76e5bf6a83
  Stored in directory: /home/changhyun/.cache/pip/wheels/8e/70/28/3d6ccd6e315f65f245da085482a2e1c7d14b90b30f239e2cf4
Successfully built future
Installing collected packages: future, pyyaml, pyswarms
Successfully installed future-0.18.2 pyswarms-1.3.0 pyyaml-6.0
Note: you may need to restart the kernel 

In [2]:
import numpy as np
import time
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

import pyswarms as ps

<hr style="border:1px solid black"> </hr>

## Step 2: Create dataset with redundant features

Use sklearn's `make_classification` function to create a dataset with a large number of redundant features

In [7]:
# Define the number of features for each type
n_features = 200
n_informative = 20
n_redundant = 100
n_repeated = 50
n_useless = 30

# Create Labels
informative_labels = [f'Informative {ii}' for ii in range(1, n_informative + 1)]
redundant_labels = [f'Redundant {ii}' for ii in range(n_informative + 1, n_informative + n_redundant + 1)]
repeated_labels = [f'Repeated {ii}' for ii in range(n_informative + n_redundant+ 1, n_informative + n_redundant + n_repeated + 1)]
useless_labels = [f'Useless {ii}' for ii in range(n_informative + n_redundant + n_repeated + 1, n_features + 1)]
labels = informative_labels + redundant_labels + repeated_labels + useless_labels

# Get data
X, y = make_classification(n_samples = 5000, n_features = n_features,
                           n_informative = n_informative,
                           n_redundant = n_redundant , n_repeated = n_repeated,
                           n_clusters_per_class = 2, class_sep = 0.5, flip_y = 0.05,
                           random_state = 42, shuffle = False)


In role of the code in the next cell block is to standardise the data

**Question** What does this do to the data, and why might this be important for feature selection?



In [9]:
from sklearn import preprocessing
scaler = preprocessing.StandardScaler().fit(X)
X = scaler.transform(X)

<hr style="border:1px solid black"> </hr>

### Step 3: Set-up the objective function

We’ll be using the optimizer `pyswarms.discrete.BinaryPSO` to perform feature subset selection. 

For a Binary PSO, the position of the particles are expressed in two terms: 1 or 0 (or on and off). Mathematically this is defined as: 
                    $x=[x_{1},x_{2},x_{3},…,x_{d}]$   where   $x_{i}\in {0,1}$

The objective function we will be using is taken from this paper and is define as:

$f(X)=\alpha(1−P)+(1−\alpha)(1− \dfrac{N_{f}}{N_{t}})$

Where $\alpha$ is a hyperparameter tradeoffs the performance of classifier $P$, with the ratio of the size of the feature subset $N_{f}$ with respect to the total number of features $N_{t}$.

**First** we need to define a function which calcualtes the objective function per particle. 

It return the object loss score for a *single* particle

 Do you understand the purpose of line in the function? 

In [None]:
"""Computes for the objective function per particle

Inputs
------
m : numpy.ndarray
    Binary mask that can be obtained from BinaryPSO, will
    be used to mask features.
alpha: float (default is 0.5)
    Constant weight for trading-off classifier performance
    and number of features

Returns
-------
numpy.ndarray
    Computed objective function
"""

# Create an instance of the classifier
classifier = RandomForestClassifier(max_depth=2,n_estimators=10)

# Define objective function
def f_per_particle(m, alpha):

    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
    
    # Get the subset of the features from the binary mask
    if np.count_nonzero(m) == 0:
        X_train_subset = X_train
        X_test_subset = X_test
    else:
        X_train_subset = X_train[:,m==1]
        X_test_subset = X_test[:,m==1]
        
    # Perform classification and store performance in P
    classifier.fit(X_train_subset, y_train)
    P = (classifier.predict(X_test_subset) == y_test).mean()

    # Compute for the objective function
    j = (alpha * (1.0 - P)
        + (1.0 - alpha) * (1 - (X_train_subset.shape[1] /X_train.shape[1])))
    
    return j

**Next** we define the higher level objective function to be evaluated in pyswarms built in optimiser. 

It returns the object loss score for *all* particles

In [None]:
"""Higher-level method to do classification in the
whole swarm.

Inputs
------
M: numpy.ndarray of shape (n_particles, dimensions)
    The swarm that will perform the search

Returns
-------
numpy.ndarray of shape (n_particles, )
    The computed loss for each particle
"""

def f(M, alpha=0.5):

    n_particles = M.shape[0]
    
    j = [f_per_particle(M[i], alpha) for i in range(n_particles)]

    return np.array(j)

<hr style="border:1px solid black"> </hr>

## Step 3: Set-up the optimisation parameters and run the selections algorithm

In the `options` dictionary below: 
- `k`represents the neighbors to be considered when calculating the best known position of the swarm
- `p`represents a distance metric used in the optimisation algorithm. 

**Question** What are the purpose of the other three parameters? 

In [None]:
# Initialize swarm, arbitrary
options = {'c1': 0.5, 'c2': 0.5, 'w':0.9, 'k': 2, 'p':2}

# Call instance of PSO
dimensions = X.shape[1] # dimensions should be the number of features

optimizer = ps.discrete.BinaryPSO(n_particles=60, dimensions=dimensions, options=options)

# Perform optimization
tic = time.perf_counter()
cost, pos = optimizer.optimize(f, iters=20, verbose=2)
toc = time.perf_counter()
print(f"Optimiser ran for {toc - tic:0.4f} seconds")
selected = np.count_nonzero(pos)
print("Optimiser retained " + str(selected)   + " features")

<hr style="border:1px solid black"> </hr>

## Step 4: Evalute the performamce of the selected features

We evaluate the effectiveness of the selected features, we can use two Random Forest Classifiers, one trained on the full feature set, the other trained on the selected feature sets, and compare the outputs

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

c1 = RandomForestClassifier(max_depth=2, n_estimators=10, random_state=42)
c1.fit(X_train, y_train)
full_performance = (c1.predict(X_test) == y_test).mean()
print('Full Feature set performance: %.3f' % (full_performance))

c2 = RandomForestClassifier(max_depth=2, n_estimators=10, random_state=42)
c2.fit(X_train[:,pos==1], y_train)
subset_performance = (c2.predict(X_test[:,pos==1]) == y_test).mean()
print('Subset Feature set performance: %.3f' % (subset_performance))

<hr style="border:1px solid black"> </hr>

## Exercise 1

Observe how altering key parameters changes the performance and running time of the optimiser.

Parameters to consider altering include
- alpha
- Swarm options
- Number of particles
- Number of optimiser iterations

<hr style="border:1px solid black"> </hr>

## Exercise 2

Implement feature selection on the *Breast Cancer Wisconsin Diagnostic Dataset*. The features present in this dataset are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. The features are labelled as coming from either a malignant or benign sample.

The data can be imported directly from sklearn

In [None]:
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()
print(data.DESCR) 

X, y = data.data, data.target

<hr style="border:1px solid black"> </hr>