## Random Forest Model using Nevergrad gradient-free optimizer for hypertuning
This notebook aims to compare the performance of RF models using gridsearchcv hyperparameters tuning vs using Nevergrad. The experiment below is very "lite" and could only serve as a basis for further experimentation. This notebook is explained in a Toky Axel's Medium article : 

In [1]:
# Needed packages
#!pip install tqdm
#!pip install nevergrad

In [2]:
import tqdm
import pandas as pd
import nevergrad as ng
from scipy.io.arff import loadarff 
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import roc_auc_score

### 1. Dataset sampling
More details about the dataset : https://www.openml.org/search?type=data&id=1494&sort=runs&status=active

In [3]:
raw = loadarff('qsar-biodeg.arff')
data = pd.DataFrame(raw[0])
data = data.replace({'Class': {b'2': 1, b'1': 0}})

In [7]:
data.shape

(1055, 42)

In [8]:
X_train, X_test, y_train, y_test = train_test_split(data.drop("Class",axis=1), data.Class, test_size = 0.33, random_state = 123)

### 2. Model training using gridsearchcv hypertuning
More details about the model : https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

In [6]:
y_train.shape

(706,)

In [7]:
y_train.value_counts()

Class
0    471
1    235
Name: count, dtype: int64

In [8]:
y_test.value_counts()

Class
0    228
1    121
Name: count, dtype: int64

In [9]:
tuned_parameters = {
    'n_estimators': [10, 100, 200], 
    'max_depth' : [None, 10, 20], 
    'min_samples_split' : range(2,50,20),
    'max_features' :['sqrt','log2', None]
}
clf = GridSearchCV(RandomForestClassifier(random_state=42), tuned_parameters, cv=5, scoring="roc_auc")
clf.fit(X_train, y_train)

In [10]:
clf.best_params_

{'max_depth': 10,
 'max_features': 'log2',
 'min_samples_split': 2,
 'n_estimators': 200}

In [11]:
clf.best_score_ 

0.9184739462962522

In [17]:
best_clf = RandomForestClassifier(random_state=42,**clf.best_params_)
best_clf.fit(X_train, y_train)

#### Evaluate the model on Test set & Train set :  
The train AUC score below can lead us to overfitting

In [18]:
roc_auc_score(y_test, best_clf.predict_proba(X_test)[:,1])

0.9464622299550529

In [19]:
roc_auc_score(y_train, best_clf.predict_proba(X_train)[:,1])

0.9999819307042508

### 3. Model training using nevergrad
More details about the method : https://facebookresearch.github.io/nevergrad/machinelearning.html

We build a custom score that prevents us from overfitting : 
$$score = 1-auc\_on\_test + (w*abs((1-auc\_on\_train)-(auc\_on\_test)))$$
where $$ w = fixed\_weight\ (example\ 3) $$

In [14]:
def train_and_return_test_error(params, X_train, y_train, X_test, y_test):
    rf_clf = RandomForestClassifier(random_state=42, **params)
    rf_clf.fit(X_train,y_train)
    score_on_test = 1-roc_auc_score(y_test, rf_clf.predict_proba(X_test)[:,1])
    score_on_train = 1-roc_auc_score(y_train, rf_clf.predict_proba(X_train)[:,1])
    return [score_on_test, score_on_train]

# Parametrization is how nevergrad configures the optimizers.
# (https://facebookresearch.github.io/nevergrad/parametrization.html)
parametrization = ng.p.Dict(
    # Discrete params bounded by lower and upper bounds
    n_estimators = ng.p.Scalar(init=10, lower=2, upper=1000).set_integer_casting(),
    max_depth = ng.p.Scalar(init=None, lower=2, upper=1000).set_integer_casting(),
    min_samples_split = ng.p.Scalar(init=2, lower=2, upper=1000).set_integer_casting(),
    # Choice between given values.
    max_features=ng.p.Choice(['sqrt','log2', None]),
)

budget = 300  # How many trainings we will do before concluding.

# List of some available optimizers
names = ["RandomSearch", "TwoPointsDE", "CMA", "PSO", "ScrHammersleySearch"]

In [15]:
all_recommendation = {}

for name in names:
    optim = ng.optimizers.registry[name](parametrization=parametrization, 
                                         budget=budget, 
                                         num_workers=3 # using 3 processes
                                        )
    
    # Evaluate the model each 3 iterations
    for u in tqdm.tqdm(range(budget // 3)):
        # Ask and tell can be asynchronous.
        # Just be careful that you "tell" something that was asked.
        # Here we ask 3 times and tell 3 times in order to fake asynchronicity
        x1 = optim.ask()
        x2 = optim.ask()
        x3 = optim.ask()
        # The three following lines could be parallelized.
        # We could also do things asynchronously, i.e. do one more ask
        # as soon as a training is over.
        y1 = train_and_return_test_error(x1.value, X_train, y_train, X_test, y_test)
        y2 = train_and_return_test_error(x2.value, X_train, y_train, X_test, y_test)
        y3 = train_and_return_test_error(x3.value, X_train, y_train, X_test, y_test)
        # We want to minimize the test AUC 
        # but also the difference between train and test AUC
        # here we use Fixed Weight (3 times) for the difference
        y1_hat = y1[0]+(3*(abs(y1[1]-y1[0])))
        y2_hat = y2[0]+(3*(abs(y2[1]-y2[0])))
        y3_hat = y3[0]+(3*(abs(y3[1]-y3[0])))
        # Tell to optimizers the value of the loss function
        optim.tell(x1, y1_hat)
        optim.tell(x2, y2_hat)
        optim.tell(x3, y3_hat)
        
    recommendation = optim.recommend()
    score = train_and_return_test_error(recommendation.value, X_train, y_train, X_test, y_test)
    all_recommendation.update({name:recommendation.value})
    
    print(name, " provides a vector of parameters", 
          recommendation.value," with AUC on test :",
          1-score[0]," (AUC on train  ",1-score[1],")")

100%|████████████████████████████████████████████████████████████████████████████████| 100/100 [09:44<00:00,  5.84s/it]


RandomSearch  provides a vector of parameters {'n_estimators': 80, 'max_depth': 701, 'min_samples_split': 92, 'max_features': 'log2'}  with AUC on test : 0.9215963462374945  (AUC on train   0.9326963906581741 )


100%|████████████████████████████████████████████████████████████████████████████████| 100/100 [08:43<00:00,  5.24s/it]


TwoPointsDE  provides a vector of parameters {'n_estimators': 65, 'max_depth': 291, 'min_samples_split': 97, 'max_features': 'log2'}  with AUC on test : 0.9185515441496304  (AUC on train   0.927898992636762 )


100%|████████████████████████████████████████████████████████████████████████████████| 100/100 [08:52<00:00,  5.32s/it]


CMA  provides a vector of parameters {'n_estimators': 218, 'max_depth': 752, 'min_samples_split': 149, 'max_features': 'log2'}  with AUC on test : 0.9033275337103088  (AUC on train   0.906346840131906 )


100%|████████████████████████████████████████████████████████████████████████████████| 100/100 [05:51<00:00,  3.52s/it]


PSO  provides a vector of parameters {'n_estimators': 46, 'max_depth': 1000, 'min_samples_split': 189, 'max_features': 'log2'}  with AUC on test : 0.8912751921125127  (AUC on train   0.891118941139269 )


100%|████████████████████████████████████████████████████████████████████████████████| 100/100 [14:17<00:00,  8.58s/it]

ScrHammersleySearch  provides a vector of parameters {'n_estimators': 21, 'max_depth': 732, 'min_samples_split': 171, 'max_features': 'log2'}  with AUC on test : 0.8914020588661736  (AUC on train   0.8945611419794913 )





#### Evaluate the model on Test set & Train set :

In [16]:
best_clf = RandomForestClassifier(random_state=42,**all_recommendation['TwoPointsDE'])
best_clf.fit(X_train, y_train)

In [17]:
roc_auc_score(y_test, best_clf.predict_proba(X_test)[:,1])

0.9185515441496304

In [18]:
roc_auc_score(y_train, best_clf.predict_proba(X_train)[:,1])

0.927898992636762

### 4. Next steps : 
Add Cross validation method in the optimization process, increase the budget, add a early stopping criterion, etc.