# Researching best params for new approach
On this notebook, we try to find the best params for our suggested approach on ICQ classifier.

## Setup

### Importing libraries and helper methods

In [1]:
import sys
import os
sys.path.append(os.path.abspath('../../'))
sys.path.append(os.path.abspath('../../models/'))
sys.path.append(os.path.abspath('../../helpers/'))

In [2]:
import warnings
from sklearn.exceptions import UndefinedMetricWarning

# We're ignoring some warning from sklearn.metrics.classification_report
warnings.simplefilter(action='ignore', category=UndefinedMetricWarning)

In [3]:
import numpy as np
from helpers.utils import print_metrics
from sklearn.model_selection import RandomizedSearchCV, train_test_split, GridSearchCV
from sklearn.multiclass import OneVsRestClassifier
from helpers.database_helpers import get_stratified_kfold, get_iris
from models.icq_estimators import IcqClassifier
from helpers.icq_executions import execute_classifier_split_input_weight_normal_sigma_q, execute_classifier_split_input_weight_polar_sigma_q

### Setting up database, k-fold and random seed

In [4]:
seed = 40
X, y = get_iris()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=seed)
cv = get_stratified_kfold(random_seed=seed)

## Search for best params on sigmaQ = sigmaX + sigmaY + sigmaZ

### Model

In [5]:
icq = OneVsRestClassifier(IcqClassifier(classifier_function=execute_classifier_split_input_weight_normal_sigma_q, max_iter=2000, random_seed=seed, plot_graphs_and_metrics=False))

### Search for Sigma Q Params
First research on the new approach (having weights on U operator and inputs on rho env) - varying SigmaQ params

#### Integer params
On this phase, we're dealing only with integer params, i.e. [1, 1, 1], [2, 2, 2], etc.

In [6]:
params = dict()
params["estimator__sigma_q_weights"] = []
params["estimator__classifier_function"] = [execute_classifier_split_input_weight_normal_sigma_q]
params["estimator__max_iter"] = [2000]
params["estimator__random_seed"] = [seed]
params["estimator__accuracy_succ"] = [1.00]
for i in range(0, 15, 1):
    for j in range(0, 15, 1):
        for k in range(0, 15, 1):
                if not(i == j and j == k and i == 1):
                    params["estimator__sigma_q_weights"].append([i, j, k])

In [7]:
%%time
busca = RandomizedSearchCV(icq, params, n_iter=50, scoring='accuracy', n_jobs=-1, cv=cv, random_state=seed, verbose=60)

# Executa busca
resultado = busca.fit(X_train, y_train)

# Resume resultados
print('Resultados busca - ICQ Alterado')
print('Melhor acurácia: %s' % resultado.best_score_)
print('Melhor hiperparâmetro: %s' % resultado.best_params_)

Fitting 10 folds for each of 50 candidates, totalling 500 fits
Resultados busca - ICQ Alterado
Melhor acurácia: 0.9333333333333332
Melhor hiperparâmetro: {'estimator__sigma_q_weights': [12, 3, 3], 'estimator__random_seed': 40, 'estimator__max_iter': 2000, 'estimator__classifier_function': <function execute_classifier_split_input_weight_normal_sigma_q at 0x000002A9E61C0040>, 'estimator__accuracy_succ': 1.0}
Wall time: 6h 45min 33s


#### Float params
On this phase, we'll deal with Sigma Q params as floats [0,1].

In [8]:
params = dict()
params["estimator__sigma_q_weights"] = []
params["estimator__classifier_function"] = [execute_classifier_split_input_weight_normal_sigma_q]
params["estimator__max_iter"] = [2000]
params["estimator__random_seed"] = [seed]
params["estimator__accuracy_succ"] = [1.00]
for i in np.arange(0, 1, 0.1):
    for j in np.arange(0, 1, 0.1):
        for k in np.arange(0, 1, 0.1):
                if not(i == j and j == k and i == 1):
                    params["estimator__sigma_q_weights"].append([i, j, k])

In [9]:
%%time
busca = RandomizedSearchCV(icq, params, n_iter=50, scoring='accuracy', n_jobs=-1, cv=cv, random_state=seed, verbose=60)

# Executa busca
resultado = busca.fit(X_train, y_train)

# Resume resultados
print('Resultados busca - ICQ Alterado')
print('Melhor acurácia: %s' % resultado.best_score_)
print('Melhor hiperparâmetro: %s' % resultado.best_params_)

Fitting 10 folds for each of 50 candidates, totalling 500 fits
Resultados busca - ICQ Alterado
Melhor acurácia: 0.9583333333333333
Melhor hiperparâmetro: {'estimator__sigma_q_weights': [0.7000000000000001, 0.4, 0.0], 'estimator__random_seed': 40, 'estimator__max_iter': 2000, 'estimator__classifier_function': <function execute_classifier_split_input_weight_normal_sigma_q at 0x000002A9E61C0040>, 'estimator__accuracy_succ': 1.0}
Wall time: 5h 38min 36s


### Search for Learning Rates
Since we didn't have any result varying the Sigma Q, let's now vary the learning rate param and see what happens.

#### Small subset of learning rates

In [10]:
params = dict()
params["estimator__sigma_q_weights"] = [[1,1,1]]
params["estimator__classifier_function"] = [execute_classifier_split_input_weight_normal_sigma_q]
params["estimator__max_iter"] = [2000]
params["estimator__random_seed"] = [seed]
params["estimator__accuracy_succ"] = [1.0]
params["estimator__learning_rate"] = [0.1, 0.01, 0.001, 0.2, 0.02, 0.002, 0.0001]

In [11]:
%%time
busca = RandomizedSearchCV(icq, params, n_iter=10, scoring='accuracy', n_jobs=-1, cv=cv, random_state=seed)

# Executa busca
resultado = busca.fit(X_train, y_train)

# Resume resultados
print('Resultados busca - ICQ Alterado')
print('Melhor acurácia: %s' % resultado.best_score_)
print('Melhor hiperparâmetro: %s' % resultado.best_params_)



Resultados busca - ICQ Alterado
Melhor acurácia: 0.9416666666666667
Melhor hiperparâmetro: {'estimator__sigma_q_weights': [1, 1, 1], 'estimator__random_seed': 40, 'estimator__max_iter': 2000, 'estimator__learning_rate': 0.002, 'estimator__classifier_function': <function execute_classifier_split_input_weight_normal_sigma_q at 0x000002A9E61C0040>, 'estimator__accuracy_succ': 1.0}
Wall time: 49min 28s


#### Biggest subset of learning rates
Let's now try with a biggest subset of learning rates, which will take longer, but hopefully will earn better results. We'll try with:
- [0.1, 0.2, ..., 0.9]; +
- [0.01, 0.2, ..., 0.9]; +
- [0.001, 0.002, ..., 0.009]; +
- [0.0001, 0.0002, ..., 0.0009]; +
- [0.00001, 0.00002, ..., 0.00009]; +
- [0.000001, 0.000002, ..., 0.000009];

In [12]:
params = dict()
params["estimator__sigma_q_weights"] = [[1,1,1]]
params["estimator__classifier_function"] = [execute_classifier_split_input_weight_normal_sigma_q]
params["estimator__max_iter"] = [2000]
params["estimator__random_seed"] = [seed]
params["estimator__accuracy_succ"] = [1.0]
params["estimator__learning_rate"] = []
for i in range (1, 10):
    params["estimator__learning_rate"].append(0.1 * i)
    params["estimator__learning_rate"].append(0.01 * i)
    params["estimator__learning_rate"].append(0.001 * i)
    params["estimator__learning_rate"].append(0.0001 * i)
    params["estimator__learning_rate"].append(0.00001 * i)
    params["estimator__learning_rate"].append(0.000001 * i)

In [13]:
%%time
busca = GridSearchCV(icq, params, scoring='accuracy', n_jobs=-1, cv=cv, verbose=10)

# Executa busca
resultado = busca.fit(X_train, y_train)

# Resume resultados
print('Resultados busca - ICQ Alterado')
print('Melhor acurácia: %s' % resultado.best_score_)
print('Melhor hiperparâmetro: %s' % resultado.best_params_)

Fitting 10 folds for each of 54 candidates, totalling 540 fits
Resultados busca - ICQ Alterado
Melhor acurácia: 0.9416666666666667
Melhor hiperparâmetro: {'estimator__accuracy_succ': 1.0, 'estimator__classifier_function': <function execute_classifier_split_input_weight_normal_sigma_q at 0x000002A9E61C0040>, 'estimator__learning_rate': 0.002, 'estimator__max_iter': 2000, 'estimator__random_seed': 40, 'estimator__sigma_q_weights': [1, 1, 1]}
Wall time: 6h 36min 50s


### Search for best learning rate for best Sigma Q param 

In [19]:
params = dict()
params["estimator__sigma_q_weights"] = [[0.7, 0.4, 0.0]]
params["estimator__classifier_function"] = [execute_classifier_split_input_weight_normal_sigma_q]
params["estimator__max_iter"] = [2000]
params["estimator__random_seed"] = [seed]
params["estimator__accuracy_succ"] = [1.0]
params["estimator__learning_rate"] = []
for i in range (1, 10):
    params["estimator__learning_rate"].append(0.1 * i)
    params["estimator__learning_rate"].append(0.01 * i)
    params["estimator__learning_rate"].append(0.001 * i)
    params["estimator__learning_rate"].append(0.0001 * i)
    params["estimator__learning_rate"].append(0.00001 * i)
    params["estimator__learning_rate"].append(0.000001 * i)

In [20]:
%%time
busca = GridSearchCV(icq, params, scoring='accuracy', n_jobs=-1, cv=cv, verbose=10)

# Executa busca
resultado = busca.fit(X_train, y_train)

# Resume resultados
print('Resultados busca - ICQ Alterado')
print('Melhor acurácia: %s' % resultado.best_score_)
print('Melhor hiperparâmetro: %s' % resultado.best_params_)

Fitting 10 folds for each of 54 candidates, totalling 540 fits
Resultados busca - ICQ Alterado
Melhor acurácia: 0.9583333333333333
Melhor hiperparâmetro: {'estimator__accuracy_succ': 1.0, 'estimator__classifier_function': <function execute_classifier_split_input_weight_normal_sigma_q at 0x000002A9E61C0040>, 'estimator__learning_rate': 0.01, 'estimator__max_iter': 2000, 'estimator__random_seed': 40, 'estimator__sigma_q_weights': [0.7, 0.4, 0.0]}
Wall time: 6h 27min 56s


### Search for best Sigma Q using best Learning Rate param

In [21]:
params = dict()
params["estimator__sigma_q_weights"] = []
params["estimator__classifier_function"] = [execute_classifier_split_input_weight_normal_sigma_q]
params["estimator__max_iter"] = [2000]
params["estimator__random_seed"] = [seed]
params["estimator__accuracy_succ"] = [1.00]
params["estimator__learning_rate"] = [0.002]
for i in np.arange(0, 1, 0.1):
    for j in np.arange(0, 1, 0.1):
        for k in np.arange(0, 1, 0.1):
                if not(i == j and j == k and i == 1):
                    params["estimator__sigma_q_weights"].append([i, j, k])

In [22]:
%%time
busca = RandomizedSearchCV(icq, params, n_iter=50, scoring='accuracy', n_jobs=-1, cv=cv, random_state=seed, verbose=60)

# Executa busca
resultado = busca.fit(X_train, y_train)

# Resume resultados
print('Resultados busca - ICQ Alterado')
print('Melhor acurácia: %s' % resultado.best_score_)
print('Melhor hiperparâmetro: %s' % resultado.best_params_)

Fitting 10 folds for each of 50 candidates, totalling 500 fits
Resultados busca - ICQ Alterado
Melhor acurácia: 0.95
Melhor hiperparâmetro: {'estimator__sigma_q_weights': [0.0, 0.7000000000000001, 0.9], 'estimator__random_seed': 40, 'estimator__max_iter': 2000, 'estimator__learning_rate': 0.002, 'estimator__classifier_function': <function execute_classifier_split_input_weight_normal_sigma_q at 0x000002A9E61C0040>, 'estimator__accuracy_succ': 1.0}
Wall time: 5h 39min 37s


## Search for best params on sigmaQ = rx\*sigmaX + ry\*sigmaY + rz\*sigmaZ

In [14]:
icq = OneVsRestClassifier(IcqClassifier(classifier_function=execute_classifier_split_input_weight_polar_sigma_q, max_iter=2000, random_seed=seed, plot_graphs_and_metrics=False))

In [15]:
params = dict()
params["estimator__classifier_function"] = [execute_classifier_split_input_weight_polar_sigma_q]
params["estimator__max_iter"] = [2000]
params["estimator__random_seed"] = [seed]
params["estimator__accuracy_succ"] = [1.00]

In [16]:
params["estimator__sigma_q_weights"] = []
for i in np.arange(0, np.pi, 0.1):
    for j in np.arange(0, np.pi, 0.1):
        params["estimator__sigma_q_weights"].append([1, i, j])

In [17]:
params["estimator__learning_rate"] = []
for i in range (1, 10):
    params["estimator__learning_rate"].append(0.1 * i)
    params["estimator__learning_rate"].append(0.01 * i)
    params["estimator__learning_rate"].append(0.001 * i)
    params["estimator__learning_rate"].append(0.0001 * i)
    params["estimator__learning_rate"].append(0.00001 * i)
    params["estimator__learning_rate"].append(0.000001 * i)

In [18]:
%%time
busca = RandomizedSearchCV(icq, params, n_iter=70, scoring='accuracy', n_jobs=-1, cv=cv, random_state=seed, verbose=60)

# Executa busca
resultado = busca.fit(X_train, y_train)

# Resume resultados
print('Resultados busca - ICQ Alterado')
print('Melhor acurácia: %s' % resultado.best_score_)
print('Melhor hiperparâmetro: %s' % resultado.best_params_)

Fitting 10 folds for each of 70 candidates, totalling 700 fits
Resultados busca - ICQ Alterado
Melhor acurácia: 0.9249999999999998
Melhor hiperparâmetro: {'estimator__sigma_q_weights': [1, 1.2000000000000002, 2.8000000000000003], 'estimator__random_seed': 40, 'estimator__max_iter': 2000, 'estimator__learning_rate': 0.04, 'estimator__classifier_function': <function execute_classifier_split_input_weight_polar_sigma_q at 0x000002A9E61C0160>, 'estimator__accuracy_succ': 1.0}
Wall time: 10h 26min 6s
