## **OTIMIZAÇÃO DE HIPERPARÂMETROS**

https://catboost.ai/en/docs/concepts/parameter-tuning

**Atributos previsores**

Cost ctr = centro de custo (setor da linha de produção)

Posting date = data de produção

QTY = quantidade de peças produzidas

**Variável alvo**

Employees QTY = quantidade de funcionários

In [14]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, roc_auc_score
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from catboost.core import CatBoostRegressor
from sklearn.model_selection import GridSearchCV

In [2]:
df = pd.read_csv(r"C:\Users\admin\Desktop\Projetos\TG1\zppoutput_processed.csv", delimiter=';')

In [3]:
df.head()

Unnamed: 0,Cost_ctr,Posting_date,QTY,Employees_QTY
0,0,0,1160,11
1,0,1,3330,13
2,0,2,1650,12
3,0,3,1320,18
4,0,4,1490,6


In [4]:
x1 = df.iloc[:,0:3].values
x1

array([[    0,     0,  1160],
       [    0,     1,  3330],
       [    0,     2,  1650],
       ...,
       [    5,  3610, 66944],
       [    5,  3611, 58424],
       [    5,  3612,   300]], dtype=int64)

In [5]:
y = df.iloc[:, 3].values
y

array([ 11,  13,  12, ..., 808, 528,   1], dtype=int64)

## **BASE DE TREINO E TESTE**

In [6]:
x_treino, x_teste, y_treino, y_teste = train_test_split(x1, y, test_size = 0.3, random_state = 10)

In [7]:
x_treino.shape, y_treino.shape

((2529, 3), (2529,))

In [8]:
x_teste.shape, y_teste.shape

((1084, 3), (1084,))

In [15]:
catboost = CatBoostRegressor (random_state = 10)

In [None]:
# Parâmetros que serão utilizados na procura em grid 
param_grid = {'iterations': [100, 150, 200],
        'learning_rate': [0.02, 0.04, 0.06, 0.08, 0.1],
        'depth': [2, 4, 6, 8, 10]}

grid_search = GridSearchCV(catboost, param_grid, scoring='roc_auc', cv=15)
grid_search.fit(x_treino, y_treino)

In [17]:
# Parâmetros que retornaram o melhor resultado
grid_search.best_params_

{'depth': 2, 'iterations': 100, 'learning_rate': 0.02}

In [18]:
# Criando um novo modelo com os parâmetros devolvidos pelo Grid Search
catboost = CatBoostRegressor (iterations=100, learning_rate=0.02, depth=2, random_state = 10)
catboost.fit(x_treino, y_treino)

0:	learn: 222.2290823	total: 620us	remaining: 61.4ms
1:	learn: 218.3965768	total: 1.09ms	remaining: 53.6ms
2:	learn: 214.8101017	total: 1.53ms	remaining: 49.4ms
3:	learn: 211.1354650	total: 1.91ms	remaining: 45.8ms
4:	learn: 207.6710353	total: 2.37ms	remaining: 45ms
5:	learn: 204.1542980	total: 2.84ms	remaining: 44.6ms
6:	learn: 200.8280386	total: 3.29ms	remaining: 43.7ms
7:	learn: 197.4276805	total: 3.71ms	remaining: 42.6ms
8:	learn: 194.2497874	total: 4.13ms	remaining: 41.8ms
9:	learn: 191.0600164	total: 4.58ms	remaining: 41.2ms
10:	learn: 187.8260068	total: 5.07ms	remaining: 41ms
11:	learn: 184.7604022	total: 5.51ms	remaining: 40.4ms
12:	learn: 181.7018523	total: 5.95ms	remaining: 39.8ms
13:	learn: 178.8790035	total: 6.41ms	remaining: 39.4ms
14:	learn: 176.0078438	total: 6.9ms	remaining: 39.1ms
15:	learn: 173.1570842	total: 7.38ms	remaining: 38.8ms
16:	learn: 170.3933690	total: 7.88ms	remaining: 38.5ms
17:	learn: 167.6402548	total: 8.42ms	remaining: 38.4ms
18:	learn: 164.9857773	tot

<catboost.core.CatBoostRegressor at 0x23c9949d810>

In [19]:
# Resultado com dados de treino
catboost.score(x_treino, y_treino)

0.9117250608472487

**TESTE**

In [20]:
# Resultado com dados de teste
catboost.score(x_teste, y_teste)

0.9142231821541681

In [21]:
previsoes_teste = catboost.predict(x_teste)

**Métricas de Desempenho**

In [22]:
# Erro médio Absoluto
mean_absolute_error(y_teste, previsoes_teste)

46.98473320884427

In [23]:
# Raiz do erro quadrático médio (RMSE)
np.sqrt(mean_squared_error(y_teste, previsoes_teste))

66.57464370097469

### **Validação Cruzada**

In [24]:
# Separando os dados em folds
kfold = KFold(n_splits = 15, shuffle=True, random_state = 5)

In [25]:
# Criando o modelo
modelo = CatBoostRegressor (iterations=100, learning_rate=0.02, depth=2, random_state = 10)
resultado = cross_val_score(modelo, x1, y, cv = kfold)
resultado

0:	learn: 222.4655299	total: 621us	remaining: 61.5ms
1:	learn: 218.6246021	total: 1.09ms	remaining: 53.7ms
2:	learn: 214.8545749	total: 1.59ms	remaining: 51.6ms
3:	learn: 211.1727655	total: 2.1ms	remaining: 50.5ms
4:	learn: 207.6702580	total: 2.62ms	remaining: 49.8ms
5:	learn: 204.1411633	total: 3.16ms	remaining: 49.5ms
6:	learn: 200.8098635	total: 3.67ms	remaining: 48.7ms
7:	learn: 197.4022969	total: 4.18ms	remaining: 48.1ms
8:	learn: 194.0580769	total: 4.69ms	remaining: 47.4ms
9:	learn: 190.8416539	total: 5.18ms	remaining: 46.6ms
10:	learn: 187.8567461	total: 5.65ms	remaining: 45.7ms
11:	learn: 184.7153973	total: 6.13ms	remaining: 44.9ms
12:	learn: 181.6647803	total: 6.65ms	remaining: 44.5ms
13:	learn: 178.8130462	total: 7.15ms	remaining: 43.9ms
14:	learn: 175.9417512	total: 7.61ms	remaining: 43.1ms
15:	learn: 173.1524984	total: 8.06ms	remaining: 42.3ms
16:	learn: 170.3825319	total: 8.56ms	remaining: 41.8ms
17:	learn: 167.6758432	total: 9.07ms	remaining: 41.3ms
18:	learn: 165.0221775

array([0.92177441, 0.89902875, 0.92950056, 0.90878412, 0.89279934,
       0.92981746, 0.92522952, 0.92400641, 0.93026921, 0.90135343,
       0.90714232, 0.9138108 , 0.89529444, 0.90156267, 0.91492103])

In [26]:
# Usamos a média e o desvio padrão
print("Coeficiente de Determinação médio: %.2f%%" % (resultado.mean() * 100.0))

Coeficiente de Determinação médio: 91.30%


**REGRESSÃO COM CAT BOOST OTIMIZADO:** R^2 = 0,9117/0,9142; RMSE = 66,57; R^2 Validação Cruzada: 91,30%