## **REGRESSÃO COM CATBOOST**

https://catboost.ai/en/docs/concepts/python-reference_catboostregressor 

**Atributos previsores**

Cost ctr = centro de custo (setor da linha de produção)

Posting date = data de produção

QTY = quantidade de peças produzidas

**Variável alvo**

Employees QTY = quantidade de funcionários

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from catboost.core import CatBoostRegressor

In [2]:
df = pd.read_csv(r"C:\Users\admin\Desktop\Projetos\TG1\zppoutput_processed.csv", delimiter=';')

In [3]:
df.head()

Unnamed: 0,Cost_ctr,Posting_date,QTY,Employees_QTY
0,0,0,1160,11
1,0,1,3330,13
2,0,2,1650,12
3,0,3,1320,18
4,0,4,1490,6


In [4]:
x1 = df.iloc[:,0:3].values
x1

array([[    0,     0,  1160],
       [    0,     1,  3330],
       [    0,     2,  1650],
       ...,
       [    5,  3610, 66944],
       [    5,  3611, 58424],
       [    5,  3612,   300]], dtype=int64)

In [5]:
y = df.iloc[:, 3].values
y

array([ 11,  13,  12, ..., 808, 528,   1], dtype=int64)

## **BASE DE TREINO E TESTE**

In [6]:
x_treino, x_teste, y_treino, y_teste = train_test_split(x1, y, test_size = 0.3, random_state = 10)

In [7]:
x_treino.shape, y_treino.shape

((2529, 3), (2529,))

In [8]:
x_teste.shape, y_teste.shape

((1084, 3), (1084,))

In [9]:
catboost = CatBoostRegressor (iterations=100, learning_rate=0.08, depth = 5, random_state = 10)
catboost.fit(x_treino, y_treino)

0:	learn: 210.1957571	total: 148ms	remaining: 14.7s
1:	learn: 195.2410987	total: 150ms	remaining: 7.34s
2:	learn: 181.5349599	total: 151ms	remaining: 4.88s
3:	learn: 169.0599419	total: 152ms	remaining: 3.65s
4:	learn: 157.5398730	total: 153ms	remaining: 2.91s
5:	learn: 147.2775869	total: 154ms	remaining: 2.42s
6:	learn: 137.5442602	total: 155ms	remaining: 2.06s
7:	learn: 128.9464478	total: 156ms	remaining: 1.8s
8:	learn: 121.1754332	total: 157ms	remaining: 1.59s
9:	learn: 114.0534722	total: 158ms	remaining: 1.42s
10:	learn: 107.3341396	total: 159ms	remaining: 1.29s
11:	learn: 101.3521577	total: 160ms	remaining: 1.17s
12:	learn: 96.0795449	total: 161ms	remaining: 1.07s
13:	learn: 91.1856844	total: 161ms	remaining: 991ms
14:	learn: 86.7826125	total: 162ms	remaining: 919ms
15:	learn: 82.9374336	total: 163ms	remaining: 856ms
16:	learn: 79.4708267	total: 164ms	remaining: 801ms
17:	learn: 76.2675185	total: 165ms	remaining: 752ms
18:	learn: 73.3818954	total: 166ms	remaining: 707ms
19:	learn: 

<catboost.core.CatBoostRegressor at 0x1b7c3835750>

In [10]:
catboost.score(x_treino, y_treino)

0.9482677838858751

**TESTE**

In [11]:
catboost.score(x_teste, y_teste)

0.9487600047975867

In [12]:
previsoes_teste = catboost.predict(x_teste)

**Métricas de Desempenho**

In [13]:
# Erro médio Absoluto
mean_absolute_error(y_teste, previsoes_teste)

30.499062680059218

In [14]:
# Raiz do erro quadrático médio (RMSE)
np.sqrt(mean_squared_error(y_teste, previsoes_teste))

51.45509989473748

### **Validação Cruzada**

In [15]:
# Separando os dados em folds
kfold = KFold(n_splits = 15, shuffle=True, random_state = 5)

In [16]:
# Criando o modelo
modelo = CatBoostRegressor (iterations=100, learning_rate=0.08, depth = 5, random_state = 10)
resultado = cross_val_score(modelo, x1, y, cv = kfold)
resultado

0:	learn: 210.3096004	total: 1.02ms	remaining: 101ms
1:	learn: 195.5544169	total: 2.12ms	remaining: 104ms
2:	learn: 181.8013008	total: 3.13ms	remaining: 101ms
3:	learn: 169.3209613	total: 3.85ms	remaining: 92.3ms
4:	learn: 157.7360255	total: 4.77ms	remaining: 90.6ms
5:	learn: 147.4470794	total: 5.7ms	remaining: 89.3ms
6:	learn: 137.6721128	total: 6.68ms	remaining: 88.7ms
7:	learn: 129.0181920	total: 7.62ms	remaining: 87.6ms
8:	learn: 121.2325649	total: 8.6ms	remaining: 87ms
9:	learn: 114.0240092	total: 9.58ms	remaining: 86.3ms
10:	learn: 107.3802565	total: 10.5ms	remaining: 84.7ms
11:	learn: 101.3768481	total: 11.5ms	remaining: 84.5ms
12:	learn: 96.0932073	total: 12.3ms	remaining: 82.2ms
13:	learn: 91.0980908	total: 13.4ms	remaining: 82.1ms
14:	learn: 86.8652229	total: 14.3ms	remaining: 80.8ms
15:	learn: 83.0765069	total: 15.2ms	remaining: 79.6ms
16:	learn: 79.5625757	total: 16.2ms	remaining: 78.9ms
17:	learn: 76.3660283	total: 17.2ms	remaining: 78.3ms
18:	learn: 73.4882146	total: 18.4

array([0.95913032, 0.9222857 , 0.95434923, 0.94500732, 0.93332044,
       0.95694306, 0.95048237, 0.94930354, 0.9631382 , 0.93204191,
       0.9369607 , 0.93593114, 0.94317837, 0.94184193, 0.94908886])

In [17]:
# Usamos a média e o desvio padrão
print("Coeficiente de Determinação médio: %.2f%%" % (resultado.mean() * 100.0))

Coeficiente de Determinação médio: 94.49%


**REGRESSÃO COM CAT BOOST:** R^2 = 0,948/0,948; RMSE = 51,45; R^2 Validação Cruzada: 94,49%