# Algoritmo de Regressão Polinomial - LinearRegression e PolynomialFeatures

## Link para o notebook principal

[Ir para main.ipynb](./main.ipynb)

## Índice

- [Modelos Utilizados](#modelos-utilizados)
- [Importando Pacotes e Bibliotecas](#importando-os-pacotes-e-bibliotecas)
- [Importando os Datasets](#importando-os-datasets)
- [Análise Inicial dos Datasets](#análise-inicial-dos-datasets)
- [Análise Exploratória dos Dados](#aed)
- [Criando os Modelos](#criando-os-modelos)
- [Treinando os Modelos](#treinando-os-modelos)
- [Resultados os Modelos](#resultados-dos-modelos)
    - [Realização dos Testes](#testes)
    - [Qualidade dos Modelos](#qualidade-dos-testes-e-resultados)
- [Discussão](#discussão)

## Importando os pacotes e bibliotecas

In [1]:
# biblioteca para realizar o corte teste | treino
from sklearn.model_selection import train_test_split

# bibliotecas para a regressao
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Biblioteca auxiliar para calcular os scores
from sklearn.metrics import accuracy_score

## Impressao de Graficos
from matplotlib import pyplot as plt
import seaborn as sns

## Bibliotecas Base
import pandas as pd
import numpy as np

## Importando o dataset

In [2]:
df_residencial = pd.read_pickle("./databases/processed/classes-consumoComercialPorUF.pkl")

In [4]:
df_residencial = df_residencial.drop(columns = ["consumo"])
df_residencial = df_residencial.T
df_residencial.head(10)

Unnamed: 0_level_0,Empresa de Pesquisa Energética - EPE,Rondônia,Acre,Amazonas,Roraima,Pará,Amapá,Tocantins,Maranhão,Piauí,Ceará,...,Espírito Santo,Rio de Janeiro,São Paulo,Paraná,Santa Catarina,Rio Grande do Sul,Mato Grosso do Sul,Mato Grosso,Goiás,Distrito Federal
ano,mes,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
2004,JAN,25870,7895,49832,6141,78075,12164,13832,40478,25398,95718,...,84820,567235,1464892,257033,170067,276902,51890,64699,97266,93427
2004,FEV,23367,7329,50457,5822,72467,3894,12160,37893,20769,86036,...,83917,594478,1421228,269481,178963,285451,52518,65905,90106,81618
2004,MAR,24153,7420,47374,5494,75857,8639,13819,40878,20862,90007,...,86051,585939,1416476,268151,186488,283818,51766,74538,96474,84322
2004,ABR,24113,7345,49875,6035,78779,8461,14883,40503,24450,97130,...,83633,600339,1579356,275652,170145,297774,55175,77146,102078,94761
2004,MAI,25789,7142,50280,5548,79714,7704,15465,42202,22877,90758,...,82244,564569,1386690,267628,161961,266991,49652,68100,95321,90165
2004,JUN,24112,6955,50734,5472,82470,8309,16111,42065,24519,93769,...,75631,506022,1280007,239079,142886,245687,44157,61587,91441,87778
2004,JUL,24312,7205,53678,5876,79919,8733,14775,41961,22937,87315,...,75145,524985,1348602,238401,146430,251824,42928,66190,86788,87198
2004,AGO,25604,7580,52589,5820,83589,7858,15815,43550,22760,92398,...,75329,503307,1324733,249298,147906,248318,49557,62202,89515,84814
2004,SET,25980,8309,55148,5652,83100,8850,17041,44972,25997,97622,...,78583,537621,1430716,264561,148465,251759,51954,69190,97105,98068
2004,OUT,27250,7910,55374,5980,82032,9621,16450,43339,25217,93272,...,81687,555512,1455419,255562,153485,248016,53788,72326,101968,97556


#### Divisão dos labels / dados

In [5]:
df_nordeste = df_residencial.iloc[:, 8:16]
df_bahia = df_residencial["Bahia"]

In [6]:
df_bahia.sample(5)

ano   mes
2011  MAI    242290
2005  AGO    165508
2012  AGO    242372
2016  NOV    320423
2020  MAR    344345
Name: Bahia, dtype: uint32

In [7]:
df_nordeste.sample(10)

Unnamed: 0_level_0,Empresa de Pesquisa Energética - EPE,Piauí,Ceará,Rio Grande do Norte,Paraíba,Pernambuco,Alagoas,Sergipe,Bahia
ano,mes,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2016,SET,67610,198138,90159,75342,232268,59111,47321,297955
2014,FEV,51685,182918,86007,78211,214210,65672,53503,293592
2015,MAI,58352,194434,93725,79062,254959,64638,50343,300189
2008,JUL,28640,115983,58168,41268,142335,34913,30321,189364
2018,JUL,61337,155822,84501,72381,223278,59028,47078,286842
2023,JAN,68590,200843,95605,85341,253438,80219,56078,343461
2006,AGO,29190,104915,53409,38515,129606,31082,25815,165985
2020,OUT,70103,195597,87959,74868,233029,65288,49423,301453
2008,AGO,30387,117259,59149,43430,136408,35799,30921,187704
2018,MAI,64818,165384,92465,78053,246375,62938,52676,315490


##### Dataset completo

##### Agrupando cada região separadamente

In [None]:
df_norte = df_residencial.iloc[0:7, :]
df_nordeste = df_residencial.iloc[8:16, :]
df_centro_oeste = df_residencial.iloc[21:23, :]
df_sudeste = df_residencial.iloc[17:20, :]
df_sul = df_residencial.iloc[24:, :]

In [None]:
X_norte = df_norte.drop(columns = ["consumo"])
y_norte = df_norte["consumo"]

X_nordeste = df_nordeste.drop(columns = ["consumo"])
y_nordeste = df_nordeste["consumo"]

X_centro_oeste = df_centro_oeste.drop(columns = ["consumo"])
y_centro_oeste = df_centro_oeste["consumo"]

X_sudeste = df_sudeste.drop(columns = ["consumo"])
y_sudeste = df_sudeste["consumo"]

X_sul = df_sul.drop(columns = ["consumo"])
y_sul = df_sul["consumo"]