# PETCA - Projeto de Análise de Contas de Energia com Aprendizado de Máquina e Redes Neurais

## Índice
- [Modelos Utilizados](#modelos-utilizados)
- [Importando Pacotes e Bibliotecas](#importando-os-pacotes-e-bibliotecas)
- [Importando os Datasets](#importando-os-datasets)
- [Análise Inicial dos Datasets](#análise-inicial-dos-datasets)
- [Análise Exploratória dos Dados](#aed)
- [Criando os Modelos](#criando-os-modelos)
- [Treinando os Modelos](#treinando-os-modelos)
- [Resultados os Modelos](#resultados-dos-modelos)
    - [Realização dos Testes](#testes)
    - [Qualidade dos Modelos](#qualidade-dos-testes-e-resultados)
- [Discussão](#discussão)

## Modelos Utilizados
- Árvore de Decisão (Decision Tree)
- Ensemble
- Floresta Randômica (Random Forest)
- Redes Neurais Convolucionais
- Regressão Linear
- Regressão Polinomial
- Support Vector Machine (SVM)

## Importando os pacotes e bibliotecas

In [1]:
# biblioteca para realizar o corte teste | treino
from sklearn.model_selection import train_test_split

# bibliotecas de classificacao
## Floresta Randomica;
## Arvore de Decisao; e
## Support Vector Machine (SVM).
from sklearn.ensemble  import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC

# biblioteca de modelos polinomiais
from sklearn.preprocessing import PolynomialFeatures

# biblioteca de modelos lineares
## Regressao Linear; e
## Support Vector Machine (SVM).
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR

# bibliotecas de suporte -----
## Impressao de Graficos
from matplotlib import pyplot as plt
import seaborn as sns

## Bibliotecas Base
import pandas as pd
import numpy as np
# ----------------------------

# bibliotecas e pacotes do TensorFlow
## Redes Neurais Convolucionais
import tensorflow as tf
from keras import layers, models

2024-07-29 14:48:00.839459: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-07-29 14:48:00.920409: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-29 14:48:00.920462: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-29 14:48:00.923809: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-29 14:48:00.937370: I tensorflow/core/platform/cpu_feature_gua

## Importando os datasets

In [2]:
df_residencial_raw = pd.read_csv("./databases/raw/CONSUMO MENSAL DE ENERGIA ELÉTRICA POR CLASSE - CONSUMO COMERCIAL POR UF.csv", sep = ",", index_col = 0)

## Análise Inicial dos Datasets

### Consumo Residencial por UF

In [3]:
df_residencial_raw.sample(10)

Unnamed: 0_level_0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,...,Unnamed: 243,Unnamed: 244,Unnamed: 245,Unnamed: 246,Unnamed: 247,Unnamed: 248,Unnamed: 249,Unnamed: 250,Unnamed: 251,Unnamed: 252
Empresa de Pesquisa Energética - EPE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Pernambuco,119.008,115.908,123.234,129.722,124.960,122.253,115.604,117.648,125.223,129.659,...,257.635,276.728,,,,,,,,
Maranhão,40.478,37.893,40.878,40.503,42.202,42.065,41.961,43.550,44.972,43.339,...,103.378,105.554,,,,,,,,
Bahia,171.570,167.151,166.958,174.506,158.750,159.294,149.743,146.609,154.636,163.207,...,357.060,368.111,,,,,,,,
Paraná,257.033,269.481,268.151,275.652,267.628,239.079,238.401,249.298,264.561,255.562,...,702.042,663.592,,,,,,,,
Piauí,25.398,20.769,20.862,24.450,22.877,24.519,22.937,22.760,25.997,25.217,...,76.114,75.892,,,,,,,,
Espírito Santo,84.820,83.917,86.051,83.633,82.244,75.631,75.145,75.329,78.583,81.687,...,212.645,203.481,,,,,,,,
Nota: atualização defasada para não antecipar informações de distribuidoras que devem obedecer às intruções da CVM sobre publicação de resultados.,,,,,,,,,,,...,,,,,,,,,,
Rondônia,25.870,23.367,24.153,24.113,25.789,24.112,24.312,25.604,25.980,27.250,...,64.375,67.186,,,,,,,,
Alagoas,31.461,30.119,30.572,32.021,30.512,31.649,27.614,27.011,29.066,29.911,...,89.432,86.378,,,,,,,,
São Paulo,1.464.892,1.421.228,1.416.476,1.579.356,1.386.690,1.280.007,1.348.602,1.324.733,1.430.716,1.455.419,...,2.892.123,2.895.258,,,,,,,,


#### Quantidade de valores nulos

In [4]:
df_residencial_transposto = df_residencial_raw.transpose()
df_residencial_transposto.sample(10)

Empresa de Pesquisa Energética - EPE,Consumo de energia elétrica na rede (MWh),Sistema SIMPLES,NaN,NaN.1,NaN.2,TOTAL POR UF,Rondônia,Acre,Amazonas,Roraima,...,Rio de Janeiro,São Paulo,Paraná,Santa Catarina,Rio Grande do Sul,Mato Grosso do Sul,Mato Grosso,Goiás,Distrito Federal,Nota: atualização defasada para não antecipar informações de distribuidoras que devem obedecer às intruções da CVM sobre publicação de resultados.
Unnamed: 126,,,,,JUN,6.862.125,52.707,17.724,106.48,13.355,...,824.231,2.203.224,450.413,290.246,388.584,82.415,126.737,183.445,168.672,
Unnamed: 48,,,,,DEZ,5.161.662,31.473,10.03,66.297,8.529,...,659.305,1.759.015,328.203,212.892,322.732,58.207,81.445,121.354,116.875,
Unnamed: 212,,,,,AGO,6.843.493,57.865,21.508,155.641,19.799,...,678.401,2.036.757,488.979,345.788,371.393,89.229,141.852,182.953,155.737,
Unnamed: 71,,,,,NOV,5.763.756,39.584,12.149,80.592,9.962,...,739.456,1.918.770,377.901,239.817,342.083,68.685,90.008,161.169,133.041,
Unnamed: 176,,,,,AGO,6.894.849,59.614,18.619,111.825,18.511,...,775.992,2.131.668,481.417,297.915,377.803,96.363,134.56,201.652,158.345,
Unnamed: 138,,,,,JUN,7.072.660,54.095,19.19,107.651,15.32,...,856.18,2.283.526,466.854,308.299,399.202,85.974,135.348,181.309,168.543,
Unnamed: 142,,,,,OUT,7.630.771,59.172,21.257,131.61,18.868,...,923.789,2.467.949,498.364,302.702,396.329,107.715,146.16,214.12,186.216,
Unnamed: 178,,,,,OUT,7.430.572,62.87,20.849,122.96,20.877,...,852.654,2.310.430,486.18,325.382,388.101,101.193,154.772,222.022,176.991,
Unnamed: 204,,,,,DEZ,7.520.377,63.636,23.629,108.025,20.08,...,882.708,2.381.197,531.045,399.654,378.878,108.78,154.333,208.658,159.099,
Unnamed: 93,,,,,SET,6.113.102,46.396,14.487,95.488,11.53,...,707.78,2.042.227,414.322,247.669,351.584,77.98,112.049,170.9,157.65,


In [5]:
columns_to_be_droped = [
    "Consumo de energia elétrica na rede (MWh)",
    "Sistema SIMPLES",
    "Nota: atualização defasada para não antecipar informações de distribuidoras que devem obedecer às intruções da CVM sobre publicação de resultados.",
    "TOTAL POR UF"
]

In [6]:
df_residencial_transposto.drop(columns = columns_to_be_droped, inplace = True, axis = "columns")
df_residencial_transposto.sample(10)

Empresa de Pesquisa Energética - EPE,NaN,NaN.1,NaN.2,Rondônia,Acre,Amazonas,Roraima,Pará,Amapá,Tocantins,...,Espírito Santo,Rio de Janeiro,São Paulo,Paraná,Santa Catarina,Rio Grande do Sul,Mato Grosso do Sul,Mato Grosso,Goiás,Distrito Federal
Unnamed: 62,,,FEV,34.332,10.278,60.694,8.154,92.923,12.526,18.912,...,115.401,719.104,1.806.503,366.386,249.7,356.982,61.417,94.417,142.782,121.616
Unnamed: 107,,,NOV,56.362,17.22,102.574,13.698,130.32,19.684,29.938,...,140.799,861.01,2.353.543,460.512,299.496,422.218,94.866,121.708,190.563,176.733
Unnamed: 10,,,OUT,27.25,7.91,55.374,5.98,82.032,9.621,16.45,...,81.687,555.512,1.455.419,255.562,153.485,248.016,53.788,72.326,101.968,97.556
Unnamed: 88,,,ABR,42.438,12.023,78.143,10.573,104.77,14.472,24.934,...,128.078,783.202,2.101.063,423.852,265.717,389.908,80.671,107.986,163.446,159.716
Unnamed: 85,,2011.0,JAN,42.015,12.545,76.646,11.542,105.223,15.095,24.376,...,128.254,811.728,2.041.243,409.5,275.025,427.341,74.47,88.317,150.184,150.104
Unnamed: 48,,,DEZ,31.473,10.03,66.297,8.529,98.943,11.64,18.842,...,113.024,659.305,1.759.015,328.203,212.892,322.732,58.207,81.445,121.354,116.875
Unnamed: 23,,,NOV,28.313,9.141,60.379,7.566,88.431,9.725,17.108,...,89.532,613.28,1.526.161,281.243,169.011,272.354,52.559,76.149,109.003,107.667
Unnamed: 143,,,NOV,57.024,21.669,118.359,17.79,167.489,28.739,36.251,...,153.347,951.513,2.532.951,498.19,309.781,403.872,106.432,145.511,216.266,190.16
Unnamed: 208,,,ABR,53.96,18.971,102.995,19.076,154.59,23.014,35.044,...,138.095,772.565,2.304.667,530.641,413.392,409.144,107.61,132.349,196.448,152.407
Unnamed: 140,,,AGO,55.228,20.575,120.819,16.195,149.47,22.435,35.883,...,138.691,846.19,2.255.344,484.699,298.443,397.417,95.226,134.469,191.105,166.195


## Renomeando colunas que não possuiam nomes

In [7]:
columns_to_rename = pd.Series(df_residencial_transposto.columns)
columns_to_rename = columns_to_rename.fillna("new_name" + (columns_to_rename.groupby(columns_to_rename.isnull()).cumcount() + 1).astype(str))
df_residencial_transposto.columns = columns_to_rename
df_residencial_transposto.sample(10)

Empresa de Pesquisa Energética - EPE,new_name1,new_name2,new_name3,Rondônia,Acre,Amazonas,Roraima,Pará,Amapá,Tocantins,...,Espírito Santo,Rio de Janeiro,São Paulo,Paraná,Santa Catarina,Rio Grande do Sul,Mato Grosso do Sul,Mato Grosso,Goiás,Distrito Federal
Unnamed: 119,,,NOV,53.027,16.984,109.6,13.346,140.934,22.967,37.62,...,147.252,860.899,2.462.167,479.215,311.677,428.509,97.547,133.681,199.184,181.681
Unnamed: 11,,,NOV,26.816,8.377,55.504,6.798,85.686,9.772,15.665,...,82.689,561.987,1.438.769,258.33,158.351,260.87,51.412,72.518,97.629,92.111
Unnamed: 115,,,JUL,51.477,15.569,100.756,13.029,137.379,18.819,31.324,...,127.321,784.956,2.071.810,422.702,280.321,377.239,79.127,128.762,175.651,156.749
Unnamed: 154,,,OUT,55.628,21.408,108.642,17.872,161.51,22.753,37.471,...,137.997,822.764,2.184.559,445.565,293.822,360.479,96.901,137.352,198.301,172.854
Unnamed: 76,,,ABR,40.729,12.385,79.541,10.244,105.416,13.844,23.899,...,126.525,755.229,1.998.557,400.511,264.603,379.992,75.654,98.902,158.353,138.753
Unnamed: 215,,,NOV,57.849,22.217,122.169,22.387,176.108,25.315,35.944,...,142.271,729.056,2.302.508,526.4,377.423,405.771,98.66,145.903,215.856,173.966
Unnamed: 44,,,AGO,30.576,9.441,62.334,7.541,95.613,10.238,19.138,...,94.554,575.956,1.494.738,301.954,177.137,282.473,49.105,73.906,111.551,102.956
Unnamed: 172,,,ABR,55.431,18.527,101.448,18.313,146.276,19.723,35.108,...,154.832,904.972,2.582.438,547.065,373.908,444.78,123.793,147.867,208.064,170.258
Unnamed: 16,,,ABR,26.397,7.587,55.001,6.593,80.3,9.605,15.862,...,94.214,655.122,1.598.831,303.092,195.076,292.785,59.365,85.785,108.313,99.922
Unnamed: 86,,,FEV,39.544,12.454,73.982,9.246,97.462,15.005,22.124,...,135.287,832.039,2.132.679,433.147,289.661,426.691,74.801,92.712,156.26,150.235


### Excluindo coluna insignificante

In [8]:
df_residencial_transposto.drop(columns = ["new_name1"], inplace = True, axis = "columns")
df_residencial_transposto.sample(10)

Empresa de Pesquisa Energética - EPE,new_name2,new_name3,Rondônia,Acre,Amazonas,Roraima,Pará,Amapá,Tocantins,Maranhão,...,Espírito Santo,Rio de Janeiro,São Paulo,Paraná,Santa Catarina,Rio Grande do Sul,Mato Grosso do Sul,Mato Grosso,Goiás,Distrito Federal
Unnamed: 55,,JUL,30.505,9.889,65.219,7.759,100.705,11.351,19.11,56.772,...,99.503,596.532,1.611.166,309.173,189.171,297.558,47.529,82.702,128.568,108.544
Unnamed: 66,,JUN,34.666,10.2,69.674,8.453,99.177,12.685,20.532,56.079,...,107.611,617.397,1.676.350,332.065,202.151,311.692,54.827,83.932,142.287,125.423
Unnamed: 166,,OUT,59.752,25.05,105.666,19.052,164.753,24.081,39.681,114.752,...,140.469,813.67,2.365.924,491.471,315.708,387.34,105.478,147.462,200.851,160.81
Unnamed: 110,,FEV,44.797,15.342,94.184,11.742,123.148,17.649,26.389,82.396,...,145.762,874.686,2.292.030,472.142,337.61,458.756,93.852,117.444,178.412,165.196
Unnamed: 157,2017.0,JAN,51.309,21.391,97.214,16.319,141.314,20.209,32.827,95.399,...,168.414,981.023,2.488.983,504.027,393.89,486.438,107.534,129.419,196.005,169.632
Unnamed: 244,,ABR,67.186,24.94,137.675,26.73,196.144,26.448,45.082,105.554,...,203.481,846.565,2.895.258,663.592,543.175,506.31,109.172,161.94,256.047,176.765
Unnamed: 108,,DEZ,50.225,16.036,100.25,13.597,128.819,19.642,28.994,85.358,...,147.533,874.682,2.308.213,476.817,322.23,461.929,98.093,118.621,180.039,166.843
Unnamed: 246,,JUN,,,,,,,,,...,,,,,,,,,,
Unnamed: 105,,SET,53.424,17.132,99.563,12.272,127.221,19.304,30.244,81.833,...,127.272,755.483,2.177.204,446.969,267.117,385.231,87.854,115.325,178.99,161.132
Unnamed: 25,2006.0,JAN,28.966,8.602,56.039,6.639,83.584,9.655,15.602,45.286,...,101.901,618.185,1.565.125,300.625,193.958,311.252,56.549,70.429,102.926,98.604


### Renomeando colunas "ano" e "mês"

In [9]:
columns_to_rename = {
    "new_name2" : "ano",
    "new_name3" : "mes"
}

df_residencial_transposto.rename(mapper = columns_to_rename, axis = "columns", inplace = True)
df_residencial_transposto.head(5)

Empresa de Pesquisa Energética - EPE,ano,mes,Rondônia,Acre,Amazonas,Roraima,Pará,Amapá,Tocantins,Maranhão,...,Espírito Santo,Rio de Janeiro,São Paulo,Paraná,Santa Catarina,Rio Grande do Sul,Mato Grosso do Sul,Mato Grosso,Goiás,Distrito Federal
Unnamed: 1,2004.0,JAN,25.87,7.895,49.832,6.141,78.075,12.164,13.832,40.478,...,84.82,567.235,1.464.892,257.033,170.067,276.902,51.89,64.699,97.266,93.427
Unnamed: 2,,FEV,23.367,7.329,50.457,5.822,72.467,3.894,12.16,37.893,...,83.917,594.478,1.421.228,269.481,178.963,285.451,52.518,65.905,90.106,81.618
Unnamed: 3,,MAR,24.153,7.42,47.374,5.494,75.857,8.639,13.819,40.878,...,86.051,585.939,1.416.476,268.151,186.488,283.818,51.766,74.538,96.474,84.322
Unnamed: 4,,ABR,24.113,7.345,49.875,6.035,78.779,8.461,14.883,40.503,...,83.633,600.339,1.579.356,275.652,170.145,297.774,55.175,77.146,102.078,94.761
Unnamed: 5,,MAI,25.789,7.142,50.28,5.548,79.714,7.704,15.465,42.202,...,82.244,564.569,1.386.690,267.628,161.961,266.991,49.652,68.1,95.321,90.165


### Preenchendo a coluna "ano" com valores corretos do ano

In [10]:
df_residencial_transposto["ano"] = df_residencial_transposto["ano"].ffill()

## Transformando colunas importantes em Índice

In [11]:
df_residencial_transposto["ano"].unique()

array(['2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011',
       '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019',
       '2020', '2021', '2022', '2023', '2024*'], dtype=object)

In [12]:
df_residencial_transposto["mes"].unique()

array(['JAN', 'FEV', 'MAR', 'ABR', 'MAI', 'JUN', 'JUL', 'AGO', 'SET',
       'OUT', 'NOV', 'DEZ'], dtype=object)

In [13]:
df_residencial_transposto.columns

Index(['ano', 'mes', 'Rondônia', 'Acre', 'Amazonas', 'Roraima', 'Pará',
       'Amapá', 'Tocantins', 'Maranhão', 'Piauí', 'Ceará',
       'Rio Grande do Norte', 'Paraíba', 'Pernambuco', 'Alagoas', 'Sergipe',
       'Bahia', 'Minas Gerais', 'Espírito Santo', 'Rio de Janeiro',
       'São Paulo', 'Paraná', 'Santa Catarina', 'Rio Grande do Sul',
       'Mato Grosso do Sul', 'Mato Grosso', 'Goiás', 'Distrito Federal'],
      dtype='object', name='Empresa de Pesquisa Energética - EPE')

In [14]:
df_residencial_transposto.reset_index(inplace = True)
df_residencial_transposto["ano"] = df_residencial_transposto["ano"].astype("str")
df_residencial_transposto["mes"] = df_residencial_transposto["mes"].astype("str")
df_residencial_transposto.set_index(["ano", "mes"], inplace = True)
df_residencial_transposto.drop(columns = ["index"], axis = "columns", inplace = True)

In [15]:
df_residencial_transposto.sample(10)

Unnamed: 0_level_0,Empresa de Pesquisa Energética - EPE,Rondônia,Acre,Amazonas,Roraima,Pará,Amapá,Tocantins,Maranhão,Piauí,Ceará,...,Espírito Santo,Rio de Janeiro,São Paulo,Paraná,Santa Catarina,Rio Grande do Sul,Mato Grosso do Sul,Mato Grosso,Goiás,Distrito Federal
ano,mes,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
2020,AGO,59.693,20.87,116.087,16.913,170.062,19.701,32.718,101.869,56.578,165.374,...,115.493,675.058,1.954.213,444.491,293.87,318.59,110.505,124.63,176.199,141.023
2011,FEV,39.544,12.454,73.982,9.246,97.462,15.005,22.124,62.591,36.803,134.576,...,135.287,832.039,2.132.679,433.147,289.661,426.691,74.801,92.712,156.26,150.235
2022,OUT,63.402,23.731,132.866,24.426,196.761,26.246,43.725,101.912,78.215,189.588,...,150.188,707.229,2.258.145,510.297,370.528,367.925,116.71,148.559,231.036,178.226
2022,SET,63.313,22.775,134.471,23.988,196.497,25.103,42.666,100.815,77.695,190.513,...,136.535,693.922,2.320.905,503.724,360.176,373.497,102.407,147.509,218.587,170.405
2015,SET,59.847,20.8,126.563,18.207,156.328,25.626,38.734,109.946,62.497,184.865,...,139.625,851.35,2.252.327,478.414,294.183,388.873,102.05,146.324,201.825,173.628
2004,MAI,25.789,7.142,50.28,5.548,79.714,7.704,15.465,42.202,22.877,90.758,...,82.244,564.569,1.386.690,267.628,161.961,266.991,49.652,68.1,95.321,90.165
2023,JAN,57.347,21.636,119.195,22.579,176.755,21.309,34.738,93.891,68.59,200.843,...,164.737,792.826,2.548.374,577.695,463.513,486.49,101.008,136.436,208.186,163.882
2015,ABR,53.46,18.96,109.084,16.211,127.765,20.662,34.84,96.822,54.699,180.58,...,160.274,970.97,2.598.202,530.812,365.028,483.143,107.193,144.484,205.112,171.004
2007,SET,33.884,9.846,65.619,7.467,98.375,12.092,19.855,51.825,30.763,107.934,...,95.25,621.84,1.665.504,315.13,188.014,284.909,55.571,79.517,115.259,107.222
2011,SET,46.396,14.487,95.488,11.53,124.534,17.455,29.043,78.471,44.744,150.002,...,117.514,707.78,2.042.227,414.322,247.669,351.584,77.98,112.049,170.9,157.65


### Deletando a linha NaN do índice

In [16]:
df_residencial_processed = df_residencial_transposto.iloc[ : df_residencial_transposto.shape[0] - 12, :]
df_residencial_processed.tail(20)

Unnamed: 0_level_0,Empresa de Pesquisa Energética - EPE,Rondônia,Acre,Amazonas,Roraima,Pará,Amapá,Tocantins,Maranhão,Piauí,Ceará,...,Espírito Santo,Rio de Janeiro,São Paulo,Paraná,Santa Catarina,Rio Grande do Sul,Mato Grosso do Sul,Mato Grosso,Goiás,Distrito Federal
ano,mes,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
2022,MAI,60.023,22.062,93.34,20.905,179.501,21.565,39.406,95.508,70.954,184.875,...,154.09,766.995,2.399.989,507.105,388.556,383.942,92.063,152.498,234.778,161.466
2022,JUN,55.906,20.726,110.633,20.277,182.373,22.355,39.379,93.184,69.828,191.863,...,137.109,693.639,2.212.539,489.848,364.369,382.844,77.987,146.694,215.66,159.671
2022,JUL,58.845,22.736,125.392,20.642,184.979,22.587,39.051,93.368,70.545,181.182,...,131.802,686.866,2.216.133,514.591,364.095,387.416,90.823,146.942,210.97,148.005
2022,AGO,62.511,22.869,135.581,23.444,193.527,23.411,40.429,100.982,71.933,185.456,...,141.382,690.904,2.262.362,530.176,370.186,382.543,103.458,145.534,212.732,158.129
2022,SET,63.313,22.775,134.471,23.988,196.497,25.103,42.666,100.815,77.695,190.513,...,136.535,693.922,2.320.905,503.724,360.176,373.497,102.407,147.509,218.587,170.405
2022,OUT,63.402,23.731,132.866,24.426,196.761,26.246,43.725,101.912,78.215,189.588,...,150.188,707.229,2.258.145,510.297,370.528,367.925,116.71,148.559,231.036,178.226
2022,NOV,60.625,22.916,130.245,22.875,187.033,23.196,39.28,99.693,73.635,195.777,...,152.476,758.398,2.413.940,524.017,393.175,394.981,116.609,143.855,228.505,173.719
2022,DEZ,63.862,22.562,125.544,22.698,186.473,23.793,37.6,100.793,74.491,195.558,...,156.328,800.252,2.517.700,560.955,443.195,478.772,98.616,146.993,222.807,171.316
2023,JAN,57.347,21.636,119.195,22.579,176.755,21.309,34.738,93.891,68.59,200.843,...,164.737,792.826,2.548.374,577.695,463.513,486.49,101.008,136.436,208.186,163.882
2023,FEV,54.871,20.649,110.512,21.194,168.418,18.451,35.098,89.814,64.915,196.893,...,174.009,836.768,2.557.049,592.226,514.24,517.115,106.67,145.345,209.09,165.747


### Verificandos os tipos das colunas do dataset

In [17]:
df_residencial_processed.dtypes

Empresa de Pesquisa Energética - EPE
Rondônia               object
Acre                   object
Amazonas               object
Roraima                object
Pará                   object
Amapá                  object
Tocantins              object
Maranhão               object
Piauí                  object
Ceará                  object
Rio Grande do Norte    object
Paraíba                object
Pernambuco             object
Alagoas                object
Sergipe                object
Bahia                  object
Minas Gerais           object
Espírito Santo         object
Rio de Janeiro         object
São Paulo              object
Paraná                 object
Santa Catarina         object
Rio Grande do Sul      object
Mato Grosso do Sul     object
Mato Grosso            object
Goiás                  object
Distrito Federal       object
dtype: object

### Trocando os tipos dos dados para melhor compressão e compreensão do dataset

Os dados serão trocados para o tipo "unsigned integer" de 32 bits

In [18]:
for coluna in df_residencial_processed.columns:
    df_residencial_processed[coluna] = [valor.replace(".", "") for valor in df_residencial_processed[coluna]]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_residencial_processed[coluna] = [valor.replace(".", "") for valor in df_residencial_processed[coluna]]


In [19]:
df_residencial_processed["Rondônia"] = df_residencial_processed["Rondônia"].astype("uint32")
df_residencial_processed["Acre"] = df_residencial_processed["Acre"].astype("uint32")
df_residencial_processed["Amazonas"] = df_residencial_processed["Amazonas"].astype("uint32")
df_residencial_processed["Roraima"] = df_residencial_processed["Roraima"].astype("uint32")
df_residencial_processed["Pará"] = df_residencial_processed["Pará"].astype("uint32")
df_residencial_processed["Amapá"] = df_residencial_processed["Amapá"].astype("uint32")
df_residencial_processed["Tocantins"] = df_residencial_processed["Tocantins"].astype("uint32")
df_residencial_processed["Maranhão"] = df_residencial_processed["Maranhão"].astype("uint32")
df_residencial_processed["Piauí"] = df_residencial_processed["Piauí"].astype("uint32")
df_residencial_processed["Ceará"] = df_residencial_processed["Ceará"].astype("uint32")
df_residencial_processed["Rio Grande do Norte"] = df_residencial_processed["Rio Grande do Norte"].astype("uint32")
df_residencial_processed["Paraíba"] = df_residencial_processed["Paraíba"].astype("uint32")
df_residencial_processed["Pernambuco"] = df_residencial_processed["Pernambuco"].astype("uint32")
df_residencial_processed["Alagoas"] = df_residencial_processed["Alagoas"].astype("uint32")
df_residencial_processed["Sergipe"] = df_residencial_processed["Sergipe"].astype("uint32")
df_residencial_processed["Bahia"] = df_residencial_processed["Bahia"].astype("uint32")
df_residencial_processed["Minas Gerais"] = df_residencial_processed["Minas Gerais"].astype("uint32")
df_residencial_processed["Espírito Santo"] = df_residencial_processed["Espírito Santo"].astype("uint32")
df_residencial_processed["Rio de Janeiro"] = df_residencial_processed["Rio de Janeiro"].astype("uint32")
df_residencial_processed["São Paulo"] = df_residencial_processed["São Paulo"].astype("uint32")
df_residencial_processed["Paraná"] = df_residencial_processed["Paraná"].astype("uint32")
df_residencial_processed["Santa Catarina"] = df_residencial_processed["Santa Catarina"].astype("uint32")
df_residencial_processed["Rio Grande do Sul"] = df_residencial_processed["Rio Grande do Sul"].astype("uint32")
df_residencial_processed["Mato Grosso do Sul"] = df_residencial_processed["Mato Grosso do Sul"].astype("uint32")
df_residencial_processed["Mato Grosso"] = df_residencial_processed["Mato Grosso"].astype("uint32")
df_residencial_processed["Goiás"] = df_residencial_processed["Goiás"].astype("uint32")
df_residencial_processed["Distrito Federal"] = df_residencial_processed["Distrito Federal"].astype("uint32")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_residencial_processed["Rondônia"] = df_residencial_processed["Rondônia"].astype("uint32")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_residencial_processed["Acre"] = df_residencial_processed["Acre"].astype("uint32")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_residencial_processed["A

### Conferindo a troca de tipos

In [20]:
df_residencial_processed.dtypes

Empresa de Pesquisa Energética - EPE
Rondônia               uint32
Acre                   uint32
Amazonas               uint32
Roraima                uint32
Pará                   uint32
Amapá                  uint32
Tocantins              uint32
Maranhão               uint32
Piauí                  uint32
Ceará                  uint32
Rio Grande do Norte    uint32
Paraíba                uint32
Pernambuco             uint32
Alagoas                uint32
Sergipe                uint32
Bahia                  uint32
Minas Gerais           uint32
Espírito Santo         uint32
Rio de Janeiro         uint32
São Paulo              uint32
Paraná                 uint32
Santa Catarina         uint32
Rio Grande do Sul      uint32
Mato Grosso do Sul     uint32
Mato Grosso            uint32
Goiás                  uint32
Distrito Federal       uint32
dtype: object

## Salvando os datasets tratados
### Eles serão salvos em .pkl (pickle)
Essa extensão criada pelos desenvolvedores da biblioteca pandas salva os metadados dos datasets enquanto comprime seu tamanho de arquivo

In [21]:
df_residencial_processed.to_pickle(path = "./databases/processed/classes-consumoComercialPorUF.pkl")

## AED
### Análise Exploratória dos Dados

### Consumo Residencial por UF

## Criando os Modelos

## Treinando os Modelos

## Resultados dos Modelos

### Testes

### Qualidade dos Testes e Resultados

## Discussão

A discussão vai aqui