# Modelando o CSV
Modificações iniciais no dataset para adequá-lo mais ao modelo.



**Importação de bibliotecas do Python:**

* Drive: Para acessar o Google Drive onde o CSV está localizado
* Pandas: Para a manipulação e análise dos dados
* Matplotlib: Para a criação de gráficos
* Numpy: Para a manipulação de arrays




In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

from google.colab import drive
drive.mount('/content/drive')
pd.set_option('display.max_columns', None)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


**Leitura do CSV**

Lemos o CSV localizado no google drive e removemos uma coluna que não será usada.

In [None]:
df_teste = pd.read_csv('drive/MyDrive/Colab Notebooks/df_dados_cliente_totais_novos.csv')
del df_teste['Unnamed: 0']

**Criação de colunas temporais**

Criamos uma coluna com a informação de hora e minuto em decimais, afinal, o minuto sozinho não tem muito peso na predição, mas o minuto em conjunto com o horário, tem. Além disso criamos uma segunda coluna contendo as informações de True ou False levando em consideração se a hora da linha é maior ou igual á 18 (hora de início do horário nobre).

In [None]:
df_teste["Hora e Minuto"] = df_teste['Hora'] + + df_teste["Minuto"]/60

df_teste.insert(len(df_teste.columns),'Horário Nobre', False)
df_teste.loc[(df_teste['Hora'] >= 18), 'Horário Nobre'] = True

**Criação de colunas da programação**

Criamos colunas contando a informação de True ou False para sabermos quais programas estão passando em seus determinados horários, mas para não lotarmos o dataframe com informações desnecessárias, colocamos apenas os programas que:
1. Passam há muito tempo e não devem parar de passar
2. Tem um impacto grande no resultado

In [None]:
df_teste["Programa 1"] = df_teste["Programa"].str.contains("PROGRAMA 63")
df_teste["Programa 2"] = df_teste["Programa"].str.contains("PROGRAMA 3")
df_teste["Programa 3"] = df_teste["Programa"].str.contains("PROGRAMA 1")
df_teste["Programa 4"] = df_teste["Programa"].str.contains("PROGRAMA 9")
df_teste["Programa 5"] = df_teste["Programa"].str.contains("PROGRAMA 214")

**Transformação do tipo de dado**

Transformamos os dados das colunas: 'Ano', 'Mês', 'Dia', 'Hora', 'Minuto' e 'Hora e Minuto' de valores numerais para valores de texto. Fazemos isso porque o modelo tem mais facilidade de ler valores quando estão em One Hot encoding (decodificação de variáveis categóricas de forma binária) do que em Label encoding (decodificação de variáveis categóricas de forma numérica). Além disso, removemos a coluna contendo as informações dos programas, uma vez que não a usaremos mais.

In [None]:
df_teste.drop(['Programa'], axis=1, inplace=True)
df_teste['Ano'] = df_teste['Ano'].astype(str)
df_teste['Mês'] = df_teste['Mês'].astype(str)
df_teste['Dia'] = df_teste['Dia'].astype(str)
df_teste['Hora'] = df_teste['Hora'].astype(str)
df_teste['Minuto'] = df_teste['Minuto'].astype(str)
df_teste["Hora e Minuto"] = df_teste["Hora e Minuto"].astype(str)

**Uso do get dummies**

O uso do método get_dummies no dataframe realiza a aplicação do one-hot encoding nas variáveis categóricas do dataframe, deletando todas as colunas com valores não numéricos e criando uma coluna para cada informação diferente em formato booleano.

In [None]:
df_teste = pd.get_dummies(df_teste)

In [None]:
df_teste

Unnamed: 0,Total de Domicílios,Masculino,Feminino,AB,C1,C2,DE,4-11 anos,12-17 anos,18-24 anos,25-34 anos,35-49 anos,50-59 anos,60+ anos,Total de Indivíduos,Masculino FID,Feminino FID,AB FID,C1 FID,C2 FID,DE FID,4-11 anos FID,12-17 anos FID,18-24 anos FID,25-34 anos FID,35-49 anos FID,50-59 anos FID,60+ anos FID,Total de Domicílios SHR,Masculino SHR,Feminino SHR,AB SHR,C1 SHR,C2 SHR,DE SHR,4-11 anos SHR,12-17 anos SHR,18-24 anos SHR,25-34 anos SHR,35-49 anos SHR,50-59 anos SHR,60+ anos SHR,Feriado,Horário Nobre,Programa 1,Programa 2,Programa 3,Programa 4,Programa 5,Ano_2015,Ano_2016,Ano_2017,Ano_2018,Ano_2019,Ano_2020,Ano_2021,Ano_2022,Mês_1,Mês_10,Mês_11,Mês_12,Mês_2,Mês_3,Mês_4,Mês_5,Mês_6,Mês_7,Mês_8,Mês_9,Dia_1,Dia_10,Dia_11,Dia_12,Dia_13,Dia_14,Dia_15,Dia_16,Dia_17,Dia_18,Dia_19,Dia_2,Dia_20,Dia_21,Dia_22,Dia_23,Dia_24,Dia_25,Dia_26,Dia_27,Dia_28,Dia_29,Dia_3,Dia_30,Dia_31,Dia_4,Dia_5,Dia_6,Dia_7,Dia_8,Dia_9,Dia da Semana_Domingo,Dia da Semana_Quarta,Dia da Semana_Quinta,Dia da Semana_Segunda,Dia da Semana_Sexta,Dia da Semana_Sábado,Dia da Semana_Terça,Hora_0,Hora_1,Hora_10,Hora_11,Hora_12,Hora_13,Hora_14,Hora_15,Hora_16,Hora_17,Hora_18,Hora_19,Hora_2,Hora_20,Hora_21,Hora_22,Hora_23,Hora_3,Hora_4,Hora_5,Hora_6,Hora_7,Hora_8,Hora_9,Minuto_0,Minuto_10,Minuto_15,Minuto_20,Minuto_25,Minuto_30,Minuto_35,Minuto_40,Minuto_45,Minuto_5,Minuto_50,Minuto_55,Gênero_ AUDITORIO,Gênero_ CARROS E MOTORES,Gênero_ CULINARIO,Gênero_ DEBATE,Gênero_ DESENHO ADULTO,Gênero_ DOCUMENTARIO,Gênero_ EDUCATIVO,Gênero_ ENTREVISTA,Gênero_ ESPORTE,Gênero_ FEMININO,Gênero_ FILME,Gênero_ FUTEBOL,Gênero_ GAME SHOW,Gênero_ HUMORISTICO,Gênero_ INFANTIL,Gênero_ JORNALISMO,Gênero_ MINISSERIE,Gênero_ MUSICAL,Gênero_ NAO CONSTA,Gênero_ NOVELA,Gênero_ OUTROS,Gênero_ POLITICO,Gênero_ PREMIACAO,Gênero_ REALITY SHOW,Gênero_ RELIGIOSO,Gênero_ REPORTAGEM,Gênero_ RURAL,Gênero_ SAUDE,Gênero_ SERIES,Gênero_ SHOW,Gênero_ SORTEIO,Gênero_ TELE VENDAS,Hora e Minuto_0.0,Hora e Minuto_0.08333333333333333,Hora e Minuto_0.16666666666666666,Hora e Minuto_0.25,Hora e Minuto_0.3333333333333333,Hora e Minuto_0.4166666666666667,Hora e Minuto_0.5,Hora e Minuto_0.5833333333333334,Hora e Minuto_0.6666666666666666,Hora e Minuto_0.75,Hora e Minuto_0.8333333333333334,Hora e Minuto_0.9166666666666666,Hora e Minuto_1.0,Hora e Minuto_1.0833333333333333,Hora e Minuto_1.1666666666666667,Hora e Minuto_1.25,Hora e Minuto_1.3333333333333333,Hora e Minuto_1.4166666666666667,Hora e Minuto_1.5,Hora e Minuto_1.5833333333333335,Hora e Minuto_1.6666666666666665,Hora e Minuto_1.75,Hora e Minuto_1.8333333333333335,Hora e Minuto_1.9166666666666665,Hora e Minuto_10.0,Hora e Minuto_10.083333333333334,Hora e Minuto_10.166666666666666,Hora e Minuto_10.25,Hora e Minuto_10.333333333333334,Hora e Minuto_10.416666666666666,Hora e Minuto_10.5,Hora e Minuto_10.583333333333334,Hora e Minuto_10.666666666666666,Hora e Minuto_10.75,Hora e Minuto_10.833333333333334,Hora e Minuto_10.916666666666666,Hora e Minuto_11.0,Hora e Minuto_11.083333333333334,Hora e Minuto_11.166666666666666,Hora e Minuto_11.25,Hora e Minuto_11.333333333333334,Hora e Minuto_11.416666666666666,Hora e Minuto_11.5,Hora e Minuto_11.583333333333334,Hora e Minuto_11.666666666666666,Hora e Minuto_11.75,Hora e Minuto_11.833333333333334,Hora e Minuto_11.916666666666666,Hora e Minuto_12.0,Hora e Minuto_12.083333333333334,Hora e Minuto_12.166666666666666,Hora e Minuto_12.25,Hora e Minuto_12.333333333333334,Hora e Minuto_12.416666666666666,Hora e Minuto_12.5,Hora e Minuto_12.583333333333334,Hora e Minuto_12.666666666666666,Hora e Minuto_12.75,Hora e Minuto_12.833333333333334,Hora e Minuto_12.916666666666666,Hora e Minuto_13.0,Hora e Minuto_13.083333333333334,Hora e Minuto_13.166666666666666,Hora e Minuto_13.25,Hora e Minuto_13.333333333333334,Hora e Minuto_13.416666666666666,Hora e Minuto_13.5,Hora e Minuto_13.583333333333334,Hora e Minuto_13.666666666666666,Hora e Minuto_13.75,Hora e Minuto_13.833333333333334,Hora e Minuto_13.916666666666666,Hora e Minuto_14.0,Hora e Minuto_14.083333333333334,Hora e Minuto_14.166666666666666,Hora e Minuto_14.25,Hora e Minuto_14.333333333333334,Hora e Minuto_14.416666666666666,Hora e Minuto_14.5,Hora e Minuto_14.583333333333334,Hora e Minuto_14.666666666666666,Hora e Minuto_14.75,Hora e Minuto_14.833333333333334,Hora e Minuto_14.916666666666666,Hora e Minuto_15.0,Hora e Minuto_15.083333333333334,Hora e Minuto_15.166666666666666,Hora e Minuto_15.25,Hora e Minuto_15.333333333333334,Hora e Minuto_15.416666666666666,Hora e Minuto_15.5,Hora e Minuto_15.583333333333334,Hora e Minuto_15.666666666666666,Hora e Minuto_15.75,Hora e Minuto_15.833333333333334,Hora e Minuto_15.916666666666666,Hora e Minuto_16.0,Hora e Minuto_16.083333333333332,Hora e Minuto_16.166666666666668,Hora e Minuto_16.25,Hora e Minuto_16.333333333333332,Hora e Minuto_16.416666666666668,Hora e Minuto_16.5,Hora e Minuto_16.583333333333332,Hora e Minuto_16.666666666666668,Hora e Minuto_16.75,Hora e Minuto_16.833333333333332,Hora e Minuto_16.916666666666668,Hora e Minuto_17.0,Hora e Minuto_17.083333333333332,Hora e Minuto_17.166666666666668,Hora e Minuto_17.25,Hora e Minuto_17.333333333333332,Hora e Minuto_17.416666666666668,Hora e Minuto_17.5,Hora e Minuto_17.583333333333332,Hora e Minuto_17.666666666666668,Hora e Minuto_17.75,Hora e Minuto_17.833333333333332,Hora e Minuto_17.916666666666668,Hora e Minuto_18.0,Hora e Minuto_18.083333333333332,Hora e Minuto_18.166666666666668,Hora e Minuto_18.25,Hora e Minuto_18.333333333333332,Hora e Minuto_18.416666666666668,Hora e Minuto_18.5,Hora e Minuto_18.583333333333332,Hora e Minuto_18.666666666666668,Hora e Minuto_18.75,Hora e Minuto_18.833333333333332,Hora e Minuto_18.916666666666668,Hora e Minuto_19.0,Hora e Minuto_19.083333333333332,Hora e Minuto_19.166666666666668,Hora e Minuto_19.25,Hora e Minuto_19.333333333333332,Hora e Minuto_19.416666666666668,Hora e Minuto_19.5,Hora e Minuto_19.583333333333332,Hora e Minuto_19.666666666666668,Hora e Minuto_19.75,Hora e Minuto_19.833333333333332,Hora e Minuto_19.916666666666668,Hora e Minuto_2.0,Hora e Minuto_2.0833333333333335,Hora e Minuto_2.1666666666666665,Hora e Minuto_2.25,Hora e Minuto_2.3333333333333335,Hora e Minuto_2.4166666666666665,Hora e Minuto_2.5,Hora e Minuto_2.5833333333333335,Hora e Minuto_2.6666666666666665,Hora e Minuto_2.75,Hora e Minuto_2.8333333333333335,Hora e Minuto_2.9166666666666665,Hora e Minuto_20.0,Hora e Minuto_20.083333333333332,Hora e Minuto_20.166666666666668,Hora e Minuto_20.25,Hora e Minuto_20.333333333333332,Hora e Minuto_20.416666666666668,Hora e Minuto_20.5,Hora e Minuto_20.583333333333332,Hora e Minuto_20.666666666666668,Hora e Minuto_20.75,Hora e Minuto_20.833333333333332,Hora e Minuto_20.916666666666668,Hora e Minuto_21.0,Hora e Minuto_21.083333333333332,Hora e Minuto_21.166666666666668,Hora e Minuto_21.25,Hora e Minuto_21.333333333333332,Hora e Minuto_21.416666666666668,Hora e Minuto_21.5,Hora e Minuto_21.583333333333332,Hora e Minuto_21.666666666666668,Hora e Minuto_21.75,Hora e Minuto_21.833333333333332,Hora e Minuto_21.916666666666668,Hora e Minuto_22.0,Hora e Minuto_22.083333333333332,Hora e Minuto_22.166666666666668,Hora e Minuto_22.25,Hora e Minuto_22.333333333333332,Hora e Minuto_22.416666666666668,Hora e Minuto_22.5,Hora e Minuto_22.583333333333332,Hora e Minuto_22.666666666666668,Hora e Minuto_22.75,Hora e Minuto_22.833333333333332,Hora e Minuto_22.916666666666668,Hora e Minuto_23.0,Hora e Minuto_23.083333333333332,Hora e Minuto_23.166666666666668,Hora e Minuto_23.25,Hora e Minuto_23.333333333333332,Hora e Minuto_23.416666666666668,Hora e Minuto_23.5,Hora e Minuto_23.583333333333332,Hora e Minuto_23.666666666666668,Hora e Minuto_23.75,Hora e Minuto_23.833333333333332,Hora e Minuto_23.916666666666668,Hora e Minuto_3.0,Hora e Minuto_3.0833333333333335,Hora e Minuto_3.1666666666666665,Hora e Minuto_3.25,Hora e Minuto_3.3333333333333335,Hora e Minuto_3.4166666666666665,Hora e Minuto_3.5,Hora e Minuto_3.5833333333333335,Hora e Minuto_3.6666666666666665,Hora e Minuto_3.75,Hora e Minuto_3.8333333333333335,Hora e Minuto_3.9166666666666665,Hora e Minuto_4.0,Hora e Minuto_4.083333333333333,Hora e Minuto_4.166666666666667,Hora e Minuto_4.25,Hora e Minuto_4.333333333333333,Hora e Minuto_4.416666666666667,Hora e Minuto_4.5,Hora e Minuto_4.583333333333333,Hora e Minuto_4.666666666666667,Hora e Minuto_4.75,Hora e Minuto_4.833333333333333,Hora e Minuto_4.916666666666667,Hora e Minuto_5.0,Hora e Minuto_5.083333333333333,Hora e Minuto_5.166666666666667,Hora e Minuto_5.25,Hora e Minuto_5.333333333333333,Hora e Minuto_5.416666666666667,Hora e Minuto_5.5,Hora e Minuto_5.583333333333333,Hora e Minuto_5.666666666666667,Hora e Minuto_5.75,Hora e Minuto_5.833333333333333,Hora e Minuto_5.916666666666667,Hora e Minuto_6.0,Hora e Minuto_6.083333333333333,Hora e Minuto_6.166666666666667,Hora e Minuto_6.25,Hora e Minuto_6.333333333333333,Hora e Minuto_6.416666666666667,Hora e Minuto_6.5,Hora e Minuto_6.583333333333333,Hora e Minuto_6.666666666666667,Hora e Minuto_6.75,Hora e Minuto_6.833333333333333,Hora e Minuto_6.916666666666667,Hora e Minuto_7.0,Hora e Minuto_7.083333333333333,Hora e Minuto_7.166666666666667,Hora e Minuto_7.25,Hora e Minuto_7.333333333333333,Hora e Minuto_7.416666666666667,Hora e Minuto_7.5,Hora e Minuto_7.583333333333333,Hora e Minuto_7.666666666666667,Hora e Minuto_7.75,Hora e Minuto_7.833333333333333,Hora e Minuto_7.916666666666667,Hora e Minuto_8.0,Hora e Minuto_8.083333333333334,Hora e Minuto_8.166666666666666,Hora e Minuto_8.25,Hora e Minuto_8.333333333333334,Hora e Minuto_8.416666666666666,Hora e Minuto_8.5,Hora e Minuto_8.583333333333334,Hora e Minuto_8.666666666666666,Hora e Minuto_8.75,Hora e Minuto_8.833333333333334,Hora e Minuto_8.916666666666666,Hora e Minuto_9.0,Hora e Minuto_9.083333333333334,Hora e Minuto_9.166666666666666,Hora e Minuto_9.25,Hora e Minuto_9.333333333333334,Hora e Minuto_9.416666666666666,Hora e Minuto_9.5,Hora e Minuto_9.583333333333334,Hora e Minuto_9.666666666666666,Hora e Minuto_9.75,Hora e Minuto_9.833333333333334,Hora e Minuto_9.916666666666666
0,1.09,0.30,0.40,0.00,1.52,0.00,0.00,0.0,0.00,1.18,0.0,0.00,1.73,0.00,100.00,100.00,100.00,0.00,100.0,0.00,0.0,0.0,0.0,100.0,0.0,0.00,100.0,0.00,34.08,15.37,57.73,0.00,67.40,0.00,0.00,0.0,0.00,100.00,0.0,0.00,100.00,0.00,True,False,False,False,False,False,False,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1.09,0.30,0.40,0.00,1.52,0.00,0.00,0.0,0.00,1.18,0.0,0.00,1.73,0.00,100.00,100.00,100.00,0.00,100.0,0.00,0.0,0.0,0.0,100.0,0.0,0.00,100.0,0.00,34.08,15.37,57.73,0.00,67.40,0.00,0.00,0.0,0.00,100.00,0.0,0.00,100.00,0.00,True,False,False,False,False,False,False,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,1.09,0.30,0.40,0.00,1.52,0.00,0.00,0.0,0.00,1.18,0.0,0.00,1.73,0.00,100.00,100.00,100.00,0.00,100.0,0.00,0.0,0.0,0.0,100.0,0.0,0.00,100.0,0.00,34.08,15.37,57.73,0.00,67.40,0.00,0.00,0.0,0.00,100.00,0.0,0.00,100.00,0.00,True,False,False,False,False,False,False,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,1.09,0.30,0.40,0.00,1.52,0.00,0.00,0.0,0.00,1.18,0.0,0.00,1.73,0.00,100.00,100.00,100.00,0.00,100.0,0.00,0.0,0.0,0.0,100.0,0.0,0.00,100.0,0.00,34.08,15.37,57.73,0.00,67.40,0.00,0.00,0.0,0.00,100.00,0.0,0.00,100.00,0.00,True,False,False,False,False,False,False,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,1.09,0.30,0.40,0.00,1.52,0.00,0.00,0.0,0.00,1.18,0.0,0.00,1.73,0.00,100.00,100.00,100.00,0.00,100.0,0.00,0.0,0.0,0.0,100.0,0.0,0.00,100.0,0.00,34.08,15.37,57.73,0.00,67.40,0.00,0.00,0.0,0.00,100.00,0.0,0.00,100.00,0.00,True,False,False,False,False,False,False,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
788539,1.67,0.17,0.92,0.30,0.41,1.03,0.53,0.0,0.96,0.25,0.0,1.41,0.00,0.71,76.00,100.00,72.93,100.00,100.0,70.95,60.0,0.0,100.0,20.0,0.0,100.00,0.0,60.00,24.30,5.70,17.88,12.09,27.88,18.67,6.49,0.0,19.77,21.12,0.0,43.45,0.00,13.73,False,False,False,False,True,False,False,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
788540,1.97,0.17,1.07,0.30,0.41,1.45,0.35,0.0,0.96,1.25,0.0,1.41,0.00,0.47,86.88,100.00,85.20,100.00,100.0,100.00,40.0,0.0,100.0,100.0,0.0,100.00,0.0,40.00,27.12,5.70,20.04,12.09,27.88,24.45,4.33,0.0,19.77,57.25,0.0,43.45,0.00,9.15,False,False,False,False,True,False,False,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
788541,2.22,0.17,1.26,0.30,0.41,1.45,0.88,0.0,0.96,1.25,0.0,1.41,0.00,1.18,100.00,100.00,100.00,100.00,100.0,100.00,100.0,0.0,100.0,100.0,0.0,100.00,0.0,100.00,29.25,5.78,21.89,9.77,27.88,24.45,10.97,0.0,19.77,57.25,0.0,34.61,0.00,22.88,False,False,False,False,True,False,False,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
788542,2.71,0.37,1.42,0.54,0.41,1.81,0.88,0.0,0.96,1.25,0.0,1.74,0.00,1.88,82.97,73.76,85.57,60.25,100.0,88.33,100.0,0.0,100.0,100.0,0.0,77.78,0.0,80.11,32.82,11.08,24.66,16.59,27.88,28.75,10.82,0.0,17.46,57.25,0.0,42.75,0.00,32.10,False,False,False,False,True,False,False,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


# Criando o Support Vector

**Criação do dataset para predição**

Começamos a criação do dataset para predição, criando um dataframe com todas as informações do dataframe principal, e uma variável contendo apenas as informações que desejamos obter como output.

In [None]:
df_both = df_teste
#y = df_both[['Total de Domicílios','Masculino','Feminino','AB','C1','C2','DE','4-11 anos','12-17 anos','18-24 anos','25-34 anos','35-49 anos','50-59 anos','60+ anos','Total de Indivíduos','Masculino FID','Feminino FID','AB FID','C1 FID','C2 FID','DE FID','4-11 anos FID','12-17 anos FID','18-24 anos FID','25-34 anos FID','35-49 anos FID','50-59 anos FID','60+ anos FID','Total de Domicílios SHR','Masculino SHR','Feminino SHR','AB SHR','C1 SHR','C2 SHR','DE SHR','4-11 anos SHR','12-17 anos SHR','18-24 anos SHR','25-34 anos SHR','35-49 anos SHR','50-59 anos SHR','60+ anos SHR']]
#y = df_both[['Total de Domicílios', 'Total de Domicílios SHR', 'Total de Indivíduos']]
#y = df_both[['Total de Domicílios','Masculino','Feminino','AB','C1','C2','DE','4-11 anos','12-17 anos','18-24 anos','25-34 anos','35-49 anos','50-59 anos','60+ anos']]
y = df_both['Total de Domicílios']

**Remoção do output**

Removemos as informações de output desejados do dataset que será usado.

In [None]:
df_both.drop(['Total de Domicílios'], axis=1, inplace=True)
df_both.drop(['Masculino'], axis=1, inplace=True)
df_both.drop(['Feminino'], axis=1, inplace=True)
df_both.drop(['AB'], axis=1, inplace=True)
df_both.drop(['C1'], axis=1, inplace=True)
df_both.drop(['C2'], axis=1, inplace=True)
df_both.drop(['DE'], axis=1, inplace=True)
df_both.drop(['4-11 anos'], axis=1, inplace=True)
df_both.drop(['12-17 anos'], axis=1, inplace=True)
df_both.drop(['18-24 anos'], axis=1, inplace=True)
df_both.drop(['25-34 anos'], axis=1, inplace=True)
df_both.drop(['35-49 anos'], axis=1, inplace=True)
df_both.drop(['50-59 anos'], axis=1, inplace=True)
df_both.drop(['60+ anos'], axis=1, inplace=True)
df_both.drop(['Total de Indivíduos'], axis=1, inplace=True)
df_both.drop(['Masculino FID'], axis=1, inplace=True)
df_both.drop(['Feminino FID'], axis=1, inplace=True)
df_both.drop(['AB FID'], axis=1, inplace=True)
df_both.drop(['C1 FID'], axis=1, inplace=True)
df_both.drop(['C2 FID'], axis=1, inplace=True)
df_both.drop(['DE FID'], axis=1, inplace=True)
df_both.drop(['4-11 anos FID'], axis=1, inplace=True)
df_both.drop(['12-17 anos FID'], axis=1, inplace=True)
df_both.drop(['18-24 anos FID'], axis=1, inplace=True)
df_both.drop(['25-34 anos FID'], axis=1, inplace=True)
df_both.drop(['35-49 anos FID'], axis=1, inplace=True)
df_both.drop(['50-59 anos FID'], axis=1, inplace=True)
df_both.drop(['60+ anos FID'], axis=1, inplace=True)
df_both.drop(['Total de Domicílios SHR'], axis=1, inplace=True)
df_both.drop(['Masculino SHR'], axis=1, inplace=True)
df_both.drop(['Feminino SHR'], axis=1, inplace=True)
df_both.drop(['AB SHR'], axis=1, inplace=True)
df_both.drop(['C1 SHR'], axis=1, inplace=True)
df_both.drop(['C2 SHR'], axis=1, inplace=True)
df_both.drop(['DE SHR'], axis=1, inplace=True)
df_both.drop(['4-11 anos SHR'], axis=1, inplace=True)
df_both.drop(['12-17 anos SHR'], axis=1, inplace=True)
df_both.drop(['18-24 anos SHR'], axis=1, inplace=True)
df_both.drop(['25-34 anos SHR'], axis=1, inplace=True)
df_both.drop(['35-49 anos SHR'], axis=1, inplace=True)
df_both.drop(['50-59 anos SHR'], axis=1, inplace=True)
df_both.drop(['60+ anos SHR'], axis=1, inplace=True)

**Train test split**

Usando o 'train_test_split' (biblioteca do python), separamos o input e output desejado em treino e teste. Escolhemos a quantidade que queremos usar para teste no 'test_size' e escolhemos se queremos os dados sortidos aleatóriamente ou não no 'shuffle'.

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df_both, y, test_size=0.15, shuffle=True)

Importamos a biblioteca do Support Vector do Sklearn, e criamos o modelo. Nele, inserimos uma quantidade limitada de linhas do dataset, uma vez que o modelo requer muito tempo para ser treinado, e limitando a quantidade de valores, conseguimos diminuímos a quantidade de tempo necessário.

In [None]:
from sklearn.svm import SVR
sv = SVR()
sv.fit(X_train[:20_000], y_train[:20_000])

SVR()

**Predição do Teste**

Para testarmos o modelo, usamos a função "predict" para criar uma variável usando com os valores preditos pelo modelo usando as features separadas para teste.

In [None]:
test_pred = sv.predict(X_test)

**Métricas do teste**

Utilizando o valor predito e o valor real, calculamos certas métricas para verificar o sucesso do modelo. Essas métricas sendo:
- R quadrado
- Erro médio absoluto
- Erro médio quadrático

In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
print(r2_score(y_test, test_pred)) #R quadrado
print(mean_absolute_error(y_test, test_pred)) #erro médio absoluto
print(mean_squared_error(y_test, test_pred)) #erro médio quadrático

0.8406374987350538
2.1446737402735336
10.084220970792735


**Predição do Treino**

Como fizemos com o teste, criamos uma predição com os inputs usados no treino.

In [None]:
train_pred = sv.predict(X_train[:300_000])

**Métricas do treino**

Calculamos as métricas do treino comparando o valor real com o valor predito.

In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
print(r2_score(y_train, train_pred)) #R quadrado
print(mean_absolute_error(y_train, train_pred)) #erro médio absoluto
print(mean_squared_error(y_train, train_pred)) #erro médio quadrático

# Gráficos e comparações

**Gráfico de comparação**

Criamos um gráfico contendo as informações da predição do teste e os valores dos valores reais.

In [None]:
plt.scatter(y_test,test_pred)

**Dataframe com predição e valor real**

Para conseguirmos ver a predição e compará-lo com os valores reais lado a lado, criamos um dataframe contendo esses valores.

In [None]:
results = {'Real Value': y_test, 'Prediction': test_pred}
df_results = pd.DataFrame(data=results)
df_results

**Função de criação de gráficos de comparação**

In [None]:
def grafico_metricas(valores_x, valores_y, tipo_dado, nome_modelo):

    import matplotlib.pyplot as plt
    import numpy as np
    from scipy.stats import linregress

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(25, 10))

    ax1.set_title("Relação linear entre as séries de dados reais e preditos \n para o conjunto de " + tipo_dado + \
                  " do modelo " + nome_modelo, fontsize=20, weight='bold')
    ax1.set_xlabel("Série de audiências reais para " + tipo_dado, fontsize=20)
    ax1.set_ylabel("Série de audiências preditas para" + tipo_dado, fontsize=20)
    ax1.tick_params(axis='x', labelsize=20)
    ax1.tick_params(axis='y', labelsize=20)

    slope, intercept, r_value, p_value, std_err = linregress(valores_x, valores_y)
    xaxis_regression = np.linspace(min(valores_x), max(valores_x), num=100)

    equation = ("y = " + "{:.5f}".format(slope) + "x" + " + (" + "{:.5f}".format(intercept) + ")")

    ax1.scatter(valores_x, valores_y, lw=3, color = "#2D243F", label = "timestamps")
    ax1.plot(xaxis_regression, intercept + slope * xaxis_regression, ls = "--", color="#E44B4A", lw=4, label = equation)

    ax1.annotate("r²=" + "{:.2f}".format(r_value**2), xy=(0.1, 0.79), xycoords='axes fraction', fontsize=20)
    ax1.annotate("ρ=" + "{:.2f}".format(p_value), xy=(0.1, 0.84), xycoords='axes fraction', fontsize=20)
    ax1.grid()
    ax1.legend(prop={"size": 15})

    difference_results = valores_x - valores_y
    
    ax2.set_title("Distribuição das diferenças entre séries reais e preditas \n para o conjunto de " + tipo_dado + \
              " do modelo " + nome_modelo, fontsize=20, weight='bold')
    
    ax2.set_xlabel("Diferença entre audiências reais e preditas (" + tipo_dado + ") ", fontsize=20)
    ax2.set_ylabel("Distribuição das diferenças", fontsize=20)
    ax2.tick_params(axis='x', labelsize=20)
    ax2.tick_params(axis='y', labelsize=20)

    difference_results_std = difference_results.std()
    difference_results_stde_mean = difference_results.sem()
    difference_results_var = difference_results.var()

    box_differences = ax2.boxplot(difference_results, medianprops=dict(color="#2D243F", linewidth=5), patch_artist = True)
    box_differences["boxes"][0].set_facecolor("#E44B4A")
    ax2.annotate("std σ=" + "{:.2f}".format(difference_results_std), xy=(0.7, 0.8), xycoords='axes fraction', fontsize=20)
    ax2.annotate("SE=" + "{:.2f}".format(difference_results_stde_mean), xy=(0.7, 0.75), xycoords='axes fraction', fontsize=20)
    ax2.annotate("var σ²=" + "{:.2f}".format(difference_results_var), xy=(0.7, 0.70), xycoords='axes fraction', fontsize=20)
    ax2.grid()

    plt.show()

    return "{:.2f}".format(r_value**2), p_value, difference_results_std, difference_results_stde_mean, difference_results_var

In [None]:
a, b, c, d, e = grafico_metricas(y_test, test_pred, "teste", "Support Vector")

In [None]:
f, g, h, i, j = grafico_metricas(y_train[:300_000], train_pred, "treino", "Support Vector")

**Criação de dataframe com as informações de métricas do modelo no teste e no treino**

In [None]:
dic_results_teste = {'Modelo': 'Support Vector', 'Tipo_de_dado':['Teste','Treino'], 'r²':[a,f], 'ρ':[b,g] ,'std σ':[c,h],'SE':[d,i], 'var':[e,j]}
results_teste_df = pd.DataFrame(data=dic_results_teste)
results_teste_df

# Hiperparametrização

**Importação do Grid Search e definição dos parâmetros**

In [None]:
from sklearn.model_selection import GridSearchCV
param_grid = { 'C':[0.1,1,100,1000],
              'kernel':['rbf','poly','sigmoid','linear'],
              'degree':[1,2,3,4,5,6]
              }

from sklearn.svm import SVR
grid = GridSearchCV(SVR(),param_grid)

**Inserção dos valores para treino**

In [None]:
grid.fit(X_train[:20_000],y_train[:20_000])

**Print dos melhores parâmetros encontrados pelo Grid Search**

In [None]:
print(grid.best_params_)

**Predição utilizando o Grid Search**

In [None]:
grid_predict = grid.predict(X_test)

**Métricas da predição**

In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
print(r2_score(y_test, grid_predict)) #R quadrado
print(mean_absolute_error(y_test, grid_predict)) #erro médio absoluto
print(mean_squared_error(y_test, grid_predict)) #erro médio quadrático