# Projeto Análise de dados da Covid 19 no estado de São Paulo

Este projeto visa analisar os dados dos casos de covid-19 no estado de São Paulo do período de fevereiro de 2020 a setembro de 2021.

Dados disponíveis em: <br>
- https://coronavirus.seade.gov.br <br>
- https://github.com/seade-R/dados-covid-sp/tree/master

## **Importação dos Dados**

In [1]:
import numpy as np
import pandas as pd

In [2]:
covid_sp = pd.read_csv('./dataset/dados_covid_sp.csv', sep=';', encoding='utf-8')

In [3]:
covid_sp.head()

Unnamed: 0,nome_munic,codigo_ibge,dia,mes,datahora,casos,casos_novos,casos_pc,casos_mm7d,obitos,...,nome_drs,cod_drs,pop,pop_60,area,map_leg,map_leg_s,latitude,longitude,semana_epidem
0,Adamantina,3500105,25,2,2020-02-25,0,0,"0,00000000000000e+00",0,0,...,Marília,5,33894,7398,41199,0,8.0,-216820,-510737,9
1,Adolfo,3500204,25,2,2020-02-25,0,0,"0,00000000000000e+00",0,0,...,São José do Rio Preto,15,3447,761,21106,0,8.0,-212325,-496451,9
2,Aguaí,3500303,25,2,2020-02-25,0,0,"0,00000000000000e+00",0,0,...,São João da Boa Vista,14,35608,5245,47455,0,8.0,-220572,-469735,9
3,Águas da Prata,3500402,25,2,2020-02-25,0,0,"0,00000000000000e+00",0,0,...,São João da Boa Vista,14,7797,1729,14267,0,8.0,-219319,-467176,9
4,Águas de Lindóia,3500501,25,2,2020-02-25,0,0,"0,00000000000000e+00",0,0,...,Campinas,3,18374,3275,6013,0,8.0,-224733,-466314,9


In [4]:
covid_sp.shape

(374034, 26)

## **Tratamento dos Dados**

### Renomeando variáveis (col)

In [5]:
covid_sp.head(5)

Unnamed: 0,nome_munic,codigo_ibge,dia,mes,datahora,casos,casos_novos,casos_pc,casos_mm7d,obitos,...,nome_drs,cod_drs,pop,pop_60,area,map_leg,map_leg_s,latitude,longitude,semana_epidem
0,Adamantina,3500105,25,2,2020-02-25,0,0,"0,00000000000000e+00",0,0,...,Marília,5,33894,7398,41199,0,8.0,-216820,-510737,9
1,Adolfo,3500204,25,2,2020-02-25,0,0,"0,00000000000000e+00",0,0,...,São José do Rio Preto,15,3447,761,21106,0,8.0,-212325,-496451,9
2,Aguaí,3500303,25,2,2020-02-25,0,0,"0,00000000000000e+00",0,0,...,São João da Boa Vista,14,35608,5245,47455,0,8.0,-220572,-469735,9
3,Águas da Prata,3500402,25,2,2020-02-25,0,0,"0,00000000000000e+00",0,0,...,São João da Boa Vista,14,7797,1729,14267,0,8.0,-219319,-467176,9
4,Águas de Lindóia,3500501,25,2,2020-02-25,0,0,"0,00000000000000e+00",0,0,...,Campinas,3,18374,3275,6013,0,8.0,-224733,-466314,9


In [6]:
covid_sp = covid_sp.rename(columns={'nome_munic' : 'municipio'})

In [7]:
covid_sp.rename(columns={'datahora' : 'data'}, inplace=True)

In [8]:
covid_sp.rename(columns={'map_leg':'rotulo_mapa', 'map_leg_s': 'codigo_mapa'}, inplace=True)

In [9]:
covid_sp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 374034 entries, 0 to 374033
Data columns (total 26 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   municipio      374034 non-null  object 
 1   codigo_ibge    374034 non-null  int64  
 2   dia            374034 non-null  int64  
 3   mes            374034 non-null  int64  
 4   data           374034 non-null  object 
 5   casos          374034 non-null  int64  
 6   casos_novos    374034 non-null  int64  
 7   casos_pc       374034 non-null  object 
 8   casos_mm7d     374034 non-null  object 
 9   obitos         374034 non-null  int64  
 10  obitos_novos   374034 non-null  int64  
 11  obitos_pc      374034 non-null  object 
 12  obitos_mm7d    374034 non-null  object 
 13  letalidade     374034 non-null  object 
 14  nome_ra        373455 non-null  object 
 15  cod_ra         374034 non-null  int64  
 16  nome_drs       373455 non-null  object 
 17  cod_drs        374034 non-nul

In [10]:
covid_sp['area']

0         41199
1         21106
2         47455
3         14267
4          6013
          ...  
374029        0
374030        0
374031        0
374032        0
374033        0
Name: area, Length: 374034, dtype: int64

### Excluindo variáveis (col)

In [11]:
covid_sp_alterado = covid_sp.drop(columns=['cod_ra'])

In [12]:
#Excluindo por nmr

covid_sp_alterado.drop(covid_sp_alterado.columns[[1]],axis=1,inplace=True)

In [13]:
covid_sp_alterado.drop(columns=['rotulo_mapa', 'codigo_mapa', 'cod_drs'], inplace=True)

In [14]:
covid_sp_alterado.head(2)

Unnamed: 0,municipio,dia,mes,data,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,...,obitos_mm7d,letalidade,nome_ra,nome_drs,pop,pop_60,area,latitude,longitude,semana_epidem
0,Adamantina,25,2,2020-02-25,0,0,"0,00000000000000e+00",0,0,0,...,0,0,RA de Presidente Prudente,Marília,33894,7398,41199,-216820,-510737,9
1,Adolfo,25,2,2020-02-25,0,0,"0,00000000000000e+00",0,0,0,...,0,0,RA de São José do Rio Preto,São José do Rio Preto,3447,761,21106,-212325,-496451,9


In [15]:
covid_sp_alterado.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 374034 entries, 0 to 374033
Data columns (total 21 columns):
 #   Column         Non-Null Count   Dtype 
---  ------         --------------   ----- 
 0   municipio      374034 non-null  object
 1   dia            374034 non-null  int64 
 2   mes            374034 non-null  int64 
 3   data           374034 non-null  object
 4   casos          374034 non-null  int64 
 5   casos_novos    374034 non-null  int64 
 6   casos_pc       374034 non-null  object
 7   casos_mm7d     374034 non-null  object
 8   obitos         374034 non-null  int64 
 9   obitos_novos   374034 non-null  int64 
 10  obitos_pc      374034 non-null  object
 11  obitos_mm7d    374034 non-null  object
 12  letalidade     374034 non-null  object
 13  nome_ra        373455 non-null  object
 14  nome_drs       373455 non-null  object
 15  pop            374034 non-null  int64 
 16  pop_60         374034 non-null  int64 
 17  area           374034 non-null  int64 
 18  lati

In [16]:
covid_sp_alterado.drop(covid_sp_alterado.columns[[13,14,18,19]], axis=1, inplace=True)
#nome_ra
#nome_drs
#latitude e longitude  

In [17]:
covid_sp_alterado.shape


(374034, 17)

### Criando e Alterando valores

In [18]:
covid_sp_alterado.head(3)

Unnamed: 0,municipio,dia,mes,data,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d,letalidade,pop,pop_60,area,semana_epidem
0,Adamantina,25,2,2020-02-25,0,0,"0,00000000000000e+00",0,0,0,"0,00000000000000e+00",0,0,33894,7398,41199,9
1,Adolfo,25,2,2020-02-25,0,0,"0,00000000000000e+00",0,0,0,"0,00000000000000e+00",0,0,3447,761,21106,9
2,Aguaí,25,2,2020-02-25,0,0,"0,00000000000000e+00",0,0,0,"0,00000000000000e+00",0,0,35608,5245,47455,9


In [19]:
# Passando área para km²

covid_sp_alterado['area'] = covid_sp_alterado['area']/100
#ou
# covid_sp_alterado['area'] = covid_sp_alterado.area/100

In [20]:
# Criação de coluna com a densidade demográfica (hab/km²)
covid_sp_alterado['densidade_demo'] = covid_sp_alterado['pop'] / covid_sp_alterado['area']

In [21]:
covid_sp_alterado.shape

(374034, 18)

In [22]:
# Criação de coluna com índices
# 374034 + 1
lista = list(range(1,374034+1))
print(lista[0])
print(lista[-1])

1
374034


In [23]:
df = pd.DataFrame(lista,columns=['indice'])
df

Unnamed: 0,indice
0,1
1,2
2,3
3,4
4,5
...,...
374029,374030
374030,374031
374031,374032
374032,374033


In [24]:
# Juntando os DataFrames

covid_sp_alterado = pd.concat([covid_sp_alterado, df], axis=1)
covid_sp_alterado


Unnamed: 0,municipio,dia,mes,data,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d,letalidade,pop,pop_60,area,semana_epidem,densidade_demo,indice
0,Adamantina,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,33894,7398,411.99,9,82.268987,1
1,Adolfo,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,3447,761,211.06,9,16.331849,2
2,Aguaí,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,35608,5245,474.55,9,75.035297,3
3,Águas da Prata,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,7797,1729,142.67,9,54.650592,4
4,Águas de Lindóia,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,18374,3275,60.13,9,305.571262,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
374029,Ignorado,21,9,2021-09-21,1956,-477,"0,00000000000000e+00",-931428571428571445,7,-1,"0,00000000000000e+00",-0285714285714286,0003578732106339468,0,0,0.00,38,,374030
374030,Ignorado,22,9,2021-09-22,1414,-542,"0,00000000000000e+00",-1038714285714285779,7,0,"0,00000000000000e+00",-0571428571428571,0004950495049504951,0,0,0.00,38,,374031
374031,Ignorado,23,9,2021-09-23,962,-452,"0,00000000000000e+00",-348000000000000000,6,-1,"0,00000000000000e+00",-1714285714285714,0006237006237006237,0,0,0.00,38,,374032
374032,Ignorado,24,9,2021-09-24,557,-405,"0,00000000000000e+00",-774428571428571445,1,-5,"0,00000000000000e+00",-2428571428571428,0001795332136445242,0,0,0.00,38,,374033


### Counter e seleção por indices

In [25]:
covid_sp_alterado = covid_sp_alterado.reindex(columns=['indice'] + list(covid_sp_alterado.columns[:-1]))
covid_sp_alterado

Unnamed: 0,indice,municipio,dia,mes,data,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d,letalidade,pop,pop_60,area,semana_epidem,densidade_demo
0,1,Adamantina,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,33894,7398,411.99,9,82.268987
1,2,Adolfo,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,3447,761,211.06,9,16.331849
2,3,Aguaí,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,35608,5245,474.55,9,75.035297
3,4,Águas da Prata,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,7797,1729,142.67,9,54.650592
4,5,Águas de Lindóia,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,18374,3275,60.13,9,305.571262
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
374029,374030,Ignorado,21,9,2021-09-21,1956,-477,"0,00000000000000e+00",-931428571428571445,7,-1,"0,00000000000000e+00",-0285714285714286,0003578732106339468,0,0,0.00,38,
374030,374031,Ignorado,22,9,2021-09-22,1414,-542,"0,00000000000000e+00",-1038714285714285779,7,0,"0,00000000000000e+00",-0571428571428571,0004950495049504951,0,0,0.00,38,
374031,374032,Ignorado,23,9,2021-09-23,962,-452,"0,00000000000000e+00",-348000000000000000,6,-1,"0,00000000000000e+00",-1714285714285714,0006237006237006237,0,0,0.00,38,
374032,374033,Ignorado,24,9,2021-09-24,557,-405,"0,00000000000000e+00",-774428571428571445,1,-5,"0,00000000000000e+00",-2428571428571428,0001795332136445242,0,0,0.00,38,


In [26]:
covid_sp_alterado['semana_epidem'].value_counts()

semana_epidem
10    9044
12    9044
11    9044
13    9044
14    9044
22    9044
15    9044
16    9044
17    9044
18    9044
19    9044
20    9044
21    9044
26    9044
23    9044
24    9044
25    9044
28    9044
27    9044
29    9044
30    9044
38    9044
31    9044
32    9044
33    9044
34    9044
35    9044
36    9044
37    9044
9     7752
39    4522
40    4522
41    4522
42    4522
43    4522
44    4522
45    4522
46    4522
47    4522
48    4522
49    4522
50    4522
51    4522
52    4522
53    4522
1     4522
2     4522
3     4522
4     4522
5     4522
6     4522
7     4522
8     4522
Name: count, dtype: int64

In [27]:
covid_sp_alterado['semana_epidem'].value_counts().sort_index()

semana_epidem
1     4522
2     4522
3     4522
4     4522
5     4522
6     4522
7     4522
8     4522
9     7752
10    9044
11    9044
12    9044
13    9044
14    9044
15    9044
16    9044
17    9044
18    9044
19    9044
20    9044
21    9044
22    9044
23    9044
24    9044
25    9044
26    9044
27    9044
28    9044
29    9044
30    9044
31    9044
32    9044
33    9044
34    9044
35    9044
36    9044
37    9044
38    9044
39    4522
40    4522
41    4522
42    4522
43    4522
44    4522
45    4522
46    4522
47    4522
48    4522
49    4522
50    4522
51    4522
52    4522
53    4522
Name: count, dtype: int64

In [28]:
from collections import Counter

In [29]:
Counter(covid_sp_alterado.semana_epidem)

Counter({10: 9044,
         11: 9044,
         12: 9044,
         13: 9044,
         14: 9044,
         15: 9044,
         16: 9044,
         17: 9044,
         18: 9044,
         19: 9044,
         20: 9044,
         21: 9044,
         22: 9044,
         23: 9044,
         24: 9044,
         25: 9044,
         26: 9044,
         27: 9044,
         28: 9044,
         29: 9044,
         30: 9044,
         31: 9044,
         32: 9044,
         33: 9044,
         34: 9044,
         35: 9044,
         36: 9044,
         37: 9044,
         38: 9044,
         9: 7752,
         39: 4522,
         40: 4522,
         41: 4522,
         42: 4522,
         43: 4522,
         44: 4522,
         45: 4522,
         46: 4522,
         47: 4522,
         48: 4522,
         49: 4522,
         50: 4522,
         51: 4522,
         52: 4522,
         53: 4522,
         1: 4522,
         2: 4522,
         3: 4522,
         4: 4522,
         5: 4522,
         6: 4522,
         7: 4522,
         8: 4522})

In [30]:
covid_sp_alterado.query('obitos_novos > 100')['municipio'].value_counts()

municipio
São Paulo    139
Name: count, dtype: int64

#### Variaveis por índices

In [31]:
covid_sp_alterado.head(1)

Unnamed: 0,indice,municipio,dia,mes,data,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d,letalidade,pop,pop_60,area,semana_epidem,densidade_demo
0,1,Adamantina,25,2,2020-02-25,0,0,"0,00000000000000e+00",0,0,0,"0,00000000000000e+00",0,0,33894,7398,411.99,9,82.268987


In [32]:
x = covid_sp_alterado.iloc[: , 5:13 ]
x

Unnamed: 0,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d
0,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000
1,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000
2,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000
3,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000
4,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000
...,...,...,...,...,...,...,...,...
374029,1956,-477,"0,00000000000000e+00",-931428571428571445,7,-1,"0,00000000000000e+00",-0285714285714286
374030,1414,-542,"0,00000000000000e+00",-1038714285714285779,7,0,"0,00000000000000e+00",-0571428571428571
374031,962,-452,"0,00000000000000e+00",-348000000000000000,6,-1,"0,00000000000000e+00",-1714285714285714
374032,557,-405,"0,00000000000000e+00",-774428571428571445,1,-5,"0,00000000000000e+00",-2428571428571428


In [33]:
type(x)

pandas.core.frame.DataFrame

In [34]:
y = covid_sp_alterado.iloc[:,1].values

In [35]:
y

array(['Adamantina', 'Adolfo', 'Aguaí', ..., 'Ignorado', 'Ignorado',
       'Ignorado'], shape=(374034,), dtype=object)

In [36]:
lista_y = list(y.flatten())
lista_y

['Adamantina',
 'Adolfo',
 'Aguaí',
 'Águas da Prata',
 'Águas de Lindóia',
 'Águas de Santa Bárbara',
 'Águas de São Pedro',
 'Agudos',
 'Alambari',
 'Alfredo Marcondes',
 'Altair',
 'Altinópolis',
 'Alto Alegre',
 'Alumínio',
 'Álvares Florence',
 'Álvares Machado',
 'Álvaro de Carvalho',
 'Alvinlândia',
 'Americana',
 'Américo Brasiliense',
 'Américo de Campos',
 'Amparo',
 'Analândia',
 'Andradina',
 'Angatuba',
 'Anhembi',
 'Anhumas',
 'Aparecida',
 "Aparecida d'Oeste",
 'Apiaí',
 'Araçariguama',
 'Araçatuba',
 'Araçoiaba da Serra',
 'Aramina',
 'Arandu',
 'Arapeí',
 'Araraquara',
 'Araras',
 'Arco-Íris',
 'Arealva',
 'Areias',
 'Areiópolis',
 'Ariranha',
 'Artur Nogueira',
 'Arujá',
 'Aspásia',
 'Assis',
 'Atibaia',
 'Auriflama',
 'Avaí',
 'Avanhandava',
 'Avaré',
 'Bady Bassitt',
 'Balbinos',
 'Bálsamo',
 'Bananal',
 'Barão de Antonina',
 'Barbosa',
 'Bariri',
 'Barra Bonita',
 'Barra do Chapéu',
 'Barra do Turvo',
 'Barretos',
 'Barrinha',
 'Barueri',
 'Bastos',
 'Batatais',
 '

In [37]:
type(lista_y)

list

In [38]:
df = pd.DataFrame(lista_y, columns=['municipios'])
df

Unnamed: 0,municipios
0,Adamantina
1,Adolfo
2,Aguaí
3,Águas da Prata
4,Águas de Lindóia
...,...
374029,Ignorado
374030,Ignorado
374031,Ignorado
374032,Ignorado


In [39]:
type(df)

pandas.core.frame.DataFrame

### Excluindo, filtrando e substituindo

#### Exemplos de exclusões

In [40]:
#Linhas especificas
covid_sp_alterado2 = covid_sp_alterado.drop(covid_sp_alterado.index[[1,3]])
covid_sp_alterado2

Unnamed: 0,indice,municipio,dia,mes,data,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d,letalidade,pop,pop_60,area,semana_epidem,densidade_demo
0,1,Adamantina,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,33894,7398,411.99,9,82.268987
2,3,Aguaí,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,35608,5245,474.55,9,75.035297
4,5,Águas de Lindóia,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,18374,3275,60.13,9,305.571262
5,6,Águas de Santa Bárbara,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,5931,1106,404.46,9,14.663996
6,7,Águas de São Pedro,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,3122,764,3.61,9,864.819945
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
374029,374030,Ignorado,21,9,2021-09-21,1956,-477,"0,00000000000000e+00",-931428571428571445,7,-1,"0,00000000000000e+00",-0285714285714286,0003578732106339468,0,0,0.00,38,
374030,374031,Ignorado,22,9,2021-09-22,1414,-542,"0,00000000000000e+00",-1038714285714285779,7,0,"0,00000000000000e+00",-0571428571428571,0004950495049504951,0,0,0.00,38,
374031,374032,Ignorado,23,9,2021-09-23,962,-452,"0,00000000000000e+00",-348000000000000000,6,-1,"0,00000000000000e+00",-1714285714285714,0006237006237006237,0,0,0.00,38,
374032,374033,Ignorado,24,9,2021-09-24,557,-405,"0,00000000000000e+00",-774428571428571445,1,-5,"0,00000000000000e+00",-2428571428571428,0001795332136445242,0,0,0.00,38,


In [41]:
#slice (intervalo)
covid_sp_alterado2 = covid_sp_alterado2.drop(covid_sp_alterado.index[4:7])

covid_sp_alterado2

Unnamed: 0,indice,municipio,dia,mes,data,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d,letalidade,pop,pop_60,area,semana_epidem,densidade_demo
0,1,Adamantina,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,33894,7398,411.99,9,82.268987
2,3,Aguaí,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,35608,5245,474.55,9,75.035297
7,8,Agudos,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,36134,5524,966.71,9,37.378324
8,9,Alambari,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,5779,830,15.96,9,362.092732
9,10,Alfredo Marcondes,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,3927,907,118.92,9,33.022200
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
374029,374030,Ignorado,21,9,2021-09-21,1956,-477,"0,00000000000000e+00",-931428571428571445,7,-1,"0,00000000000000e+00",-0285714285714286,0003578732106339468,0,0,0.00,38,
374030,374031,Ignorado,22,9,2021-09-22,1414,-542,"0,00000000000000e+00",-1038714285714285779,7,0,"0,00000000000000e+00",-0571428571428571,0004950495049504951,0,0,0.00,38,
374031,374032,Ignorado,23,9,2021-09-23,962,-452,"0,00000000000000e+00",-348000000000000000,6,-1,"0,00000000000000e+00",-1714285714285714,0006237006237006237,0,0,0.00,38,
374032,374033,Ignorado,24,9,2021-09-24,557,-405,"0,00000000000000e+00",-774428571428571445,1,-5,"0,00000000000000e+00",-2428571428571428,0001795332136445242,0,0,0.00,38,


In [42]:
covid_sp_alterado2.reset_index(drop=True, inplace=True)
covid_sp_alterado2

Unnamed: 0,indice,municipio,dia,mes,data,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d,letalidade,pop,pop_60,area,semana_epidem,densidade_demo
0,1,Adamantina,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,33894,7398,411.99,9,82.268987
1,3,Aguaí,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,35608,5245,474.55,9,75.035297
2,8,Agudos,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,36134,5524,966.71,9,37.378324
3,9,Alambari,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,5779,830,15.96,9,362.092732
4,10,Alfredo Marcondes,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,3927,907,118.92,9,33.022200
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
374024,374030,Ignorado,21,9,2021-09-21,1956,-477,"0,00000000000000e+00",-931428571428571445,7,-1,"0,00000000000000e+00",-0285714285714286,0003578732106339468,0,0,0.00,38,
374025,374031,Ignorado,22,9,2021-09-22,1414,-542,"0,00000000000000e+00",-1038714285714285779,7,0,"0,00000000000000e+00",-0571428571428571,0004950495049504951,0,0,0.00,38,
374026,374032,Ignorado,23,9,2021-09-23,962,-452,"0,00000000000000e+00",-348000000000000000,6,-1,"0,00000000000000e+00",-1714285714285714,0006237006237006237,0,0,0.00,38,
374027,374033,Ignorado,24,9,2021-09-24,557,-405,"0,00000000000000e+00",-774428571428571445,1,-5,"0,00000000000000e+00",-2428571428571428,0001795332136445242,0,0,0.00,38,


#### Filtrando e Excluindo

In [43]:
ignorado = covid_sp_alterado.loc[covid_sp_alterado.municipio == 'Ignorado']
ignorado

Unnamed: 0,indice,municipio,dia,mes,data,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d,letalidade,pop,pop_60,area,semana_epidem,densidade_demo
373455,373456,Ignorado,25,2,2020-02-25,-1,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,0,0,0.0,9,
373456,373457,Ignorado,26,2,2020-02-26,0,1,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,0,0,0.0,9,
373457,373458,Ignorado,27,2,2020-02-27,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,0,0,0.0,9,
373458,373459,Ignorado,28,2,2020-02-28,-1,-1,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,0,0,0.0,9,
373459,373460,Ignorado,29,2,2020-02-29,0,1,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,0,0,0.0,9,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
374029,374030,Ignorado,21,9,2021-09-21,1956,-477,"0,00000000000000e+00",-931428571428571445,7,-1,"0,00000000000000e+00",-0285714285714286,0003578732106339468,0,0,0.0,38,
374030,374031,Ignorado,22,9,2021-09-22,1414,-542,"0,00000000000000e+00",-1038714285714285779,7,0,"0,00000000000000e+00",-0571428571428571,0004950495049504951,0,0,0.0,38,
374031,374032,Ignorado,23,9,2021-09-23,962,-452,"0,00000000000000e+00",-348000000000000000,6,-1,"0,00000000000000e+00",-1714285714285714,0006237006237006237,0,0,0.0,38,
374032,374033,Ignorado,24,9,2021-09-24,557,-405,"0,00000000000000e+00",-774428571428571445,1,-5,"0,00000000000000e+00",-2428571428571428,0001795332136445242,0,0,0.0,38,


In [44]:
ignorado.shape

(579, 19)

In [45]:
# Filtrando os ignorados
covid_sp_alterado = covid_sp_alterado.loc[covid_sp_alterado.municipio != 'Ignorado']
covid_sp_alterado

Unnamed: 0,indice,municipio,dia,mes,data,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d,letalidade,pop,pop_60,area,semana_epidem,densidade_demo
0,1,Adamantina,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,33894,7398,411.99,9,82.268987
1,2,Adolfo,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,3447,761,211.06,9,16.331849
2,3,Aguaí,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,35608,5245,474.55,9,75.035297
3,4,Águas da Prata,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,7797,1729,142.67,9,54.650592
4,5,Águas de Lindóia,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,18374,3275,60.13,9,305.571262
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
373450,373451,Votorantim,25,9,2021-09-25,11670,1,"9,73928428361597e+03",2571428571428572,515,0,"4,29797035652290e+02",0142857142857143,0044130248500428446,119824,16378,183.52,38,652.920663
373451,373452,Votuporanga,25,9,2021-09-25,16147,9,"1,75969921534438e+04",7571428571428571,453,0,"4,93679163034002e+02",0142857142857143,0028054747011828824,91760,17203,42.07,38,2181.126694
373452,373453,Zacarias,25,9,2021-09-25,268,0,"1,04687500000000e+04",0000000000000000,10,0,"3,90625000000000e+02",0000000000000000,0037313432835820892,2560,481,319.06,38,8.023569
373453,373454,Chavantes,25,9,2021-09-25,1388,0,"1,13556410046633e+04",0142857142857143,48,0,"3,92702282582018e+02",0000000000000000,0034582132564841501,12223,2098,188.73,38,64.764478


In [46]:
guarulhos = covid_sp_alterado.loc[covid_sp_alterado.municipio == 'Guarulhos']
guarulhos

Unnamed: 0,indice,municipio,dia,mes,data,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d,letalidade,pop,pop_60,area,semana_epidem,densidade_demo
212,213,Guarulhos,25,2,2020-02-25,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,1351275,162662,318.68,9,4240.225304
857,858,Guarulhos,26,2,2020-02-26,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,1351275,162662,318.68,9,4240.225304
1502,1503,Guarulhos,27,2,2020-02-27,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,1351275,162662,318.68,9,4240.225304
2147,2148,Guarulhos,28,2,2020-02-28,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,1351275,162662,318.68,9,4240.225304
2792,2793,Guarulhos,29,2,2020-02-29,0,0,"0,00000000000000e+00",0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,1351275,162662,318.68,9,4240.225304
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
370442,370443,Guarulhos,21,9,2021-09-21,63334,28,"4,68698081441602e+03",102571428571428569,4857,3,"3,59438308264417e+02",2571428571428572,0076688666435090161,1351275,162662,318.68,38,4240.225304
371087,371088,Guarulhos,22,9,2021-09-22,63351,17,"4,68823888549703e+03",105000000000000000,4861,4,"3,59734324989362e+02",2428571428571428,0076731227604931257,1351275,162662,318.68,38,4240.225304
371732,371733,Guarulhos,23,9,2021-09-23,63368,17,"4,68949695657805e+03",107285714285714292,4862,1,"3,59808329170598e+02",2000000000000000,0076726423431384930,1351275,162662,318.68,38,4240.225304
372377,372378,Guarulhos,24,9,2021-09-24,63387,19,"4,69090303602154e+03",61285714285714285,4863,1,"3,59882333351834e+02",2142857142857143,0076719201098016943,1351275,162662,318.68,38,4240.225304


In [47]:
guarulhos.drop(columns=['data', 'municipio'], inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  guarulhos.drop(columns=['data', 'municipio'], inplace=True)


In [48]:
guarulhos.head(10)

Unnamed: 0,indice,dia,mes,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d,letalidade,pop,pop_60,area,semana_epidem,densidade_demo
212,213,25,2,0,0,"0,00000000000000e+00",0,0,0,"0,00000000000000e+00",0,0,1351275,162662,318.68,9,4240.225304
857,858,26,2,0,0,"0,00000000000000e+00",0,0,0,"0,00000000000000e+00",0,0,1351275,162662,318.68,9,4240.225304
1502,1503,27,2,0,0,"0,00000000000000e+00",0,0,0,"0,00000000000000e+00",0,0,1351275,162662,318.68,9,4240.225304
2147,2148,28,2,0,0,"0,00000000000000e+00",0,0,0,"0,00000000000000e+00",0,0,1351275,162662,318.68,9,4240.225304
2792,2793,29,2,0,0,"0,00000000000000e+00",0,0,0,"0,00000000000000e+00",0,0,1351275,162662,318.68,9,4240.225304
3437,3438,1,3,0,0,"0,00000000000000e+00",0,0,0,"0,00000000000000e+00",0,0,1351275,162662,318.68,10,4240.225304
4082,4083,2,3,0,0,"0,00000000000000e+00",0,0,0,"0,00000000000000e+00",0,0,1351275,162662,318.68,10,4240.225304
4727,4728,3,3,0,0,"0,00000000000000e+00",0,0,0,"0,00000000000000e+00",0,0,1351275,162662,318.68,10,4240.225304
5372,5373,4,3,0,0,"0,00000000000000e+00",0,0,0,"0,00000000000000e+00",0,0,1351275,162662,318.68,10,4240.225304
6017,6018,5,3,0,0,"0,00000000000000e+00",0,0,0,"0,00000000000000e+00",0,0,1351275,162662,318.68,10,4240.225304


In [49]:
guarulhos['semana_epidem'] = guarulhos['semana_epidem'].replace({9:'nove', 10:'dez'})

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  guarulhos['semana_epidem'] = guarulhos['semana_epidem'].replace({9:'nove', 10:'dez'})


In [50]:
guarulhos['casos_pc'] = guarulhos['casos_pc'].apply(lambda x: x.replace(',','.'))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  guarulhos['casos_pc'] = guarulhos['casos_pc'].apply(lambda x: x.replace(',','.'))


#### Criando coluna com datas

In [51]:
import datetime

In [52]:
data = np.array('2020-02-25', dtype=np.datetime64())
data

array('2020-02-25', dtype='datetime64[D]')

In [53]:
data = data + np.arange(579)
data

array(['2020-02-25', '2020-02-26', '2020-02-27', '2020-02-28',
       '2020-02-29', '2020-03-01', '2020-03-02', '2020-03-03',
       '2020-03-04', '2020-03-05', '2020-03-06', '2020-03-07',
       '2020-03-08', '2020-03-09', '2020-03-10', '2020-03-11',
       '2020-03-12', '2020-03-13', '2020-03-14', '2020-03-15',
       '2020-03-16', '2020-03-17', '2020-03-18', '2020-03-19',
       '2020-03-20', '2020-03-21', '2020-03-22', '2020-03-23',
       '2020-03-24', '2020-03-25', '2020-03-26', '2020-03-27',
       '2020-03-28', '2020-03-29', '2020-03-30', '2020-03-31',
       '2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04',
       '2020-04-05', '2020-04-06', '2020-04-07', '2020-04-08',
       '2020-04-09', '2020-04-10', '2020-04-11', '2020-04-12',
       '2020-04-13', '2020-04-14', '2020-04-15', '2020-04-16',
       '2020-04-17', '2020-04-18', '2020-04-19', '2020-04-20',
       '2020-04-21', '2020-04-22', '2020-04-23', '2020-04-24',
       '2020-04-25', '2020-04-26', '2020-04-27', '2020-

In [54]:
data = pd.DataFrame(data)
data

Unnamed: 0,0
0,2020-02-25
1,2020-02-26
2,2020-02-27
3,2020-02-28
4,2020-02-29
...,...
574,2021-09-21
575,2021-09-22
576,2021-09-23
577,2021-09-24


In [55]:
data.columns = ['data']
data.head()

Unnamed: 0,data
0,2020-02-25
1,2020-02-26
2,2020-02-27
3,2020-02-28
4,2020-02-29


In [56]:
guarulhos2 = pd.concat([data,guarulhos], axis=1)
guarulhos2.head(5)

Unnamed: 0,data,indice,dia,mes,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d,letalidade,pop,pop_60,area,semana_epidem,densidade_demo
0,2020-02-25,,,,,,,,,,,,,,,,,
1,2020-02-26,,,,,,,,,,,,,,,,,
2,2020-02-27,,,,,,,,,,,,,,,,,
3,2020-02-28,,,,,,,,,,,,,,,,,
4,2020-02-29,,,,,,,,,,,,,,,,,


In [57]:
# Reordenar índices
guarulhos = guarulhos.reset_index(drop=True)
guarulhos

Unnamed: 0,indice,dia,mes,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d,letalidade,pop,pop_60,area,semana_epidem,densidade_demo
0,213,25,2,0,0,0.00000000000000e+00,0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,1351275,162662,318.68,nove,4240.225304
1,858,26,2,0,0,0.00000000000000e+00,0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,1351275,162662,318.68,nove,4240.225304
2,1503,27,2,0,0,0.00000000000000e+00,0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,1351275,162662,318.68,nove,4240.225304
3,2148,28,2,0,0,0.00000000000000e+00,0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,1351275,162662,318.68,nove,4240.225304
4,2793,29,2,0,0,0.00000000000000e+00,0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,1351275,162662,318.68,nove,4240.225304
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
574,370443,21,9,63334,28,4.68698081441602e+03,102571428571428569,4857,3,"3,59438308264417e+02",2571428571428572,0076688666435090161,1351275,162662,318.68,38,4240.225304
575,371088,22,9,63351,17,4.68823888549703e+03,105000000000000000,4861,4,"3,59734324989362e+02",2428571428571428,0076731227604931257,1351275,162662,318.68,38,4240.225304
576,371733,23,9,63368,17,4.68949695657805e+03,107285714285714292,4862,1,"3,59808329170598e+02",2000000000000000,0076726423431384930,1351275,162662,318.68,38,4240.225304
577,372378,24,9,63387,19,4.69090303602154e+03,61285714285714285,4863,1,"3,59882333351834e+02",2142857142857143,0076719201098016943,1351275,162662,318.68,38,4240.225304


In [58]:
guarulhos2 = pd.concat([data,guarulhos], axis=1)
guarulhos2.head(5)

Unnamed: 0,data,indice,dia,mes,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d,letalidade,pop,pop_60,area,semana_epidem,densidade_demo
0,2020-02-25,213,25,2,0,0,0.0,0,0,0,"0,00000000000000e+00",0,0,1351275,162662,318.68,nove,4240.225304
1,2020-02-26,858,26,2,0,0,0.0,0,0,0,"0,00000000000000e+00",0,0,1351275,162662,318.68,nove,4240.225304
2,2020-02-27,1503,27,2,0,0,0.0,0,0,0,"0,00000000000000e+00",0,0,1351275,162662,318.68,nove,4240.225304
3,2020-02-28,2148,28,2,0,0,0.0,0,0,0,"0,00000000000000e+00",0,0,1351275,162662,318.68,nove,4240.225304
4,2020-02-29,2793,29,2,0,0,0.0,0,0,0,"0,00000000000000e+00",0,0,1351275,162662,318.68,nove,4240.225304


In [59]:
guarulhos2.shape

(579, 18)

#### Valores Missing

In [60]:
covid_sp_alterado.isnull().sum()

indice            0
municipio         0
dia               0
mes               0
data              0
casos             0
casos_novos       0
casos_pc          0
casos_mm7d        0
obitos            0
obitos_novos      0
obitos_pc         0
obitos_mm7d       0
letalidade        0
pop               0
pop_60            0
area              0
semana_epidem     0
densidade_demo    0
dtype: int64

In [61]:
covid_sp_alterado['casos'].isnull().sum()

np.int64(0)

In [62]:
#DF Original para exemplo
covid_sp.isnull().sum()

municipio          0
codigo_ibge        0
dia                0
mes                0
data               0
casos              0
casos_novos        0
casos_pc           0
casos_mm7d         0
obitos             0
obitos_novos       0
obitos_pc          0
obitos_mm7d        0
letalidade         0
nome_ra          579
cod_ra             0
nome_drs         579
cod_drs            0
pop                0
pop_60             0
area               0
rotulo_mapa      579
codigo_mapa      579
latitude           0
longitude          0
semana_epidem      0
dtype: int64

In [63]:
#nome_ra / nome_drs/ rotulo_mapa/ codigo_mapa

#Excluir todos os valores missing
covid_sp2 = covid_sp.dropna()

In [66]:
covid_sp2.isnull().sum()

municipio        0
codigo_ibge      0
dia              0
mes              0
data             0
casos            0
casos_novos      0
casos_pc         0
casos_mm7d       0
obitos           0
obitos_novos     0
obitos_pc        0
obitos_mm7d      0
letalidade       0
nome_ra          0
cod_ra           0
nome_drs         0
cod_drs          0
pop              0
pop_60           0
area             0
rotulo_mapa      0
codigo_mapa      0
latitude         0
longitude        0
semana_epidem    0
dtype: int64

In [67]:
# Preencher com a mediana
covid_sp['obitos_novos'].fillna(covid_sp['obitos_novos'].median(),inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  covid_sp['obitos_novos'].fillna(covid_sp['obitos_novos'].median(),inplace=True)


In [69]:
# Preencher com a media
covid_sp['obitos_novos'].fillna(covid_sp['obitos_novos'].mean(),inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  covid_sp['obitos_novos'].fillna(covid_sp['obitos_novos'].mean(),inplace=True)


In [70]:
# Preencher com qlq valor
covid_sp['obitos_novos'].fillna(5,inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  covid_sp['obitos_novos'].fillna(5,inplace=True)


#### Classificação e alteração de tipagem

In [76]:
covid_sp_alterado.dtypes

indice              int64
municipio          object
dia                 int64
mes                 int64
data               object
casos               int64
casos_novos         int64
casos_pc          float64
casos_mm7d         object
obitos              int64
obitos_novos        int64
obitos_pc          object
obitos_mm7d        object
letalidade         object
pop                 int64
pop_60              int64
area              float64
semana_epidem       int64
densidade_demo    float64
dtype: object

In [74]:
covid_sp_alterado['casos_pc'] = covid_sp_alterado['casos_pc'].apply(lambda x: x.replace(',','.'))
covid_sp_alterado.head(3)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  covid_sp_alterado['casos_pc'] = covid_sp_alterado['casos_pc'].apply(lambda x: x.replace(',','.'))


Unnamed: 0,indice,municipio,dia,mes,data,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d,letalidade,pop,pop_60,area,semana_epidem,densidade_demo
0,1,Adamantina,25,2,2020-02-25,0,0,0.0,0,0,0,"0,00000000000000e+00",0,0,33894,7398,411.99,9,82.268987
1,2,Adolfo,25,2,2020-02-25,0,0,0.0,0,0,0,"0,00000000000000e+00",0,0,3447,761,211.06,9,16.331849
2,3,Aguaí,25,2,2020-02-25,0,0,0.0,0,0,0,"0,00000000000000e+00",0,0,35608,5245,474.55,9,75.035297


In [75]:
covid_sp_alterado['casos_pc'] = covid_sp_alterado['casos_pc'].astype(float)
covid_sp_alterado

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  covid_sp_alterado['casos_pc'] = covid_sp_alterado['casos_pc'].astype(float)


Unnamed: 0,indice,municipio,dia,mes,data,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d,letalidade,pop,pop_60,area,semana_epidem,densidade_demo
0,1,Adamantina,25,2,2020-02-25,0,0,0.000000,0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,33894,7398,411.99,9,82.268987
1,2,Adolfo,25,2,2020-02-25,0,0,0.000000,0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,3447,761,211.06,9,16.331849
2,3,Aguaí,25,2,2020-02-25,0,0,0.000000,0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,35608,5245,474.55,9,75.035297
3,4,Águas da Prata,25,2,2020-02-25,0,0,0.000000,0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,7797,1729,142.67,9,54.650592
4,5,Águas de Lindóia,25,2,2020-02-25,0,0,0.000000,0000000000000000,0,0,"0,00000000000000e+00",0000000000000000,0000000000000000000,18374,3275,60.13,9,305.571262
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
373450,373451,Votorantim,25,9,2021-09-25,11670,1,9739.284284,2571428571428572,515,0,"4,29797035652290e+02",0142857142857143,0044130248500428446,119824,16378,183.52,38,652.920663
373451,373452,Votuporanga,25,9,2021-09-25,16147,9,17596.992153,7571428571428571,453,0,"4,93679163034002e+02",0142857142857143,0028054747011828824,91760,17203,42.07,38,2181.126694
373452,373453,Zacarias,25,9,2021-09-25,268,0,10468.750000,0000000000000000,10,0,"3,90625000000000e+02",0000000000000000,0037313432835820892,2560,481,319.06,38,8.023569
373453,373454,Chavantes,25,9,2021-09-25,1388,0,11355.641005,0142857142857143,48,0,"3,92702282582018e+02",0000000000000000,0034582132564841501,12223,2098,188.73,38,64.764478


In [77]:
covid_sp_alterado['casos_mm7d'] = covid_sp_alterado['casos_mm7d'].apply(lambda x: x.replace(',','.'))
covid_sp_alterado['obitos_pc'] = covid_sp_alterado['obitos_pc'].apply(lambda x: x.replace(',','.'))
covid_sp_alterado['obitos_mm7d'] = covid_sp_alterado['obitos_mm7d'].apply(lambda x: x.replace(',','.'))
covid_sp_alterado['letalidade'] = covid_sp_alterado['letalidade'].apply(lambda x: x.replace(',','.'))

covid_sp_alterado.head(5)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  covid_sp_alterado['casos_mm7d'] = covid_sp_alterado['casos_mm7d'].apply(lambda x: x.replace(',','.'))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  covid_sp_alterado['obitos_pc'] = covid_sp_alterado['obitos_pc'].apply(lambda x: x.replace(',','.'))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cov

Unnamed: 0,indice,municipio,dia,mes,data,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d,letalidade,pop,pop_60,area,semana_epidem,densidade_demo
0,1,Adamantina,25,2,2020-02-25,0,0,0.0,0.0,0,0,0.0,0.0,0.0,33894,7398,411.99,9,82.268987
1,2,Adolfo,25,2,2020-02-25,0,0,0.0,0.0,0,0,0.0,0.0,0.0,3447,761,211.06,9,16.331849
2,3,Aguaí,25,2,2020-02-25,0,0,0.0,0.0,0,0,0.0,0.0,0.0,35608,5245,474.55,9,75.035297
3,4,Águas da Prata,25,2,2020-02-25,0,0,0.0,0.0,0,0,0.0,0.0,0.0,7797,1729,142.67,9,54.650592
4,5,Águas de Lindóia,25,2,2020-02-25,0,0,0.0,0.0,0,0,0.0,0.0,0.0,18374,3275,60.13,9,305.571262


In [78]:
covid_sp_alterado['casos_mm7d'] = covid_sp_alterado['casos_mm7d'].astype(float)
covid_sp_alterado['obitos_pc'] = covid_sp_alterado['obitos_pc'].astype(float)
covid_sp_alterado['obitos_mm7d'] = covid_sp_alterado['obitos_mm7d'].astype(float)
covid_sp_alterado['letalidade'] = covid_sp_alterado['letalidade'].astype(float)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  covid_sp_alterado['casos_mm7d'] = covid_sp_alterado['casos_mm7d'].astype(float)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  covid_sp_alterado['obitos_pc'] = covid_sp_alterado['obitos_pc'].astype(float)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  covid_sp_alterado['obitos_mm7d'] = covid_sp_alt

In [84]:
covid_sp_alterado.dtypes

indice                     int64
municipio                 object
dia                        int64
mes                        int64
data              datetime64[ns]
casos                      int64
casos_novos                int64
casos_pc                 float64
casos_mm7d               float64
obitos                     int64
obitos_novos               int64
obitos_pc                float64
obitos_mm7d              float64
letalidade               float64
pop                        int64
pop_60                     int64
area                     float64
semana_epidem              int64
densidade_demo           float64
dtype: object

In [83]:
covid_sp_alterado['data'] = covid_sp_alterado['data'].astype('datetime64[ns]')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  covid_sp_alterado['data'] = covid_sp_alterado['data'].astype('datetime64[ns]')


## **Salvando e exportando os dados**

In [86]:
covid_sp_alterado.to_csv('dataset/covid_sp_tratado.csv', sep=';', encoding='utf-8', index=False)