# **PROJETO ANÁLISE DOS DADOS DO COVID 19 NO ESTADO DE SÃO PAULO**

Este projeto analisa os dados dos casos de covid 19 no estado de São Paulo do período de fevereiro de 2020 a setembro de 2021.

Os dados estão disponíveis nos sites:

https://www.seade.gov.br/coronavirus/#

https://github.com/seade-R/dados-covid-sp

https://www.seade.gov.br/


## **Importação dos Dados**

### **Importação de arquivo csv**

In [2]:
# importando as bibliotecas
# pandas - biblioteca responsável por manipular os dados
import numpy as np
import pandas as pd

In [None]:
# comando do google colab para integração com o google drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# carregando as informações do arquivo csv
covid_sp = pd.read_csv('/content/drive/MyDrive/AULAS/dados/dados_covid_sp.csv',
                       sep=';', encoding='utf-8')
# encoding: codificação de caracteres, normalmente utiliza-se
#o iso-8859-1, utf-8, latin-1)

In [None]:
# visualizando o dados
covid_sp.head(10)

In [None]:
# contando as linhas e colunas
covid_sp.shape

(614992, 16)

### **Importação de arquivo Excel**

In [None]:
# carregando informações de um arquivo xlsx
covid_sp2 = pd.read_excel('/content/drive/MyDrive/AULAS/dados/dados_covid_sp2.xlsx')

In [None]:
# visualizando os dados
covid_sp2.head()

Unnamed: 0,nome_munic,codigo_ibge,datahora,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d,letalidade,nome_drs,cod_drs,pop,semana_epidem
0,Adamantina,3500105,2020-02-25,0,0,0.0,0.0,0,0,0.0,0.0,0.0,Marília,5,33894,9
1,Adolfo,3500204,2020-02-25,0,0,0.0,0.0,0,0,0.0,0.0,0.0,São José do Rio Preto,15,3447,9
2,Aguaí,3500303,2020-02-25,0,0,0.0,0.0,0,0,0.0,0.0,0.0,São João da Boa Vista,14,35608,9
3,Águas da Prata,3500402,2020-02-25,0,0,0.0,0.0,0,0,0.0,0.0,0.0,São João da Boa Vista,14,7797,9
4,Águas de Lindóia,3500501,2020-02-25,0,0,0.0,0.0,0,0,0.0,0.0,0.0,Campinas,3,18374,9


In [None]:
# contanto as linhas e coluna
covid_sp2.shape

(614992, 16)

### **Importação através de uma url**

In [None]:
# carregando informações de dados a partir de uma url
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

In [None]:
# criando um array para definir os nomes das colunas
colnames = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']

In [None]:
iris = pd.read_csv(url, names=colnames)

In [None]:
iris.shape

(150, 5)

In [None]:
iris.head()

Unnamed: 0,sepal-length,sepal-width,petal-length,petal-width,Class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [None]:
# mostrando todas as colunas do dataframe
iris.columns

Index(['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class'], dtype='object')

In [None]:
iris.head(10)

Unnamed: 0,sepal-length,sepal-width,petal-length,petal-width,Class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa
9,4.9,3.1,1.5,0.1,Iris-setosa


In [None]:
# verificando o tipo de dado que está sendo utilizado
type(iris)

pandas.core.frame.DataFrame

In [None]:
iris.shape

(150, 5)

In [None]:
# contanto o número de linhas
len(iris.Class)

150

In [None]:
# acessando uma informação específica
iris['sepal-width'][2]

3.2

### **Datasets das bibliotecas do Python**

https://www.statsmodels.org/stable/datasets/index.html

In [None]:
# biblioteca de dados para treino do python
import statsmodels.api as sm

In [None]:
cancer = sm.datasets.cancer.load_pandas().data
cancer.head()

Unnamed: 0,cancer,population
0,1.0,445.0
1,0.0,559.0
2,3.0,677.0
3,4.0,681.0
4,3.0,746.0


In [None]:
type(cancer)

pandas.core.frame.DataFrame

In [None]:
cancer.shape

(301, 2)

https://scikit-learn.org/stable/datasets/toy_dataset.html

In [None]:
import sklearn

In [None]:
from sklearn import datasets

In [None]:
iris = datasets.load_iris()
iris

{'data': array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2],
        [4.4, 2.9, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.1],
        [5.4, 3.7, 1.5, 0.2],
        [4.8, 3.4, 1.6, 0.2],
        [4.8, 3. , 1.4, 0.1],
        [4.3, 3. , 1.1, 0.1],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [5.4, 3.9, 1.3, 0.4],
        [5.1, 3.5, 1.4, 0.3],
        [5.7, 3.8, 1.7, 0.3],
        [5.1, 3.8, 1.5, 0.3],
        [5.4, 3.4, 1.7, 0.2],
        [5.1, 3.7, 1.5, 0.4],
        [4.6, 3.6, 1. , 0.2],
        [5.1, 3.3, 1.7, 0.5],
        [4.8, 3.4, 1.9, 0.2],
        [5. , 3. , 1.6, 0.2],
        [5. , 3.4, 1.6, 0.4],
        [5.2, 3.5, 1.5, 0.2],
        [5.2, 3.4, 1.4, 0.2],
        [4.7, 3.2, 1.6, 0.2],
        [4.8, 3.1, 1.6, 0.2],
        [5.4, 3.4, 1.5, 0.4],
        [5.2, 4.1, 1.5, 0.1],
  

In [None]:
iris.data

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
       [4.9, 3

## **Organização dos dados**

### **Renomeando variáveis (colunas)**

In [None]:
covid_sp.head(3)

Unnamed: 0,nome_munic,codigo_ibge,datahora,casos,casos_novos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d,letalidade,nome_drs,cod_drs,pop,semana_epidem
0,Adamantina,3500105,2020-02-25,0,0,"0,00000000000000e+00",0,0,0,"0,00000000000000e+00",0,0,Marília,5,33894,9
1,Adolfo,3500204,2020-02-25,0,0,"0,00000000000000e+00",0,0,0,"0,00000000000000e+00",0,0,São José do Rio Preto,15,3447,9
2,Aguaí,3500303,2020-02-25,0,0,"0,00000000000000e+00",0,0,0,"0,00000000000000e+00",0,0,São João da Boa Vista,14,35608,9


In [None]:
covid_sp = covid_sp.rename(columns={'nome_munic': 'municipio'})

In [None]:
covid_sp.head(3)

In [None]:
covid_sp.rename(columns={'datahora': 'data'}, inplace=True)

In [None]:
covid_sp.head(3)

In [None]:
covid_sp.columns

Index(['municipio', 'codigo_ibge', 'data', 'casos', 'casos_novos', 'casos_pc',
       'casos_mm7d', 'obitos', 'obitos_novos', 'obitos_pc', 'obitos_mm7d',
       'letalidade', 'nome_drs', 'cod_drs', 'pop', 'semana_epidem'],
      dtype='object')

### **Excluindo variáveis (colunas)**

In [None]:
# Excluir por nome
covid_sp_alterado = covid_sp.drop(columns=['casos_novos'])

In [None]:
covid_sp_alterado.head(2)

Unnamed: 0,municipio,codigo_ibge,data,casos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_pc,obitos_mm7d,letalidade,nome_drs,cod_drs,pop,semana_epidem
0,Adamantina,3500105,2020-02-25,0,"0,00000000000000e+00",0,0,0,"0,00000000000000e+00",0,0,Marília,5,33894,9
1,Adolfo,3500204,2020-02-25,0,"0,00000000000000e+00",0,0,0,"0,00000000000000e+00",0,0,São José do Rio Preto,15,3447,9


In [None]:
covid_sp_alterado.shape

(614992, 15)

In [None]:
# Excluir por número (axis = 1 irá excluir a coluna)
covid_sp_alterado = covid_sp_alterado.drop(covid_sp_alterado.columns[[1]],axis=1)

In [None]:
# Excluir mais de uma variável (inplace = True altera a lista sem a necessida de realizar a atribuição novamante)
covid_sp_alterado.drop(columns=['obitos_pc','cod_drs'],inplace=True)

In [None]:
covid_sp_alterado.head()

Unnamed: 0,municipio,data,casos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_mm7d,letalidade,nome_drs,pop,semana_epidem
0,Adamantina,2020-02-25,0,"0,00000000000000e+00",0,0,0,0,0,Marília,33894,9
1,Adolfo,2020-02-25,0,"0,00000000000000e+00",0,0,0,0,0,São José do Rio Preto,3447,9
2,Aguaí,2020-02-25,0,"0,00000000000000e+00",0,0,0,0,0,São João da Boa Vista,35608,9
3,Águas da Prata,2020-02-25,0,"0,00000000000000e+00",0,0,0,0,0,São João da Boa Vista,7797,9
4,Águas de Lindóia,2020-02-25,0,"0,00000000000000e+00",0,0,0,0,0,Campinas,18374,9


In [None]:
covid_sp_alterado.drop(covid_sp_alterado.columns[[9,10]],axis=1, inplace=True)


In [None]:
covid_sp_alterado.shape

(614992, 10)

### **Criando e alterando valores das colunas (Variáveis)**

In [None]:
covid_sp_alterado.head(3)

Unnamed: 0,municipio,data,casos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_mm7d,letalidade,semana_epidem
0,Adamantina,2020-02-25,0,"0,00000000000000e+00",0,0,0,0,0,9
1,Adolfo,2020-02-25,0,"0,00000000000000e+00",0,0,0,0,0,9
2,Aguaí,2020-02-25,0,"0,00000000000000e+00",0,0,0,0,0,9


In [None]:
covid_sp_alterado['total'] = covid_sp_alterado['casos']+1

In [None]:
covid_sp_alterado.head()

Unnamed: 0,municipio,data,casos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_mm7d,letalidade,semana_epidem,total
0,Adamantina,2020-02-25,0,"0,00000000000000e+00",0,0,0,0,0,9,1
1,Adolfo,2020-02-25,0,"0,00000000000000e+00",0,0,0,0,0,9,1
2,Aguaí,2020-02-25,0,"0,00000000000000e+00",0,0,0,0,0,9,1
3,Águas da Prata,2020-02-25,0,"0,00000000000000e+00",0,0,0,0,0,9,1
4,Águas de Lindóia,2020-02-25,0,"0,00000000000000e+00",0,0,0,0,0,9,1


In [None]:
covid_sp_alterado.shape

(614992, 11)

In [None]:
# Criação de uma coluna com índices
lista = list(range(1,374035))
lista

In [None]:
df = pd.DataFrame(lista,columns=['indice'])
df

In [None]:
# Juntando dois DataFrames
covid_sp_alterado = pd.concat([covid_sp_alterado,df],axis=1)
covid_sp_alterado.head()

Unnamed: 0,municipio,data,casos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_mm7d,letalidade,semana_epidem,total,indice
0,Adamantina,2020-02-25,0,"0,00000000000000e+00",0,0,0,0,0,9,1,1.0
1,Adolfo,2020-02-25,0,"0,00000000000000e+00",0,0,0,0,0,9,1,2.0
2,Aguaí,2020-02-25,0,"0,00000000000000e+00",0,0,0,0,0,9,1,3.0
3,Águas da Prata,2020-02-25,0,"0,00000000000000e+00",0,0,0,0,0,9,1,4.0
4,Águas de Lindóia,2020-02-25,0,"0,00000000000000e+00",0,0,0,0,0,9,1,5.0


### **Contagem de Registros das Variáveis (Colunas)**

In [None]:
covid_sp_alterado['municipio'].values

array(['Adamantina', 'Adolfo', 'Aguaí', ..., 'Ignorado', 'Ignorado',
       'Ignorado'], dtype=object)

In [None]:
covid_sp_alterado['semana_epidem'].value_counts().sort_index()

1      9044
2      9044
3      9044
4      9044
5      9044
6      9044
7      9044
8      9044
9     12274
10    13566
11    13566
12    13566
13    13566
14    13566
15    13566
16    13566
17    13566
18    13566
19    13566
20    13566
21    13566
22    13566
23    13566
24    13566
25    13566
26    13566
27    13566
28    13566
29    13566
30    13566
31    13566
32    13566
33    13566
34    13566
35    13566
36    13566
37    13566
38    13566
39    13566
40    10336
41     9044
42     9044
43     9044
44     9044
45     9044
46     9044
47     9044
48     9044
49     9044
50     9044
51     9044
52     9044
53     4522
Name: semana_epidem, dtype: int64

In [None]:
from collections import Counter
Counter(covid_sp_alterado.semana_epidem)

Counter({9: 12274,
         10: 13566,
         11: 13566,
         12: 13566,
         13: 13566,
         14: 13566,
         15: 13566,
         16: 13566,
         17: 13566,
         18: 13566,
         19: 13566,
         20: 13566,
         21: 13566,
         22: 13566,
         23: 13566,
         24: 13566,
         25: 13566,
         26: 13566,
         27: 13566,
         28: 13566,
         29: 13566,
         30: 13566,
         31: 13566,
         32: 13566,
         33: 13566,
         34: 13566,
         35: 13566,
         36: 13566,
         37: 13566,
         38: 13566,
         39: 13566,
         40: 10336,
         41: 9044,
         42: 9044,
         43: 9044,
         44: 9044,
         45: 9044,
         46: 9044,
         47: 9044,
         48: 9044,
         49: 9044,
         50: 9044,
         51: 9044,
         52: 9044,
         53: 4522,
         1: 9044,
         2: 9044,
         3: 9044,
         4: 9044,
         5: 9044,
         6: 9044,
      

In [None]:
covid_sp_alterado.query('obitos_novos > 50')['municipio'].value_counts()

São Paulo                314
Guarulhos                  9
São Bernardo do Campo      3
Taubaté                    3
Itapetininga               2
Campinas                   1
Sorocaba                   1
Santos                     1
Atibaia                    1
Name: municipio, dtype: int64

### **Selecionar Variáveis (Colunas) por Índices**

In [None]:
covid_sp_alterado

Unnamed: 0,municipio,data,casos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_mm7d,letalidade,semana_epidem,total,indice
0,Adamantina,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,1.0
1,Adolfo,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,2.0
2,Aguaí,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,3.0
3,Águas da Prata,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,4.0
4,Águas de Lindóia,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,5.0
...,...,...,...,...,...,...,...,...,...,...,...,...
614987,Ignorado,2022-09-29,399,"0,00000000000000e+00",-28000000000000000,5,0,-1000000000000000,0012531328320802004,39,400,
614988,Ignorado,2022-09-30,330,"0,00000000000000e+00",-24714285714285715,5,0,-0285714285714286,0015151515151515152,39,331,
614989,Ignorado,2022-10-01,204,"0,00000000000000e+00",-63428571428571431,0,-5,-1000000000000000,0000000000000000000,39,205,
614990,Ignorado,2022-10-02,199,"0,00000000000000e+00",-50714285714285715,0,0,-0714285714285714,0000000000000000000,40,200,


In [None]:
x = covid_sp_alterado.iloc[: , 5:13]
x

Unnamed: 0,obitos,obitos_novos,obitos_mm7d,letalidade,semana_epidem,total,indice
0,0,0,0000000000000000,0000000000000000000,9,1,1.0
1,0,0,0000000000000000,0000000000000000000,9,1,2.0
2,0,0,0000000000000000,0000000000000000000,9,1,3.0
3,0,0,0000000000000000,0000000000000000000,9,1,4.0
4,0,0,0000000000000000,0000000000000000000,9,1,5.0
...,...,...,...,...,...,...,...
614987,5,0,-1000000000000000,0012531328320802004,39,400,
614988,5,0,-0285714285714286,0015151515151515152,39,331,
614989,0,-5,-1000000000000000,0000000000000000000,39,205,
614990,0,0,-0714285714285714,0000000000000000000,40,200,


In [None]:
type(x)

pandas.core.frame.DataFrame

### **Excluindo, Filtrando e Substituindo Registros (Linhas)**

In [None]:
# Excluindo linhas por índices (valores absolutos)
covid_sp_alterado2 = covid_sp_alterado.drop(covid_sp_alterado.index[[1,3]])
covid_sp_alterado2

Unnamed: 0,municipio,data,casos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_mm7d,letalidade,semana_epidem,total,indice
0,Adamantina,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,1.0
2,Aguaí,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,3.0
4,Águas de Lindóia,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,5.0
5,Águas de Santa Bárbara,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,6.0
6,Águas de São Pedro,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,7.0
...,...,...,...,...,...,...,...,...,...,...,...,...
614987,Ignorado,2022-09-29,399,"0,00000000000000e+00",-28000000000000000,5,0,-1000000000000000,0012531328320802004,39,400,
614988,Ignorado,2022-09-30,330,"0,00000000000000e+00",-24714285714285715,5,0,-0285714285714286,0015151515151515152,39,331,
614989,Ignorado,2022-10-01,204,"0,00000000000000e+00",-63428571428571431,0,-5,-1000000000000000,0000000000000000000,39,205,
614990,Ignorado,2022-10-02,199,"0,00000000000000e+00",-50714285714285715,0,0,-0714285714285714,0000000000000000000,40,200,


In [None]:
# Excluindo linhas por índices (intervalo de valores)
covid_sp_alterado2 = covid_sp_alterado2.drop(covid_sp_alterado.index[4:7])
covid_sp_alterado2

Unnamed: 0,municipio,data,casos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_mm7d,letalidade,semana_epidem,total,indice
0,Adamantina,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,1.0
2,Aguaí,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,3.0
7,Agudos,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,8.0
8,Alambari,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,9.0
9,Alfredo Marcondes,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,10.0
...,...,...,...,...,...,...,...,...,...,...,...,...
614987,Ignorado,2022-09-29,399,"0,00000000000000e+00",-28000000000000000,5,0,-1000000000000000,0012531328320802004,39,400,
614988,Ignorado,2022-09-30,330,"0,00000000000000e+00",-24714285714285715,5,0,-0285714285714286,0015151515151515152,39,331,
614989,Ignorado,2022-10-01,204,"0,00000000000000e+00",-63428571428571431,0,-5,-1000000000000000,0000000000000000000,39,205,
614990,Ignorado,2022-10-02,199,"0,00000000000000e+00",-50714285714285715,0,0,-0714285714285714,0000000000000000000,40,200,


In [None]:
# Reordenar (resetar) índices
covid_sp_alterado2 = covid_sp_alterado2.reset_index(drop=True) # drop é para excluir índice anterior
covid_sp_alterado2

Unnamed: 0,municipio,data,casos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_mm7d,letalidade,semana_epidem,total,indice
0,Adamantina,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,1.0
1,Aguaí,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,3.0
2,Agudos,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,8.0
3,Alambari,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,9.0
4,Alfredo Marcondes,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,10.0
...,...,...,...,...,...,...,...,...,...,...,...,...
614982,Ignorado,2022-09-29,399,"0,00000000000000e+00",-28000000000000000,5,0,-1000000000000000,0012531328320802004,39,400,
614983,Ignorado,2022-09-30,330,"0,00000000000000e+00",-24714285714285715,5,0,-0285714285714286,0015151515151515152,39,331,
614984,Ignorado,2022-10-01,204,"0,00000000000000e+00",-63428571428571431,0,-5,-1000000000000000,0000000000000000000,39,205,
614985,Ignorado,2022-10-02,199,"0,00000000000000e+00",-50714285714285715,0,0,-0714285714285714,0000000000000000000,40,200,


In [None]:
ignorado = covid_sp_alterado.loc[covid_sp_alterado.municipio == 'Ignorado']
ignorado

Unnamed: 0,municipio,data,casos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_mm7d,letalidade,semana_epidem,total,indice
614040,Ignorado,2020-02-25,-1,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,0,
614041,Ignorado,2020-02-26,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,
614042,Ignorado,2020-02-27,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,
614043,Ignorado,2020-02-28,-1,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,0,
614044,Ignorado,2020-02-29,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,
...,...,...,...,...,...,...,...,...,...,...,...,...
614987,Ignorado,2022-09-29,399,"0,00000000000000e+00",-28000000000000000,5,0,-1000000000000000,0012531328320802004,39,400,
614988,Ignorado,2022-09-30,330,"0,00000000000000e+00",-24714285714285715,5,0,-0285714285714286,0015151515151515152,39,331,
614989,Ignorado,2022-10-01,204,"0,00000000000000e+00",-63428571428571431,0,-5,-1000000000000000,0000000000000000000,39,205,
614990,Ignorado,2022-10-02,199,"0,00000000000000e+00",-50714285714285715,0,0,-0714285714285714,0000000000000000000,40,200,


In [None]:
ignorado.shape

(952, 12)

In [None]:
covid_sp_alterado = covid_sp_alterado.loc[covid_sp_alterado.municipio != 'Ignorado']
covid_sp_alterado

Unnamed: 0,municipio,data,casos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_mm7d,letalidade,semana_epidem,total,indice
0,Adamantina,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,1.0
1,Adolfo,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,2.0
2,Aguaí,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,3.0
3,Águas da Prata,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,4.0
4,Águas de Lindóia,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,5.0
...,...,...,...,...,...,...,...,...,...,...,...,...
614035,Votorantim,2022-10-03,12539,"1,04645146214448e+04",0000000000000000,572,0,0000000000000000,0045617672860674696,40,12540,
614036,Votuporanga,2022-10-03,21262,"2,31713164777681e+04",27857142857142858,508,0,0000000000000000,0023892390179663249,40,21263,
614037,Zacarias,2022-10-03,279,"1,08984375000000e+04",0000000000000000,12,0,0000000000000000,0043010752688172046,40,280,
614038,Chavantes,2022-10-03,2697,"2,20649595025771e+04",0142857142857143,60,0,0000000000000000,0022246941045606230,40,2698,


In [None]:
guarulhos = covid_sp_alterado.loc[covid_sp_alterado.municipio == 'Guarulhos']
guarulhos

Unnamed: 0,municipio,data,casos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_mm7d,letalidade,semana_epidem,total,indice
212,Guarulhos,2020-02-25,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,213.0
857,Guarulhos,2020-02-26,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,858.0
1502,Guarulhos,2020-02-27,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,1503.0
2147,Guarulhos,2020-02-28,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,2148.0
2792,Guarulhos,2020-02-29,0,"0,00000000000000e+00",0000000000000000,0,0,0000000000000000,0000000000000000000,9,1,2793.0
...,...,...,...,...,...,...,...,...,...,...,...,...
611027,Guarulhos,2022-09-29,84405,"6,24632291724482e+03",4000000000000000,5520,0,0428571428571429,0065398969255375861,39,84406,
611672,Guarulhos,2022-09-30,84408,"6,24654492978853e+03",3428571428571428,5520,0,0000000000000000,0065396644867785042,39,84409,
612317,Guarulhos,2022-10-01,84413,"6,24691495069471e+03",4142857142857143,5521,1,0142857142857143,0065404617772144097,39,84414,
612962,Guarulhos,2022-10-02,84413,"6,24691495069471e+03",3857142857142857,5524,3,0571428571428571,0065440157321739537,40,84414,


In [None]:
guarulhos.drop(columns=['data','municipio'],inplace=True)
guarulhos.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  guarulhos.drop(columns=['data','municipio'],inplace=True)


Unnamed: 0,casos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_mm7d,letalidade,semana_epidem,total,indice
212,0,"0,00000000000000e+00",0,0,0,0,0,9,1,213.0
857,0,"0,00000000000000e+00",0,0,0,0,0,9,1,858.0
1502,0,"0,00000000000000e+00",0,0,0,0,0,9,1,1503.0
2147,0,"0,00000000000000e+00",0,0,0,0,0,9,1,2148.0
2792,0,"0,00000000000000e+00",0,0,0,0,0,9,1,2793.0


In [None]:
guarulhos['semana_epidem']=guarulhos['semana_epidem'].replace({9:'nove', 10: 'dez'})
guarulhos.head(10)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  guarulhos['semana_epidem']=guarulhos['semana_epidem'].replace({9:'nove', 10: 'dez'})


Unnamed: 0,casos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_mm7d,letalidade,semana_epidem,total,indice
212,0,"0,00000000000000e+00",0,0,0,0,0,nove,1,213.0
857,0,"0,00000000000000e+00",0,0,0,0,0,nove,1,858.0
1502,0,"0,00000000000000e+00",0,0,0,0,0,nove,1,1503.0
2147,0,"0,00000000000000e+00",0,0,0,0,0,nove,1,2148.0
2792,0,"0,00000000000000e+00",0,0,0,0,0,nove,1,2793.0
3437,0,"0,00000000000000e+00",0,0,0,0,0,dez,1,3438.0
4082,0,"0,00000000000000e+00",0,0,0,0,0,dez,1,4083.0
4727,0,"0,00000000000000e+00",0,0,0,0,0,dez,1,4728.0
5372,0,"0,00000000000000e+00",0,0,0,0,0,dez,1,5373.0
6017,0,"0,00000000000000e+00",0,0,0,0,0,dez,1,6018.0


In [None]:
guarulhos['semana_epidem']=guarulhos['semana_epidem'].replace([11,12,13],['onze','doze','treze'])
guarulhos.head(30)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  guarulhos['semana_epidem']=guarulhos['semana_epidem'].replace([11,12,13],['onze','doze','treze'])


Unnamed: 0,casos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_mm7d,letalidade,semana_epidem,total,indice
212,0,"0,00000000000000e+00",0,0,0,0,0,nove,1,213.0
857,0,"0,00000000000000e+00",0,0,0,0,0,nove,1,858.0
1502,0,"0,00000000000000e+00",0,0,0,0,0,nove,1,1503.0
2147,0,"0,00000000000000e+00",0,0,0,0,0,nove,1,2148.0
2792,0,"0,00000000000000e+00",0,0,0,0,0,nove,1,2793.0
3437,0,"0,00000000000000e+00",0,0,0,0,0,dez,1,3438.0
4082,0,"0,00000000000000e+00",0,0,0,0,0,dez,1,4083.0
4727,0,"0,00000000000000e+00",0,0,0,0,0,dez,1,4728.0
5372,0,"0,00000000000000e+00",0,0,0,0,0,dez,1,5373.0
6017,0,"0,00000000000000e+00",0,0,0,0,0,dez,1,6018.0


In [None]:
guarulhos['casos_pc'] = guarulhos['casos_pc'].apply(lambda x: x.replace(',','.'))
guarulhos

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  guarulhos['casos_pc'] = guarulhos['casos_pc'].apply(lambda x: x.replace(',','.'))


Unnamed: 0,casos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_mm7d,letalidade,semana_epidem,total,indice
212,0,0.00000000000000e+00,0000000000000000,0,0,0000000000000000,0000000000000000000,nove,1,213.0
857,0,0.00000000000000e+00,0000000000000000,0,0,0000000000000000,0000000000000000000,nove,1,858.0
1502,0,0.00000000000000e+00,0000000000000000,0,0,0000000000000000,0000000000000000000,nove,1,1503.0
2147,0,0.00000000000000e+00,0000000000000000,0,0,0000000000000000,0000000000000000000,nove,1,2148.0
2792,0,0.00000000000000e+00,0000000000000000,0,0,0000000000000000,0000000000000000000,nove,1,2793.0
...,...,...,...,...,...,...,...,...,...,...
611027,84405,6.24632291724482e+03,4000000000000000,5520,0,0428571428571429,0065398969255375861,39,84406,
611672,84408,6.24654492978853e+03,3428571428571428,5520,0,0000000000000000,0065396644867785042,39,84409,
612317,84413,6.24691495069471e+03,4142857142857143,5521,1,0142857142857143,0065404617772144097,39,84414,
612962,84413,6.24691495069471e+03,3857142857142857,5524,3,0571428571428571,0065440157321739537,40,84414,


In [None]:
guarulhos.shape

(952, 10)

In [None]:
# Criar coluna com datas
import datetime

In [None]:
data = np.array('2020-02-25', dtype = np.datetime64())
data

array('2020-02-25', dtype='datetime64[D]')

In [None]:
data = data + np.arange(579)
data

array(['2020-02-25', '2020-02-26', '2020-02-27', '2020-02-28',
       '2020-02-29', '2020-03-01', '2020-03-02', '2020-03-03',
       '2020-03-04', '2020-03-05', '2020-03-06', '2020-03-07',
       '2020-03-08', '2020-03-09', '2020-03-10', '2020-03-11',
       '2020-03-12', '2020-03-13', '2020-03-14', '2020-03-15',
       '2020-03-16', '2020-03-17', '2020-03-18', '2020-03-19',
       '2020-03-20', '2020-03-21', '2020-03-22', '2020-03-23',
       '2020-03-24', '2020-03-25', '2020-03-26', '2020-03-27',
       '2020-03-28', '2020-03-29', '2020-03-30', '2020-03-31',
       '2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04',
       '2020-04-05', '2020-04-06', '2020-04-07', '2020-04-08',
       '2020-04-09', '2020-04-10', '2020-04-11', '2020-04-12',
       '2020-04-13', '2020-04-14', '2020-04-15', '2020-04-16',
       '2020-04-17', '2020-04-18', '2020-04-19', '2020-04-20',
       '2020-04-21', '2020-04-22', '2020-04-23', '2020-04-24',
       '2020-04-25', '2020-04-26', '2020-04-27', '2020-

In [None]:
data = pd.DataFrame(data)
data

Unnamed: 0,0
0,2020-02-25
1,2020-02-26
2,2020-02-27
3,2020-02-28
4,2020-02-29
...,...
574,2021-09-21
575,2021-09-22
576,2021-09-23
577,2021-09-24


In [None]:
data.columns = ['data']
data.head()

Unnamed: 0,data
0,2020-02-25
1,2020-02-26
2,2020-02-27
3,2020-02-28
4,2020-02-29


In [None]:
data = data.append(['2020-02-24','2023-02-25'])
data

df_final = df_final.append({'Posição' : '17º +1 CSA',	'J' : 37,	'E' : 37,	'D' : 37,	'GP' : 37,
                            'CA' : 37,	'p' : 35,	'v' : 37,	'gc' : 37,	'sg' : 37},
                           ignore_index=True)



  data = data.append(['2020-02-24','2023-02-25'])


Unnamed: 0,data,0
0,2020-02-25,
1,2020-02-26,
2,2020-02-27,
3,2020-02-28,
4,2020-02-29,
...,...,...
576,2021-09-23,
577,2021-09-24,
578,2021-09-25,
0,NaT,2020-02-24


In [None]:
guarulhos2 = pd.concat([data,guarulhos],axis=1)
guarulhos2.head()

Unnamed: 0,data,casos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_mm7d,letalidade,semana_epidem,total,indice
0,2020-02-25,,,,,,,,,,
1,2020-02-26,,,,,,,,,,
2,2020-02-27,,,,,,,,,,
3,2020-02-28,,,,,,,,,,
4,2020-02-29,,,,,,,,,,


In [None]:
# Reordenar (resetar) índices
guarulhos = guarulhos.reset_index(drop=True) # drop é para excluir índice anterior
guarulhos

Unnamed: 0,casos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_mm7d,letalidade,semana_epidem,total,indice
0,0,0.00000000000000e+00,0000000000000000,0,0,0000000000000000,0000000000000000000,nove,1,213.0
1,0,0.00000000000000e+00,0000000000000000,0,0,0000000000000000,0000000000000000000,nove,1,858.0
2,0,0.00000000000000e+00,0000000000000000,0,0,0000000000000000,0000000000000000000,nove,1,1503.0
3,0,0.00000000000000e+00,0000000000000000,0,0,0000000000000000,0000000000000000000,nove,1,2148.0
4,0,0.00000000000000e+00,0000000000000000,0,0,0000000000000000,0000000000000000000,nove,1,2793.0
...,...,...,...,...,...,...,...,...,...,...
947,84405,6.24632291724482e+03,4000000000000000,5520,0,0428571428571429,0065398969255375861,39,84406,
948,84408,6.24654492978853e+03,3428571428571428,5520,0,0000000000000000,0065396644867785042,39,84409,
949,84413,6.24691495069471e+03,4142857142857143,5521,1,0142857142857143,0065404617772144097,39,84414,
950,84413,6.24691495069471e+03,3857142857142857,5524,3,0571428571428571,0065440157321739537,40,84414,


In [None]:
guarulhos2 = pd.concat([data,guarulhos],axis=1)
guarulhos2.head()

Unnamed: 0,data,casos,casos_pc,casos_mm7d,obitos,obitos_novos,obitos_mm7d,letalidade,semana_epidem,total,indice
0,2020-02-25,0,0.0,0,0,0,0,0,nove,1,213.0
1,2020-02-26,0,0.0,0,0,0,0,0,nove,1,858.0
2,2020-02-27,0,0.0,0,0,0,0,0,nove,1,1503.0
3,2020-02-28,0,0.0,0,0,0,0,0,nove,1,2148.0
4,2020-02-29,0,0.0,0,0,0,0,0,nove,1,2793.0


In [None]:
guarulhos2.shape

(952, 11)

### **Valores Missing (NAN)**

In [None]:
# RELAÇÃO DA QUANTIDADE
covid_sp_alterado.isnull().sum()

municipio             0
data                  0
casos                 0
casos_pc              0
casos_mm7d            0
obitos                0
obitos_novos          0
obitos_mm7d           0
letalidade            0
semana_epidem         0
total                 0
indice           240006
dtype: int64

In [None]:
covid_sp_alterado['casos'].isnull().sum()

0

In [None]:
# RELAÇÃO DA QUANTIDADE
covid_sp.isnull().sum()

municipio          0
codigo_ibge        0
data               0
casos              0
casos_novos        0
casos_pc           0
casos_mm7d         0
obitos             0
obitos_novos       0
obitos_pc          0
obitos_mm7d        0
letalidade         0
nome_drs         952
cod_drs            0
pop                0
semana_epidem      0
dtype: int64

In [None]:
# EXCLUIR TODOS OS VALORES MISSING
covid_sp2 = covid_sp.dropna()

In [None]:
covid_sp2.isnull().sum()

municipio        0
codigo_ibge      0
data             0
casos            0
casos_novos      0
casos_pc         0
casos_mm7d       0
obitos           0
obitos_novos     0
obitos_pc        0
obitos_mm7d      0
letalidade       0
nome_drs         0
cod_drs          0
pop              0
semana_epidem    0
dtype: int64

In [None]:
# SUBSTITUIR OS VALORES MISSING PELA MEDIANA
covid_sp['obitos_novos'].fillna(covid_sp['obitos_novos'].median(), inplace=True)

In [None]:
# SUBSTITUIR OS VALORES MISSING PELA MÉDIA
covid_sp['obitos_novos'].fillna(covid_sp['obitos_novos'].mean(), inplace=True)

In [None]:
# SUBSTITUIR OS VALORES MISSING POR QUALQUER OUTRO VALOR
covid_sp['obitos_novos'].fillna(10, inplace=True)

### **Classificação e alteração da tipagem dos atributos**

Análise dos tipos de atributos:

object: strings

int64: inteiros

float64: reais

complex: complexos


In [None]:
covid_sp_alterado.dtypes

municipio         object
data              object
casos              int64
casos_pc          object
casos_mm7d        object
obitos             int64
obitos_novos       int64
obitos_mm7d       object
letalidade        object
semana_epidem      int64
total              int64
indice           float64
dtype: object

## **Salvando (Exportando) o Dataframe Tratado**

In [None]:
covid_sp_alterado.to_csv('covid_sp_tratado.csv', sep=';', encoding='utf-8', index = False)