# Projeto COVID-19
## Análise exploratória dos dados do COVID-19 com Python e Machine Learning
### Prof. Dr. Neylson Crepalde
### Breno Souza
Primeiro vamos importar algumas bibliotecas necessárias para o projeto

In [1]:
import pandas as pd

In [2]:
import numpy as np

In [3]:
from datetime import datetime

In [4]:
import plotly.express as px

In [5]:
import plotly.graph_objects as go

### Vamos importar os dados para o projeto

In [7]:
url = 'https://github.com/BrenoVitoi/projeto_covid/blob/master/covid_19_data.csv?raw=true'

In [8]:
df = pd.read_csv(url, parse_dates=['ObservationDate', 'Last Update'])

In [9]:
df

Unnamed: 0,SNo,ObservationDate,Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered
0,1,2020-01-22,Anhui,Mainland China,2020-01-22 17:00:00,1.0,0.0,0.0
1,2,2020-01-22,Beijing,Mainland China,2020-01-22 17:00:00,14.0,0.0,0.0
2,3,2020-01-22,Chongqing,Mainland China,2020-01-22 17:00:00,6.0,0.0,0.0
3,4,2020-01-22,Fujian,Mainland China,2020-01-22 17:00:00,1.0,0.0,0.0
4,5,2020-01-22,Gansu,Mainland China,2020-01-22 17:00:00,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...
83406,83407,2020-08-09,Zaporizhia Oblast,Ukraine,2020-08-10 04:34:55,937.0,22.0,652.0
83407,83408,2020-08-09,Zeeland,Netherlands,2020-08-10 04:34:55,918.0,69.0,0.0
83408,83409,2020-08-09,Zhejiang,Mainland China,2020-08-10 04:34:55,1273.0,1.0,1268.0
83409,83410,2020-08-09,Zhytomyr Oblast,Ukraine,2020-08-10 04:34:55,2131.0,41.0,1524.0


### Agora, vamos conferir os tipos das variáveis para verificar que as colunas foram corretamente importadas.

In [10]:
# Conferir os tipos de cada coluna
df.dtypes

SNo                         int64
ObservationDate    datetime64[ns]
Province/State             object
Country/Region             object
Last Update        datetime64[ns]
Confirmed                 float64
Deaths                    float64
Recovered                 float64
dtype: object

Nomes de colunas não devem ter letras maiúsculas e nem caracteres especiais. Vamos utilizar uma função para corrigir os nomes das colunas.

In [11]:
import re
def corrige_colunas(col_name):
    return re.sub(r"[/| ]", "", col_name).lower()

In [14]:
#Vamos corrigir todas as colunas de df
df.columns = [corrige_colunas(col) for col in df.columns]

In [13]:
df

Unnamed: 0,sno,observationdate,provincestate,countryregion,lastupdate,confirmed,deaths,recovered
0,1,2020-01-22,Anhui,Mainland China,2020-01-22 17:00:00,1.0,0.0,0.0
1,2,2020-01-22,Beijing,Mainland China,2020-01-22 17:00:00,14.0,0.0,0.0
2,3,2020-01-22,Chongqing,Mainland China,2020-01-22 17:00:00,6.0,0.0,0.0
3,4,2020-01-22,Fujian,Mainland China,2020-01-22 17:00:00,1.0,0.0,0.0
4,5,2020-01-22,Gansu,Mainland China,2020-01-22 17:00:00,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...
83406,83407,2020-08-09,Zaporizhia Oblast,Ukraine,2020-08-10 04:34:55,937.0,22.0,652.0
83407,83408,2020-08-09,Zeeland,Netherlands,2020-08-10 04:34:55,918.0,69.0,0.0
83408,83409,2020-08-09,Zhejiang,Mainland China,2020-08-10 04:34:55,1273.0,1.0,1268.0
83409,83410,2020-08-09,Zhytomyr Oblast,Ukraine,2020-08-10 04:34:55,2131.0,41.0,1524.0
