# Análise de dados da COVID-19

O intuito desse notebook é extrair informações sobre o desenvolvimento da COVID-19 (coronavírus) ao longo desse tempo de pandemia.

O dataset utilizado foi extraido diretamente do repositório *Our World in Data* disponível no github no endereço https://github.com/owid/covid-19-data/tree/master/public/data.

As descrições completas de cada variável (colunas do dataset) estão presentes no link acima, mas iremos descrever o que cada variável representa conforme o andamento do estudo em questão.

## Algumas conclusões a serem identificadas

Queremos identificar graficamente o andamento da COVID-19 com base no dataset disponibilizado. Iremos nos guiar a priori em extrair informações dos seguintes itens:

* **Número de casos por período tempo por país**
* **Número de casos por período de tempo por país vacinado**
* **Taxa de transmissão por número de vacinações por país**
* **Número de vacinações conduzidas por período de tempo por país** (taxa de vacinação por país)

## 01 - Importando o dataset e analisando algumas informações

In [1]:
import pandas as pd

In [2]:
covid_df = pd.read_csv(r"owid-covid-data.csv")

In [3]:
covid_df.head(5)

Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,...,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,excess_mortality_cumulative_absolute,excess_mortality_cumulative,excess_mortality,excess_mortality_cumulative_per_million
0,AFG,Asia,Afghanistan,2020-02-24,5.0,5.0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
1,AFG,Asia,Afghanistan,2020-02-25,5.0,0.0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
2,AFG,Asia,Afghanistan,2020-02-26,5.0,0.0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
3,AFG,Asia,Afghanistan,2020-02-27,5.0,0.0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
4,AFG,Asia,Afghanistan,2020-02-28,5.0,0.0,,,,,...,,,37.746,0.5,64.83,0.511,,,,


In [4]:
covid_df.tail(5)

Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,...,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,excess_mortality_cumulative_absolute,excess_mortality_cumulative,excess_mortality,excess_mortality_cumulative_per_million
154678,ZWE,Africa,Zimbabwe,2022-01-10,223000.0,1082.0,987.571,5180.0,32.0,19.0,...,1.6,30.7,36.791,1.7,61.49,0.571,,,,
154679,ZWE,Africa,Zimbabwe,2022-01-11,223000.0,0.0,760.286,5180.0,0.0,14.571,...,1.6,30.7,36.791,1.7,61.49,0.571,,,,
154680,ZWE,Africa,Zimbabwe,2022-01-12,224433.0,1433.0,768.0,5215.0,35.0,17.571,...,1.6,30.7,36.791,1.7,61.49,0.571,,,,
154681,ZWE,Africa,Zimbabwe,2022-01-13,225084.0,651.0,700.857,5222.0,7.0,16.286,...,1.6,30.7,36.791,1.7,61.49,0.571,,,,
154682,ZWE,Africa,Zimbabwe,2022-01-14,225637.0,553.0,622.143,5238.0,16.0,14.571,...,1.6,30.7,36.791,1.7,61.49,0.571,,,,


In [5]:
covid_df.sample(5)

Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,...,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,excess_mortality_cumulative_absolute,excess_mortality_cumulative,excess_mortality,excess_mortality_cumulative_per_million
109113,PAN,North America,Panama,2021-01-16,296269.0,2677.0,2785.286,4738.0,49.0,47.429,...,2.4,9.9,,2.3,78.51,0.815,,,,
133853,SWE,Europe,Sweden,2020-02-18,1.0,0.0,0.0,,,,...,18.8,18.9,,2.22,82.8,0.945,,,,
153358,ZMB,Africa,Zambia,2020-03-27,22.0,6.0,2.857,,,,...,3.1,24.7,13.938,2.0,63.89,0.584,,,,
48336,FLK,South America,Falkland Islands,2021-02-09,53.0,8.0,1.714,,,,...,,,,,81.44,,,,,
57855,GTM,North America,Guatemala,2020-09-26,89702.0,824.0,650.0,3213.0,27.0,15.429,...,,,76.665,0.6,74.3,0.663,,,,


In [6]:
# Número de linhas e colunas do dataset: (linhas, colunas)
covid_df.shape

(154683, 67)

In [7]:
# Informações sobre o dataset
covid_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 154683 entries, 0 to 154682
Data columns (total 67 columns):
 #   Column                                      Non-Null Count   Dtype  
---  ------                                      --------------   -----  
 0   iso_code                                    154683 non-null  object 
 1   continent                                   145377 non-null  object 
 2   location                                    154683 non-null  object 
 3   date                                        154683 non-null  object 
 4   total_cases                                 151890 non-null  float64
 5   new_cases                                   151844 non-null  float64
 6   new_cases_smoothed                          150693 non-null  float64
 7   total_deaths                                134569 non-null  float64
 8   new_deaths                                  134742 non-null  float64
 9   new_deaths_smoothed                         134610 non-null  float64
 

In [8]:
covid_df.columns.to_list()

['iso_code',
 'continent',
 'location',
 'date',
 'total_cases',
 'new_cases',
 'new_cases_smoothed',
 'total_deaths',
 'new_deaths',
 'new_deaths_smoothed',
 'total_cases_per_million',
 'new_cases_per_million',
 'new_cases_smoothed_per_million',
 'total_deaths_per_million',
 'new_deaths_per_million',
 'new_deaths_smoothed_per_million',
 'reproduction_rate',
 'icu_patients',
 'icu_patients_per_million',
 'hosp_patients',
 'hosp_patients_per_million',
 'weekly_icu_admissions',
 'weekly_icu_admissions_per_million',
 'weekly_hosp_admissions',
 'weekly_hosp_admissions_per_million',
 'new_tests',
 'total_tests',
 'total_tests_per_thousand',
 'new_tests_per_thousand',
 'new_tests_smoothed',
 'new_tests_smoothed_per_thousand',
 'positive_rate',
 'tests_per_case',
 'tests_units',
 'total_vaccinations',
 'people_vaccinated',
 'people_fully_vaccinated',
 'total_boosters',
 'new_vaccinations',
 'new_vaccinations_smoothed',
 'total_vaccinations_per_hundred',
 'people_vaccinated_per_hundred',
 'peo

In [41]:
# Dropando as linhas com new_cases < 0

# Indices com new_cases < 0
covid_df.loc[covid_df["new_cases"] < 0].index

Int64Index([    25,    679,   4893,   5203,  10980,  15214,  19948,  21820,
             29486,  29902,  33252,  36747,  36953,  38137,  39726,  40518,
             40519,  40522,  40641,  42029,  48587,  48594,  49560,  49561,
             50180,  50183,  50199,  50205,  50230,  50239,  50240,  50265,
             51422,  52336,  54177,  54226,  54995,  55340,  55371,  56596,
             60030,  62144,  63346,  63467,  64510,  68833,  68897,  68910,
             69718,  70120,  70340,  70533,  72949,  73990,  79779,  80332,
             81236,  81237,  84096,  85377,  88873,  89398,  90548,  93142,
             96778,  98736, 100051, 100760, 100768, 101038, 101140, 102185,
            109623, 112258, 113112, 118574, 123471, 123595, 123774, 127609,
            129111, 131141, 131172, 131453, 131737, 133033, 136104, 136110,
            138128, 142617, 143477, 143510, 145948, 145987, 147707, 152827,
            154060],
           dtype='int64')

In [43]:
covid_df.drop(labels=covid_df.loc[covid_df["new_cases"] < 0].index, axis=0, inplace=True)

## 01.1 - Extraindo as variáveis (colunas do dataset) a serem usadas na análise

In [45]:
covid_df2 = covid_df[['iso_code',
                     'location',
                     'date',
                     'total_cases',
                     'new_cases',
                     'total_deaths',
                     'new_deaths',
                     'reproduction_rate',
                     'icu_patients',
                     'hosp_patients',
                     'positive_rate',
                     'total_vaccinations',
                     'people_vaccinated',
                     'people_fully_vaccinated',
                     'stringency_index',
                     'population',
                     'median_age',
                     'aged_65_older',
                     'aged_70_older',
                     'gdp_per_capita',
                     'female_smokers',
                     'male_smokers',
                     'human_development_index']]

In [46]:
covid_df2.sample(5)

Unnamed: 0,iso_code,location,date,total_cases,new_cases,total_deaths,new_deaths,reproduction_rate,icu_patients,hosp_patients,...,people_fully_vaccinated,stringency_index,population,median_age,aged_65_older,aged_70_older,gdp_per_capita,female_smokers,male_smokers,human_development_index
137638,TZA,Tanzania,2021-03-31,509.0,0.0,21.0,0.0,0.0,,,...,,8.33,61498438.0,17.7,3.108,1.874,2683.304,3.3,26.7,0.529
151261,VNM,Vietnam,2021-04-24,2833.0,3.0,35.0,0.0,1.3,,,...,,51.39,98168829.0,32.6,7.15,4.718,6171.884,1.0,45.9,0.704
111109,PER,Peru,2020-10-14,853974.0,0.0,84411.0,148.0,0.93,,,...,,82.41,33359415.0,29.1,7.151,4.455,12236.706,4.8,,0.777
129880,KOR,South Korea,2020-08-05,14499.0,43.0,302.0,0.0,0.94,14.0,,...,,53.24,51305184.0,43.4,13.914,8.622,35938.374,6.2,40.9,0.916
3819,AGO,Angola,2021-03-17,21489.0,43.0,522.0,0.0,1.14,,,...,,58.33,33933611.0,16.8,2.405,1.362,5819.495,,,0.581


### 01.1.1 - Descrição dos dados selecionados para análise

* iso_code:	ISO 3166-1 alpha-3 – three-letter country codes


* location:	Geographical location


* date:	Date of observation


* total_cases:	Total confirmed cases of COVID-19


* new_cases:	New confirmed cases of COVID-19


* total_deaths:	Total deaths attributed to COVID-19


* new_deaths:	New deaths attributed to COVID-19


* reproduction_rate:	Real-time estimate of the effective reproduction rate (R) of COVID-19


* icu_patients:	Number of COVID-19 patients in intensive care units (ICUs) on a given day


* hosp_patients	Number: of COVID-19 patients in hospital on a given day


* positive_rate	The share of COVID-19 tests that are positive, given as a rolling 7-day average (this is the inverse of tests_per_case)


* total_vaccinations:	Total number of COVID-19 vaccination doses administered


* people_fully_vaccinated:	Total number of people who received all doses prescribed by the vaccination protocol


* population


* stringency_index	Government Response Stringency Index: composite measure based on 9 response indicators including school closures, workplace closures, and travel bans, rescaled to a value from 0 to 100 (100 = strictest response)


* median_age:	Median age of the population, UN projection for 2020


* aged_65_older:	Share of the population that is 65 years and older, most recent year available


* aged_70_older:	Share of the population that is 70 years and older in 2015


## 02 - Atualizando os tipos das variáveis

Vamos converter nossas variáveis (colunas) para os tipos corretos para melhorar nossa perfomance e não ocorrer nenhum problema durante nossa análise.

In [47]:
covid_df2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 154586 entries, 0 to 154682
Data columns (total 23 columns):
 #   Column                   Non-Null Count   Dtype  
---  ------                   --------------   -----  
 0   iso_code                 154586 non-null  object 
 1   location                 154586 non-null  object 
 2   date                     154586 non-null  object 
 3   total_cases              151793 non-null  float64
 4   new_cases                151747 non-null  float64
 5   total_deaths             134490 non-null  float64
 6   new_deaths               134664 non-null  float64
 7   reproduction_rate        116378 non-null  float64
 8   icu_patients             20367 non-null   float64
 9   hosp_patients            21536 non-null   float64
 10  positive_rate            69490 non-null   float64
 11  total_vaccinations       40062 non-null   float64
 12  people_vaccinated        38332 non-null   float64
 13  people_fully_vaccinated  35501 non-null   float64
 14  stri

In [48]:
covid_df2.iso_code = covid_df2.iso_code.astype("category")
covid_df2.location = covid_df2.location.astype("category")
covid_df2.date = pd.to_datetime(covid_df2.date)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [49]:
covid_df2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 154586 entries, 0 to 154682
Data columns (total 23 columns):
 #   Column                   Non-Null Count   Dtype         
---  ------                   --------------   -----         
 0   iso_code                 154586 non-null  category      
 1   location                 154586 non-null  category      
 2   date                     154586 non-null  datetime64[ns]
 3   total_cases              151793 non-null  float64       
 4   new_cases                151747 non-null  float64       
 5   total_deaths             134490 non-null  float64       
 6   new_deaths               134664 non-null  float64       
 7   reproduction_rate        116378 non-null  float64       
 8   icu_patients             20367 non-null   float64       
 9   hosp_patients            21536 non-null   float64       
 10  positive_rate            69490 non-null   float64       
 11  total_vaccinations       40062 non-null   float64       
 12  people_vaccinate

## 03 - Explorando os dados

In [50]:
covid_df2.head(20)

Unnamed: 0,iso_code,location,date,total_cases,new_cases,total_deaths,new_deaths,reproduction_rate,icu_patients,hosp_patients,...,people_fully_vaccinated,stringency_index,population,median_age,aged_65_older,aged_70_older,gdp_per_capita,female_smokers,male_smokers,human_development_index
0,AFG,Afghanistan,2020-02-24,5.0,5.0,,,,,,...,,8.33,39835428.0,18.6,2.581,1.337,1803.987,,,0.511
1,AFG,Afghanistan,2020-02-25,5.0,0.0,,,,,,...,,8.33,39835428.0,18.6,2.581,1.337,1803.987,,,0.511
2,AFG,Afghanistan,2020-02-26,5.0,0.0,,,,,,...,,8.33,39835428.0,18.6,2.581,1.337,1803.987,,,0.511
3,AFG,Afghanistan,2020-02-27,5.0,0.0,,,,,,...,,8.33,39835428.0,18.6,2.581,1.337,1803.987,,,0.511
4,AFG,Afghanistan,2020-02-28,5.0,0.0,,,,,,...,,8.33,39835428.0,18.6,2.581,1.337,1803.987,,,0.511
5,AFG,Afghanistan,2020-02-29,5.0,0.0,,,,,,...,,8.33,39835428.0,18.6,2.581,1.337,1803.987,,,0.511
6,AFG,Afghanistan,2020-03-01,5.0,0.0,,,,,,...,,27.78,39835428.0,18.6,2.581,1.337,1803.987,,,0.511
7,AFG,Afghanistan,2020-03-02,5.0,0.0,,,,,,...,,27.78,39835428.0,18.6,2.581,1.337,1803.987,,,0.511
8,AFG,Afghanistan,2020-03-03,5.0,0.0,,,,,,...,,27.78,39835428.0,18.6,2.581,1.337,1803.987,,,0.511
9,AFG,Afghanistan,2020-03-04,5.0,0.0,,,,,,...,,27.78,39835428.0,18.6,2.581,1.337,1803.987,,,0.511


In [51]:
covid_df2.tail(20)

Unnamed: 0,iso_code,location,date,total_cases,new_cases,total_deaths,new_deaths,reproduction_rate,icu_patients,hosp_patients,...,people_fully_vaccinated,stringency_index,population,median_age,aged_65_older,aged_70_older,gdp_per_capita,female_smokers,male_smokers,human_development_index
154663,ZWE,Zimbabwe,2021-12-26,204351.0,605.0,4891.0,6.0,0.79,,,...,3111352.0,52.78,15092171.0,19.6,2.822,1.882,1899.775,1.6,30.7,0.571
154664,ZWE,Zimbabwe,2021-12-27,205449.0,1098.0,4908.0,17.0,0.78,,,...,3113142.0,52.78,15092171.0,19.6,2.822,1.882,1899.775,1.6,30.7,0.571
154665,ZWE,Zimbabwe,2021-12-28,207548.0,2099.0,4940.0,32.0,0.77,,,...,3115190.0,52.78,15092171.0,19.6,2.822,1.882,1899.775,1.6,30.7,0.571
154666,ZWE,Zimbabwe,2021-12-29,207548.0,0.0,4940.0,0.0,0.76,,,...,3121776.0,52.78,15092171.0,19.6,2.822,1.882,1899.775,1.6,30.7,0.571
154667,ZWE,Zimbabwe,2021-12-30,211728.0,4180.0,4997.0,57.0,0.8,,,...,,52.78,15092171.0,19.6,2.822,1.882,1899.775,1.6,30.7,0.571
154668,ZWE,Zimbabwe,2021-12-31,213258.0,1530.0,5004.0,7.0,0.79,,,...,3135168.0,52.78,15092171.0,19.6,2.822,1.882,1899.775,1.6,30.7,0.571
154669,ZWE,Zimbabwe,2022-01-01,214214.0,956.0,5017.0,13.0,0.77,,,...,3140338.0,52.78,15092171.0,19.6,2.822,1.882,1899.775,1.6,30.7,0.571
154670,ZWE,Zimbabwe,2022-01-02,214214.0,0.0,5017.0,0.0,0.75,,,...,3144021.0,52.78,15092171.0,19.6,2.822,1.882,1899.775,1.6,30.7,0.571
154671,ZWE,Zimbabwe,2022-01-03,216087.0,1873.0,5047.0,30.0,0.74,,,...,3147652.0,52.78,15092171.0,19.6,2.822,1.882,1899.775,1.6,30.7,0.571
154672,ZWE,Zimbabwe,2022-01-04,217678.0,1591.0,5078.0,31.0,0.71,,,...,3153051.0,52.78,15092171.0,19.6,2.822,1.882,1899.775,1.6,30.7,0.571


In [52]:
# Analisando o número de paises registrados
countries_list = covid_df2["location"].value_counts().index.to_list()
print(f"Número de países analisados: {len(countries_list)}")

Número de países analisados: 238


In [53]:
countries_list

['Peru',
 'Argentina',
 'Mexico',
 'Thailand',
 'Taiwan',
 'High income',
 'Macao',
 'Upper middle income',
 'North America',
 'World',
 'United States',
 'Asia',
 'Japan',
 'South Korea',
 'Lower middle income',
 'Vietnam',
 'Singapore',
 'Europe',
 'European Union',
 'Malaysia',
 'China',
 'Oceania',
 'Hong Kong',
 'Nepal',
 'Australia',
 'Canada',
 'Cambodia',
 'Sri Lanka',
 'Germany',
 'United Arab Emirates',
 'Fiji',
 'India',
 'Finland',
 'Russia',
 'Philippines',
 'France',
 'Sweden',
 'Italy',
 'United Kingdom',
 'Slovenia',
 'Estonia',
 'Denmark',
 'Belgium',
 'Spain',
 'Morocco',
 'South Africa',
 'International',
 'Africa',
 'Egypt',
 'Guatemala',
 'Iran',
 'Lebanon',
 'Chile',
 'Low income',
 'Israel',
 'South America',
 'Greece',
 'Kuwait',
 'Bahrain',
 'Oman',
 'Iraq',
 'Albania',
 'Algeria',
 'Austria',
 'Luxembourg',
 'Pakistan',
 'Switzerland',
 'Croatia',
 'Romania',
 'Georgia',
 'Afghanistan',
 'Latvia',
 'Serbia',
 'Norway',
 'North Macedonia',
 'Netherlands',
 'Bra

## 03.1 - Explorando os dados do Brasil

In [54]:
# Consultando os dados do Brasil
covid_df2.loc[covid_df2["location"] == "Brazil"].head(30)

Unnamed: 0,iso_code,location,date,total_cases,new_cases,total_deaths,new_deaths,reproduction_rate,icu_patients,hosp_patients,...,people_fully_vaccinated,stringency_index,population,median_age,aged_65_older,aged_70_older,gdp_per_capita,female_smokers,male_smokers,human_development_index
19832,BRA,Brazil,2020-02-26,1.0,1.0,,,,,,...,,5.56,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
19833,BRA,Brazil,2020-02-27,1.0,0.0,,,,,,...,,5.56,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
19834,BRA,Brazil,2020-02-28,1.0,0.0,,,,,,...,,5.56,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
19835,BRA,Brazil,2020-02-29,2.0,1.0,,,,,,...,,5.56,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
19836,BRA,Brazil,2020-03-01,2.0,0.0,,,,,,...,,5.56,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
19837,BRA,Brazil,2020-03-02,2.0,0.0,,,,,,...,,11.11,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
19838,BRA,Brazil,2020-03-03,2.0,0.0,,,,,,...,,11.11,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
19839,BRA,Brazil,2020-03-04,4.0,2.0,,,,,,...,,11.11,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
19840,BRA,Brazil,2020-03-05,4.0,0.0,,,,,,...,,11.11,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
19841,BRA,Brazil,2020-03-06,13.0,9.0,,,,,,...,,11.11,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765


In [55]:
covid_df2.loc[covid_df2["location"] == "Brazil"].tail(30)

Unnamed: 0,iso_code,location,date,total_cases,new_cases,total_deaths,new_deaths,reproduction_rate,icu_patients,hosp_patients,...,people_fully_vaccinated,stringency_index,population,median_age,aged_65_older,aged_70_older,gdp_per_capita,female_smokers,male_smokers,human_development_index
20491,BRA,Brazil,2021-12-16,22209368.0,3661.0,617675.0,123.0,0.76,,,...,141175069.0,,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
20492,BRA,Brazil,2021-12-17,22213731.0,4363.0,617873.0,198.0,0.78,,,...,141411665.0,,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
20493,BRA,Brazil,2021-12-18,22216112.0,2381.0,618011.0,138.0,0.8,,,...,141474540.0,,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
20494,BRA,Brazil,2021-12-19,22217570.0,1458.0,618066.0,55.0,0.82,,,...,141488098.0,,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
20495,BRA,Brazil,2021-12-20,22220209.0,2639.0,618149.0,83.0,0.84,,,...,141702263.0,,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
20496,BRA,Brazil,2021-12-21,22223943.0,3734.0,618220.0,71.0,0.87,,,...,141915573.0,,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
20497,BRA,Brazil,2021-12-22,22227169.0,3226.0,618366.0,146.0,0.91,,,...,142378728.0,,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
20498,BRA,Brazil,2021-12-23,22230943.0,3774.0,618492.0,126.0,0.96,,,...,,,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
20499,BRA,Brazil,2021-12-24,22234623.0,3680.0,618655.0,163.0,1.02,,,...,,,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
20500,BRA,Brazil,2021-12-25,22238297.0,3674.0,618686.0,31.0,1.08,,,...,,,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765


In [89]:
covid_BRA_df = covid_df2.loc[covid_df2["location"] == "Brazil"].reset_index(drop=True)

In [90]:
covid_BRA_df.head()

Unnamed: 0,iso_code,location,date,total_cases,new_cases,total_deaths,new_deaths,reproduction_rate,icu_patients,hosp_patients,...,people_fully_vaccinated,stringency_index,population,median_age,aged_65_older,aged_70_older,gdp_per_capita,female_smokers,male_smokers,human_development_index
0,BRA,Brazil,2020-02-26,1.0,1.0,,,,,,...,,5.56,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
1,BRA,Brazil,2020-02-27,1.0,0.0,,,,,,...,,5.56,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
2,BRA,Brazil,2020-02-28,1.0,0.0,,,,,,...,,5.56,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
3,BRA,Brazil,2020-02-29,2.0,1.0,,,,,,...,,5.56,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765
4,BRA,Brazil,2020-03-01,2.0,0.0,,,,,,...,,5.56,213993441.0,33.5,8.552,5.06,14103.452,10.1,17.9,0.765


In [91]:
covid_BRA_df.shape

(688, 23)

In [92]:
covid_BRA_df.loc[covid_BRA_df["new_cases"] < 0].index

Int64Index([], dtype='int64')

In [21]:
import plotly
import plotly.graph_objs as go
import plotly.offline as py

In [22]:
plotly.offline.init_notebook_mode(connected=True)

In [150]:
trace1 = go.Scatter(x=covid_BRA_df["date"],
                   y=covid_BRA_df["total_cases"] / (10),
                   name="Linha de evolução dos casos de COVID-19 por 10 pessoas",
                   line=dict(color="#4169E1"))

trace2 = go.Scatter(x=covid_BRA_df["date"],
                   y=covid_BRA_df["total_deaths"],
                   name="Linha de evolução das mortes por COVID-19",
                   line=dict(color="#FF0000"))

data = [trace1, trace2]

layout = go.Layout(title="Evolução da COVID-19 no Brasil - Evolução total no período",
                  xaxis={"title" : "Dias ao longo dos meses"})

fig = go.Figure(data=data, layout=layout)

# Para fazer o upload atualizamos py -> pyo e iplot -> plot
pyo.plot(fig, filename="Evolução da COVID-19 no Brasil - Evolução total no período")

'https://plotly.com/~Leonardo.Cesar/1/'

In [151]:
trace1 = go.Scatter(x=covid_BRA_df["date"],
                   y=covid_BRA_df["new_cases"] / 10,
                   name="Número de casos por dia por 10 pessoas",
                   line=dict(color="#4169E1"))

trace2 = go.Scatter(x=covid_BRA_df["date"],
                   y=covid_BRA_df["new_deaths"],
                   name="Número de mortes novas por dia",
                   line=dict(color="#FF0000"))

data = [trace1, trace2]

layout = go.Layout(title="Evolução da COVID-19 no Brasil - Evolução diária no período",
                  xaxis={"title" : "Dias ao longo dos meses"})

fig = go.Figure(data=data, layout=layout)

# Para fazer o upload atualizamos py -> pyo e iplot -> plot
pyo.plot(fig, filename="Número de casos de COVID-19 por dia no Brasil - Evolução diária no período")

'https://plotly.com/~Leonardo.Cesar/3/'

## 03.1 - Explorando os dados dos EUA

### 03.1.1 - Evolução temporal da COVID-19

In [95]:
covid_USA_df = covid_df2.loc[covid_df2["location"] == "United States"].reset_index(drop=True)

In [96]:
covid_USA_df.head()

Unnamed: 0,iso_code,location,date,total_cases,new_cases,total_deaths,new_deaths,reproduction_rate,icu_patients,hosp_patients,...,people_fully_vaccinated,stringency_index,population,median_age,aged_65_older,aged_70_older,gdp_per_capita,female_smokers,male_smokers,human_development_index
0,USA,United States,2020-01-22,1.0,,,,,,,...,,0.0,332915074.0,38.3,15.413,9.732,54225.446,19.1,24.6,0.926
1,USA,United States,2020-01-23,1.0,0.0,,,,,,...,,0.0,332915074.0,38.3,15.413,9.732,54225.446,19.1,24.6,0.926
2,USA,United States,2020-01-24,2.0,1.0,,,,,,...,,0.0,332915074.0,38.3,15.413,9.732,54225.446,19.1,24.6,0.926
3,USA,United States,2020-01-25,2.0,0.0,,,,,,...,,0.0,332915074.0,38.3,15.413,9.732,54225.446,19.1,24.6,0.926
4,USA,United States,2020-01-26,5.0,3.0,,,,,,...,,0.0,332915074.0,38.3,15.413,9.732,54225.446,19.1,24.6,0.926


In [97]:
covid_USA_df.shape

(724, 23)

In [152]:
trace1 = go.Scatter(x=covid_USA_df["date"],
                   y=covid_USA_df["total_cases"] / 10,
                   name="Linha de evolução dos casos de COVID-19 por 10 pessoas",
                   line=dict(color="#4169E1"))

trace2 = go.Scatter(x=covid_USA_df["date"],
                   y=covid_USA_df["total_deaths"],
                   name="Linha de evolução das mortes por COVID-19",
                   line=dict(color="#FF0000"))

data = [trace1, trace2]

layout = go.Layout(title="Evolução da COVID-19 no Estados Unidos - Evolução total no período",
                  xaxis={"title" : "Dias ao longo dos meses"})

fig = go.Figure(data=data, layout=layout)

pyo.plot(fig, filename="Evolução da COVID-19 no Estados Unidos - Evolução total no período")

'https://plotly.com/~Leonardo.Cesar/5/'

In [153]:
trace1 = go.Scatter(x=covid_USA_df["date"],
                   y=covid_USA_df["new_cases"] / 100,
                   name="Número de casos por dia por 100 pessoas",
                   line=dict(color="#4169E1"))

trace2 = go.Scatter(x=covid_USA_df["date"],
                   y=covid_USA_df["new_deaths"],
                   name="Número de mortes novas por dia",
                   line=dict(color="#FF0000"))

data = [trace1, trace2]

layout = go.Layout(title="Evolução da COVID-19 no Estdos Unidos - Evolução diária no período",
                  xaxis={"title" : "Dias ao longo dos meses"})

fig = go.Figure(data=data, layout=layout)

pyo.plot(fig, filename="Número de casos de COVID-19 por dia no Estados Unidos - Evolução diária no período")

'https://plotly.com/~Leonardo.Cesar/7/'

## 03.3 - Explorando os dados do Brasil e dos Estados Unidos de modo conjunto

In [154]:
# Evolução temporal da COVID-19 no Brasil
trace1 = go.Scatter(x=covid_BRA_df["date"],
                   y=covid_BRA_df["total_cases"],
                   name="Linha de evolução dos casos de COVID-19 no Brasil",
                   line=dict(color="#008000"))

# Evolução temporal da COVID-19 no Estados Unidos
trace2 = go.Scatter(x=covid_USA_df["date"],
                   y=covid_USA_df["total_cases"],
                   name="Linha de evolução dos casos de COVID-19 no Estados Unidos",
                   line=dict(color="#4682B4"))

data = [trace1, trace2]

layout = go.Layout(title="Evolução da COVID-19 no Brasil e Estados Unidos - Evolução total no período",
                  xaxis={"title" : "Dias ao longo dos meses"}, 
                  yaxis={"title" : "Número total de casos"})

fig = go.Figure(data=data, layout=layout)

pyo.plot(fig, filename="Evolução da COVID-19 no Brasil e Estados Unidos - Evolução total no período")

'https://plotly.com/~Leonardo.Cesar/9/'

In [155]:
# Evolução temporal da COVID-19 no Brasil
trace1 = go.Scatter(x=covid_BRA_df["date"],
                   y=covid_BRA_df["new_cases"],
                   name="Linha de evolução dos casos de COVID-19 no Brasil",
                   line=dict(color="#008000"))

# Evolução temporal da COVID-19 no Estados Unidos
trace2 = go.Scatter(x=covid_USA_df["date"],
                   y=covid_USA_df["new_cases"],
                   name="Linha de evolução dos casos de COVID-19 no Estados Unidos",
                   line=dict(color="#4682B4"))

data = [trace1, trace2]

layout = go.Layout(title="Evolução da COVID-19 no Brasil e Estados Unidos - Número de casos por dia",
                  yaxis={"title" : "Número de casos"},
                  xaxis=dict(title="Dias ao longo dos meses",
                            rangeselector=dict(
                                                buttons=list([
                                                    dict(count=1,
                                                         label='periodo=1mes',
                                                         step='month',
                                                         stepmode='backward'),
                                                    dict(count=6,
                                                         label='periodo=6mes',
                                                         step='month',
                                                         stepmode='backward'),
                                                    dict(count=1,
                                                         label='periodo=1ano',
                                                         step='year',
                                                         stepmode='backward'),
                                                    dict(step='all')
                                                ])),
                            rangeslider=dict(
                                visible = True
                            ),
                            type='date'))

fig = go.Figure(data=data, layout=layout)

pyo.plot(fig, filename="Evolução da COVID-19 no Brasil e Estados Unidos - Evolução diária no período")

'https://plotly.com/~Leonardo.Cesar/11/'

### 03.1.2 - Relação de vacinados e não vacinados entre Brasil e Estados Unidos

In [111]:
# Dados referentes ao Brasil
covid_BRA_df.columns

Index(['iso_code', 'location', 'date', 'total_cases', 'new_cases',
       'total_deaths', 'new_deaths', 'reproduction_rate', 'icu_patients',
       'hosp_patients', 'positive_rate', 'total_vaccinations',
       'people_vaccinated', 'people_fully_vaccinated', 'stringency_index',
       'population', 'median_age', 'aged_65_older', 'aged_70_older',
       'gdp_per_capita', 'female_smokers', 'male_smokers',
       'human_development_index'],
      dtype='object')

In [116]:
vaccinations_BRA = covid_BRA_df[["date", "total_vaccinations", "people_vaccinated"]].copy()

In [117]:
vaccinations_BRA.head()

Unnamed: 0,date,total_vaccinations,people_vaccinated
0,2020-02-26,,
1,2020-02-27,,
2,2020-02-28,,
3,2020-02-29,,
4,2020-03-01,,


In [129]:
vaccination_list_BRA = (vaccinations_BRA["people_vaccinated"].isnull() == False).to_list()

# Lista dos indices com registro de vacinação
index_of_vaccination_BRA = []
for index, vac in enumerate(vaccination_list_BRA):
    if vac == True:
        index_of_vaccination_BRA.append(index)

In [134]:
# Número de dias de registro
len(index_of_vaccination_BRA)

353

In [137]:
vac_BRA = vaccinations_BRA.iloc[index_of_vaccination_BRA].copy()
vac_BRA

Unnamed: 0,date,total_vaccinations,people_vaccinated
325,2021-01-17,112.0,112.0
326,2021-01-18,1109.0,1109.0
327,2021-01-19,13527.0,13527.0
328,2021-01-20,105824.0,105824.0
329,2021-01-21,193699.0,193699.0
...,...,...,...
683,2022-01-10,336825920.0,166713275.0
684,2022-01-11,337655171.0,166769657.0
685,2022-01-12,338386118.0,166772188.0
686,2022-01-13,339257036.0,166833338.0


In [131]:
# Dados referentes ao Estados Unidos
vaccinations_USA = covid_USA_df[["date", "total_vaccinations", "people_vaccinated"]].copy()

In [132]:
vaccination_list_USA = (vaccinations_USA["people_vaccinated"].isnull() == False).to_list()

# Lista dos indices com registro de vacinação
index_of_vaccination_USA = []
for index, vac in enumerate(vaccination_list_USA):
    if vac == True:
        index_of_vaccination_USA.append(index)

In [135]:
# Número de dias de registro
len(index_of_vaccination_USA)

398

In [138]:
vac_USA = vaccinations_USA.iloc[index_of_vaccination_USA].copy()
vac_USA

Unnamed: 0,date,total_vaccinations,people_vaccinated
326,2020-12-13,25556.0,21549.0
327,2020-12-14,29989.0,25842.0
328,2020-12-15,79166.0,72791.0
329,2020-12-16,238663.0,226807.0
330,2020-12-17,510610.0,491606.0
...,...,...,...
719,2022-01-10,523624949.0,247798779.0
720,2022-01-11,524740160.0,248101414.0
721,2022-01-12,525499067.0,248306493.0
722,2022-01-13,525604448.0,248338447.0


In [144]:
# População total no Brasil e no Estados Unidos
population_BRA = covid_BRA_df["population"].iloc[687]
population_USA = covid_USA_df["population"].iloc[723]

In [156]:
# Evolução temporal da vacinação contra a COVID-19 no Brasil
trace1 = go.Scatter(x=vac_BRA["date"],
                   y=vac_BRA["people_vaccinated"],
                   name="Linha de evolução da vacinação contra a COVID-19 no Brasil",
                   line=dict(color="#008000"))

# Evolução temporal da vacinação contra a COVID-19 no Estados Unidos
trace2 = go.Scatter(x=vac_USA["date"],
                   y=vac_USA["people_vaccinated"],
                   name="Linha de evolução da vacinação contra a COVID-19 no Estados Unidos",
                   line=dict(color="#4682B4"))

data = [trace1, trace2]

layout = go.Layout(title="Evolução da vacinação no Brasil e no Estados Unidos",
                  yaxis={"title" : "Número de vacinados"},
                  xaxis=dict(title="Dias ao longo dos meses",
                            rangeselector=dict(
                                                buttons=list([
                                                    dict(count=1,
                                                         label='periodo=1mes',
                                                         step='month',
                                                         stepmode='backward'),
                                                    dict(count=6,
                                                         label='periodo=6mes',
                                                         step='month',
                                                         stepmode='backward'),
                                                    dict(count=1,
                                                         label='periodo=1ano',
                                                         step='year',
                                                         stepmode='backward'),
                                                    dict(step='all')
                                                ])),
                            rangeslider=dict(
                                visible = True
                            ),
                            type='date'))

fig = go.Figure(data=data, layout=layout)

# Linha de população total do Brasil
fig.add_hline(y=population_BRA, line_color="#008000",
              line_dash="dot",
             annotation_text="População total do BRA")

# Linha de população total do Estados Unidos
fig.add_hline(y=population_USA, line_color="#4682B4",
              line_dash="dot",
             annotation_text="População total do EUA")

pyo.plot(fig, filename="Evolução da vacinação no Brasil e no Estados Unidos")

'https://plotly.com/~Leonardo.Cesar/13/'

In [148]:
import chart_studio
chart_studio.tools.set_credentials_file(username="Leonardo.Cesar", api_key="---------------------")

In [149]:
import chart_studio.plotly as pyo