<h1>Group 38 Project Proposal: Determining the Connection Between Country Wealth and Tuberculosis Mortality <h1>
    
<img src="images/TB_img.jpg" alt="Tuberculosis under EM Microscope" width = "1000"/>
    
<font size="2"> <i>image attribution</i>: NIAID Mycobacterium tuberculosis Bacteria, the Cause of TB, CC BY 2.0 <https://creativecommons.org/licenses/by/4.0>, via Flickr at <https://www.flickr.com/photos/niaid/51637606937/in/photostream/> </font>

<h3>Introduction</h3>
<hr>

*Total Word Count:* ---

<h3>Preliminary Results</h3>
<hr>

In [46]:
library(tidyverse)
library(broom)
library(repr)
library(digest)
library(infer)
library(gridExtra)
options(repr.matrix.max.rows = 6)
options(repr.matrix.max.cols = 30)

For this investigation, we use two datasets: the **World Health Orginization (WHO)** Tuberculosis Mortality dataset, and the **OECD** GDP by country dataset.

We begin with the **WHO** dataset, which we'll wrangle into tidy data before encorperating the **OECD** GDP data.

In [50]:
# URL of the WHO dataset csv file
tb_url <- "https://apps.who.int/gho/athena/data/data-verbose.csv?target=GHO/MDG_0000000017,TB_e_mort_exc_tbhiv_num&profile=verbose&filter=COUNTRY:*;REGION:SEAR;&ead="

# Reading this csv file into a dataframe
tb_df <- read.csv(tb_url)

tb_df

GHO..CODE.,GHO..DISPLAY.,GHO..URL.,PUBLISHSTATE..CODE.,PUBLISHSTATE..DISPLAY.,PUBLISHSTATE..URL.,YEAR..CODE.,YEAR..DISPLAY.,YEAR..URL.,REGION..CODE.,REGION..DISPLAY.,REGION..URL.,COUNTRY..CODE.,COUNTRY..DISPLAY.,COUNTRY..URL.,Display.Value,Numeric,Low,High,StdErr,StdDev,Comments
<chr>,<chr>,<chr>,<chr>,<chr>,<lgl>,<int>,<int>,<lgl>,<chr>,<chr>,<lgl>,<chr>,<chr>,<lgl>,<chr>,<dbl>,<dbl>,<dbl>,<lgl>,<lgl>,<lgl>
TB_e_mort_exc_tbhiv_num,"Number of deaths due to tuberculosis, excluding HIV",https://www.who.int/data/gho/indicator-metadata-registry/imr-details/1425,PUBLISHED,Published,,2015,2015,,SEAR,South-East Asia,,BGD,Bangladesh,,66 000 [43 000-95 000],66000,43000,95000,,,
TB_e_mort_exc_tbhiv_num,"Number of deaths due to tuberculosis, excluding HIV",https://www.who.int/data/gho/indicator-metadata-registry/imr-details/1425,PUBLISHED,Published,,2015,2015,,SEAR,South-East Asia,,BTN,Bhutan,,130 [86-190],130,86,190,,,
TB_e_mort_exc_tbhiv_num,"Number of deaths due to tuberculosis, excluding HIV",https://www.who.int/data/gho/indicator-metadata-registry/imr-details/1425,PUBLISHED,Published,,2015,2015,,SEAR,South-East Asia,,IDN,Indonesia,,97 000 [91 000-102 000],97000,91000,102000,,,
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
MDG_0000000017,Deaths due to tuberculosis among HIV-negative people (per 100 000 population),https://www.who.int/data/gho/indicator-metadata-registry/imr-details/17,PUBLISHED,Published,,2012,2012,,SEAR,South-East Asia,,TLS,Timor-Leste,,78 [50-111],78,50,111,,,
MDG_0000000017,Deaths due to tuberculosis among HIV-negative people (per 100 000 population),https://www.who.int/data/gho/indicator-metadata-registry/imr-details/17,PUBLISHED,Published,,2013,2013,,SEAR,South-East Asia,,TLS,Timor-Leste,,84 [53-122],84,53,122,,,
MDG_0000000017,Deaths due to tuberculosis among HIV-negative people (per 100 000 population),https://www.who.int/data/gho/indicator-metadata-registry/imr-details/17,PUBLISHED,Published,,2014,2014,,SEAR,South-East Asia,,TLS,Timor-Leste,,90 [53-135],90,53,135,,,


As of now, however, this dataset is unsorted, too large, and filled with uneccessary metadata. We fix this with a series of operations to wrangle our data into tidy format with three columns: *Country*, *Year*, and *Number of Deaths due to TB*

In [51]:
# The current columns are difficult to reference due to their spaces. We use make.names here to make them referenceable
colnames(tb_df) <- make.names(colnames(tb_df))

tb_df <- tb_df %>%
    select(YEAR..CODE., COUNTRY..DISPLAY., Numeric) %>%
    rename("Year" = "YEAR..CODE.", "Country" = "COUNTRY..DISPLAY.", "Number_of_Deaths_due_to_TB" = "Numeric")

tb_df

Year,Country,Number_of_Deaths_due_to_TB
<int>,<chr>,<dbl>
2015,Bangladesh,66000
2015,Bhutan,130
2015,Indonesia,97000
⋮,⋮,⋮
2012,Timor-Leste,78
2013,Timor-Leste,84
2014,Timor-Leste,90


In [49]:
gdp_url <- "https://stats.oecd.org/sdmx-json/data/DP_LIVE/.GDP.TOT.USD_CAP.A/OECD?contentType=csv&detail=code&separator=comma&csv-lang=en&startPeriod=2000&endPeriod=2021"

df

LOCATION,INDICATOR,SUBJECT,MEASURE,FREQUENCY,TIME,Value,Flag.Codes
<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<dbl>,<chr>
AUS,GDP,TOT,USD_CAP,A,2000,28312.87,
AUS,GDP,TOT,USD_CAP,A,2001,29546.38,
AUS,GDP,TOT,USD_CAP,A,2002,30807.51,
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
SEN,GDP,TOT,USD_CAP,A,2018,3416.171,
SEN,GDP,TOT,USD_CAP,A,2019,3530.096,
SEN,GDP,TOT,USD_CAP,A,2020,3513.150,


<h3>Methods: Plan</h3>
<hr>

<h3>References</h3>
<hr>