In [1]:
# Packages
library(cowplot)
library(digest)
library(infer)
library(repr)
library(tidyverse)
library(dplyr)
library(datateachr)

-- [1mAttaching packages[22m ----------------------------------- tidyverse 1.3.1 --

[32mv[39m [34mggplot2[39m 3.3.5     [32mv[39m [34mpurrr  [39m 0.3.4
[32mv[39m [34mtibble [39m 3.1.6     [32mv[39m [34mdplyr  [39m 1.0.7
[32mv[39m [34mtidyr  [39m 1.1.4     [32mv[39m [34mstringr[39m 1.4.0
[32mv[39m [34mreadr  [39m 2.1.1     [32mv[39m [34mforcats[39m 0.5.1

-- [1mConflicts[22m -------------------------------------- tidyverse_conflicts() --
[31mx[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31mx[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



ERROR: Error in library(datateachr): there is no package called 'datateachr'


# What is the difference in life expectancy between developed and developing countries?

# Introduction

A country's developed/developing status is mainly based on their GDP and living standards. To look at an example of how a countries status may effect it’s life expectancy: Ethiopa (a developing country) and the United States (a developed country). By comparing the living standards between Ethiopia and the US, Freeman (2020) finds out that the latter has longer life expectancy due to various factors like community-based health strategies, improving access to safe water, female education and gender empowerment, and so on. However, there is another view which deems that with the improvement of modern technology, the impact from factors like mortality has been overstated for earlier periods (Preston, 1975). This raises the question, is there a difference in life expectancy between developed and developing countries, if so is it significant? The random variable is life expectancy, the location parameter is mean, and the scale parameter is standard deviation. Our population is all countries, our dataset contains data, provided by the WHO, contains annual data observations from 2000-2015 of 193 countries. Each observation contains information on the country and year as well as socioeconomic (government spending, GDP, schooling, alcohol use) and health (BMI, immunizations, diseases) factors


# Preliminary Results

In [None]:
life_expectancy_df <- read.csv(file = 'https://raw.githubusercontent.com/eahn01/stat201-group25/main/data/life-expectancy.csv')
head(life_expectancy_df)

First we split the data frame into two different data frames one containing developing countries, and the other containing developed countries and selected the columns needed which are country, year, status, life expectnacy. We also cleaned the dataset to filter out the N\A values.

In [None]:
developing_le <- life_expectancy_df|> 
                 filter(!is.na(Life.expectancy))|> 
                 filter(Status == "Developing")|> 
                 select(Country, Year, Status, Life.expectancy)

developed_le <- life_expectancy_df|> 
                 filter(!is.na(Life.expectancy))|> 
                 filter(Status == "Developed")|> 
                 select(Country, Year, Status, Life.expectancy)

head(developing_le)
head(developed_le)

In [None]:
developing_mean <- developing_le %>% 
                   summarize(mean_le = mean(Life.expectancy))|> 
                   as.numeric()

developed_mean <- developed_le %>% 
                  summarize(mean_le = mean(Life.expectancy))|> 
                  as.numeric()
developing_mean
developed_mean

In [None]:

developing_sampling_dist <- developing_le %>% 
    group_by(Country) %>% 
    summarize(mean_le = mean(Life.expectancy)) %>% 
        ggplot(aes(x = mean_le)) +
        geom_histogram(binwidth = 3, colour = "white") +
        xlab("Life Expectancy in years") +
        ggtitle("Sample Mean of Developing Countries's Life Expectancy") +
        geom_vline(xintercept = developing_mean, colour = "red", size = 1) 


developed_sampling_dist <- developed_le %>% 
    group_by(Country) %>% 
    summarize(mean_le = mean(Life.expectancy)) %>% 
        ggplot(aes(x = mean_le)) +
        geom_histogram(binwidth = 1, colour = "white") +
        xlab("Life Expectancy in years") +
        ggtitle("Sample Mean of Developed Countries's Life Expectancy") +
        geom_vline(xintercept = developed_mean, colour = "red", size = 1) 

developing_sampling_dist
developed_sampling_dist

We are expected to find the life expectancy in developed countries is greater than developing countries by implying hypothesis test and bootstrapping. Our findings would indicate a level of correlation between the condition of the national economy and lifespan. A later study could look into what ae the differences are between developing countries and developed countries that lead to the different length of life, and how do these factors affect the length of life.# Methods: Plan

We are expected to find the life expectancy in developed countries is greater than developing countries by implying hypothesis test and bootstrapping. Our findings would indicate a level of correlation between the condition of the national economy and lifespan. A later study could look into what ae the differences are between developing countries and developed countries that lead to the different length of life, and how do these factors affect the length of life.

# References

In [None]:
- Freeman, T., Gesesew, H. A., Bambra, C., Giugliani, E. R. J., Popay, J., Sanders, D., ... & Baum, F. (2020). Why do some countries do better or worse in life expectancy relative to income? An analysis of Brazil, Ethiopia, and the United States of America. International journal for equity in health, 19(1), 1-19. 

- Preston, S. H. (1975). The changing relation between mortality and level of economic development. Population studies, 29(2), 231-248.
