#Case Study on Countries based on their HDI (Human Development Index)

HDI is a parameter that measures the quality of life of the people of a country based on the parameters listed below:
1. Life expectancy at birth
2. People's overall knowledge (adult literacy, rate of enrollment of students from primary schools upto the university level)
3. Standard of Living

Countries' comparisons have been made based on the following parameters:
1. Continent on which the country is located
2. Country's HDI value in 2022 (latest year's data)
3. Country's HDI value in 2021
4. Change in country's HDI value in 2022 w.r.t. year 2021
5. Mean years of schooling
6. Gross national per capital income

Some other parameters that are also present for reference include:
1. Life expectancy at birth
2. Expected years of schooling


#Caution:

Please change the colab's environment to **R** before running the codes in this report, else you may get syntax errors while running the codes in this report.



In [None]:
filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)

head(data, 5)
# just to verify whether the program is able to fetch the file correctly or not

Unnamed: 0_level_0,X,Country,HDI_Tier,HDI_2022,HDI_2021,HDI_1990,Continent,HDI_Rank,Life_Expectancy_At_Birth_Years_2022,Expected_Years_Of_Schooling_Years_2022,Mean_Years_Of_Schooling_Years_2022,Gross_National_Income_Per_Capital_2022,Change_HDI_2022_21,Result_HDI_2022_21
Unnamed: 0_level_1,<int>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<int>,<dbl>,<chr>
1,1,Afghanistan,Low,0.462,0.473,0.284,Asia,182,62.9,10.7,2.5,1335,-0.011,Negative / Deterioration
2,2,Albania,High,0.789,0.785,0.649,Europe,74,76.8,14.5,10.1,15293,0.004,Positive / Improvement
3,3,Algeria,High,0.745,0.74,0.593,Africa,93,77.1,15.5,7.0,10978,0.005,Positive / Improvement
4,4,Argentina,Very High,0.849,0.844,0.724,South America,48,76.1,19.0,11.1,22048,0.005,Positive / Improvement
5,5,Armenia,High,0.786,0.774,0.658,Asia,76,73.4,14.4,11.3,15388,0.012,Positive / Improvement


In [None]:
# Countries having highest HDI from all the continents

filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)

fac_cont <- factor(data$Continent)
continent_list <- c(levels(fac_cont))

# Initialize an empty list
temp <- list()

for(i in 1: length(continent_list)){
  filtered_data <- subset(data, Continent == continent_list[i], select = c(Country, Continent, HDI_2022))
  # Append the result to the list
  temp[[i]] <- subset(filtered_data, HDI_2022 == max(HDI_2022))
}

# Access each element in the list
for(i in 1: length(continent_list)){
cat("Continent:", continent_list[i], "\n")
cat("Country with highest HDI:", temp[[i]]$Country, "\n")
cat("HDI Value:", temp[[i]]$HDI_2022, "\n\n")
}

# No, you can't use a vector for temp in this R code. You are iteratively subsetting your data based on
# continents and storing the result of each iteration. Vectors in R are designed to hold elements of the
# same data type, whereas, in this case, you are trying to store a list of dataframes. Using a list
# allows you to store and access these different dataframes effectively.


Continent: Africa 
Country with highest HDI: Mauritius 
HDI Value: 0.796 

Continent: Asia 
Country with highest HDI: Singapore 
HDI Value: 0.949 

Continent: Europe 
Country with highest HDI: Switzerland 
HDI Value: 0.967 

Continent: North America 
Country with highest HDI: Canada 
HDI Value: 0.935 

Continent: Oceania 
Country with highest HDI: Australia 
HDI Value: 0.946 

Continent: South America 
Country with highest HDI: Chile 
HDI Value: 0.86 

Continent: Transcontinental 
Country with highest HDI: Cyprus 
HDI Value: 0.907 



In [None]:
# Countries where change of hdi in 2022 is highest in both the directions i.e. country which witnessed the
# maximum improvement and maximum deterioration of hdi in the year 2022 wrt the year 2021.

# steps
# 1st filter data suitably i.e. the countries that saw improvement / deterioration of hdi
# then apply subset on them separately

filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)
# print(head(data))

data_improvement <- subset(data, Result_HDI_2022_21 == "Positive / Improvement", select = c(Country, Change_HDI_2022_21, Result_HDI_2022_21, HDI_2022, HDI_2021))
temp_imp <- subset(data_improvement, Change_HDI_2022_21 == max(Change_HDI_2022_21))

data_deterioration <- subset(data, Result_HDI_2022_21 == "Negative / Deterioration", select = c(Country, Change_HDI_2022_21, Result_HDI_2022_21, HDI_2022, HDI_2021))
temp_det <- subset(data_deterioration, Change_HDI_2022_21 == min(Change_HDI_2022_21))
# min is used instead of max because in case of deterioration, the change will be negative
# so we have to choose the value which is the most negative which will correspond to highest change
# in the negative direction

print("Highest Improvement of HDI in 2022 w.r.t. 2021.")
temp_imp

print('______________________________________________________________')
print("Highest Deterioration of HDI in 2022 w.r.t. 2021.")
temp_det



[1] "Highest Improvement of HDI in 2022 w.r.t. 2021."


Unnamed: 0_level_0,Country,Change_HDI_2022_21,Result_HDI_2022_21,HDI_2022,HDI_2021
Unnamed: 0_level_1,<chr>,<dbl>,<chr>,<dbl>,<dbl>
15,Botswana,0.028,Positive / Improvement,0.708,0.68


[1] "______________________________________________________________"
[1] "Highest Deterioration of HDI in 2022 w.r.t. 2021."


Unnamed: 0_level_0,Country,Change_HDI_2022_21,Result_HDI_2022_21,HDI_2022,HDI_2021
Unnamed: 0_level_1,<chr>,<dbl>,<chr>,<dbl>,<dbl>
117,Ukraine,-0.021,Negative / Deterioration,0.734,0.755


In [None]:
# Countries that improved in terms of hdi in 2022 w.r.t. 2021 from africa, asia.

# Steps
# Either shortlist all countries that whose HDI has improved, then filter the list w.r.t. continents
# Or shortlist all the countries from Asia & Africa & then apply filter to this data & choose the
# countries that have Positive Result

# method 1:

filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)

data_improvement <- subset(data, Result_HDI_2022_21 == "Positive / Improvement", select = c(Country, Continent, Change_HDI_2022_21, Result_HDI_2022_21, HDI_2022, HDI_2021))
temp_imp <- subset(data_improvement, Continent %in% c("Asia", "Africa"))
temp_imp

# temp_imp <- subset(data_improvement, !Continent %in% c("Asia", "Africa"))
# fetches data of the countries all continents except Asia & Africa


Unnamed: 0_level_0,Country,Continent,Change_HDI_2022_21,Result_HDI_2022_21,HDI_2022,HDI_2021
Unnamed: 0_level_1,<chr>,<chr>,<dbl>,<chr>,<dbl>,<dbl>
3,Algeria,Africa,0.005,Positive / Improvement,0.745,0.74
5,Armenia,Asia,0.012,Positive / Improvement,0.786,0.774
9,Bahrain,Asia,0.004,Positive / Improvement,0.888,0.884
10,Bangladesh,Asia,0.008,Positive / Improvement,0.67,0.662
14,Benin,Africa,0.002,Positive / Improvement,0.504,0.502
15,Botswana,Africa,0.028,Positive / Improvement,0.708,0.68
18,Burundi,Africa,0.001,Positive / Improvement,0.42,0.419
19,Cambodia,Asia,0.004,Positive / Improvement,0.6,0.596
20,Cameroon,Africa,0.006,Positive / Improvement,0.587,0.581
24,China,Asia,0.003,Positive / Improvement,0.788,0.785


In [None]:
# Country with highest HDI improvement in Asia in 2022 w.r.t. 2021.

filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)
# print(head(data))

filtered_data <- subset(data, Continent == "Asia" & Result_HDI_2022_21 == "Positive / Improvement")
# print(filtered_data)

final_data <- subset(filtered_data, Change_HDI_2022_21 == max(Change_HDI_2022_21), select = c(Country, Continent, Change_HDI_2022_21, HDI_2022, HDI_2021))
final_data


Unnamed: 0_level_0,Country,Continent,Change_HDI_2022_21,HDI_2022,HDI_2021
Unnamed: 0_level_1,<chr>,<chr>,<dbl>,<dbl>,<dbl>
91,Philippines,Asia,0.018,0.71,0.692


In [None]:
# Country that has witnessed highest / maximum deterioration of HDI in 2022 w.r.t. 2021 on the
# continent of Africa.

filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)

filtered_data <- subset(data, Continent == 'Africa' & Result_HDI_2022_21 == 'Negative / Deterioration')
# print(filtered_data)

final_data <- subset(filtered_data, Change_HDI_2022_21 == min(Change_HDI_2022_21), select = c(Country, Continent, Change_HDI_2022_21, HDI_2022, HDI_2021))
final_data
# min() has been used and not max() because:
# If the HDI of a country has deteriorated, then the Change_HDI_2022_21 will have a negative value.
# So, to find out which country has seen highest change [maximum deterioration] of HDI, we have to find
# the minimum value because smaller the value of change i.e. more negative the value, greater is the
# magnitude of change. So, we've used min() instead of max().


Unnamed: 0_level_0,Country,Continent,Change_HDI_2022_21,HDI_2022,HDI_2021
Unnamed: 0_level_1,<chr>,<chr>,<dbl>,<dbl>,<dbl>
79,Namibia,Africa,-0.006,0.61,0.616


In [None]:
# Find the countries whose hdi ranks have remained unchanged in 2022 w.r.t. 2021.

filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)

filtered_data <- subset(data, Result_HDI_2022_21 == 'No change / Constant', select = c(Country, Continent, HDI_2022, HDI_2021, Result_HDI_2022_21))
filtered_data


Unnamed: 0_level_0,Country,Continent,HDI_2022,HDI_2021,Result_HDI_2022_21
Unnamed: 0_level_1,<chr>,<chr>,<dbl>,<dbl>,<chr>
22,Central African Republic,Africa,0.387,0.387,No change / Constant
57,Japan,Asia,0.92,0.92,No change / Constant
58,Jordan,Asia,0.736,0.736,No change / Constant
65,Libya,Africa,0.746,0.746,No change / Constant
67,Luxembourg,Europe,0.927,0.927,No change / Constant
106,Sudan,Africa,0.516,0.516,No change / Constant
115,Tuvalu,Oceania,0.653,0.653,No change / Constant


In [None]:
# Find countries that deteriorated in terms of HDI from Europe & North America

filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)

filtered_data <- subset(data, Continent %in% c('Europe', 'North America') & Result_HDI_2022_21 == 'Negative / Deterioration', select = c(Country, Continent, HDI_2022, HDI_2021, Result_HDI_2022_21))
filtered_data

Unnamed: 0_level_0,Country,Continent,HDI_2022,HDI_2021,Result_HDI_2022_21
Unnamed: 0_level_1,<chr>,<chr>,<dbl>,<dbl>,<chr>
43,Guatemala,North America,0.629,0.63,Negative / Deterioration
117,Ukraine,Europe,0.734,0.755,Negative / Deterioration


In [None]:
# Find the country that has the 3rd highest HDI on the continent of Asia in 2022

filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)

# Filter for Asia and select relevant columns
filtered_data <- subset(data, Continent == "Asia", select = c(Country, Continent, HDI_2022))

# Order by HDI_2022 in descending order
ordered_data <- filtered_data[order(-filtered_data$HDI_2022), ]

# Check if there are at least 3 countries in Asia
if (nrow(ordered_data) >= 3) {
  # Get the 3rd highest HDI country details
  third_highest_country <- ordered_data[3, "Country"]
  third_highest_hdi <- ordered_data[3, "HDI_2022"]

  cat("Country with 3rd highest HDI in Asia:", third_highest_country, "\n")
  cat("HDI Value:", third_highest_hdi, "\n\n")
} else {
  cat("There are less than 3 countries in Asia in the dataset.\n")
}

Country with 3rd highest HDI in Asia: Japan 
HDI Value: 0.92 



In [None]:
# Compare the details like gross national income per capita, mean schooling years, etc of the countries
# of EU having very high hdi to that of countries of NA who have their HDI in the tier of 'very high'

filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)

filtered_data <- subset(data, HDI_Tier == 'Very High' & Continent %in% c('Europe', 'North America'), select = c(Country, Continent, Mean_Years_Of_Schooling_Years_2022, Gross_National_Income_Per_Capital_2022))

ans_set_1 <- subset(filtered_data, Continent == 'Europe')
ans_set_2 <- subset(filtered_data, Continent == 'North America')

ans_set_1
print("__________________________________________________________________")
ans_set_2


Unnamed: 0_level_0,Country,Continent,Mean_Years_Of_Schooling_Years_2022,Gross_National_Income_Per_Capital_2022
Unnamed: 0_level_1,<chr>,<chr>,<dbl>,<int>
7,Austria,Europe,12.3,56530
12,Belgium,Europe,12.5,53644
29,Denmark,Europe,13.0,62019
34,Estonia,Europe,13.5,37152
36,Finland,Europe,12.9,49522
37,France,Europe,11.7,47379
40,Germany,Europe,14.3,55340
42,Greece,Europe,11.4,31382
48,Hungary,Europe,12.2,34196
49,Iceland,Europe,13.8,54688


[1] "__________________________________________________________________"


Unnamed: 0_level_0,Country,Continent,Mean_Years_Of_Schooling_Years_2022,Gross_National_Income_Per_Capital_2022
Unnamed: 0_level_1,<chr>,<chr>,<dbl>,<int>
8,Bahamas,North America,12.7,32535
11,Barbados,North America,9.9,14810
21,Canada,North America,13.9,48444
26,Costa Rica,North America,8.8,20248
120,United States,North America,13.6,65565


In [None]:
# Compare details like gross national income per capita, mean schooling years, etc of the asian countries
# who have their HDI in the 'low' tier with that of the african countries who have their hdi in
# the same tier

filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)

filtered_data <- subset(data, HDI_Tier == 'Low' & Continent %in% c('Asia', 'Africa'), select = c(Country, Continent, Mean_Years_Of_Schooling_Years_2022, Gross_National_Income_Per_Capital_2022))

ans_set_1 <- subset(filtered_data, Continent == 'Asia')

ans_set_2 <- subset(filtered_data, Continent == 'Africa')

ans_set_1
print("______________________________________________________________________")
ans_set_2

Unnamed: 0_level_0,Country,Continent,Mean_Years_Of_Schooling_Years_2022,Gross_National_Income_Per_Capital_2022
Unnamed: 0_level_1,<chr>,<chr>,<dbl>,<int>
1,Afghanistan,Asia,2.5,1335
86,Pakistan,Asia,4.4,5374
122,Yemen,Asia,2.8,1106


[1] "______________________________________________________________________"


Unnamed: 0_level_0,Country,Continent,Mean_Years_Of_Schooling_Years_2022,Gross_National_Income_Per_Capital_2022
Unnamed: 0_level_1,<chr>,<chr>,<dbl>,<int>
14,Benin,Africa,3.1,3406
18,Burundi,Africa,3.3,712
22,Central African Republic,Africa,4.0,869
39,Gambia,Africa,4.5,2090
44,Guinea,Africa,2.4,2404
64,Lesotho,Africa,7.5,2709
68,Malawi,Africa,5.2,1432
70,Mali,Africa,1.6,2044
72,Mauritania,Africa,4.8,5344
77,Mozambique,Africa,3.9,1219


In [None]:
# Find the HDI of the African nation having least Mean_Years_Of_Schooling_Years_2022.
# Compare the details of this country with the details of the African country that has the
# lowest Gross_National_Income_Per_Capital_2022 amongst the African nations.

filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)

filtered_data <- subset(data, Continent == "Africa")
# print(filtered_data)

final_data <- subset(filtered_data, Mean_Years_Of_Schooling_Years_2022 == min(Mean_Years_Of_Schooling_Years_2022), select = c(Country, HDI_2022, Mean_Years_Of_Schooling_Years_2022, Gross_National_Income_Per_Capital_2022))

cat("African nation with least Mean_Years_Of_Schooling_Years_2022\n")
print(paste0("Country: ", final_data$Country))
print(paste0("HDI in 2022: ", final_data$HDI_2022))
print(paste0("Mean years of schooling in 2022: ", final_data$Mean_Years_Of_Schooling_Years_2022))
print(paste0("Gross National Income Per Capital in 2022: ", final_data$Gross_National_Income_Per_Capital_2022))

cat('\n')
print("_______________________________________________________________________")
cat("\n")

final_data2 <- subset(filtered_data, Gross_National_Income_Per_Capital_2022 == min(Gross_National_Income_Per_Capital_2022), select = c(Country, HDI_2022, Mean_Years_Of_Schooling_Years_2022, Gross_National_Income_Per_Capital_2022))

cat("African nation with least Gross_National_Income_Per_Capital_2022", "\n")
print(paste0("Country: ", final_data2$Country))
print(paste0("HDI in 2022: ", final_data2$HDI_2022))
print(paste0("Mean years of schooling in 2022: ", final_data2$Mean_Years_Of_Schooling_Years_2022))
print(paste0("Gross National Income Per Capital in 2022: ", final_data2$Gross_National_Income_Per_Capital_2022))


African nation with least Mean_Years_Of_Schooling_Years_2022
[1] "Country: Niger"
[1] "HDI in 2022: 0.394"
[1] "Mean years of schooling in 2022: 1.3"
[1] "Gross National Income Per Capital in 2022: 1283"

[1] "_______________________________________________________________________"

African nation with least Gross_National_Income_Per_Capital_2022 
[1] "Country: Burundi"
[1] "HDI in 2022: 0.42"
[1] "Mean years of schooling in 2022: 3.3"
[1] "Gross National Income Per Capital in 2022: 712"


In [None]:
# Fetch & compare & calculate the percentage difference in the Gross_National_Income_Per_Capital_2022
# between countries of North America that have their HDIs in the very high tier and the countries of
# Europe that have their HDI in the same tier

PerDiff <- function(v1, v2){

  v <- c(v1, v2)

  num <- abs(v1 - v2)
  den <- mean(v)
  per_diff <- (num/den)*100
  print(paste0("The percentage difference between the 2 values: ", round(per_diff, 2), "%."))
}

filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)

filtered_data_NA <- subset(data, Continent == 'North America' & HDI_Tier == 'Very High')
filtered_data_EU <- subset(data, Continent == 'Europe' & HDI_Tier == 'Very High')

avg_gross_income_NA <- mean(filtered_data_NA$Gross_National_Income_Per_Capital_2022)
avg_gross_income_EU <- mean(filtered_data_EU$Gross_National_Income_Per_Capital_2022)

print(paste0("Mean gross national income for the North American Countries with very high HDI: ", round(avg_gross_income_NA, 2)))
print(paste0("Mean gross national income for the European Countries with very high HDI: ", round(avg_gross_income_EU, 2)))

PerDiff(avg_gross_income_NA, avg_gross_income_EU)

# conclusion = higher in EU...

[1] "Mean gross national income for the North American Countries with very high HDI: 36320.4"
[1] "Mean gross national income for the European Countries with very high HDI: 50238.23"
[1] "The percentage difference between the 2 values: 32.16%."


In [None]:
# Fetch & compare & calculate the percentage difference in the Gross_National_Income_Per_Capital_2022
# between the countries of Asia that have their HDIs in the low tier and the countries of Africa
# that have their HDI in the same tier

PerDiff <- function(v1, v2){

  v <- c(v1, v2)

  num <- abs(v1 - v2)
  den <- mean(v)
  per_diff <- (num/den)*100
  print(paste0("The percentage difference between the 2 values: ", round(per_diff, 2), "%."))
}

filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)

filtered_data_asia <- subset(data, Continent == 'Asia' & HDI_Tier == 'Low', select = c(Country, Continent, HDI_Tier, Gross_National_Income_Per_Capital_2022))
filtered_data_africa <- subset(data, Continent == 'Africa' & HDI_Tier == 'Low', select = c(Country, Continent, HDI_Tier, Gross_National_Income_Per_Capital_2022))

# print(filtered_data_asia)
# print(filtered_data_africa)

avg_gross_income_asia <- mean(filtered_data_asia$Gross_National_Income_Per_Capital_2022)
avg_gross_income_africa <- mean(filtered_data_africa$Gross_National_Income_Per_Capital_2022)

print(paste0("Mean gross national income for the Asian Countries with low HDI: ", round(avg_gross_income_asia, 2)))
print(paste0("Mean gross national income for the African Countries with low HDI: ", round(avg_gross_income_africa, 2)))

PerDiff(avg_gross_income_asia, avg_gross_income_africa)

# conclusion = higher in Asia

[1] "Mean gross national income for the Asian Countries with low HDI: 2605"
[1] "Mean gross national income for the African Countries with low HDI: 2287.87"
[1] "The percentage difference between the 2 values: 12.96%."


In [None]:
# Fetch & compare & compute the percentage difference in the Gross_National_Income_Per_Capital_2022
# between the countries of North America that have their HDIs in the very high tier and the countries
# of Africa that have their HDI in the low tier

PerDiff <- function(v1, v2){

  v <- c(v1, v2)

  num <- abs(v1 - v2)
  den <- mean(v)
  per_diff <- (num/den)*100
  print(paste0("The percentage difference between the 2 values: ", round(per_diff, 2), "%."))
}

filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)

filtered_data_NA <- subset(data, Continent == 'North America' & HDI_Tier == 'Very High', select = c(Country, Continent, HDI_Tier, Gross_National_Income_Per_Capital_2022))
filtered_data_africa <- subset(data, Continent == 'Africa' & HDI_Tier == 'Low', select = c(Country, Continent, HDI_Tier, Gross_National_Income_Per_Capital_2022))

avg_gross_income_NA <- mean(filtered_data_NA$Gross_National_Income_Per_Capital_2022)
avg_gross_income_africa <- mean(filtered_data_africa$Gross_National_Income_Per_Capital_2022)

print(paste0("Mean gross national income for the North American Countries with very high HDI: ", round(avg_gross_income_NA, 2)))
print(paste0("Mean gross national income for the African Countries with low HDI: ", round(avg_gross_income_africa, 2)))

PerDiff(avg_gross_income_NA, avg_gross_income_africa)

# conclusion = tremendously higher in NA

[1] "Mean gross national income for the North American Countries with very high HDI: 36320.4"
[1] "Mean gross national income for the African Countries with low HDI: 2287.87"
[1] "The percentage difference between the 2 values is: 176.3%."


In [None]:
# Fetch & compare & calculate the percentage difference in the Gross_National_Income_Per_Capital_2022
# between countries of North America that have their HDIs in the very high tier and the countries of
# Asia that have their HDI in the same tier.

PerDiff <- function(v1, v2){

  v <- c(v1, v2)

  num <- abs(v1 - v2)
  den <- mean(v)
  per_diff <- (num/den)*100
  print(paste0("The percentage difference between the 2 values: ", round(per_diff, 2), "%."))
}

filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)

filtered_data_NA <- subset(data, Continent == 'North America' & HDI_Tier == 'Very High', select = c(Country, Continent, HDI_Tier, Gross_National_Income_Per_Capital_2022))
filtered_data_Asia <- subset(data, Continent == 'Asia' & HDI_Tier == 'Very High', select = c(Country, Continent, HDI_Tier, Gross_National_Income_Per_Capital_2022))

# print(filtered_data_NA)
# print(filtered_data_Asia)

avg_gross_income_NA <- mean(filtered_data_NA$Gross_National_Income_Per_Capital_2022)
avg_gross_income_Asia <- mean(filtered_data_Asia$Gross_National_Income_Per_Capital_2022)

print(paste0("Mean gross national income for the North American Countries with very high HDI: ", round(avg_gross_income_NA, 2)))
print(paste0("Mean gross national income for the Asian Countries with very high HDI: ", round(avg_gross_income_Asia, 2)))

PerDiff(avg_gross_income_NA, avg_gross_income_Asia)

# surprisingly higher in Asia
# 1. middle east countries in this HDI Tier have much higher gross national income than many/almost all
# countries of Asia & NA in this HDI Tier
# 2. no. of countries in this HDI Tier in Asia outnumber the ones in NA

[1] "Mean gross national income for the North American Countries with very high HDI: 36320.4"
[1] "Mean gross national income for the Asian Countries with very high HDI: 51717.27"
[1] "The percentage difference between the 2 values: 34.98%."


In [None]:
# Fetch & compare & calculate the percentage difference in the Mean_Years_Of_Schooling_Years_2022 between:
# the countries of North America that have their HDIs in the very high tier and the countries of Europe
# that have their HDI in the same tier.

PerDiff <- function(v1, v2){

  v <- c(v1, v2)

  num <- abs(v1 - v2)
  den <- mean(v)
  per_diff <- (num/den)*100
  print(paste0("The percentage difference between the 2 values: ", round(per_diff, 2), "%."))
}

filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)

filtered_data_NA <- subset(data, Continent == 'North America' & HDI_Tier == 'Very High')
filtered_data_EU <- subset(data, Continent == 'Europe' & HDI_Tier == 'Very High')

mean_schooling_years_NA <- mean(filtered_data_NA$Mean_Years_Of_Schooling_Years_2022)
mean_schooling_years_EU <- mean(filtered_data_EU$Mean_Years_Of_Schooling_Years_2022)

print(paste0("Mean years of schooling for North American countries having their HDI in very high tier: ", round(mean_schooling_years_NA, 2)))
print(paste0("Mean years of schooling for European countries having their HDI in very high tier: ", round(mean_schooling_years_EU, 2)))

PerDiff(mean_schooling_years_NA, mean_schooling_years_EU)

# conclusion = slight/marginal difference


[1] "Mean years of schooling for North American countries having their HDI in very high tier: 11.78"
[1] "Mean years of schooling for European countries having their HDI in very high tier: 12.42"
[1] "The percentage difference between the 2 values: 5.31%."


In [None]:
# Fetch & compare & calculate the percentage difference in the Mean_Years_Of_Schooling_Years_2022 between
# the countries of Asia that have their HDIs in the low tier and the countries of Africa
# that have their HDI in the same tier.

PerDiff <- function(v1, v2){

  v <- c(v1, v2)

  num <- abs(v1 - v2)
  den <- mean(v)
  per_diff <- (num/den)*100
  print(paste0("The percentage difference between the 2 values: ", round(per_diff, 2), "%."))
}

filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)

filtered_data_asia <- subset(data, Continent == 'Asia' & HDI_Tier == 'Low', select = c(Country, Continent, HDI_Tier, Mean_Years_Of_Schooling_Years_2022))
filtered_data_africa <- subset(data, Continent == 'Africa' & HDI_Tier == 'Low', select = c(Country, Continent, HDI_Tier, Mean_Years_Of_Schooling_Years_2022))

mean_schooling_years_asia <- mean(filtered_data_asia$Mean_Years_Of_Schooling_Years_2022)
mean_schooling_years_africa <- mean(filtered_data_africa$Mean_Years_Of_Schooling_Years_2022)

print(paste0("Mean schooling years for Asian countries having their HDI in the Low tier: ", round(mean_schooling_years_asia, 2)))
print(paste0("Mean schooling years for African countries having their HDI in the Low tier: ", round(mean_schooling_years_africa, 2)))

PerDiff(mean_schooling_years_asia, mean_schooling_years_africa)

# Surprisingly, this figure was found to be low for Asia instead of Africa, a different situation
# than the one usually anticipated by majority of the people based on little knowledge they usually
# have on Asia & Africa


[1] "Mean schooling years for Asian countries having their HDI in the Low tier: 3.23"
[1] "Mean schooling years for African countries having their HDI in the Low tier: 3.83"
[1] "The percentage difference between the 2 values: 16.98%."


In [None]:
# Fetch & compare & calculate the percentage difference in the Mean_Years_Of_Schooling_Years_2022 between
# the countries of North America that have their HDIs in the very high tier and the countries of Africa
# that have their HDI in the low tier.

PerDiff <- function(v1, v2){

  v <- c(v1, v2)

  num <- abs(v1 - v2)
  den <- mean(v)
  per_diff <- (num/den)*100
  print(paste0("The percentage difference between the 2 values is: ", round(per_diff, 2), "%."))
}

filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)

filtered_data_NA <- subset(data, Continent == 'North America' & HDI_Tier == 'Very High', select = c(Country, Continent, HDI_Tier, Mean_Years_Of_Schooling_Years_2022))
filtered_data_africa <- subset(data, Continent == 'Africa' & HDI_Tier == 'Low', select = c(Country, Continent, HDI_Tier, Mean_Years_Of_Schooling_Years_2022))

mean_schooling_years_NA <- mean(filtered_data_NA$Mean_Years_Of_Schooling_Years_2022)
mean_schooling_years_africa <- mean(filtered_data_africa$Mean_Years_Of_Schooling_Years_2022)

print(paste0("Mean years of schooling for North American countries with very high HDI: ", round(mean_schooling_years_NA, 2)))
print(paste0("Mean years of schooling for African countries with low HDI: ", round(mean_schooling_years_africa, 2)))

PerDiff(mean_schooling_years_NA, mean_schooling_years_africa)

# conclusion = diff is tremendously high between the 2 groups of countries


[1] "Mean years of schooling for North American countries with very high HDI: 11.78"
[1] "Mean years of schooling for African countries with low HDI: 3.83"
[1] "The percentage difference between the 2 values is: 101.79%."


In [None]:
# Fetch & compare & calculate the percentage difference in the Mean_Years_Of_Schooling_Years_2022 between
# countries of North America that have their HDIs in the very high tier and the countries of Asia
# that have their HDI in the same tier.

PerDiff <- function(v1, v2){

  v <- c(v1, v2)

  num <- abs(v1 - v2)
  den <- mean(v)
  per_diff <- (num/den)*100
  print(paste0("The percentage difference between the 2 values is: ", round(per_diff, 2), "%."))
}

filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)

filtered_data_NA <- subset(data, Continent == 'North America' & HDI_Tier == 'Very High', select = c(Country, Continent, HDI_Tier, Mean_Years_Of_Schooling_Years_2022))
filtered_data_asia <- subset(data, Continent == 'Asia' & HDI_Tier == 'Very High', select = c(Country, Continent, HDI_Tier, Mean_Years_Of_Schooling_Years_2022))

mean_schooling_years_NA <- mean(filtered_data_NA$Mean_Years_Of_Schooling_Years_2022)
mean_schooling_years_asia <- mean(filtered_data_asia$Mean_Years_Of_Schooling_Years_2022)

print(paste0("Mean schooling years of the North American countries with very high HDI:", round(mean_schooling_years_NA, 2)))
print(paste0("Mean schooling years of the Asian countries with very high HDI:", round(mean_schooling_years_asia, 2)))

PerDiff(mean_schooling_years_NA, mean_schooling_years_asia)

# conclusion = slight difference was observed

[1] "Mean schooling years of the North American countries with very high HDI:11.78"
[1] "Mean schooling years of the Asian countries with very high HDI:11.14"
[1] "The percentage difference between the 2 values is: 5.62%."


In [None]:
# Create a separate CSV file for each continent, which is mentioned in the dataset file. The CSV
# file that will be generated, should contain statistics of that respective continent i.e. it should
# contain the following details about the continent:
#
# 1. Total no. of countries present on that continent (whose data is present in the dataset file)
# 2. Countries on that continent whose HDI improved in the year 2022 w.r.t. the year 2021
# 3. Countries on that continent whose HDI deteriorated in the year 2022 w.r.t. the year 2021
# 4. Countries on that continent whose HDI remained constant in the year 2022 w.r.t. the year 2021


filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)
# str(data)

# find out all the continents present in the dataset file using factors()

fac_cont <- factor(data$Continent)
cont_list <- c(levels(fac_cont))
# print(cont_list)
# print(length(cont_list))  # 7

# creating a vector to store the count values i.e. values/counts of countries of each
# continent whose hdi has improved/deterioraated/remained constant

count <- c(0, 0, 0)   # creating a counter i.e. a vector that will act as a counter

# filtering data
for(i in 1:length(cont_list)){
  filtered_data <- subset(data, data$Continent == cont_list[i])
  count <- c(0, 0, 0)   # reinitializing counter to 0

# counting statistics
  for(j in 1:nrow(filtered_data)){

    total_countries_on_continent <- nrow(filtered_data)

    if(filtered_data$Change_HDI_2022_21[j] > 0){
      count[1] <- count[1] + 1
    }
    if(filtered_data$Change_HDI_2022_21[j] < 0){
      count[2] <- count[2] + 1
    }
    if(filtered_data$Change_HDI_2022_21[j] == 0){
      count[3] <- count[3] + 1
    }
  }
# transferring the data to a separate CSV file
# print(count)

col_names = c("Total Countries on the Continent (which are present in the dataset)", "Countries with improved HDI in 2022", "Countries with deteriorated HDI in 2022", "Countries whose HDI remained constant in 2022")
file_data <- c(total_countries_on_continent, count[1], count[2], count[3])

df <- data.frame(col_names, file_data)
write.csv(df, paste0("Continent_", cont_list[i], ".csv"))
}

In [None]:
library(caTools)

filepath = 'https://raw.githubusercontent.com/AnmolDixitB13/R/main/Case%20study%202/cleaned_hdi_with_calculations.csv'
# the place (on github) where the dataset file is stored

data <- read.csv(filepath)

set.seed(123)
split <- sample.split(data$HDI_2022, SplitRatio = 0.5)
# training_set <- subset(data, split == TRUE)
# testing_set <- subset(data, split == FALSE)

print(split)

  [1]  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE
 [13] FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
 [25] FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE
 [37] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [49]  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE
 [61] FALSE  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE
 [73] FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE FALSE
 [85]  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE
 [97] FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
[109]  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
[121] FALSE  TRUE  TRUE  TRUE


#Conclusion

Here is a list of conclusions that could be drawn from this case study:

1. A positive change can be observed that could be observed from this case study is that majority of countries, from all the continents [except Antarctica because there are no countries there] have shown improvement in their HDI 2022 scores when compared with their respective HDI 2021 scores.

2. The no. of countries whose HDI had either deteriorated or remained constant in the year 2022 w.r.t. the year 2021 are relatively very less on all the continents.

3. The average HDI in 2022 amongst the developed countries was found to be  more than the other countries in the developing category, as anticipated usually by majority of the people.

4. Countries with high HDI usually performed better in the parameters like Mean Schooling Years, Gross National Per Capital Income, Life Expectancy, and other parameters present in the dataset file and vice versa with other countries. However, sometimes there can be exceptions. Eg, on the African continent, Mauritius had slightly better HDI 2022 when compared with Algeria, but Algeria had slightly better life expectancy than Mauritius in the same year.


[Note: Parameters of a country, as mentioned & talked about in the conclusion, referes only to the parameters which were present in the dataset csv file and do not refer/relate to the parameters which were absent in the dataset file.]