<a href="https://colab.research.google.com/github/DanB1421/world_development_explorer_final/blob/main/wdx_analysis_part_B_final_ipynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **The effect of worldwide energy consumption on disease mortality and life expectancy from 2000 to 2015** 

*Daniel Brilliant*

*Data Science Master's Student, UMBC*

*April 5, 2022*




Human civilization is defined by decisions that produce beneficial and harmful outcomes. One of the most major impacts for humanity that falls into both categories is industrialization. Industrialization has made processes such as manufacturing and transportation much easier than before, but that does not come without a cost. The modified processes produced during the industrial revolution are often fueled by non-renewable resources such as coal, oil, and natural gas, also known as fossil fuels. The aftereffects have caused changes to air quality, water purity, and the ozone layer due to increases in the emission of carbon dioxide (CO<sub>2</sub>).

The effect of CO<sub>2</sub> emissions also showed a direct impact on quality of life, whether this be through breathing in polluted air, prolonged exposure to carcinogens, and access to drinkable water sources. The most direct way to combat rising CO<sub>2</sub> emissions is to introduce greater levels of renewable energy usage. Renewable energy usage produces less harmful aftereffects environmentally, as these resources produce energy without the need to emit harmful gas. Some resources that fall into this category include solar power, wind power, and hydro (water) power.

 The direct results of environmental changes from energy usage can be measured in a variety of statistics, but two that have clear implications on human life are mortality rates from diseases and life expectancies. There are two major questions we can ask about these relationships:
  1. How do CO<sub>2</sub> emissions relate to both mortality rates from disease and life expectancy at birth? 
  2. How does renewable energy consumption relate to both mortality rates from disease and life expectancy at birth? 

These can be answered by analyzing the statistical signficance and correlation between CO<sub>2</sub> emissions and renewable energy usage worldwide in the years 2000 and 2015 to disease mortalities and life expectancy at birth.

## **1. Analysis Strategy and Approach**

- **Data Source:** World Development Explorer ([worlddev.xyz](https://))
- **Countries Analyzed:** All eligible countries during the timespan studied
- **Timespan of Data:** 2000-2015
- **Topics & Indicators:**
  - **Environment- Renewable energy consumption (% of total final energy consumption):** the share of renewable energy in the total final energy consumption.
  - **Environment- CO<sub>2</sub> emissions (kt):** emissions of carbon dioxide stemming from the burning of fossil fuels and the manufacture of cement. These can include CO<sub>2</sub> produced during consumption of solid, liquid, and gas fuels and gas flaring.
  - **Health- Cause of death, by non-communicable diseases (% of total):** Cause of death is the share of all deaths at all ages due to underlying causes. Non-communicable diseases include cancer, diabetes mellitus, cardiovascular diseases, digestive diseases, skin diseases, musculoskeletal diseases, and congenital anomalies.
  - **Health- Cause of death, by communicable diseases and maternal, prenatal and nutrition conditions (% of total):** Cause of death is the share of all deaths at all ages due to underlying causes. Communicable diseases and maternal, prenatal, and nutrition conditions include infectious and parasitis diseases, respiratory infections, and nutritional deficiencies such as underweight and stunting.
  - **Health- Life expectancy at birth, total (years):** The number of years a newborn infant would live if prevailing mortality patterns at time of birth remain the same throughout life.

## **2. How do CO<sub>2</sub> emissions relate to life expectancy and mortality rates from communicable and non-communicable disease?**

In [None]:
import pandas as pd
import plotly
import plotly.express as px

DATA_URL = "https://raw.githubusercontent.com/DanB1421/world_development_explorer_final/main/wdi_data_DB_4_1.csv"  # Imports data downloaded from World Development Explorer and stored on GitHub

In [None]:
df = pd.read_csv(DATA_URL, index_col=0)

df.sample(5)  # Creates a data frame from World Development Explorer data and displays a sample of 5 rows in the set

Unnamed: 0,Year,value,indicator,Country Code,Country Name,Region,Income Group,Lending Type
8560,2004,6.5619,EG.FEC.RNEW.ZS,LBN,Lebanon,Middle East & North Africa,Upper middle income,IBRD
2771,2012,81.704878,SP.DYN.LE00.IN,SWE,Sweden,Europe & Central Asia,High income,Not classified
1223,2012,55.645,SP.DYN.LE00.IN,GNB,Guinea-Bissau,Sub-Saharan Africa,Low income,IDA
4440,2001,301830.0,EN.ATM.CO2E.KT,IDN,Indonesia,East Asia & Pacific,Lower middle income,IBRD
3313,2010,168140.0,EN.ATM.CO2E.KT,ARG,Argentina,Latin America & Caribbean,Upper middle income,IBRD


In [None]:
df_pivot = df.pivot_table(
    index=["Year", "Country Code", "Country Name", "Region", "Income Group",	"Lending Type"], 
    columns="indicator", 
    values="value"
)

df_pivot.head()  # Creates pivot table that separates each of the indicators into a separate column

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,indicator,EG.FEC.RNEW.ZS,EN.ATM.CO2E.KT,SH.DTH.COMM.ZS,SH.DTH.NCOM.ZS,SP.DYN.LE00.IN
Year,Country Code,Country Name,Region,Income Group,Lending Type,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2000,ABW,Aruba,Latin America & Caribbean,High income,Not classified,0.1753,,,,73.787
2000,AFG,Afghanistan,South Asia,Low income,IDA,54.243198,770.0,59.81465,32.009763,55.841
2000,AGO,Angola,Sub-Saharan Africa,Lower middle income,IBRD,73.441101,12370.0,72.839091,20.208854,46.522
2000,ALB,Albania,Europe & Central Asia,Upper middle income,IBRD,41.445,3170.0,8.109352,84.142346,73.955
2000,AND,Andorra,Europe & Central Asia,High income,Not classified,14.5082,520.0,,,


In [None]:
df_pivot.reset_index(inplace=True)

df_pivot.head()  # Resets the index of the dataset

indicator,Year,Country Code,Country Name,Region,Income Group,Lending Type,EG.FEC.RNEW.ZS,EN.ATM.CO2E.KT,SH.DTH.COMM.ZS,SH.DTH.NCOM.ZS,SP.DYN.LE00.IN
0,2000,ABW,Aruba,Latin America & Caribbean,High income,Not classified,0.1753,,,,73.787
1,2000,AFG,Afghanistan,South Asia,Low income,IDA,54.243198,770.0,59.81465,32.009763,55.841
2,2000,AGO,Angola,Sub-Saharan Africa,Lower middle income,IBRD,73.441101,12370.0,72.839091,20.208854,46.522
3,2000,ALB,Albania,Europe & Central Asia,Upper middle income,IBRD,41.445,3170.0,8.109352,84.142346,73.955
4,2000,AND,Andorra,Europe & Central Asia,High income,Not classified,14.5082,520.0,,,


### **Worldwide CO<sub>2</sub> emission distribution in 2000 and 2015**

In [None]:
df_2000 = df_pivot.query("Year == 2000")

df_2000.head()  # Queries the  dataset to provide data exclusively from the year 2000

indicator,Year,Country Code,Country Name,Region,Income Group,Lending Type,EG.FEC.RNEW.ZS,EN.ATM.CO2E.KT,SH.DTH.COMM.ZS,SH.DTH.NCOM.ZS,SP.DYN.LE00.IN
0,2000,ABW,Aruba,Latin America & Caribbean,High income,Not classified,0.1753,,,,73.787
1,2000,AFG,Afghanistan,South Asia,Low income,IDA,54.243198,770.0,59.81465,32.009763,55.841
2,2000,AGO,Angola,Sub-Saharan Africa,Lower middle income,IBRD,73.441101,12370.0,72.839091,20.208854,46.522
3,2000,ALB,Albania,Europe & Central Asia,Upper middle income,IBRD,41.445,3170.0,8.109352,84.142346,73.955
4,2000,AND,Andorra,Europe & Central Asia,High income,Not classified,14.5082,520.0,,,


In [None]:
fig = px.histogram(
    df_2000, 
    x="EN.ATM.CO2E.KT",
    labels={"EN.ATM.CO2E.KT":"2000 CO2 emissions (kt)"},
    color="Country Name",
    height=800,
    title='Worldwide distribution of CO2 emissions, 2000',
    template="plotly_dark"
)

fig.update_layout(showlegend=False)

fig.show()  # Creates histogram to show distribution of CO2 emissions worldwide in the year 2000

In [None]:
df_2015 = df_pivot.query("Year == 2015")

df_2015.head()  # Queries the  dataset to provide data exclusively from the year 2015

indicator,Year,Country Code,Country Name,Region,Income Group,Lending Type,EG.FEC.RNEW.ZS,EN.ATM.CO2E.KT,SH.DTH.COMM.ZS,SH.DTH.NCOM.ZS,SP.DYN.LE00.IN
3226,2015,ABW,Aruba,Latin America & Caribbean,High income,Not classified,6.728,,,,75.725
3227,2015,AFG,Afghanistan,South Asia,Low income,IDA,20.2738,7990.0,39.390981,44.219053,63.377
3228,2015,AGO,Angola,Sub-Saharan Africa,Lower middle income,IBRD,47.815601,35160.0,61.575382,29.938442,59.398
3229,2015,ALB,Albania,Europe & Central Asia,Upper middle income,IBRD,38.625599,5070.0,2.955382,92.917806,78.025
3230,2015,AND,Andorra,Europe & Central Asia,High income,Not classified,19.2777,470.0,,,


In [None]:
fig = px.histogram(
    df_2015, 
    x="EN.ATM.CO2E.KT",
    nbins=20,
    labels={"EN.ATM.CO2E.KT":"2015 CO2 emissions (kt)"},
    color="Country Name",
    height=800,
    title='Worldwide distribution of CO2 emissions, 2015',
    template="plotly_dark"
)

fig.update_layout(showlegend=False)

fig.show()  # Creates histogram to show distribution of CO2 emissions worldwide in the year 2015

### **Non-communicable disease mortality vs CO<sub>2</sub> emissions in 2000 and 2015, with regression line**

In [None]:
fig = px.scatter(
    df_2000,
    x = "EN.ATM.CO2E.KT",
    y = "SH.DTH.NCOM.ZS",
    labels={"EN.ATM.CO2E.KT":"2000 CO2 emissions (kt)", "SH.DTH.NCOM.ZS":"2000 Cause of death, by non-communicable diseases (% of total)"},
    height=800,
    title='Non-communicable disease mortality vs CO2 emissions, 2000',
    trendline="ols",
    template="plotly_dark"
)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using CO2 emission and non-communicable disease mortality data from the year 2000

In [None]:
fig = px.scatter(
    df_2015,
    x = "EN.ATM.CO2E.KT",
    y = "SH.DTH.NCOM.ZS",
    labels={"EN.ATM.CO2E.KT":"2015 CO2 emissions (kt)", "SH.DTH.NCOM.ZS":"2015 Cause of death, by non-communicable diseases (% of total)"},
    height=800,
    title='Non-communicable disease mortality vs CO2 emissions, 2015',
    trendline="ols",
    template="plotly_dark"
)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using CO2 emission and non-communicable disease mortality data from the year 2015

### **Communicable disease mortality vs CO<sub>2</sub> emissions in 2000 and 2015, with regression line**

In [None]:
fig = px.scatter(
    df_2000,
    x = "EN.ATM.CO2E.KT",
    y = "SH.DTH.COMM.ZS",
    labels={"EN.ATM.CO2E.KT":"2000 CO2 emissions (kt)", "SH.DTH.COMM.ZS":"2000 Cause of death, by communicable diseases and maternal, prenatal, and nutrition conditions (% of total)"},
    height=800,
    title='Communicable disease mortality vs CO2 emissions, 2000',
    trendline="ols",
    template="plotly_dark"
)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using CO2 emission and communicable disease mortality data from the year 2000

In [None]:
fig = px.scatter(
    df_2015,
    x = "EN.ATM.CO2E.KT",
    y = "SH.DTH.COMM.ZS",
    labels={"EN.ATM.CO2E.KT":"2015 CO2 emissions (kt)", "SH.DTH.COMM.ZS":"2015 Cause of death, by communicable diseases and maternal, prenatal, and nutrition conditions (% of total)"},
    height=800,
    title='Communicable disease mortality vs CO2 emissions, 2015',
    trendline="ols",
    template="plotly_dark"
)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using CO2 emission and communicable disease mortality data from the year 2015

### **Life expectancy vs CO<sub>2</sub> emissions in 2000 and 2015, with regression line**

In [None]:
fig = px.scatter(
    df_2000,
    x = "EN.ATM.CO2E.KT",
    y = "SP.DYN.LE00.IN",
    labels={"EN.ATM.CO2E.KT":"2000 CO2 emissions (kt)", "SP.DYN.LE00.IN":"2000 Life expectancy at birth, total (years)"},
    height=800,                                                                                                                                                                            
    title='Life expectancy vs CO2 emissions, 2000',
    trendline="ols",
    template="plotly_dark"
)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using CO2 emission and life expectancy data from the year 2000

In [None]:
fig = px.scatter(
    df_2015,
    x = "EN.ATM.CO2E.KT",
    y = "SP.DYN.LE00.IN",
    labels={"EN.ATM.CO2E.KT":"2015 CO2 emissions (kt)", "SP.DYN.LE00.IN":"2015 Life expectancy at birth, total (years)"},
    height=800,
    title='Life expectancy vs CO2 emissions, 2015',
    trendline="ols",
    template="plotly_dark"
)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using CO2 emission and life expectancy data from the year 2015

- When looking at worldwide distribution of CO<sub>2</sub> emissions in 2000 and 2015, the right skew of countries at the highest emission level increased in 2015 mainly due to the increase of China's emission level. However, the spread of high-emitting countries aside from China decreased in 2015 and less countries were emitting at a high level than in 2000.
- The p-value analysis and correlation in these comparisons had varied outcomes.
  - For the non-communicable disease mortality comparisons, the link to CO<sub>2</sub> emissions was statistically significant in 2000 (p = 0.016) but not in 2015 (p = 0.063). Neither showed a high level of correlation (r = 0.179 in 2000 and r = 0.138 in 2015)
  - For the communicable disease mortality comparisons, the link to CO<sub>2</sub> emissions was statistically significant in 2000 (p = 0.020), but not in 2015 (p = 0.068). Neither showed a high level of correlation (r = -0.173 in 2000 and r = -0.134 in 2015)
  - For the life expectancy comparisons, the link to CO<sub>2</sub> emissions was statistically signficant in 2000 (p = 0.024), but not in 2015 (p = .101). Neither showed a high level of correlation (r = 0.164 in 2000 and r = 0.122 in 2015)
- Overall, the data showed little correlation between CO<sub>2</sub> emissions and disease mortality or life expectancy for either 2000 or 2015. Conversely, the linkage for all comparisons was statistically significant in the 2000 data, but not in the 2015 data.

## **3. Does an increase in renewable energy consumption cause significant changes to mortality rates from disease and life expectancy?**

### **Worldwide renewable energy consumption distribution in 2000 and 2015**

In [None]:
fig = px.histogram(
    df_2000, 
    x="EG.FEC.RNEW.ZS",
    labels={"EG.FEC.RNEW.ZS":"2000 Renewable energy consumption (% of total final energy consumption)"},
    color="Country Name",
    height=800,
    title='Worldwide distribution of renewable energy consumption, 2000',
    template="plotly_dark"
)

fig.update_layout(showlegend=False)

fig.show()  # Creates histogram to show distribution of renewable energy consumption worldwide in the year 2000

In [None]:
fig = px.histogram(
    df_2015, 
    x="EG.FEC.RNEW.ZS",
    labels={"EG.FEC.RNEW.ZS":"2015 Renewable energy consumption (% of total final energy consumption)"},
    color="Country Name",
    height=800,
    title='Worldwide distribution of renewable energy consumption, 2015',
    template="plotly_dark"
)

fig.update_layout(showlegend=False)

fig.show()  # Creates histogram to show distribution of renewable energy consumption worldwide in the year 2015

### **Non-communicable disease mortality vs Renewable energy consumption in 2000 and 2015, with regression line**

In [None]:
fig = px.scatter(
    df_2000,
    x = "EG.FEC.RNEW.ZS",
    y = "SH.DTH.NCOM.ZS",
    labels={"EG.FEC.RNEW.ZS":"2000 Renewable energy consumption (% of total final energy consumption)", "SH.DTH.NCOM.ZS":"2000 Cause of death, by non-communicable diseases (% of total)"},
    height=800,
    title='Non-communicable disease mortality vs Renewable energy consumption, 2000',
    trendline="ols",
    template="plotly_dark"
)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using renewable energy consumption and non-communicable disease mortality data from the year 2000

In [None]:
fig = px.scatter(
    df_2015,
    x = "EG.FEC.RNEW.ZS",
    y = "SH.DTH.NCOM.ZS",
    labels={"EG.FEC.RNEW.ZS":"2015 Renewable energy consumption (% of total final energy consumption)", "SH.DTH.NCOM.ZS":"2015 Cause of death, by non-communicable diseases (% of total)"},
    height=800,
    title='Non-communicable disease mortality vs Renewable energy consumption, 2015',
    trendline="ols",
    template="plotly_dark"
)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using renewable energy consumption and non-communicable disease mortality data from the year 2015

### **Communicable disease mortality vs Renewable energy consumption in 2000 and 2015, with regression line**

In [None]:
fig = px.scatter(
    df_2000,
    x = "EG.FEC.RNEW.ZS",
    y = "SH.DTH.COMM.ZS",
    labels={"EG.FEC.RNEW.ZS":"2000 Renewable energy consumption (% of total final energy consumption)", "SH.DTH.COMM.ZS":"2000 Cause of death, by communicable diseases and maternal, prenatal, and nutrition conditions (% of total)"},
    height=800,
    title='Communicable disease mortality vs Renewable energy consumption, 2000',
    trendline="ols",
    template="plotly_dark"
)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using renewable energy consumption and communicable disease mortality data from the year 2000

In [None]:
fig = px.scatter(
    df_2015,
    x = "EG.FEC.RNEW.ZS",
    y = "SH.DTH.COMM.ZS",
    labels={"EG.FEC.RNEW.ZS":"2015 Renewable energy consumption (% of total final energy consumption)", "SH.DTH.COMM.ZS":"2015 Cause of death, by communicable diseases and maternal, prenatal, and nutrition conditions (% of total)"},
    height=800,
    title='Communicable disease mortality vs Renewable energy consumption, 2015',
    trendline="ols",
    template="plotly_dark"
)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using renewable energy consumption and communicable disease mortality data from the year 2000

### **Life expectancy vs Renewable energy consumption in 2000 and 2015, with regression line**

In [None]:
fig = px.scatter(
    df_2000,
    x = "EG.FEC.RNEW.ZS",
    y = "SP.DYN.LE00.IN",
    labels={"EG.FEC.RNEW.ZS":"2000 Renewable energy consumption (% of total final energy consumption)", "SP.DYN.LE00.IN":"2000 Life expectancy at birth, total (years)"},
    height=800,
    title='Life expectancy vs Renewable energy consumption, 2000',
    trendline="ols",
    template="plotly_dark"
)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using renewable energy consumption and life expectancy data from the year 2000

In [None]:
fig = px.scatter(
    df_2015,
    x = "EG.FEC.RNEW.ZS",
    y = "SP.DYN.LE00.IN",
    labels={"EG.FEC.RNEW.ZS":"2015 Renewable energy consumption (% of total final energy consumption)", "SP.DYN.LE00.IN":"2015 Life expectancy at birth, total (years)"},
    height=800,                                                                                                                                                                            
    title='Life expectancy vs Renewable energy consumption, 2015',
    trendline="ols",
    template="plotly_dark"
)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using renewable energy consumption and life expectancy data from the year 2000

- When looking at worldwide distribution of renewable energy consumption in 2000 and 2015, a number of countries increased over time. This shows in a more even but still right skewed distribution. Also, the increase occurred more in a shift in the 0%-30% bins more than 40%-100%, which showed a more even grouping outside of the 70%-80% and 100% bins.
- The p-value analysis and correlation in these comparisons had strong connections.
  - For the non-communicable disease mortality comparisons, the link to renewable energy consumption was statistically significant in both 2000 and 2015 (p = 0.000 for both). Both showed a moderate negative correlation (r = -0.757 in 2000 and r = -0.666 in 2015)
  - For the communicable disease mortality comparisons, the link to renewable energy consumption was statistically significant in both 2000 and 2015 (p = 0.000 for both). Both showed a moderate positive correlation (r = 0.771 in 2000 and r = .713 in 2015)
  - For the life expectancy comparisons, the link to renewable energy consumption was statistically signficant in both 2000 and 2015 (p = 0.000 for both). Both showed a moderate negative correlation(r = -0.748 in 2000 and r = -0.636 in 2015)
- Overall, the data was entirely statistically significant in connecting renewable energy consumption to disease mortality and life expectancy. The data also signified that countries with higher renewable energy consumption somewhat tended to have lower non-communicable disease mortality rates and lower life expectancies, while the opposite was true for communicable disease mortality rates.

## 4. **Conclusions**
- The changes in distribution of CO<sub>2</sub> emissions and renewable energy consumption worldwide between 2000 and 2015 showed evident distinctions in analysis
- The difference in CO<sub>2</sub> emissions distribution caused the disease mortality and life expectancy comparisons to be statistically significant in 2000 but not in 2015. This was not true for renewable energy consumption distribution, as the change over time did not affect the statistical significance of the disease mortality and life expectancy comparisons.
- The correlation between CO<sub>2</sub> emissions and disease mortalities or life expectancy was not strong in any of the data sets examined.
- The correlation with renewable energy consumption was moderately strong in all data sets examined. The results were somewhat surprising. Non-communicable disease mortality and life expectancy were lower in countries with higher renewable energy consumption and communicable disease mortality was higher.
- Some anecdotal observations can be made from the overall results. 
  - CO<sub>2</sub> emissions are not an adequate comparative tool to measure the harmful effects of non-renewable resource consumption in relation to disease mortality and life expectancy. Better statistics to look at for analysis would likely be either fossil fuel consumption or non-renewable resource consumption statistics.
  - Renewable energy consumption can be instructive in analyzing the disease mortality and life expectancy of a country. However, further analysis should occur on other factors that determine disease mortality and life expectancy. The correlation is not strong enough on its own to use these comparisons, so further research on other environmental and economic factors is necessary.