<a href="https://colab.research.google.com/github/DanB1421/world_development_explorer_final/blob/main/wdx_analysis_part_B_draft.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **The effect of worldwide energy consumption on disease mortality and life expectancy from 2000 to 2015** 

*Daniel Brilliant*

*Data Science Master's Student, UMBC*

*April 1, 2022*




Human civilization is defined by decisions that produce beneficial and harmful outcomes. One of the most major impacts for humanity that falls into both categories is industrialization. Industrialization has made processes such as manufacturing and transportation much easier than before, but that does not come without a cost. The modified processes produced during the industrial revolution are often fueled by non-renewable resources such as coal, oil, and natural gas, also known as fossil fuels. The aftereffects have caused changes to air quality, water purity, and the ozone layer due to increases in the emission of carbon dioxide (CO<sub>2</sub>).

The effect of CO<sub>2</sub> emissions also showed a direct impact on quality of life, whether this be through breathing in polluted air, prolonged exposure to carcinogens, and access to drinkable water sources. The most direct way to combat rising CO<sub>2</sub> emissions is to introduce greater levels of renewable energy usage. Renewable energy usage produces less harmful aftereffects environmentally, as these resources produce energy without the need to emit harmful gas. Some resources that fall into this category include solar power, wind power, and hydro (water) power.

 The direct results of environmental changes from energy usage can be measured in a variety of statistics, but two that have clear implications on human life are mortality rates from diseases and life expectancies. There are two major questions we can ask about these relationships:
  1. How do CO<sub>2</sub> emissions relate to both mortality rates from disease and life expectancy at birth? 
  2. How does renewable energy consumption relate to both mortality rates from disease and life expectancy at birth? 

These can be answered by analyzing the statistical signficance and correlation between CO<sub>2</sub> emissions and renewable energy usage worldwide in the years 2000 and 2015 to disease mortalities and life expectancy at birth.

## **1. Analysis Strategy and Approach**

- **Data Source:** World Development Explorer ([worlddev.xyz](https://))
- **Countries Analyzed:** Comoros, Denmark, Finland, Hungary, Iceland, Moldova, North Korea (referred to here as Democratic People's Republic of Korea), Sweden, Uruguay, and Zimbabwe
- **Timespan of Data:** 2000-2015
- **Topics & Indicators:**
  - **Environment- Renewable energy consumption (% of total final energy consumption):** the share of renewable energy in the total final energy consumption.
  - **Environment- CO<sub>2</sub> emissions (kt):** emissions of carbon dioxide stemming from the burning of fossil fuels and the manufacture of cement. These can include CO<sub>2</sub> produced during consumption of solid, liquid, and gas fuels and gas flaring.
  - **Health- Cause of death, by non-communicable diseases (% of total):** Cause of death is the share of all deaths at all ages due to underlying causes. Non-communicable diseases include cancer, diabetes mellitus, cardiovascular diseases, digestive diseases, skin diseases, musculoskeletal diseases, and congenital anomalies.
  - **Health- Cause of death, by communicable diseases and maternal, prenatal and nutrition conditions (% of total):** Cause of death is the share of all deaths at all ages due to underlying causes. Communicable diseases and maternal, prenatal, and nutrition conditions include infectious and parasitis diseases, respiratory infections, and nutritional deficiencies such as underweight and stunting.
  - **Health- Life expectancy at birth, total (years):** The number of years a newborn infant would live if prevailing mortality patterns at time of birth remain the same throughout life.

## **2. How do CO<sub>2</sub> emissions relate to life expectancy and mortality rates from communicable and non-communicable disease?**

In [124]:
import pandas as pd
import plotly
import plotly.express as px

DATA_URL = "https://raw.githubusercontent.com/DanB1421/world_development_explorer_final/main/wdi_data_DB_4_1.csv"  # Imports data downloaded from World Development Explorer and stored on GitHub

In [125]:
df = pd.read_csv(DATA_URL, index_col=0)

df.sample(5)  # Creates a data frame from World Development Explorer data and displays a sample of 5 rows in the set

Unnamed: 0,Year,value,indicator,Country Code,Country Name,Region,Income Group,Lending Type
7912,2012,7.2652,EG.FEC.RNEW.ZS,PYF,French Polynesia,East Asia & Pacific,High income,Not classified
4025,2002,127740.0,EN.ATM.CO2E.KT,EGY,"Egypt, Arab Rep.",Middle East & North Africa,Lower middle income,IBRD
6595,2015,8.228408,SH.DTH.COMM.ZS,MNG,Mongolia,East Asia & Pacific,Lower middle income,IBRD
1027,2008,75.092,SP.DYN.LE00.IN,PYF,French Polynesia,East Asia & Pacific,High income,Not classified
10540,2015,87.990296,SH.DTH.NCOM.ZS,MUS,Mauritius,Sub-Saharan Africa,Upper middle income,IBRD


### **Worldwide CO<sub>2</sub> emission distribution in 2000 and 2015**

In [292]:
df_2000_co2 = df.query("Year == 2000").query("indicator == 'EN.ATM.CO2E.KT'")
df_2000_co2.rename(columns = {"value":"EN.ATM.CO2E.KT"}, inplace=True)
df_2000_co2.drop(['indicator','Region', 'Income Group', 'Lending Type'], axis = 1, inplace=True)
df_2000_co2.reset_index(drop=True,inplace=True)

df_2000_co2.tail()  # Cleanses CO2 emission data from the year 2000 to remove unnecessary columns and reset the index, displays final 5 rows of data

Unnamed: 0,Year,EN.ATM.CO2E.KT,Country Code,Country Name
186,2000,134390.0,VEN,"Venezuela, RB"
187,2000,50310.0,VNM,Vietnam
188,2000,13890.0,YEM,"Yemen, Rep."
189,2000,1810.0,ZMB,Zambia
190,2000,13700.0,ZWE,Zimbabwe


In [160]:
fig = px.histogram(
    df_2000_co2, 
    x="EN.ATM.CO2E.KT",
    labels={"EN.ATM.CO2E.KT":"2000 CO2 emissions (kt)"},
    color="Country Name",
    height=800,
    title='Worldwide distribution of CO2 emissions, 2000',
    template=list(plotly.io.templates.keys())[5]
)

fig.update_layout(showlegend=False)

fig.show()  # Creates histogram to show distribution of CO2 emissions worldwide in the year 2000

In [271]:
df_2015_co2 = df.query("Year == 2015").query("indicator == 'EN.ATM.CO2E.KT'")
df_2015_co2.rename(columns = {"value":"EN.ATM.CO2E.KT"}, inplace=True)
df_2015_co2.drop(['indicator','Region', 'Income Group', 'Lending Type'], axis = 1, inplace=True)
df_2015_co2.reset_index(drop=True,inplace=True)

df_2015_co2.tail()  # Cleanses CO2 emission data from the year 2015 to remove unnecessary columns and reset the index, displays final 5 rows of data

Unnamed: 0,Year,EN.ATM.CO2E.KT,Country Code,Country Name
186,2015,173880.0,VEN,"Venezuela, RB"
187,2015,209200.0,VNM,Vietnam
188,2015,14210.0,YEM,"Yemen, Rep."
189,2015,5070.0,ZMB,Zambia
190,2015,12400.0,ZWE,Zimbabwe


In [313]:
fig = px.histogram(
    df_2015_co2, 
    x="EN.ATM.CO2E.KT",
    labels={"EN.ATM.CO2E.KT":"2015 CO2 emissions (kt)"},
    color="Country Name",
    height=800,
    title='Worldwide distribution of CO2 emissions, 2015',
    template=list(plotly.io.templates.keys())[5]
)

fig.update_layout(showlegend=False)

fig.show()  # Creates histogram to show distribution of CO2 emissions worldwide in the year 2015

### **Non-communicable disease mortality vs CO<sub>2</sub> emissions in 2000 and 2015, with regression line**

In [272]:
df_2000_ncd = df.query("Year == 2000").query("indicator == 'SH.DTH.NCOM.ZS'")
df_2000_ncd.rename(columns = {"value":"SH.DTH.NCOM.ZS"}, inplace=True)
df_2000_ncd.drop(['indicator','Region', 'Income Group', 'Lending Type'], axis = 1, inplace=True)
df_2000_ncd.reset_index(drop=True,inplace=True)
df_2000_ncd.tail()  # Cleanses non-communicable disease mortality data from the year 2000 to remove unnecessary columns and reset the index, displays final 5 rows of data

Unnamed: 0,Year,SH.DTH.NCOM.ZS,Country Code,Country Name
178,2000,63.629964,VEN,"Venezuela, RB"
179,2000,72.908371,VNM,Vietnam
180,2000,41.257248,YEM,"Yemen, Rep."
181,2000,20.394113,ZMB,Zambia
182,2000,16.009386,ZWE,Zimbabwe


In [275]:
df_2000_co2.drop(labels=[3,48,99,108,120,131,159,177], axis=0, inplace=True)
df_2000_co2.reset_index(drop=True,inplace=True)
df_2000_co2.tail()  # Cleanses CO2 emission data from the year 2000 to remove rows that are not present in the non-communicable disease mortality data and reset the index, displays final 5 rows of data

Unnamed: 0,Year,EN.ATM.CO2E.KT,Country Code,Country Name
178,2000,134390.0,VEN,"Venezuela, RB"
179,2000,50310.0,VNM,Vietnam
180,2000,13890.0,YEM,"Yemen, Rep."
181,2000,1810.0,ZMB,Zambia
182,2000,13700.0,ZWE,Zimbabwe


In [276]:
ncd_2000 = df_2000_ncd["SH.DTH.NCOM.ZS"]
df_2000_ncd_vs_co2 = df_2000_co2.join(ncd_2000)
df_2000_ncd_vs_co2.tail()  # Adds non-communicable disease mortality data column to the CO2 emission dataset from 2000 organized in the same index, displays final 5 rows of data

Unnamed: 0,Year,EN.ATM.CO2E.KT,Country Code,Country Name,SH.DTH.NCOM.ZS
178,2000,134390.0,VEN,"Venezuela, RB",63.629964
179,2000,50310.0,VNM,Vietnam,72.908371
180,2000,13890.0,YEM,"Yemen, Rep.",41.257248
181,2000,1810.0,ZMB,Zambia,20.394113
182,2000,13700.0,ZWE,Zimbabwe,16.009386


In [182]:
fig = px.scatter(
    df_2000_ncd_vs_co2,
    x = "EN.ATM.CO2E.KT",
    y = "SH.DTH.NCOM.ZS",
    labels={"EN.ATM.CO2E.KT":"2000 CO2 emissions (kt)", "SH.DTH.NCOM.ZS":"2000 Cause of death, by non-communicable diseases (% of total)"},
    height=800,
    title='Non-communicable disease mortality vs CO2 emissions, 2000',
    trendline="ols",
    template=list(plotly.io.templates.keys())[5]
)

fig.update_layout(showlegend=False)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using CO2 emission and non-communicable disease mortality data from the year 2000


pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.



In [277]:
df_2015_ncd = df.query("Year == 2015").query("indicator == 'SH.DTH.NCOM.ZS'")
df_2015_ncd.rename(columns = {"value":"SH.DTH.NCOM.ZS"}, inplace=True)
df_2015_ncd.drop(['indicator','Region', 'Income Group', 'Lending Type'], axis = 1, inplace=True)
df_2015_ncd.reset_index(drop=True,inplace=True)
df_2015_ncd.tail()  # Cleanses non-communicable disease mortality data from the year 2015 to remove unnecessary columns and reset the index, displays final 5 rows of data

Unnamed: 0,Year,SH.DTH.NCOM.ZS,Country Code,Country Name
178,2015,66.362497,VEN,"Venezuela, RB"
179,2015,80.407434,VNM,Vietnam
180,2015,50.036798,YEM,"Yemen, Rep."
181,2015,33.197525,ZMB,Zambia
182,2015,36.946549,ZWE,Zimbabwe


In [278]:
df_2015_co2.drop(labels=[3,48,99,108,120,131,159,177], axis=0, inplace=True)
df_2015_co2.reset_index(drop=True,inplace=True)
df_2015_co2.tail()  # Cleanses CO2 emission data from the year 2015 to remove rows that are not present in the non-communicable disease mortality data and reset the index, displays final 5 rows of data

Unnamed: 0,Year,EN.ATM.CO2E.KT,Country Code,Country Name
178,2015,173880.0,VEN,"Venezuela, RB"
179,2015,209200.0,VNM,Vietnam
180,2015,14210.0,YEM,"Yemen, Rep."
181,2015,5070.0,ZMB,Zambia
182,2015,12400.0,ZWE,Zimbabwe


In [279]:
ncd_2015 = df_2015_ncd["SH.DTH.NCOM.ZS"]
df_2015_ncd_vs_co2 = df_2015_co2.join(ncd_2015)
df_2015_ncd_vs_co2.tail()  # Adds non-communicable disease mortality data column to the CO2 emission dataset from 2015 organized in the same index, displays final 5 rows of data

Unnamed: 0,Year,EN.ATM.CO2E.KT,Country Code,Country Name,SH.DTH.NCOM.ZS
178,2015,173880.0,VEN,"Venezuela, RB",66.362497
179,2015,209200.0,VNM,Vietnam,80.407434
180,2015,14210.0,YEM,"Yemen, Rep.",50.036798
181,2015,5070.0,ZMB,Zambia,33.197525
182,2015,12400.0,ZWE,Zimbabwe,36.946549


In [198]:
fig = px.scatter(
    df_2015_ncd_vs_co2,
    x = "EN.ATM.CO2E.KT",
    y = "SH.DTH.NCOM.ZS",
    labels={"EN.ATM.CO2E.KT":"2015 CO2 emissions (kt)", "SH.DTH.NCOM.ZS":"2015 Cause of death, by non-communicable diseases (% of total)"},
    height=800,
    title='Non-communicable disease mortality vs CO2 emissions, 2015',
    trendline="ols",
    template=list(plotly.io.templates.keys())[5]
)

fig.update_layout(showlegend=False)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using CO2 emission and non-communicable disease mortality data from the year 2015

### **Communicable disease mortality vs CO<sub>2</sub> emissions in 2000 and 2015, with regression line**

In [280]:
df_2000_cd = df.query("Year == 2000").query("indicator == 'SH.DTH.COMM.ZS'")
df_2000_cd.rename(columns = {"value":"SH.DTH.COMM.ZS"}, inplace=True)
df_2000_cd.drop(['indicator','Region', 'Income Group', 'Lending Type'], axis = 1, inplace=True)
df_2000_cd.reset_index(drop=True,inplace=True)
df_2000_cd.tail()  # Cleanses communicable disease mortality data from the year 2000 to remove unnecessary columns and reset the index, displays final 5 rows of data

Unnamed: 0,Year,SH.DTH.COMM.ZS,Country Code,Country Name
178,2000,16.024431,VEN,"Venezuela, RB"
179,2000,18.126715,VNM,Vietnam
180,2000,49.973732,YEM,"Yemen, Rep."
181,2000,74.262997,ZMB,Zambia
182,2000,78.637109,ZWE,Zimbabwe


In [281]:
df_2000_co2.tail()  # Displays previously cleansed CO2 emission data from the year 2000, as the data is indexed the same for comparison with communicable disease data

Unnamed: 0,Year,EN.ATM.CO2E.KT,Country Code,Country Name
178,2000,134390.0,VEN,"Venezuela, RB"
179,2000,50310.0,VNM,Vietnam
180,2000,13890.0,YEM,"Yemen, Rep."
181,2000,1810.0,ZMB,Zambia
182,2000,13700.0,ZWE,Zimbabwe


In [282]:
cd_2000 = df_2000_cd["SH.DTH.COMM.ZS"]
df_2000_cd_vs_co2 = df_2000_co2.join(cd_2000)
df_2000_cd_vs_co2.tail()  # Adds communicable disease mortality data column to the CO2 emission dataset from 2000 organized in the same index, displays final 5 rows of data

Unnamed: 0,Year,EN.ATM.CO2E.KT,Country Code,Country Name,SH.DTH.COMM.ZS
178,2000,134390.0,VEN,"Venezuela, RB",16.024431
179,2000,50310.0,VNM,Vietnam,18.126715
180,2000,13890.0,YEM,"Yemen, Rep.",49.973732
181,2000,1810.0,ZMB,Zambia,74.262997
182,2000,13700.0,ZWE,Zimbabwe,78.637109


In [208]:
fig = px.scatter(
    df_2000_cd_vs_co2,
    x = "EN.ATM.CO2E.KT",
    y = "SH.DTH.COMM.ZS",
    labels={"EN.ATM.CO2E.KT":"2000 CO2 emissions (kt)", "SH.DTH.COMM.ZS":"2000 Cause of death, by communicable diseases and maternal, prenatal, and nutrition conditions (% of total)"},
    height=800,
    title='Communicable disease mortality vs CO2 emissions, 2000',
    trendline="ols",
    template=list(plotly.io.templates.keys())[5]
)

fig.update_layout(showlegend=False)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using CO2 emission and communicable disease mortality data from the year 2000

In [283]:
df_2015_cd = df.query("Year == 2015").query("indicator == 'SH.DTH.COMM.ZS'")
df_2015_cd.rename(columns = {"value":"SH.DTH.COMM.ZS"}, inplace=True)
df_2015_cd.drop(['indicator','Region', 'Income Group', 'Lending Type'], axis = 1, inplace=True)
df_2015_cd.reset_index(drop=True,inplace=True)
df_2015_cd.tail()  # Cleanses communicable disease mortality data from the year 2015 to remove unnecessary columns and reset the index, displays final 5 rows of data

Unnamed: 0,Year,SH.DTH.COMM.ZS,Country Code,Country Name
178,2015,12.972305,VEN,"Venezuela, RB"
179,2015,10.755655,VNM,Vietnam
180,2015,32.104167,YEM,"Yemen, Rep."
181,2015,58.314607,ZMB,Zambia
182,2015,51.247909,ZWE,Zimbabwe


In [284]:
df_2015_co2.tail()  # Displays previously cleansed CO2 emission data from the year 2015, as the data is indexed the same for comparison with communicable disease data

Unnamed: 0,Year,EN.ATM.CO2E.KT,Country Code,Country Name
178,2015,173880.0,VEN,"Venezuela, RB"
179,2015,209200.0,VNM,Vietnam
180,2015,14210.0,YEM,"Yemen, Rep."
181,2015,5070.0,ZMB,Zambia
182,2015,12400.0,ZWE,Zimbabwe


In [285]:
cd_2015 = df_2015_cd["SH.DTH.COMM.ZS"]
df_2015_cd_vs_co2 = df_2015_co2.join(cd_2015)
df_2015_cd_vs_co2.tail()  # Adds communicable disease mortality data column to the CO2 emission dataset from 2015 organized in the same index, displays final 5 rows of data

Unnamed: 0,Year,EN.ATM.CO2E.KT,Country Code,Country Name,SH.DTH.COMM.ZS
178,2015,173880.0,VEN,"Venezuela, RB",12.972305
179,2015,209200.0,VNM,Vietnam,10.755655
180,2015,14210.0,YEM,"Yemen, Rep.",32.104167
181,2015,5070.0,ZMB,Zambia,58.314607
182,2015,12400.0,ZWE,Zimbabwe,51.247909


In [212]:
fig = px.scatter(
    df_2015_cd_vs_co2,
    x = "EN.ATM.CO2E.KT",
    y = "SH.DTH.COMM.ZS",
    labels={"EN.ATM.CO2E.KT":"2015 CO2 emissions (kt)", "SH.DTH.COMM.ZS":"2015 Cause of death, by communicable diseases and maternal, prenatal, and nutrition conditions (% of total)"},
    height=800,
    title='Communicable disease mortality vs CO2 emissions, 2015',
    trendline="ols",
    template=list(plotly.io.templates.keys())[5]
)

fig.update_layout(showlegend=False)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using CO2 emission and communicable disease mortality data from the year 2015

### **Life expectancy vs CO<sub>2</sub> emissions in 2000 and 2015, with regression line**

In [286]:
df_2000_le = df.query("Year == 2000").query("indicator == 'SP.DYN.LE00.IN'")
df_2000_le.rename(columns = {"value":"SP.DYN.LE00.IN"}, inplace=True)
df_2000_le.drop(['indicator','Region', 'Income Group', 'Lending Type'], axis = 1, inplace=True)
df_2000_le.reset_index(drop=True,inplace=True)
df_2000_le.drop(labels=[7,19,35,59,63,70,79,97,109,130,147,169,196,197], axis=0, inplace=True)
df_2000_le.reset_index(drop=True,inplace=True)

df_2000_le.tail()  # Cleanses life expectancy data from the year 2000 to remove unnecessary columns and rows not present in the CO2 emission data, resets the index, displays final 5 rows of data

Unnamed: 0,Year,SP.DYN.LE00.IN,Country Code,Country Name
182,2000,72.112,VEN,"Venezuela, RB"
183,2000,73.025,VNM,Vietnam
184,2000,60.683,YEM,"Yemen, Rep."
185,2000,44.0,ZMB,Zambia
186,2000,44.649,ZWE,Zimbabwe


In [287]:
df_2000_co2 = df.query("Year == 2000").query("indicator == 'EN.ATM.CO2E.KT'")
df_2000_co2.rename(columns = {"value":"EN.ATM.CO2E.KT"}, inplace=True)
df_2000_co2.drop(['indicator','Region', 'Income Group', 'Lending Type'], axis = 1, inplace=True)
df_2000_co2.reset_index(drop=True,inplace=True)
df_2000_co2.drop(labels=[3,48,120,159], axis=0, inplace=True)
df_2000_co2.reset_index(drop=True,inplace=True)

df_2000_co2.tail()  # Cleanses CO2 emission data from the year 2000 to remove unnecessary columns and rows that are not present in the life expectancy data, resets the index, displays final 5 rows of data

Unnamed: 0,Year,EN.ATM.CO2E.KT,Country Code,Country Name
182,2000,134390.0,VEN,"Venezuela, RB"
183,2000,50310.0,VNM,Vietnam
184,2000,13890.0,YEM,"Yemen, Rep."
185,2000,1810.0,ZMB,Zambia
186,2000,13700.0,ZWE,Zimbabwe


In [288]:
le_2000 = df_2000_le["SP.DYN.LE00.IN"]
df_2000_le_vs_co2 = df_2000_co2.join(le_2000)
df_2000_le_vs_co2.tail()  # Adds life expectancy data column to the CO2 emission dataset from 2000 organized in the same index, displays final 5 rows of data

Unnamed: 0,Year,EN.ATM.CO2E.KT,Country Code,Country Name,SP.DYN.LE00.IN
182,2000,134390.0,VEN,"Venezuela, RB",72.112
183,2000,50310.0,VNM,Vietnam,73.025
184,2000,13890.0,YEM,"Yemen, Rep.",60.683
185,2000,1810.0,ZMB,Zambia,44.0
186,2000,13700.0,ZWE,Zimbabwe,44.649


In [224]:
fig = px.scatter(
    df_2000_le_vs_co2,
    x = "EN.ATM.CO2E.KT",
    y = "SP.DYN.LE00.IN",
    labels={"EN.ATM.CO2E.KT":"2000 CO2 emissions (kt)", "SP.DYN.LE00.IN":"2000 Life expectancy at birth, total (years)"},
    height=800,                                                                                                                                                                            # differs somewhat from original chart
    title='Life expectancy vs CO2 emissions, 2000',
    trendline="ols",
    template=list(plotly.io.templates.keys())[5]
)

fig.update_layout(showlegend=False)  # Creates scatterplot with ordinary least squares (OLS) trendline using CO2 emission and life expectancy data from the year 2000

fig.show()

In [289]:
df_2015_le = df.query("Year == 2015").query("indicator == 'SP.DYN.LE00.IN'")
df_2015_le.rename(columns = {"value":"SP.DYN.LE00.IN"}, inplace=True)
df_2015_le.drop(['indicator','Region', 'Income Group', 'Lending Type'], axis = 1, inplace=True)
df_2015_le.reset_index(drop=True,inplace=True)
df_2015_le.drop(labels=[7,19,35,59,63,70,79,97,109,130,147,169,196,197], axis=0, inplace=True)
df_2015_le.reset_index(drop=True,inplace=True)

df_2015_le.tail()  # Cleanses life expectancy data from the year 2015 to remove unnecessary columns and rows not present in the CO2 emission data, resets the index, displays final 5 rows of data

Unnamed: 0,Year,SP.DYN.LE00.IN,Country Code,Country Name
182,2015,72.584,VEN,"Venezuela, RB"
183,2015,75.11,VNM,Vietnam
184,2015,66.085,YEM,"Yemen, Rep."
185,2015,61.737,ZMB,Zambia
186,2015,59.534,ZWE,Zimbabwe


In [290]:
df_2015_co2 = df.query("Year == 2015").query("indicator == 'EN.ATM.CO2E.KT'")
df_2015_co2.rename(columns = {"value":"EN.ATM.CO2E.KT"}, inplace=True)
df_2015_co2.drop(['indicator','Region', 'Income Group', 'Lending Type'], axis = 1, inplace=True)
df_2015_co2.reset_index(drop=True,inplace=True)
df_2015_co2.drop(labels=[3,48,120,159], axis=0, inplace=True)
df_2015_co2.reset_index(drop=True,inplace=True)

df_2015_co2.tail()  # Cleanses CO2 emission data from the year 2015 to remove unnecessary columns and rows that are not present in the life expectancy data, resets the index, displays final 5 rows of data

Unnamed: 0,Year,EN.ATM.CO2E.KT,Country Code,Country Name
182,2015,173880.0,VEN,"Venezuela, RB"
183,2015,209200.0,VNM,Vietnam
184,2015,14210.0,YEM,"Yemen, Rep."
185,2015,5070.0,ZMB,Zambia
186,2015,12400.0,ZWE,Zimbabwe


In [291]:
le_2015 = df_2015_le["SP.DYN.LE00.IN"]
df_2015_le_vs_co2 = df_2015_co2.join(le_2015)
df_2015_le_vs_co2.tail()  # Adds life expectancy data column to the CO2 emission dataset from 2015 organized in the same index, displays final 5 rows of data

Unnamed: 0,Year,EN.ATM.CO2E.KT,Country Code,Country Name,SP.DYN.LE00.IN
182,2015,173880.0,VEN,"Venezuela, RB",72.584
183,2015,209200.0,VNM,Vietnam,75.11
184,2015,14210.0,YEM,"Yemen, Rep.",66.085
185,2015,5070.0,ZMB,Zambia,61.737
186,2015,12400.0,ZWE,Zimbabwe,59.534


In [228]:
fig = px.scatter(
    df_2015_le_vs_co2,
    x = "EN.ATM.CO2E.KT",
    y = "SP.DYN.LE00.IN",
    labels={"EN.ATM.CO2E.KT":"2015 CO2 emissions (kt)", "SP.DYN.LE00.IN":"2015 Life expectancy at birth, total (years)"},
    height=800,                                                                                                                                                                            # differs somewhat from original chart
    title='Life expectancy vs CO2 emissions, 2015',
    trendline="ols",
    template=list(plotly.io.templates.keys())[5]
)

fig.update_layout(showlegend=False)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using CO2 emission and life expectancy data from the year 2015

- When looking at worldwide distribution of CO<sub>2</sub> emissions in 2000 and 2015, the right skew of countries at the highest emission level increased in 2015 mainly due to the increase of China's emission level. However, the spread of high-emitting countries aside from China decreased in 2015 and less countries were emitting at a high level than in 2000.
- The p-value analysis and correlation in these comparisons had varied outcomes.
  - For the non-communicable disease mortality comparisons, the link to CO<sub>2</sub> emissions was statistically significant in 2000 (p = 0.016) but not in 2015 (p = 0.063). Neither showed a high level of correlation (r = 0.179 in 2000 and r = 0.138 in 2015)
  - For the communicable disease mortality comparisons, the link to CO<sub>2</sub> emissions was statistically significant in 2000 (p = 0.020), but not in 2015 (p = 0.068). Neither showed a high level of correlation (r = -0.173 in 2000 and r = -0.134 in 2015)
  - For the life expectancy comparisons, the link to CO<sub>2</sub> emissions was statistically signficant in 2000 (p = 0.024), but not in 2015 (p = .101). Neither showed a high level of correlation (r = 0.164 in 2000 and r = 0.122 in 2015)
- Overall, the data showed little correlation between CO<sub>2</sub> emissions and disease mortality or life expectancy for either 2000 or 2015. Conversely, the linkage for all comparisons was statistically significant in the 2000 data, but not in the 2015 data.

## **3. Does an increase in renewable energy consumption cause significant changes to mortality rates from disease and life expectancy?**

### **Worldwide renewable energy consumption distribution in 2000 and 2015**

In [293]:
df_2000_re = df.query("Year == 2000").query("indicator == 'EG.FEC.RNEW.ZS'")
df_2000_re.rename(columns = {"value":"EG.FEC.RNEW.ZS"}, inplace=True)
df_2000_re.drop(['indicator','Region', 'Income Group', 'Lending Type'], axis = 1, inplace=True)
df_2000_re.reset_index(drop=True,inplace=True)

df_2000_re.tail()  # Cleanses renewable energy consumption data from the year 2000 to remove unnecessary columns and reset the index, displays final 5 rows of data

Unnamed: 0,Year,EG.FEC.RNEW.ZS,Country Code,Country Name
208,2000,0.0,VIR,Virgin Islands (U.S.)
209,2000,17.5249,PSE,West Bank and Gaza
210,2000,1.1513,YEM,"Yemen, Rep."
211,2000,89.919403,ZMB,Zambia
212,2000,69.259804,ZWE,Zimbabwe


In [230]:
fig = px.histogram(
    df_2000_re, 
    x="EG.FEC.RNEW.ZS",
    labels={"EG.FEC.RNEW.ZS":"2000 Renewable energy consumption (% of total final energy consumption)"},
    color="Country Name",
    height=800,
    title='Worldwide distribution of renewable energy consumption, 2000',
    template=list(plotly.io.templates.keys())[5]
)

fig.update_layout(showlegend=False)

fig.show()  # Creates histogram to show distribution of renewable energy consumption worldwide in the year 2000

In [294]:
df_2015_re = df.query("Year == 2015").query("indicator == 'EG.FEC.RNEW.ZS'")
df_2015_re.rename(columns = {"value":"EG.FEC.RNEW.ZS"}, inplace=True)
df_2015_re.drop(['indicator','Region', 'Income Group', 'Lending Type'], axis = 1, inplace=True)
df_2015_re.reset_index(drop=True,inplace=True)

df_2015_re.tail()  # Cleanses renewable energy consumption data from the year 2015 to remove unnecessary columns and reset the index, displays final 5 rows of data

Unnamed: 0,Year,EG.FEC.RNEW.ZS,Country Code,Country Name
208,2015,4.1266,VIR,Virgin Islands (U.S.)
209,2015,10.9941,PSE,West Bank and Gaza
210,2015,2.4195,YEM,"Yemen, Rep."
211,2015,85.487,ZMB,Zambia
212,2015,81.4272,ZWE,Zimbabwe


In [232]:
fig = px.histogram(
    df_2015_re, 
    x="EG.FEC.RNEW.ZS",
    labels={"EG.FEC.RNEW.ZS":"2015 Renewable energy consumption (% of total final energy consumption)"},
    color="Country Name",
    height=800,
    title='Worldwide distribution of renewable energy consumption, 2015',
    template=list(plotly.io.templates.keys())[5]
)

fig.update_layout(showlegend=False)

fig.show()  # Creates histogram to show distribution of renewable energy consumption worldwide in the year 2015

### **Non-communicable disease mortality vs Renewable energy consumption in 2000 and 2015, with regression line**

In [295]:
df_2000_ncd = df.query("Year == 2000").query("indicator == 'SH.DTH.NCOM.ZS'")
df_2000_ncd.rename(columns = {"value":"SH.DTH.NCOM.ZS"}, inplace=True)
df_2000_ncd.drop(['indicator','Region', 'Income Group', 'Lending Type'], axis = 1, inplace=True)
df_2000_ncd.reset_index(drop=True,inplace=True)
df_2000_ncd.tail()  # Cleanses non-communicable disease mortality data from the year 2000 to remove unnecessary columns and reset the index, displays final 5 rows of data

Unnamed: 0,Year,SH.DTH.NCOM.ZS,Country Code,Country Name
178,2000,63.629964,VEN,"Venezuela, RB"
179,2000,72.908371,VNM,Vietnam
180,2000,41.257248,YEM,"Yemen, Rep."
181,2000,20.394113,ZMB,Zambia
182,2000,16.009386,ZWE,Zimbabwe


In [296]:
df_2000_re.drop(labels=[3,4,9,21,27,36,49,54,64,68,74,76,78,85,93,104,113,116,123,135,138,144,148,156,169,178,196,197,208,209], axis=0, inplace=True)
df_2000_re.reset_index(drop=True,inplace=True)
df_2000_re.tail()  # Cleanses renewable energy consumption data from the year 2000 to remove rows that are not present in the non-communicable disease mortality data and reset the index, displays final 5 rows of data

Unnamed: 0,Year,EG.FEC.RNEW.ZS,Country Code,Country Name
178,2000,15.2895,VEN,"Venezuela, RB"
179,2000,57.980301,VNM,Vietnam
180,2000,1.1513,YEM,"Yemen, Rep."
181,2000,89.919403,ZMB,Zambia
182,2000,69.259804,ZWE,Zimbabwe


In [297]:
ncd_2000 = df_2000_ncd["SH.DTH.NCOM.ZS"]
df_2000_ncd_vs_re = df_2000_re.join(ncd_2000)
df_2000_ncd_vs_re.tail()  # Adds non-communicable disease mortality data column to the renewable energy consumption dataset from 2000 organized in the same index, displays final 5 rows of data

Unnamed: 0,Year,EG.FEC.RNEW.ZS,Country Code,Country Name,SH.DTH.NCOM.ZS
178,2000,15.2895,VEN,"Venezuela, RB",63.629964
179,2000,57.980301,VNM,Vietnam,72.908371
180,2000,1.1513,YEM,"Yemen, Rep.",41.257248
181,2000,89.919403,ZMB,Zambia,20.394113
182,2000,69.259804,ZWE,Zimbabwe,16.009386


In [314]:
fig = px.scatter(
    df_2000_ncd_vs_re,
    x = "EG.FEC.RNEW.ZS",
    y = "SH.DTH.NCOM.ZS",
    labels={"EG.FEC.RNEW.ZS":"2000 Renewable energy consumption (% of total final energy consumption)", "SH.DTH.NCOM.ZS":"2000 Cause of death, by non-communicable diseases (% of total)"},
    height=800,
    title='Non-communicable disease mortality vs Renewable energy consumption, 2000',
    trendline="ols",
    template=list(plotly.io.templates.keys())[5]
)

fig.update_layout(showlegend=False)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using renewable energy consumption and non-communicable disease mortality data from the year 2000

In [298]:
df_2015_ncd = df.query("Year == 2015").query("indicator == 'SH.DTH.NCOM.ZS'")
df_2015_ncd.rename(columns = {"value":"SH.DTH.NCOM.ZS"}, inplace=True)
df_2015_ncd.drop(['indicator','Region', 'Income Group', 'Lending Type'], axis = 1, inplace=True)
df_2015_ncd.reset_index(drop=True,inplace=True)
df_2015_ncd.tail()  # Cleanses non-communicable disease mortality data from the year 2015 to remove unnecessary columns and reset the index, displays final 5 rows of data

Unnamed: 0,Year,SH.DTH.NCOM.ZS,Country Code,Country Name
178,2015,66.362497,VEN,"Venezuela, RB"
179,2015,80.407434,VNM,Vietnam
180,2015,50.036798,YEM,"Yemen, Rep."
181,2015,33.197525,ZMB,Zambia
182,2015,36.946549,ZWE,Zimbabwe


In [299]:
df_2015_re.drop(labels=[3,4,9,21,27,36,49,54,64,68,74,76,78,85,93,104,113,116,123,135,138,144,148,156,169,178,196,197,208,209], axis=0, inplace=True)
df_2015_re.reset_index(drop=True,inplace=True)
df_2015_re.tail()  # Cleanses renewable energy consumption data from the year 2015 to remove rows that are not present in the non-communicable disease mortality data and reset the index, displays final 5 rows of data

Unnamed: 0,Year,EG.FEC.RNEW.ZS,Country Code,Country Name
178,2015,15.3294,VEN,"Venezuela, RB"
179,2015,30.7199,VNM,Vietnam
180,2015,2.4195,YEM,"Yemen, Rep."
181,2015,85.487,ZMB,Zambia
182,2015,81.4272,ZWE,Zimbabwe


In [300]:
ncd_2015 = df_2015_ncd["SH.DTH.NCOM.ZS"]
df_2015_ncd_vs_re = df_2015_re.join(ncd_2015)
df_2015_ncd_vs_re.tail()  # Adds non-communicable disease mortality data column to the renewable energy consumption dataset from 2015 organized in the same index, displays final 5 rows of data

Unnamed: 0,Year,EG.FEC.RNEW.ZS,Country Code,Country Name,SH.DTH.NCOM.ZS
178,2015,15.3294,VEN,"Venezuela, RB",66.362497
179,2015,30.7199,VNM,Vietnam,80.407434
180,2015,2.4195,YEM,"Yemen, Rep.",50.036798
181,2015,85.487,ZMB,Zambia,33.197525
182,2015,81.4272,ZWE,Zimbabwe,36.946549


In [315]:
fig = px.scatter(
    df_2015_ncd_vs_re,
    x = "EG.FEC.RNEW.ZS",
    y = "SH.DTH.NCOM.ZS",
    labels={"EG.FEC.RNEW.ZS":"2015 Renewable energy consumption (% of total final energy consumption)", "SH.DTH.NCOM.ZS":"2015 Cause of death, by non-communicable diseases (% of total)"},
    height=800,
    title='Non-communicable disease mortality vs Renewable energy consumption, 2015',
    trendline="ols",
    template=list(plotly.io.templates.keys())[5]
)

fig.update_layout(showlegend=False)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using renewable energy consumption and non-communicable disease mortality data from the year 2015

### **Communicable disease mortality vs Renewable energy consumption in 2000 and 2015, with regression line**

In [301]:
df_2000_cd = df.query("Year == 2000").query("indicator == 'SH.DTH.COMM.ZS'")
df_2000_cd.rename(columns = {"value":"SH.DTH.COMM.ZS"}, inplace=True)
df_2000_cd.drop(['indicator','Region', 'Income Group', 'Lending Type'], axis = 1, inplace=True)
df_2000_cd.reset_index(drop=True,inplace=True)
df_2000_cd.tail()  # Cleanses communicable disease mortality data from the year 2000 to remove unnecessary columns and reset the index, displays final 5 rows of data

Unnamed: 0,Year,SH.DTH.COMM.ZS,Country Code,Country Name
178,2000,16.024431,VEN,"Venezuela, RB"
179,2000,18.126715,VNM,Vietnam
180,2000,49.973732,YEM,"Yemen, Rep."
181,2000,74.262997,ZMB,Zambia
182,2000,78.637109,ZWE,Zimbabwe


In [302]:
df_2000_re.tail()  # Displays previously cleansed renewable energy consumption data from the year 2000, as the data is indexed the same for comparison with communicable disease data

Unnamed: 0,Year,EG.FEC.RNEW.ZS,Country Code,Country Name
178,2000,15.2895,VEN,"Venezuela, RB"
179,2000,57.980301,VNM,Vietnam
180,2000,1.1513,YEM,"Yemen, Rep."
181,2000,89.919403,ZMB,Zambia
182,2000,69.259804,ZWE,Zimbabwe


In [303]:
cd_2000 = df_2000_cd["SH.DTH.COMM.ZS"]
df_2000_cd_vs_re = df_2000_re.join(cd_2000)
df_2000_cd_vs_re.tail()  # Adds communicable disease mortality data column to the renewable energy consumption dataset from 2000 organized in the same index, displays final 5 rows of data

Unnamed: 0,Year,EG.FEC.RNEW.ZS,Country Code,Country Name,SH.DTH.COMM.ZS
178,2000,15.2895,VEN,"Venezuela, RB",16.024431
179,2000,57.980301,VNM,Vietnam,18.126715
180,2000,1.1513,YEM,"Yemen, Rep.",49.973732
181,2000,89.919403,ZMB,Zambia,74.262997
182,2000,69.259804,ZWE,Zimbabwe,78.637109


In [316]:
fig = px.scatter(
    df_2000_cd_vs_re,
    x = "EG.FEC.RNEW.ZS",
    y = "SH.DTH.COMM.ZS",
    labels={"EG.FEC.RNEW.ZS":"2000 Renewable energy consumption (% of total final energy consumption)", "SH.DTH.COMM.ZS":"2000 Cause of death, by communicable diseases and maternal, prenatal, and nutrition conditions (% of total)"},
    height=800,
    title='Communicable disease mortality vs Renewable energy consumption, 2000',
    trendline="ols",
    template=list(plotly.io.templates.keys())[5]
)

fig.update_layout(showlegend=False)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using renewable energy consumption and communicable disease mortality data from the year 2000

In [304]:
df_2015_cd = df.query("Year == 2015").query("indicator == 'SH.DTH.COMM.ZS'")
df_2015_cd.rename(columns = {"value":"SH.DTH.COMM.ZS"}, inplace=True)
df_2015_cd.drop(['indicator','Region', 'Income Group', 'Lending Type'], axis = 1, inplace=True)
df_2015_cd.reset_index(drop=True,inplace=True)
df_2015_cd.tail()  # Cleanses communicable disease mortality data from the year 2015 to remove unnecessary columns and reset the index, displays final 5 rows of data

Unnamed: 0,Year,SH.DTH.COMM.ZS,Country Code,Country Name
178,2015,12.972305,VEN,"Venezuela, RB"
179,2015,10.755655,VNM,Vietnam
180,2015,32.104167,YEM,"Yemen, Rep."
181,2015,58.314607,ZMB,Zambia
182,2015,51.247909,ZWE,Zimbabwe


In [305]:
df_2015_re.tail()  # Displays previously cleansed renewable energy consumption data from the year 2015, as the data is indexed the same for comparison with communicable disease data

Unnamed: 0,Year,EG.FEC.RNEW.ZS,Country Code,Country Name
178,2015,15.3294,VEN,"Venezuela, RB"
179,2015,30.7199,VNM,Vietnam
180,2015,2.4195,YEM,"Yemen, Rep."
181,2015,85.487,ZMB,Zambia
182,2015,81.4272,ZWE,Zimbabwe


In [306]:
cd_2015 = df_2015_cd["SH.DTH.COMM.ZS"]
df_2015_cd_vs_re = df_2015_re.join(cd_2015)
df_2015_cd_vs_re.tail()  # Adds communicable disease mortality data column to the renewable energy consumption dataset from 2015 organized in the same index, displays final 5 rows of data

Unnamed: 0,Year,EG.FEC.RNEW.ZS,Country Code,Country Name,SH.DTH.COMM.ZS
178,2015,15.3294,VEN,"Venezuela, RB",12.972305
179,2015,30.7199,VNM,Vietnam,10.755655
180,2015,2.4195,YEM,"Yemen, Rep.",32.104167
181,2015,85.487,ZMB,Zambia,58.314607
182,2015,81.4272,ZWE,Zimbabwe,51.247909


In [317]:
fig = px.scatter(
    df_2015_cd_vs_re,
    x = "EG.FEC.RNEW.ZS",
    y = "SH.DTH.COMM.ZS",
    labels={"EG.FEC.RNEW.ZS":"2015 Renewable energy consumption (% of total final energy consumption)", "SH.DTH.COMM.ZS":"2015 Cause of death, by communicable diseases and maternal, prenatal, and nutrition conditions (% of total)"},
    height=800,
    title='Communicable disease mortality vs Renewable energy consumption, 2015',
    trendline="ols",
    template=list(plotly.io.templates.keys())[5]
)

fig.update_layout(showlegend=False)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using renewable energy consumption and communicable disease mortality data from the year 2000

### **Life expectancy vs Renewable energy consumption in 2000 and 2015, with regression line**

In [307]:
df_2000_le = df.query("Year == 2000").query("indicator == 'SP.DYN.LE00.IN'")
df_2000_le.rename(columns = {"value":"SP.DYN.LE00.IN"}, inplace=True)
df_2000_le.drop(['indicator','Region', 'Income Group', 'Lending Type'], axis = 1, inplace=True)
df_2000_le.reset_index(drop=True,inplace=True)
df_2000_le.drop(labels=[35,169], axis=0, inplace=True)
df_2000_le.reset_index(drop=True,inplace=True)

df_2000_le.tail()  # Cleanses life expectancy data from the year 2000 to remove unnecessary columns and rows not present in the renewable energy consumption data, resets the index, displays final 5 rows of data

Unnamed: 0,Year,SP.DYN.LE00.IN,Country Code,Country Name
194,2000,76.619512,VIR,Virgin Islands (U.S.)
195,2000,71.022,PSE,West Bank and Gaza
196,2000,60.683,YEM,"Yemen, Rep."
197,2000,44.0,ZMB,Zambia
198,2000,44.649,ZWE,Zimbabwe


In [308]:
df_2000_re = df.query("Year == 2000").query("indicator == 'EG.FEC.RNEW.ZS'")
df_2000_re.rename(columns = {"value":"EG.FEC.RNEW.ZS"}, inplace=True)
df_2000_re.drop(['indicator','Region', 'Income Group', 'Lending Type'], axis = 1, inplace=True)
df_2000_re.reset_index(drop=True,inplace=True)
df_2000_re.drop(labels=[3,4,27,36,49,54,74,93,135,144,169,178,196,197], axis=0, inplace=True)
df_2000_re.reset_index(drop=True,inplace=True)

df_2000_re.tail()  # Cleanses renewable energy consumption data from the year 2000 to remove unnecessary columns and rows that are not present in the life expectancy data, resets the index, displays final 5 rows of data

Unnamed: 0,Year,EG.FEC.RNEW.ZS,Country Code,Country Name
194,2000,0.0,VIR,Virgin Islands (U.S.)
195,2000,17.5249,PSE,West Bank and Gaza
196,2000,1.1513,YEM,"Yemen, Rep."
197,2000,89.919403,ZMB,Zambia
198,2000,69.259804,ZWE,Zimbabwe


In [309]:
le_2000 = df_2000_le["SP.DYN.LE00.IN"]
df_2000_le_vs_re = df_2000_re.join(le_2000)
df_2000_le_vs_re.tail()  # Adds life expectancy data column to the renewable energy consumption dataset from 2000 organized in the same index, displays final 5 rows of data

Unnamed: 0,Year,EG.FEC.RNEW.ZS,Country Code,Country Name,SP.DYN.LE00.IN
194,2000,0.0,VIR,Virgin Islands (U.S.),76.619512
195,2000,17.5249,PSE,West Bank and Gaza,71.022
196,2000,1.1513,YEM,"Yemen, Rep.",60.683
197,2000,89.919403,ZMB,Zambia,44.0
198,2000,69.259804,ZWE,Zimbabwe,44.649


In [318]:
fig = px.scatter(
    df_2000_le_vs_re,
    x = "EG.FEC.RNEW.ZS",
    y = "SP.DYN.LE00.IN",
    labels={"EG.FEC.RNEW.ZS":"2000 Renewable energy consumption (% of total final energy consumption)", "SP.DYN.LE00.IN":"2000 Life expectancy at birth, total (years)"},
    height=800,
    title='Life expectancy vs Renewable energy consumption, 2000',
    trendline="ols",
    template=list(plotly.io.templates.keys())[5]
)

fig.update_layout(showlegend=False)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using renewable energy consumption and life expectancy data from the year 2000

In [310]:
df_2015_le = df.query("Year == 2015").query("indicator == 'SP.DYN.LE00.IN'")
df_2015_le.rename(columns = {"value":"SP.DYN.LE00.IN"}, inplace=True)
df_2015_le.drop(['indicator','Region', 'Income Group', 'Lending Type'], axis = 1, inplace=True)
df_2015_le.reset_index(drop=True,inplace=True)
df_2015_le.drop(labels=[35,169], axis=0, inplace=True)
df_2015_le.reset_index(drop=True,inplace=True)

df_2015_le.tail()  # Cleanses life expectancy data from the year 2015 to remove unnecessary columns and rows not present in the renewable energy consumption data, resets the index, displays final 5 rows of data

Unnamed: 0,Year,SP.DYN.LE00.IN,Country Code,Country Name
194,2015,79.017073,VIR,Virgin Islands (U.S.)
195,2015,73.442,PSE,West Bank and Gaza
196,2015,66.085,YEM,"Yemen, Rep."
197,2015,61.737,ZMB,Zambia
198,2015,59.534,ZWE,Zimbabwe


In [311]:
df_2015_re = df.query("Year == 2015").query("indicator == 'EG.FEC.RNEW.ZS'")
df_2015_re.rename(columns = {"value":"EG.FEC.RNEW.ZS"}, inplace=True)
df_2015_re.drop(['indicator','Region', 'Income Group', 'Lending Type'], axis = 1, inplace=True)
df_2015_re.reset_index(drop=True,inplace=True)
df_2015_re.drop(labels=[3,4,27,36,49,54,74,93,135,144,169,178,196,197], axis=0, inplace=True)
df_2015_re.reset_index(drop=True,inplace=True)

df_2015_re.tail()  # Cleanses renewable energy consumption data from the year 2015 to remove unnecessary columns and rows that are not present in the life expectancy data, resets the index, displays final 5 rows of data

Unnamed: 0,Year,EG.FEC.RNEW.ZS,Country Code,Country Name
194,2015,4.1266,VIR,Virgin Islands (U.S.)
195,2015,10.9941,PSE,West Bank and Gaza
196,2015,2.4195,YEM,"Yemen, Rep."
197,2015,85.487,ZMB,Zambia
198,2015,81.4272,ZWE,Zimbabwe


In [312]:
le_2015 = df_2015_le["SP.DYN.LE00.IN"]
df_2015_le_vs_re = df_2015_re.join(le_2015)
df_2015_le_vs_re.tail()  # Adds life expectancy data column to the renewable energy consumption dataset from 2015 organized in the same index, displays final 5 rows of data

Unnamed: 0,Year,EG.FEC.RNEW.ZS,Country Code,Country Name,SP.DYN.LE00.IN
194,2015,4.1266,VIR,Virgin Islands (U.S.),79.017073
195,2015,10.9941,PSE,West Bank and Gaza,73.442
196,2015,2.4195,YEM,"Yemen, Rep.",66.085
197,2015,85.487,ZMB,Zambia,61.737
198,2015,81.4272,ZWE,Zimbabwe,59.534


In [319]:
fig = px.scatter(
    df_2015_le_vs_re,
    x = "EG.FEC.RNEW.ZS",
    y = "SP.DYN.LE00.IN",
    labels={"EG.FEC.RNEW.ZS":"2015 Renewable energy consumption (% of total final energy consumption)", "SP.DYN.LE00.IN":"2015 Life expectancy at birth, total (years)"},
    height=800,                                                                                                                                                                            # differs somewhat from original chart
    title='Life expectancy vs Renewable energy consumption, 2015',
    trendline="ols",
    template=list(plotly.io.templates.keys())[5]
)

fig.update_layout(showlegend=False)

fig.show()  # Creates scatterplot with ordinary least squares (OLS) trendline using renewable energy consumption and life expectancy data from the year 2000

- When looking at worldwide distribution of renewable energy consumption in 2000 and 2015, a number of countries increased over time. This shows in a more even but still right skewed distribution. Also, the increase occurred more in a shift in the 0%-30% bins more than 40%-100%, which showed a more even grouping outside of the 70%-80% and 100% bins.
- The p-value analysis and correlation in these comparisons had strong connections.
  - For the non-communicable disease mortality comparisons, the link to renewable energy consumption was statistically significant in both 2000 and 2015 (p = 0.000 for both). Both showed a moderate negative correlation (r = -0.757 in 2000 and r = -0.666 in 2015)
  - For the communicable disease mortality comparisons, the link to renewable energy consumption was statistically significant in both 2000 and 2015 (p = 0.000 for both). Both showed a moderate positive correlation (r = 0.771 in 2000 and r = .713 in 2015)
  - For the life expectancy comparisons, the link to renewable energy consumption was statistically signficant in both 2000 and 2015 (p = 0.000 for both). Both showed a moderate negative correlation(r = -0.748 in 2000 and r = -0.636 in 2015)
- Overall, the data was entirely statistically significant in connecting renewable energy consumption to disease mortality and life expectancy. The data also signified that countries with higher renewable energy consumption somewhat tended to have lower non-communicable disease mortality rates and lower life expectancies, while the opposite was true for communicable disease mortality rates.

## 4. **Conclusions**
- The changes in distribution of CO<sub>2</sub> emissions and renewable energy consumption worldwide between 2000 and 2015 showed evident distinctions in analysis
- The difference in CO<sub>2</sub> emissions distribution caused the disease mortality and life expectancy comparisons to be statistically significant in 2000 but not in 2015. This was not true for renewable energy consumption distribution, as the change over time did not affect the statistical significance of the disease mortality and life expectancy comparisons.
- The correlation between CO<sub>2</sub> emissions and disease mortalities or life expectancy was not strong in any of the data sets examined.
- The correlation with renewable energy consumption was moderately strong in all data sets examined. The results were somewhat surprising. Non-communicable disease mortality and life expectancy were lower in countries with higher renewable energy consumption and communicable disease mortality was higher.
- Some anecdotal observations can be made from the overall results. 
  - CO<sub>2</sub> emissions are not an adequate comparative tool to measure the harmful effects of non-renewable resource consumption in relation to disease mortality and life expectancy. Better statistics to look at for analysis would likely be either fossil fuel consumption or non-renewable resource consumption statistics.
  - Renewable energy consumption can be instructive in analyzing the disease mortality and life expectancy of a country. However, further analysis should occur on other factors that determine disease mortality and life expectancy. The correlation is not strong enough on its own to use these comparisons, so further research on other environmental and economic factors is necessary.