<a href="https://colab.research.google.com/github/abdulSalamKagaji97/world_development_explorer/blob/main/Part_B/wdx_analysis_Part_B_final.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **How does population density, fossil fuel consumption, Renewable Energy Consumption impact on greenhouse gas emission?** 

*Abdul Salam Kagaji*

*Data Science Master's Student, UMBC*

*April 1, 2022*

Global population growth leads to overpopulation, causing scarcity of natural resources of the Earth. Population is growth is influenced by multiple factors like evolved health care, better and safe urban infrastructures. It makes life unsustainable in areas with dense populations.
Advanced innovations in the fields of science and health care have made life on Earth long-lasting. Despite these phenomenal results, it has also led to overpopulation which in turn affects the consumption of natural resources for energy generation. Modern innovations ranging from automobiles to semiconductor devices demand energy sources in form of electricity or fuels. 

Fossil fuels are the most prominent natural resources for the production of energy. Increasing populations require higher production of energy for everyday activities and this demands consumption of fossil fuels like coal and crude oil. Despite the production of hydroelectricity, the majority proportion of energy generated is from fossil fuels. 
Continued usage of energy and electronic devices have affected carbon emission leading to the greenhouse effect on Earth. Greenhouse gases cause global warming which disrupts the environmental conditions creating chaos that could lead to devastating situations.

Lets analyze the effects of overpopulation, fossil fuel consumption, renewable energy consumption on greenhouse gas emissions with the data from the time period of 2010 to 2021.

## Approach

**a.	Source of data & graphs:** THE WORLD DEVELOPMENT EXPLORER (https://www.worlddev.xyz/) 

**b.	Regions Compared:** East Asia & Pacific, Europe & central Asia, Latin America & Caribbean, Middle East & North Africa, North America, South Asia, Sub-Saharan Africa.

**c.	Timeline :** 2010 to 2021

**d.	Topics and Indicators:**

-	**Urban Development** : Population density (people per sq. km of land area - EN.POP.DNST) 
- **Energy & Mining** : Renewable energy consumption (% of total final energy consumption - EG.FEC.RNEW.ZS)
- **Energy & Mining** : Fossil fuel energy consumption (% of total - EG.USE.COMM.FO.ZS)
- **Climate Change** : Total greenhouse gas emissions (kt of CO2 equivalent - EN.ATM.GHGT.KT.CE)

## Visual Analysis:

In [None]:
import pandas as pd
import plotly.express as px

In [None]:
URL = "https://raw.githubusercontent.com/abdulSalamKagaji97/world_development_explorer/main/Part_B/wdi_data_part_b.csv"

In [None]:
df = pd.read_csv(URL) # reading data
df.sample(10)

Unnamed: 0.1,Unnamed: 0,Year,value,indicator,Country Code,Country Name,Region,Income Group,Lending Type
450,450,2010,61800.0,EN.ATM.GHGT.KT.CE,ECU,Ecuador,Latin America & Caribbean,Upper middle income,IBRD
5016,5016,2018,46.665455,EN.POP.DNST,GNQ,Equatorial Guinea,Sub-Saharan Africa,Upper middle income,IBRD
5793,5793,2017,1.999536,EN.POP.DNST,MNG,Mongolia,East Asia & Pacific,Lower middle income,IBRD
3741,3741,2015,58.133301,EG.FEC.RNEW.ZS,NOR,Norway,Europe & Central Asia,High income,Not classified
2160,2160,2012,8.269956,EG.USE.COMM.FO.ZS,MOZ,Mozambique,Sub-Saharan Africa,Low income,IDA
6619,6619,2019,35.893176,EN.POP.DNST,USA,United States,North America,High income,Not classified
4394,4394,2012,175.376596,EN.POP.DNST,AND,Andorra,Europe & Central Asia,High income,Not classified
171,171,2010,1570.0,EN.ATM.GHGT.KT.CE,BTN,Bhutan,South Asia,Lower middle income,IDA
77,77,2015,594580.0,EN.ATM.GHGT.KT.CE,AUS,Australia,East Asia & Pacific,High income,Not classified
2428,2428,2011,28.492743,EG.USE.COMM.FO.ZS,ZWE,Zimbabwe,Sub-Saharan Africa,Lower middle income,Blend


In [None]:
# df.indicator.unique()

### Population distribution among regions:

In [None]:
df_population_density = df.query("indicator == 'EN.POP.DNST'") # creating a data frame specific to population density
df_population_density.sample(5)

Unnamed: 0.1,Unnamed: 0,Year,value,indicator,Country Code,Country Name,Region,Income Group,Lending Type
5678,5678,2012,1324.103333,EN.POP.DNST,MDV,Maldives,South Asia,Upper middle income,IDA
6433,6433,2020,95.30391,EN.POP.DNST,SYR,Syrian Arab Republic,Middle East & North Africa,Low income,IDA
4890,4890,2013,108.258703,EN.POP.DNST,CUB,Cuba,Latin America & Caribbean,Upper middle income,Not classified
6635,6635,2013,71.093559,EN.POP.DNST,UZB,Uzbekistan,Europe & Central Asia,Lower middle income,Blend
5806,5806,2019,46.247435,EN.POP.DNST,MNE,Montenegro,Europe & Central Asia,Upper middle income,IBRD


In [None]:
df_population_region_group = df_population_density.groupby("Region").value.sum() # grouping by region and adding up the value column
df_population_region_group = df_population_region_group.reset_index()
df_population_region_group

Unnamed: 0,Region,value
0,East Asia & Pacific,431804.961155
1,Europe & Central Asia,319456.010751
2,Latin America & Caribbean,92435.011488
3,Middle East & North Africa,68586.107779
4,North America,13577.721557
5,South Asia,44202.455145
6,Sub-Saharan Africa,53924.04833


In [None]:
df_pop_sorted = df_population_region_group.sort_values(by="value",ascending=False)
fig = px.bar(
    data_frame=df_pop_sorted,
    x = "Region", 
    y = "value",
    color = "Region",
    height = 600,
    labels={'value':'population density (group sum)'},
    template = "plotly_white"
    )
fig.update_layout(showlegend = True)
fig.show()

- From the distribution graph it is clearly seen that the population density of East Asia & Pacific region is the highest as compared to other western regions like North America and middle East & North Africa region, followed by Europe & central Asia with second highest population density.
- Sub-Saharan Africa is the region with least population density.

### Population Change between 2010 and 2021:

In [None]:
df_population_region_group_trend = df_population_density.groupby(["Year","Region"]).value.sum()
df_population_region_group_trend = df_population_region_group_trend.reset_index()
df_population_region_group_trend
fig = px.line(
    df_population_region_group_trend, 
    x="Year", 
    labels={"value":"Population Density (group sum)"},
    y="value",
    color="Region",
    height=600,
    title="Population density change"
)

fig.update_layout(showlegend=True)

fig.show()

- The time series graph shows a gradual increase in population densities in all the regions with time and it is also seen that East Asia & Pacific has highest change in population density.

### Fossil Fuel Consumption by region:

In [None]:
df_fossil_fuel_consumption = df.query("indicator == 'EG.USE.COMM.FO.ZS'") # creating a data frame specific to fossil fuel consumption
df_fossil_fuel_consumption.sample(5)

Unnamed: 0.1,Unnamed: 0,Year,value,indicator,Country Code,Country Name,Region,Income Group,Lending Type
2277,2277,2013,52.172708,EG.USE.COMM.FO.ZS,SEN,Senegal,Sub-Saharan Africa,Lower middle income,IDA
1796,1796,2012,65.789993,EG.USE.COMM.FO.ZS,BWA,Botswana,Sub-Saharan Africa,Upper middle income,IBRD
2269,2269,2010,99.996776,EG.USE.COMM.FO.ZS,SAU,Saudi Arabia,Middle East & North Africa,High income,Not classified
1856,1856,2011,48.26481,EG.USE.COMM.FO.ZS,CRI,Costa Rica,Latin America & Caribbean,Upper middle income,IBRD
2226,2226,2011,82.635606,EG.USE.COMM.FO.ZS,PAN,Panama,Latin America & Caribbean,Upper middle income,IBRD


In [None]:
df_fossil_fuel_consumption = df_fossil_fuel_consumption.groupby(["Region"]).value.sum()
df_fossil_fuel_consumption = df_fossil_fuel_consumption.reset_index()
df_fossil_fuel_consumption

Unnamed: 0,Region,value
0,East Asia & Pacific,6620.593711
1,Europe & Central Asia,19345.555666
2,Latin America & Caribbean,7896.366823
3,Middle East & North Africa,8107.266415
4,North America,944.73127
5,South Asia,1328.998062
6,Sub-Saharan Africa,4254.209111


In [None]:
fig = px.pie(df_fossil_fuel_consumption, values='value', names='Region', title = "Fossil fuel consumption by region")
fig.show()

- The above pie chart shows that the Europe & Central Asia has the highest fossil fuel consumption recorded with a total of 39.9% followed by Middle East & North Africa.

### Renewable Energy Consumption by region:

In [None]:
df_renewable_energy_consumption = df.query("indicator == 'EG.FEC.RNEW.ZS'") # creating a data frame specific to renewable energy consumption
df_renewable_energy_consumption.sample(5)

Unnamed: 0.1,Unnamed: 0,Year,value,indicator,Country Code,Country Name,Region,Income Group,Lending Type
2699,2699,2017,17.08,EG.FEC.RNEW.ZS,BGR,Bulgaria,Europe & Central Asia,Upper middle income,IBRD
3231,3231,2018,31.689199,EG.FEC.RNEW.ZS,IND,India,South Asia,Lower middle income,IBRD
2573,2573,2017,31.603201,EG.FEC.RNEW.ZS,BGD,Bangladesh,South Asia,Lower middle income,IDA
2478,2478,2012,49.443298,EG.FEC.RNEW.ZS,AGO,Angola,Sub-Saharan Africa,Lower middle income,IBRD
3499,3499,2016,78.673897,EG.FEC.RNEW.ZS,MWI,Malawi,Sub-Saharan Africa,Low income,IDA


In [None]:
df_renewable_energy_consumption = df_renewable_energy_consumption.groupby(["Region"]).value.sum()
df_renewable_energy_consumption = df_renewable_energy_consumption.reset_index()
df_renewable_energy_consumption

Unnamed: 0,Region,value
0,East Asia & Pacific,6426.95789
1,Europe & Central Asia,10159.76558
2,Latin America & Caribbean,7438.931716
3,Middle East & North Africa,886.370204
4,North America,285.972101
5,South Asia,3268.956908
6,Sub-Saharan Africa,27506.435389


In [None]:
fig = px.pie(df_renewable_energy_consumption, values='value', names='Region', title = "Renewable Energy consumption by region")
fig.show()

- In contrast to the fossil fuel consumption graph, Sub-Saharan Africa has the highest renewable energy consumption when compared to all the other regions with a total of 49.1% and North America has the least renewable Energy Consumption records

### Greenhouse Gas Emission by region:

In [None]:
df_greenhouse_gas_emission = df.query("indicator == 'EN.ATM.GHGT.KT.CE'") # creating a data frame specific to greenhouse gas emission
df_greenhouse_gas_emission.sample(5)

Unnamed: 0.1,Unnamed: 0,Year,value,indicator,Country Code,Country Name,Region,Income Group,Lending Type
1487,1487,2012,51250.0,EN.ATM.GHGT.KT.CE,CHE,Switzerland,Europe & Central Asia,High income,Not classified
1106,1106,2018,178640.0,EN.ATM.GHGT.KT.CE,NLD,Netherlands,Europe & Central Asia,High income,Not classified
429,429,2016,1470.0,EN.ATM.GHGT.KT.CE,DJI,Djibouti,Middle East & North Africa,Lower middle income,IDA
205,205,2017,17460.0,EN.ATM.GHGT.KT.CE,BWA,Botswana,Sub-Saharan Africa,Upper middle income,IBRD
1400,1400,2015,501090.0,EN.ATM.GHGT.KT.CE,ZAF,South Africa,Sub-Saharan Africa,Upper middle income,IBRD


In [None]:
df_greenhouse_gas_emission = df_greenhouse_gas_emission.groupby(["Region"]).value.sum()
df_greenhouse_gas_emission = df_greenhouse_gas_emission.reset_index()
df_greenhouse_gas_emission

Unnamed: 0,Region,value
0,East Asia & Pacific,149743170.0
1,Europe & Central Asia,77151910.0
2,Latin America & Caribbean,28464090.0
3,Middle East & North Africa,27901280.0
4,North America,60954130.0
5,South Asia,33008140.0
6,Sub-Saharan Africa,19835920.0


In [None]:
fig = px.pie(df_greenhouse_gas_emission, values='value', names='Region', title = "Greenhouse gas emission by region")
fig.show()

- A surprisingly high contribution towards greenhouse gas emission is seen by the East Asia & Pacific with a total of 37.7%.

### Relation between population density,fossil fuel consumption and greenhouse gas emission:

In [None]:

df_relations = df.groupby(["indicator","Region"]).value.sum()
df_relations = df_relations.reset_index().query("indicator != 'EG.FEC.RNEW.ZS'") # not considering renewable energy consumption indicator
df_relations


# creating a data frame with the follow structrue 
"""
region | indicator1_value | indicator2_value | indicator3_value |
=================================================================
xxx    | xxxxx            | xxxxxx           | xxxxxxxxxx       |
"""

data = []
columns = df_relations.indicator.unique()
for region in df.Region.unique():
    li = []
    li.append(region)
    for indicator in df_relations.indicator.unique():
        dff = df_relations.query(f"Region == '{region}'").query(f"indicator == '{indicator}'")
        try:
            li.append(float(dff.value))
        except:
          pass
    data.append(li)
df_population_fossil_fuel_greenhouse_gas_relation = pd.DataFrame(data)
df_population_fossil_fuel_greenhouse_gas_relation.columns = ["Region", *columns]
df_population_fossil_fuel_greenhouse_gas_relation
        

Unnamed: 0,Region,EG.USE.COMM.FO.ZS,EN.ATM.GHGT.KT.CE,EN.POP.DNST
0,South Asia,1328.998062,33008140.0,44202.455145
1,Europe & Central Asia,19345.555666,77151910.0,319456.010751
2,Middle East & North Africa,8107.266415,27901280.0,68586.107779
3,Sub-Saharan Africa,4254.209111,19835920.0,53924.04833
4,Latin America & Caribbean,7896.366823,28464090.0,92435.011488
5,East Asia & Pacific,6620.593711,149743170.0,431804.961155
6,North America,944.73127,60954130.0,13577.721557


In [None]:
df_population_fossil_fuel_greenhouse_gas_relation.columns = ["Region", "fossil_fuel_consumption", 'greenhouse_gas_emission','populaiton_density'] # renaming columns
df_population_fossil_fuel_greenhouse_gas_relation

Unnamed: 0,Region,fossil_fuel_consumption,greenhouse_gas_emission,populaiton_density
0,South Asia,1328.998062,33008140.0,44202.455145
1,Europe & Central Asia,19345.555666,77151910.0,319456.010751
2,Middle East & North Africa,8107.266415,27901280.0,68586.107779
3,Sub-Saharan Africa,4254.209111,19835920.0,53924.04833
4,Latin America & Caribbean,7896.366823,28464090.0,92435.011488
5,East Asia & Pacific,6620.593711,149743170.0,431804.961155
6,North America,944.73127,60954130.0,13577.721557


In [None]:
fig = px.scatter(df_population_fossil_fuel_greenhouse_gas_relation, 
                 x="populaiton_density",
                 y="fossil_fuel_consumption", 
                 color="Region",
                 size='greenhouse_gas_emission',
                 labels={"populaiton_density":"Population Density (group sum)","fossil_fuel_consumption":"Fossil Fuel Energy Consumption"},
                 height=600,
                 title="Relation between population density,fossil fuel consumption and greenhouse gas emission")
fig.show()

- The graph shows how fossil fuel consumption directly impacts the emission of greenhouse gases despite population density.
North America, despite its low population density has a very high volume of greenhouse gas emission. And East Asia & pacific with comparably less greenhouse gas emission even with highest population density.

### Relation between population density, renewable energy consumption and greenhouse gas emission

In [None]:

df_relations1 = df.groupby(["indicator","Region"]).value.sum()
df_relations1 = df_relations1.reset_index().query("indicator != 'EG.USE.COMM.FO.ZS'") # not considering fossil fuel consumption indicator
df_relations1


# creating a data frame with the follow structrue 
"""
region | indicator1_value | indicator2_value | indicator3_value |
=================================================================
xxx    | xxxxx            | xxxxxx           | xxxxxxxxxx       |
"""

data = []
columns = df_relations1.indicator.unique()
for region in df.Region.unique():
    li = []
    li.append(region)
    for indicator in df_relations1.indicator.unique():
        dff = df_relations1.query(f"Region == '{region}'").query(f"indicator == '{indicator}'")
        try:
            li.append(float(dff.value))
        except:
          pass
    data.append(li)
df_population_renewable_energy_greenhouse_gas_relation = pd.DataFrame(data)
df_population_renewable_energy_greenhouse_gas_relation.columns = ["Region", *columns]
df_population_renewable_energy_greenhouse_gas_relation
        

Unnamed: 0,Region,EG.FEC.RNEW.ZS,EN.ATM.GHGT.KT.CE,EN.POP.DNST
0,South Asia,3268.956908,33008140.0,44202.455145
1,Europe & Central Asia,10159.76558,77151910.0,319456.010751
2,Middle East & North Africa,886.370204,27901280.0,68586.107779
3,Sub-Saharan Africa,27506.435389,19835920.0,53924.04833
4,Latin America & Caribbean,7438.931716,28464090.0,92435.011488
5,East Asia & Pacific,6426.95789,149743170.0,431804.961155
6,North America,285.972101,60954130.0,13577.721557


In [None]:
df_population_renewable_energy_greenhouse_gas_relation.columns = ["Region", "Renewable_energy_consumption", 'greenhouse_gas_emission','populaiton_density']
df_population_renewable_energy_greenhouse_gas_relation

Unnamed: 0,Region,Renewable_energy_consumption,greenhouse_gas_emission,populaiton_density
0,South Asia,3268.956908,33008140.0,44202.455145
1,Europe & Central Asia,10159.76558,77151910.0,319456.010751
2,Middle East & North Africa,886.370204,27901280.0,68586.107779
3,Sub-Saharan Africa,27506.435389,19835920.0,53924.04833
4,Latin America & Caribbean,7438.931716,28464090.0,92435.011488
5,East Asia & Pacific,6426.95789,149743170.0,431804.961155
6,North America,285.972101,60954130.0,13577.721557


In [None]:
fig = px.scatter(df_population_renewable_energy_greenhouse_gas_relation, 
                 x="populaiton_density",
                 y="Renewable_energy_consumption", 
                 color="Region",
                 size='greenhouse_gas_emission',
                 labels={"populaiton_density":"Population Density (group sum)","Renewable_energy_consumption":"Renewable Energy Consumption"},
                 height=600,
                 title="Relation between population density,Renewable energy consumption and greenhouse gas emission")
fig.show()

- The graph shows how renewable energy consumption directly impacts the emission of greenhouse gases despite population density. 
- North America, despite its low population density has a very high volume of greenhouse gas emission due to its very less consumption of renewable energy sources. And Sub-Saharan Africa with comparably less greenhouse gas emission due to its highest usage of renewable resources.

### Change in Fossil fuel consumption, renewable energy consumption and greenhouse gas emission with time:

In [None]:
# change in fossil fuel consumption by region

df_fossil_fule_consumption = df.query("indicator == 'EG.USE.COMM.FO.ZS'")

df_fossil_fuel_region_group_trend = df_fossil_fule_consumption.groupby(["Year","Region"]).value.sum()
df_fossil_fuel_region_group_trend = df_fossil_fuel_region_group_trend.reset_index()
df_fossil_fuel_region_group_trend
fig = px.line(
    df_fossil_fuel_region_group_trend, 
    x="Year", 
    labels={"value":"Fossil Fuel consumption (group sum)"},
    y="value",
    color="Region",
    height=600,
    title="Fossil Fuel Consumption"
)

fig.update_layout(showlegend=True)

fig.show()

In [None]:
# change in renewable energy consumption by region

df_renewable_energy_consumption = df.query("indicator == 'EG.FEC.RNEW.ZS'")

df_renewable_energy_region_group_trend = df_renewable_energy_consumption.groupby(["Year","Region"]).value.sum()
df_renewable_energy_region_group_trend = df_renewable_energy_region_group_trend.reset_index()
df_renewable_energy_region_group_trend
fig = px.line(
    df_renewable_energy_region_group_trend, 
    x="Year", 
    labels={"value":"Renewable Energy consumption (group sum)"},
    y="value",
    color="Region",
    height=600,
    title="Renewable Energy Consumption"
)

fig.update_layout(showlegend=True)

fig.show()

- The above time series graphs show the trends of consumption of fossil fuels and renewable energy during 2010 and 2021

In [None]:
# change in greenhouse gas emission by region

df_ghg_emission = df.query("indicator == 'EN.ATM.GHGT.KT.CE'")

df_ghg_emission_region_group_trend = df_ghg_emission.groupby(["Year","Region"]).value.sum()
df_ghg_emission_region_group_trend = df_ghg_emission_region_group_trend.reset_index()
df_ghg_emission_region_group_trend
fig = px.line(
    df_ghg_emission_region_group_trend, 
    x="Year", 
    labels={"value":"Renewable Energy consumption (group sum)"},
    y="value",
    color="Region",
    height=600,
    title="Renewable Energy Consumption"
)

fig.update_layout(showlegend=True)

fig.show()

- The above time series graph shows the trends of greenhouse gas emission during 2010 and 2021, East Asia & pacific could be seen at the top of the plot with the highest gasses emission rates. 
In a nutshell the graph shows a increase in greenhouse gasses emission with time.


## Regression Analysis

### Impact of fossil fuel consumption on greenhouse gas emission:

In [None]:
df_ffc = df.query("indicator == 'EG.USE.COMM.FO.ZS'")
df_ghge = df.query("indicator == 'EN.ATM.GHGT.KT.CE'")
df_merged = df_ffc.merge(df_ghge,on=["Year","Region"])

fig = px.scatter(df_merged, x="value_x", y="value_y", trendline="ols",trendline_color_override = "red",
                 title = "Impact of fossil fuel consumption on greenhouse gas emission",
                 labels={"value_x":"fossil fuel consumption","value_y":"greenhouse gas emission"})
fig.show()

- The regression graph clearly shows a upward trend indicating to the point that usage of fossil fuels leads to increase in emission of greenhouse gasses.

### Impact of renewable energy consumption on greenhouse gas emission:

In [None]:
df_ffc = df.query("indicator == 'EG.FEC.RNEW.ZS'")
df_ghge = df.query("indicator == 'EN.ATM.GHGT.KT.CE'")
df_merged = df_ffc.merge(df_ghge,on=["Year","Region"])

fig = px.scatter(df_merged, x="value_x", y="value_y", trendline="ols",trendline_color_override = "red",
                 title = "Impact of renewable energy consumption on greenhouse gas emission",
                 labels={"value_x":"renewable energy consumption","value_y":"greenhouse gas emission"})
fig.show()

- The regression graph clearly shows a downward trend indicating to the point that usage of renewable energy resources leads to a decrease in emission of greenhouse gasses.

## Observations 

- It can be observed that as time passes the population density increases
- As population density increases the emission of greenhouse gasses increases
- Consumption of fossil fuels affects the emission of greenhouse gasses by increasing its rates.
- Consumption of renewable energy affects the emission of greenhouse gasses by decreasion its rates.
- Emission of greenhouse gasses increases with increase in nation's land size and is slightly affected by the populatio density.


## Conclusion

Although it is inevitably true that it is not possible to determine 100% facts from just few indicators like population, consumption of fossil fuels and consumption of renewable energy to declare them as the sole contributors to the increase in greenhouse gasses as many other factors like current environmental situaltions, climatic conditions could also be the reasons.

