<div style="float:left">
    <h1 style="width:600px">What factors in the renewable energy use field significantly impact GDP per capita on a global scale?</h1>
</div>
<div style="float:right"><img width="100" src="https://github.com/jreades/i2p/raw/master/img/casa_logo.jpg" /></div>

In [73]:
import datetime
now = datetime.datetime.now()
print("Last executed: " + now.strftime("%Y-%m-%d %H:%M:%S"))

Last executed: 2024-04-22 21:33:42


# 1. Introduction

随着时代的进步，世界各国都在为实现进步做出相应的努力。为了评价国家的进步，有的专家会用民众的幸福指数来分析评价，有的政府会根据幸福指数对政治进行有针对性的调整。根据国民的幸福感水平，编制《2019年世界幸福报告》的学者发现，各国的生活环境（包括社会环境和政治制度）是幸福感的重要来源。人们对生活评价的巨大国际差异，是由于人们之间的接触方式不同、共同的制度和社会规范不同造成的。此外，人们还可以通过幸福指数进一步探讨技术、社会规范、冲突和政府政策的变化如何影响人们的幸福感。这项工作的目的是分析那些对幸福感得分影响较大的因素，从而给出相应的调整方案。我们从 plotly.offline 中导入 download_plotlyjs 、init_notebook_mode 、plot 和 iplot，在世界地图上将各国的幸福指数和自变量可视化。

# 2. EDA (Exploratory data analysis)

In [74]:
import pandas as pd
from sklearn.linear_model import LinearRegression
import statsmodels
import statsmodels.api as sm

import numpy as np

import sklearn
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix

import matplotlib.pyplot as plt
import matplotlib
import seaborn as sn

from time import time

In [75]:
sustainable_energy_G = pd.read_csv('global_data_on_sustainable_energy.csv')

In [76]:
sustainable_energy_G.shape

(3649, 21)

In [77]:
sustainable_energy_G.head(30)

Unnamed: 0,Entity,Year,Access to electricity (% of population),Access to clean fuels for cooking,Renewable-electricity-generating-capacity-per-capita,Financial flows to developing countries (US $),Renewable energy share in the total final energy consumption (%),Electricity from fossil fuels (TWh),Electricity from nuclear (TWh),Electricity from renewables (TWh),...,Primary energy consumption per capita (kWh/person),Energy intensity level of primary energy (MJ/$2017 PPP GDP),Value_co2_emissions_kt_by_country,Renewables (% equivalent primary energy),gdp_growth,gdp_per_capita,Density\n(P/Km2),Land Area(Km2),Latitude,Longitude
0,Afghanistan,2000,1.613591,6.2,9.22,20000.0,44.99,0.16,0.0,0.31,...,302.59482,1.64,760.0,,,,60,652230.0,33.93911,67.709953
1,Afghanistan,2001,4.074574,7.2,8.86,130000.0,45.6,0.09,0.0,0.5,...,236.89185,1.74,730.0,,,,60,652230.0,33.93911,67.709953
2,Afghanistan,2002,9.409158,8.2,8.47,3950000.0,37.83,0.13,0.0,0.56,...,210.86215,1.4,1029.999971,,,179.426579,60,652230.0,33.93911,67.709953
3,Afghanistan,2003,14.738506,9.5,8.09,25970000.0,36.66,0.31,0.0,0.63,...,229.96822,1.4,1220.000029,,8.832278,190.683814,60,652230.0,33.93911,67.709953
4,Afghanistan,2004,20.064968,10.9,7.75,,44.24,0.33,0.0,0.56,...,204.23125,1.2,1029.999971,,1.414118,211.382074,60,652230.0,33.93911,67.709953
5,Afghanistan,2005,25.390894,12.2,7.51,9830000.0,33.88,0.34,0.0,0.59,...,252.06912,1.41,1549.999952,,11.229715,242.031313,60,652230.0,33.93911,67.709953
6,Afghanistan,2006,30.71869,13.85,7.4,10620000.0,31.89,0.2,0.0,0.64,...,304.4209,1.5,1759.99999,,5.357403,263.733602,60,652230.0,33.93911,67.709953
7,Afghanistan,2007,36.05101,15.3,7.25,15750000.0,28.78,0.2,0.0,0.75,...,354.2799,1.53,1769.999981,,13.82632,359.693158,60,652230.0,33.93911,67.709953
8,Afghanistan,2008,42.4,16.7,7.49,16170000.0,21.17,0.19,0.0,0.54,...,607.8335,1.94,3559.999943,,3.924984,364.663542,60,652230.0,33.93911,67.709953
9,Afghanistan,2009,46.74005,18.4,7.5,9960000.0,16.53,0.16,0.0,0.78,...,975.04816,2.25,4880.000114,,21.390528,437.26874,60,652230.0,33.93911,67.709953


CSV文件包含以下列名，下面是每列名的详细中文解释：

1. `Entity` - 实体：通常指的是国家或地区的名称。
2. `Year` - 年份：数据记录的年份。
3. `Access to electricity (% of population)` - 通电率（%人口）：表示有多少百分比的人口能够获得电力。
4. `Access to clean fuels for cooking` - 烹饪用清洁燃料的可获得性：衡量人们获得用于烹饪的清洁燃料的程度。
5. `Renewable-electricity-generating-capacity-per-capita` - 人均可再生电力发电容量：每人拥有的可再生电力发电能力。
6. `Financial flows to developing countries (US $)` - 流向发展中国家的资金（美元）：以美元计量，流向发展中国家的金融资金总额。
7. `Renewable energy share in the total final energy consumption (%)` - 在最终能源消费总量中可再生能源的份额（%）：可再生能源在总能源消费中所占的百分比。
8. `Electricity from fossil fuels (TWh)` - 来自化石燃料的电力（千亿瓦时）：通过燃烧化石燃料发电的总量。
9. `Electricity from nuclear (TWh)` - 核能发电（千亿瓦时）：通过核能发电产生的电力总量。
10. `Electricity from renewables (TWh)` - 来自可再生能源的电力（千亿瓦时）：通过可再生能源发电的总量。
11. `Low-carbon electricity (% electricity)` - 低碳电力（%电力）：在所有电力中，低碳（如可再生能源和核能）电力所占的百分比。
12. `Primary energy consumption per capita (kWh/person)` - 人均一次能源消费（千瓦时/人）：每个人平均消费的一次性能源量。
13. `Energy intensity level of primary energy (MJ/$2017 PPP GDP)` - 一次能源的能源强度水平（兆焦/2017年购买力平价GDP美元）：能源消耗量与产出（GDP）之间的比率，反映了经济活动的能源效率。
14. `Value_co2_emissions_kt_by_country` - 各国二氧化碳排放量（千吨）：该国家或地区在特定年份内排放的二氧化碳总量。
15. `Renewables (% equivalent primary energy)` - 可再生能源（%相当于一次能源）：在一次能源消费中，以可再生能源计算的百分比。
16. `gdp_growth` - GDP增长率：国内生产总值的增长率。
17. `gdp_per_capita` - 人均GDP：人均国内生产总值。
18. `Density\n(P/Km2)` - 人口密度（人/平方公里）：每平方公里的人口数量。
19. `Land Area(Km2)` - 土地面积（平方公里）：该国家或地区的总土地面积。
20. `Latitude` - 纬度：地理坐标中的纬度值。
21. `Longitude` - 经度：地理坐标中的经度值。

请注意，部分列名可能包含换行符`\n`或特定的缩写，例如`Density\n(P/Km2)`中的`P/Km2`实际上意味着人口密度的测量单位是人/平方公里。

In [78]:
# NA value counts
na_counts = sustainable_energy_G.isna().sum()
na_counts

Entity                                                                 0
Year                                                                   0
Access to electricity (% of population)                               10
Access to clean fuels for cooking                                    169
Renewable-electricity-generating-capacity-per-capita                 931
Financial flows to developing countries (US $)                      2089
Renewable energy share in the total final energy consumption (%)     194
Electricity from fossil fuels (TWh)                                   21
Electricity from nuclear (TWh)                                       126
Electricity from renewables (TWh)                                     21
Low-carbon electricity (% electricity)                                42
Primary energy consumption per capita (kWh/person)                     0
Energy intensity level of primary energy (MJ/$2017 PPP GDP)          207
Value_co2_emissions_kt_by_country                  

In [79]:
 # 替换成你需要移除的列名列表
columns_to_drop = ['Financial flows to developing countries (US $)', 'Renewables (% equivalent primary energy)'] 

# 移除列
sustainable_energy_G = sustainable_energy_G.drop(columns=columns_to_drop)

sustainable_energy_G = sustainable_energy_G[sustainable_energy_G['Entity'] != 'French Guiana']

sustainable_energy_G.shape

(3648, 19)

In [80]:
sustainable_energy_G.head(5)

Unnamed: 0,Entity,Year,Access to electricity (% of population),Access to clean fuels for cooking,Renewable-electricity-generating-capacity-per-capita,Renewable energy share in the total final energy consumption (%),Electricity from fossil fuels (TWh),Electricity from nuclear (TWh),Electricity from renewables (TWh),Low-carbon electricity (% electricity),Primary energy consumption per capita (kWh/person),Energy intensity level of primary energy (MJ/$2017 PPP GDP),Value_co2_emissions_kt_by_country,gdp_growth,gdp_per_capita,Density\n(P/Km2),Land Area(Km2),Latitude,Longitude
0,Afghanistan,2000,1.613591,6.2,9.22,44.99,0.16,0.0,0.31,65.95744,302.59482,1.64,760.0,,,60,652230.0,33.93911,67.709953
1,Afghanistan,2001,4.074574,7.2,8.86,45.6,0.09,0.0,0.5,84.745766,236.89185,1.74,730.0,,,60,652230.0,33.93911,67.709953
2,Afghanistan,2002,9.409158,8.2,8.47,37.83,0.13,0.0,0.56,81.159424,210.86215,1.4,1029.999971,,179.426579,60,652230.0,33.93911,67.709953
3,Afghanistan,2003,14.738506,9.5,8.09,36.66,0.31,0.0,0.63,67.02128,229.96822,1.4,1220.000029,8.832278,190.683814,60,652230.0,33.93911,67.709953
4,Afghanistan,2004,20.064968,10.9,7.75,44.24,0.33,0.0,0.56,62.92135,204.23125,1.2,1029.999971,1.414118,211.382074,60,652230.0,33.93911,67.709953


### 第二个数据集

In [84]:
Renewable_E_P = pd.read_csv('share-electricity-renewables.csv')
Renewable_E_P.head(5)

Unnamed: 0,Entity,Code,Year,Renewables - % electricity
0,ASEAN (Ember),,2000,19.770159
1,ASEAN (Ember),,2001,19.301565
2,ASEAN (Ember),,2002,17.929144
3,ASEAN (Ember),,2003,16.870672
4,ASEAN (Ember),,2004,15.841829


In [85]:
Renewable_E_P = Renewable_E_P.drop(columns= "Code")
Renewable_E_P = Renewable_E_P.rename(columns= {"Renewables - % electricity" : "Renewable_E_P"})
Renewable_E_P.head(5)

Unnamed: 0,Entity,Year,Renewable_E_P
0,ASEAN (Ember),2000,19.770159
1,ASEAN (Ember),2001,19.301565
2,ASEAN (Ember),2002,17.929144
3,ASEAN (Ember),2003,16.870672
4,ASEAN (Ember),2004,15.841829


In [86]:
Renewable_E_P.shape

(6834, 3)

In [96]:
# 删除年份在 2000-2020 之外的行
Renewable_E_P = Renewable_E_P[(Renewable_E_P['Year'] >= 2000) & (Renewable_E_P['Year'] <= 2020)]

Renewable_E_P.shape

(5119, 3)

In [97]:
sustainable_energy_G.shape

(3606, 19)

In [98]:
# 找出df2不同于df1的Entity名称
diff_entities_df2 = Renewable_E_P[~Renewable_E_P['Entity'].isin(sustainable_energy_G['Entity'])]
diff_entities_df2.shape

(1503, 3)

In [90]:
diff_entities_df2["Entity"].unique()

array(['ASEAN (Ember)', 'Africa', 'Africa (EI)', 'Africa (Ember)',
       'American Samoa', 'Asia', 'Asia (Ember)', 'Asia Pacific (EI)',
       'Bolivia', 'British Virgin Islands', 'Brunei', 'CIS (EI)',
       'Cape Verde', 'Central America (EI)', 'Cook Islands',
       "Cote d'Ivoire", 'Democratic Republic of Congo', 'East Timor',
       'Eastern Africa (EI)', 'Europe', 'Europe (EI)', 'Europe (Ember)',
       'European Union (27)', 'Falkland Islands', 'Faroe Islands',
       'French Guiana', 'French Polynesia', 'G20 (Ember)', 'G7 (Ember)',
       'Greenland', 'Guadeloupe', 'Guam', 'High-income countries',
       'Hong Kong', 'Iran', 'Kosovo', 'Laos',
       'Latin America and Caribbean (Ember)', 'Low-income countries',
       'Lower-middle-income countries', 'Macao', 'Martinique',
       'Middle Africa (EI)', 'Middle East (EI)', 'Middle East (Ember)',
       'Moldova', 'Montserrat', 'Non-OECD (EI)', 'North America',
       'North America (EI)', 'North America (Ember)', 'North Korea',


In [91]:
diff_entities_df1 = sustainable_energy_G[~sustainable_energy_G['Entity'].isin(Renewable_E_P['Entity'])]
diff_entities_df1.shape

(42, 19)

In [92]:
# 找出df1不同于df2的Entity名称
diff_entities_df1["Entity"].unique()

array(['Bermuda', 'Tuvalu'], dtype=object)

In [93]:
na_counts_Bermuda = sustainable_energy_G[sustainable_energy_G["Entity"] == "Bermuda"].isna().sum()
na_counts_Bermuda

Entity                                                               0
Year                                                                 0
Access to electricity (% of population)                              0
Access to clean fuels for cooking                                   21
Renewable-electricity-generating-capacity-per-capita                21
Renewable energy share in the total final energy consumption (%)     1
Electricity from fossil fuels (TWh)                                  0
Electricity from nuclear (TWh)                                       0
Electricity from renewables (TWh)                                    0
Low-carbon electricity (% electricity)                              21
Primary energy consumption per capita (kWh/person)                   0
Energy intensity level of primary energy (MJ/$2017 PPP GDP)          1
Value_co2_emissions_kt_by_country                                    4
gdp_growth                                                           0
gdp_pe

In [94]:
na_counts_Tuvalu = sustainable_energy_G[sustainable_energy_G["Entity"] == "Tuvalu"].isna().sum()
na_counts_Tuvalu

Entity                                                               0
Year                                                                 0
Access to electricity (% of population)                              0
Access to clean fuels for cooking                                    0
Renewable-electricity-generating-capacity-per-capita                 0
Renewable energy share in the total final energy consumption (%)     1
Electricity from fossil fuels (TWh)                                 21
Electricity from nuclear (TWh)                                      21
Electricity from renewables (TWh)                                   21
Low-carbon electricity (% electricity)                              21
Primary energy consumption per capita (kWh/person)                   0
Energy intensity level of primary energy (MJ/$2017 PPP GDP)          1
Value_co2_emissions_kt_by_country                                    1
gdp_growth                                                           0
gdp_pe

In [95]:
sustainable_energy_G = sustainable_energy_G[(sustainable_energy_G['Entity'] != 'Bermuda') & (sustainable_energy_G['Entity'] != 'Tuvalu')]
sustainable_energy_G.shape

(3606, 19)

In [100]:
df_merged = pd.merge(sustainable_energy_G, Renewable_E_P[['Entity', 'Year', 'Renewable_E_P']], on=['Entity', 'Year'], how='left')

# 打印结果
df_merged.shape


(3606, 20)

In [102]:
df_merged.head(5)

Unnamed: 0,Entity,Year,Access to electricity (% of population),Access to clean fuels for cooking,Renewable-electricity-generating-capacity-per-capita,Renewable energy share in the total final energy consumption (%),Electricity from fossil fuels (TWh),Electricity from nuclear (TWh),Electricity from renewables (TWh),Low-carbon electricity (% electricity),Primary energy consumption per capita (kWh/person),Energy intensity level of primary energy (MJ/$2017 PPP GDP),Value_co2_emissions_kt_by_country,gdp_growth,gdp_per_capita,Density\n(P/Km2),Land Area(Km2),Latitude,Longitude,Renewable_E_P
0,Afghanistan,2000,1.613591,6.2,9.22,44.99,0.16,0.0,0.31,65.95744,302.59482,1.64,760.0,,,60,652230.0,33.93911,67.709953,65.95744
1,Afghanistan,2001,4.074574,7.2,8.86,45.6,0.09,0.0,0.5,84.745766,236.89185,1.74,730.0,,,60,652230.0,33.93911,67.709953,84.745766
2,Afghanistan,2002,9.409158,8.2,8.47,37.83,0.13,0.0,0.56,81.159424,210.86215,1.4,1029.999971,,179.426579,60,652230.0,33.93911,67.709953,81.159424
3,Afghanistan,2003,14.738506,9.5,8.09,36.66,0.31,0.0,0.63,67.02128,229.96822,1.4,1220.000029,8.832278,190.683814,60,652230.0,33.93911,67.709953,67.02128
4,Afghanistan,2004,20.064968,10.9,7.75,44.24,0.33,0.0,0.56,62.92135,204.23125,1.2,1029.999971,1.414118,211.382074,60,652230.0,33.93911,67.709953,62.92135


# Conclusion

In this workshop, we have practiced PCA, kernel PCA, and LLE on the Boston housing data. 

You can use these methods on your own dataset, especially when the data have a high dimension and are difficult to understand and analyse.

If you are interested, you can learn other methods for dimensionality reduction, such as t-SNE and UMAP. The resources below would be useful.

# Resources

- Rethinking 'distance' in New York City *Medium* [URL](https://medium.com/topos-ai/rethinking-distance-in-new-york-city-d17212d24919)
- Five Boroughs for the 21st Century *Medium* [URL](https://medium.com/topos-ai/five-boroughs-for-the-21st-century-8da941f53618)
- [Curse of Dimensionality on Wikipedia](https://en.wikipedia.org/wiki/Curse_of_dimensionality)
- [The Curse of Dimensionality](https://towardsdatascience.com/the-curse-of-dimensionality-50dc6e49aa1e)
- [Importance of Feature Scaling](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html)
- [Understanding PCA](https://towardsdatascience.com/understanding-pca-fae3e243731d)
- [Introduction to t-SNE in Python](https://www.datacamp.com/community/tutorials/introduction-t-sne)
- [Visualising Data Using Embeddings - a lecture on t-SNE](https://www.youtube.com/watch?v=EMD106bB2vY) (Video)
- [StatQuest: t-SNE, Clearly Explained](https://www.youtube.com/watch?v=NEaUSP4YerM) (Video)
- [How to Use t-SNE Effectively](https://distill.pub/2016/misread-tsne/)
- [How to tune the Hyperparameters of t-SNE](https://towardsdatascience.com/how-to-tune-hyperparameters-of-tsne-7c0596a18868)
- [Understanding UMAP](https://pair-code.github.io/understanding-umap/) (Compares to t-SNE)
- [How UMAP Works](https://towardsdatascience.com/how-exactly-umap-works-13e3040e1668)
- [3 New Techniques for Data-Dimensionality Reduction in ML](https://thenewstack.io/3-new-techniques-for-data-dimensionality-reduction-in-machine-learning/)
- [UMAP for Dimensionality Reduction](https://www.youtube.com/watch?v=nq6iPZVUxZU) (Video)
- [A Bluffer's Guide to Dimensionality Reduction](https://www.youtube.com/watch?v=9iol3Lk6kyU) (Video)

# Acknowledgements

This workshop is partly based on a [notebook](https://github.com/jreades/i2p/blob/master/lectures/9.3-Dimensionality.md) developed by [Jon Reades](https://github.com/jreades), which is part of the CASA00013 module.