# **Plotting Maps Using World Happiness Index**

**Please upvote if you find this helpful and comment any improvements or questions! Feel free to check out some of my other notebooks!**

Plotting graphs for EDAs are a common thing in machine learning and data science tasks and using world maps as EDAs takes this one step further. I will be using Geopandas in order to show how to plot maps from data, depicting the data in variations of colors, as well as showing what we can learn from these maps about the world. In here you will find:

* Detailed Explanations On How To Create Map Graphs
* Sample Code and Walkthroughs Using the World Happiness Data

![](https://miro.medium.com/max/3720/1*HevTonUoRkTNolFPO2P8Kw.png)



# **Importing Libraries and Preparing Datasets**

First, let's import several libraries that will help us. Most importantly, we have to import the Geopandas library, which will help us create a pandas Dataframe from "map data".

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("darkgrid")
import math
import geopandas as gpd

We will also have to load in the data. I will be using the World Happiness Index Reports as my data, so let's load those in first.

In [None]:
data_2015 = pd.read_csv('../input/world-happiness/2015.csv')
data_2016 = pd.read_csv('../input/world-happiness/2016.csv')
data_2017 = pd.read_csv('../input/world-happiness/2017.csv')
data_2018 = pd.read_csv('../input/world-happiness/2018.csv')
data_2019 = pd.read_csv('../input/world-happiness/2019.csv')

data_2019.head(5)

For the "map", we will have to import a shapefile ('.shp'). This can be downloaded from various web sources online. I found the world map shapefile from [here](https://hub.arcgis.com/). Just upload the zip file into the Kaggle input section, and then use the geopands read_file function to import the shp file, similar to how we did for the .csv files.

In [None]:
map_df = gpd.read_file('../input/shapefile/World_Countries__Generalized_.shp')

map_df.plot()

There's our map! But first of, we need to standardize a couple names across the two Dataframes. There are several discrepencies on how certain countries are named, such as the "Russian Federation" instead of "Russia", or "Trinidad and Tobago" versus "Trinidad & Tobago". Let's handle this problem first.

In [None]:
map_df = map_df.replace({'Russian Federation':'Russia',
                        'Trinidad and Tobago': 'Trinidad & Tobago',
                        "Côte d'Ivoire": 'Ivory Coast',
                        'Congo': 'Congo (Brazzaville)',
                        'Congo DRC':'Congo (Kinshasa)',
                        'Palestinian Territory':'Palestinian Territories'})

Also, to simplify the data a little, I will create a new Dataframe with only the country names and their happiness score, as we will be visualizing that first. I will also be renaming one of the columns to make it easier to call on later on.

In [None]:
data_2019 = data_2019.rename(index = str, columns = {'Country or region':"Country"})
df = data_2019[['Country','Score']]

df.head(5)

Great! Now let's merge the map_df dataframe with the world happiness score dataframe. 

In [None]:
merged = map_df.set_index('COUNTRY').join(df.set_index('Country'))

merged.head(5)

Woah. That's a lot of NaN values! Let's handle that quickly. As you can see, there are many cases where territories and colonies are listed on the world map, but don't have a happiness index as they fit in with other countries, such as the American Samoa. Luckily, the map_df gave a "Country Affilation", which helps us handle many of these missing values.

In [None]:
print(merged.Score.isnull().sum())

First, I want to create a dictionary, letting the key be the country name and the key values being the happiness scores. I will also be appending several extra countries to the dictionary, since their information were not included in the 2019 dataset and their data were taken from previous happiness reports. These countries include:

* Maldives (2019)
* Oman (2016)
* Sudan (2017)
* Djibouti (2014)
* Angola (2017)
* Belize (2017)

In [None]:
df = df.set_index('Country').T.to_dict('list')
updates = {'Maldives':[5.20],
          'Oman':[6.853],
          'Sudan':[4.14],
          'Djibouti':[4.37],
          'Angola':[3.80],
          'Belize':[5.95599985122681]
          }
df.update(updates)

df

Finally, I will create a simple for loop, which checks if the score is NaN, and will replace it if the Country Affilation is within the dictionary. 

In [None]:
for i in range (len(merged)):
    if math.isnan(merged['Score'][i]):
        if (str(merged['COUNTRYAFF'][i])) in df:
            merged['Score'][i] = float(df[str(merged['COUNTRYAFF'][i])][0])

merged.head(5)

Looks like most of our NaN values have been taken care of! Of course there are still a good amount of countries who weren't included in the 2019 World Happiness Report, or previous reports, so they will remain as NaN in our dataset. 

In [None]:
print(merged.Score.isnull().sum())

This is the full list of unincluded countries: 
* Samoa
* Tonga
* Fiji
* Antigua and Barbuda
* Aruba
* Bahamas
* Barbados
* Cuba 
* Dominica
* Grenada
* Guyana
* Saint Kitts and Nevis
* Saint Lucia
* Saint Vincent and the Grenadines
* Suriname
* Cabo Verde
* Guinea-Bissau
* Sao Tome and Principe
* Eswatini
* Seychelles
* Equatorial Guinea
* Kiribati
* Eritrea
* Andorra
* Liechtenstein
* Monaco
* San Marino
* Vatican City
* Timor-Leste
* Nauru
* Papua New Guinea
* Solomon Islands
* Tuvalu
* Vanuatu
* Brunei Darussalam
* North Korea
* Marshall Islands
* Micronesia
* Palau

# **Plotting The World Map**

Now for the fun part ... the graphing. First let us define a couple variables that will be helpful later on. Our "variable" variable will be the data we want to represent in the world map, using different colors, and the vmin and vmax variables will represent the maximum and minimum values represented in our "variable" data.

In [None]:
variable = 'Score'

vmin, vmax = 2.853,7.021 

Next, we will define our subplots, figure size, and plot our merged dataframe. We will define the "column" paramater to the "variable" variable, and you can adjust the edge color and line width to your liking. I chose to use the cmap "viridis" because it produces colors from yellow to purple, with yellow representing happier countries and purple representing less happy countries.

In [None]:
fig,ax = plt.subplots(1,figsize = (40,24))

merged.plot(column = variable, cmap = 'viridis', linewidth = 0.3, ax=ax,edgecolor = '0.5')

The map looks great already! But there are a couple more things we can add. First off, I will remove the axis and grids, as they don't provide any important information to this graph. Also I will add a quick title, as well as a scale on the side of the map. This scale will help viewers distinguish the color differences on the graph and help with understandability and readability. 

In [None]:
fig,ax = plt.subplots(1,figsize = (40,24))

merged.plot(column = variable, cmap = 'viridis', linewidth = 0.3, ax=ax,edgecolor = '0.5')

ax.axis('off')

ax.set_title('World Happiness Score',fontdict = {'fontsize':'40'})


sm = plt.cm.ScalarMappable(cmap='viridis',norm = plt.Normalize(vmin = vmin, vmax=vmax))
sm._A = []
cbar = fig.colorbar(sm)

merged.plot(column = variable, cmap = 'viridis', linewidth = 0.3, ax=ax,edgecolor = '0.5')

Beautiful! Our world map is now complete. Notice that countries where we have no data, such as Antarctica, North Korea, and Cuba, are depicted as white, as the NaN values are not taken into consideration in the cmap scale.

# **What Does This Tell Us**

Analyzing this map can show us multiple different things. First off, we can see how there is a lot of missing data specfically around the Oceania and Pacific Ocean region, which could be a focus for the World Happiness Index Report in future studies. Also we notice that the bulk of the happiest countries are located in the Northern Hemisphere, centered around North America and Northern Europe. Maybe colder weather and Northern environment can be an indication or hint as to how to improve happiness among countries as a whole. On the other hand, we notice the unhappiest countries are located in the Southern Hemisphere, centralized around Africa, which could be a focus for humanitarian efforts in the future.

There are also some anomolies that can noted here. First off, we notice that regions tend to share similar happiness scores, as bordering countries typically have very similar color schemes. Australia, though, is an exception, as it is by far the happiness country about of Oceania, Asia, and Africa. Likewise, we can notice heavy distinctions in the Middle East, with Saudi Arabia and Oman being a vastly lighter color than their neighbors, Yemen, Jordan, Iraq, etc. 

All in all, though, this map can provide helpful insight on how geography and environment may affect happiness around the world, as we continue to study it.

# **Some More Plots**

Now that you know how to plot heatmaps using maps, I will be doing a couple other visualizations using the World Happiness Data. I will include my analysis of the graphs below it. (Disclaimer: I will not be inlcuding the added in data)

In [None]:
variable = 'Perceptions of corruption'

df = data_2019[['Country',variable]]

print(df.head(5))

merged = map_df.set_index('COUNTRY').join(df.set_index('Country'))

df = df.set_index('Country').T.to_dict('list')

for i in range (len(merged)):
    if math.isnan(merged[variable][i]):
        if (str(merged['COUNTRYAFF'][i])) in df:
            merged[variable][i] = float(df[str(merged['COUNTRYAFF'][i])][0])

vmin, vmax = merged[variable].min(),merged[variable].max()

fig,ax = plt.subplots(1,figsize = (40,24))

merged.plot(column = variable, cmap = 'viridis', linewidth = 0.3, ax=ax,edgecolor = '0.5')

ax.axis('off')

ax.set_title(variable,fontdict = {'fontsize':'40'})


sm = plt.cm.ScalarMappable(cmap='viridis',norm = plt.Normalize(vmin = vmin, vmax=vmax))
sm._A = []
cbar = fig.colorbar(sm)

merged.plot(column = variable, cmap = 'viridis', linewidth = 0.3, ax=ax,edgecolor = '0.5')

Wow! Looks like most of the world has a high sense of corruption! It is interesting to see how from America, to South America, to Africa, to even Russia, the color schemes are very similar, and only select countries - Canada, Greenland, New Zealand - have low levels of corruption. It seems obvious that corruption is not a good factor for determining happiness, as all the countries are struggling in terms of corruption. It is interesting to note how the Nordic Countries continue to maintain the lowest levels of corruption, similar to how to have extremely high happiness scores. Again the Northern countries seem to excel in low levels of corruption, as they did with high levels of happiness.

In [None]:
variable = 'Generosity'

df = data_2019[['Country',variable]]

print(df.head(5))

merged = map_df.set_index('COUNTRY').join(df.set_index('Country'))

df = df.set_index('Country').T.to_dict('list')

for i in range (len(merged)):
    if math.isnan(merged[variable][i]):
        if (str(merged['COUNTRYAFF'][i])) in df:
            merged[variable][i] = float(df[str(merged['COUNTRYAFF'][i])][0])

vmin, vmax = merged[variable].min(),merged[variable].max()

fig,ax = plt.subplots(1,figsize = (40,24))

merged.plot(column = variable, cmap = 'viridis', linewidth = 0.3, ax=ax,edgecolor = '0.5')

ax.axis('off')

ax.set_title(variable,fontdict = {'fontsize':'40'})


sm = plt.cm.ScalarMappable(cmap='viridis',norm = plt.Normalize(vmin = vmin, vmax=vmax))
sm._A = []
cbar = fig.colorbar(sm)

merged.plot(column = variable, cmap = 'viridis', linewidth = 0.3, ax=ax,edgecolor = '0.5')

We see an interesting trend here, as even the Nordic Countries are mediocre in terms of generosity compared to the other states. Interestingly enough, we see the South East Asian countries excelling in terms of generosity, despite falling very low in terms of the happiness index. It is interesting how generosity does not correlate directly to happiness. It seems like the old saying, on how happiness isn't dictated by material needs, may not be scientifically true, as the world's happiest countries have mediocre generosity. 

In [None]:
variable = 'Freedom to make life choices'

df = data_2019[['Country',variable]]

print(df.head(5))

merged = map_df.set_index('COUNTRY').join(df.set_index('Country'))

df = df.set_index('Country').T.to_dict('list')

for i in range (len(merged)):
    if math.isnan(merged[variable][i]):
        if (str(merged['COUNTRYAFF'][i])) in df:
            merged[variable][i] = float(df[str(merged['COUNTRYAFF'][i])][0])

vmin, vmax = merged[variable].min(),merged[variable].max()

fig,ax = plt.subplots(1,figsize = (40,24))

merged.plot(column = variable, cmap = 'viridis', linewidth = 0.3, ax=ax,edgecolor = '0.5')

ax.axis('off')

ax.set_title(variable,fontdict = {'fontsize':'40'})


sm = plt.cm.ScalarMappable(cmap='viridis',norm = plt.Normalize(vmin = vmin, vmax=vmax))
sm._A = []
cbar = fig.colorbar(sm)

merged.plot(column = variable, cmap = 'viridis', linewidth = 0.3, ax=ax,edgecolor = '0.5')

We see the opposite effect of the generosity graph here, as most of the world is above average in terms of freedom in life choices. This is an interesting trend again, as freedom does not directly correlate to happiness either. Also, we see less of the similarities between bordering countries, as evident in Africa, where one country could have the color purple and the next could be light green. This contrasts suggests that freedom in life choices is very muuch dependent on individual country's jurisdictions and governments, rather than geographic locations, as hypothesized earlier regarding happiness. But again, following the theme from the last map, it is interesting how again, an unmaterialistic value, such as freedom ,is not a great indicator of happiness.

In [None]:
variable = 'Healthy life expectancy'

df = data_2019[['Country',variable]]

print(df.head(5))

merged = map_df.set_index('COUNTRY').join(df.set_index('Country'))

df = df.set_index('Country').T.to_dict('list')

for i in range (len(merged)):
    if math.isnan(merged[variable][i]):
        if (str(merged['COUNTRYAFF'][i])) in df:
            merged[variable][i] = float(df[str(merged['COUNTRYAFF'][i])][0])

vmin, vmax = merged[variable].min(),merged[variable].max()

fig,ax = plt.subplots(1,figsize = (40,24))

merged.plot(column = variable, cmap = 'viridis', linewidth = 0.3, ax=ax,edgecolor = '0.5')

ax.axis('off')

ax.set_title(variable,fontdict = {'fontsize':'40'})


sm = plt.cm.ScalarMappable(cmap='viridis',norm = plt.Normalize(vmin = vmin, vmax=vmax))
sm._A = []
cbar = fig.colorbar(sm)

merged.plot(column = variable, cmap = 'viridis', linewidth = 0.3, ax=ax,edgecolor = '0.5')

In [None]:
variable = 'Social support'

df = data_2019[['Country',variable]]

print(df.head(5))

merged = map_df.set_index('COUNTRY').join(df.set_index('Country'))

df = df.set_index('Country').T.to_dict('list')

for i in range (len(merged)):
    if math.isnan(merged[variable][i]):
        if (str(merged['COUNTRYAFF'][i])) in df:
            merged[variable][i] = float(df[str(merged['COUNTRYAFF'][i])][0])

vmin, vmax = merged[variable].min(),merged[variable].max()

fig,ax = plt.subplots(1,figsize = (40,24))

merged.plot(column = variable, cmap = 'viridis', linewidth = 0.3, ax=ax,edgecolor = '0.5')

ax.axis('off')

ax.set_title(variable,fontdict = {'fontsize':'40'})


sm = plt.cm.ScalarMappable(cmap='viridis',norm = plt.Normalize(vmin = vmin, vmax=vmax))
sm._A = []
cbar = fig.colorbar(sm)

merged.plot(column = variable, cmap = 'viridis', linewidth = 0.3, ax=ax,edgecolor = '0.5')

So far, this is the most similar representation of the World Happiness Map as shown above, but many countries continue to excel, or at least be above average. Again, this is a poor seperator of happiness, because countries in Africa and South Asia, which tend to be underdeveloped, have poor showings in terms of life expectancy. Either way though, life expectancy seems like defenitely an important consideration for the happiness score, though greater research into the region's healthcare and infrastructure could shed some more light on the issue.

In [None]:
variable = 'GDP per capita'

df = data_2019[['Country',variable]]

print(df.head(5))

merged = map_df.set_index('COUNTRY').join(df.set_index('Country'))

df = df.set_index('Country').T.to_dict('list')

for i in range (len(merged)):
    if math.isnan(merged[variable][i]):
        if (str(merged['COUNTRYAFF'][i])) in df:
            merged[variable][i] = float(df[str(merged['COUNTRYAFF'][i])][0])

vmin, vmax = merged[variable].min(),merged[variable].max()

fig,ax = plt.subplots(1,figsize = (40,24))

merged.plot(column = variable, cmap = 'viridis', linewidth = 0.3, ax=ax,edgecolor = '0.5')

ax.axis('off')

ax.set_title(variable,fontdict = {'fontsize':'40'})


sm = plt.cm.ScalarMappable(cmap='viridis',norm = plt.Normalize(vmin = vmin, vmax=vmax))
sm._A = []
cbar = fig.colorbar(sm)

merged.plot(column = variable, cmap = 'viridis', linewidth = 0.3, ax=ax,edgecolor = '0.5')

Surprisingly, this is the most similar to the happiness score. This would show that money and GDP is the greatest indicator of happiness within a country, which is an interesting find. A greater GDP could just indicate better infrastructure and development in the country so that could be a heavy indicator of happiness as well. Either way, though, GDP seems like a good board generalizer, but on a closer level, other factors may come into play more, as America is ranked higher here than the Nordic Countries, even though the later clearly ranks higher than the US in happiness levels.

**That's all for this kernel. Thank you for viewing!!**