# Visualization of correlation (Global)

This notebook has to be read next to the following document (url), and is also complemented by the following notebooks (url).

In [1]:
import pandas as pd
import os
import numpy as np
import warnings
warnings.filterwarnings("ignore")
import ipywidgets as widgets
from ipywidgets import Layout
from ipywidgets import interact, interact_manual
import plotly.express as px

In [2]:
df= pd.read_csv (os.getcwd()+'/Data/'+'GoldenDataFrame.csv')

In [3]:
columns=['Country','Year','Exports-Commercial services','Renewable electricity','Employment-agriculture','Employment-industry','Employment-services','Exports-G&S','Fertility rate','Foreign investment','GDP','Education GExp','Workers high education','Literacy rate','Net migration','Mortality-infants','Health services use','R&D GExp','Ninis','Suicide','International taxes','Alcohol per capita']
clist=df['Country'].unique()
countries_by_region = {
    "Europe": ('DEU','FRA','SWE','GBR','ESP','HRV','POL','GRC','AUT','NLD'),
    'Persian Gulf': ('IRQ','QAT','ARE','SAU','AZE','YEM','YDR','OMN'),
    'North Africa':('DZA','EGY','LBY','ISR','TUR','MAR'),
    'South Africa':('SEN','ZAF','LBR','MOZ','CMR','NGA','GHA'),
    'Asia':('BGD','IND','VNM','THA','IDN','PHL','KOR'),
    'Latam':('MEX','BRA','ARG','PER','VEN','COL','CHL','PCZ','CRI'),
    'Pair':('USA','CHN')
    }

all_countries = {}
for region in countries_by_region.keys():
  for country in countries_by_region[region]:
    all_countries[country] = region

In the following cell, we have defined a function that will allow us to calculate the different posibilities of relations: cuadratic, cubic and logaritmic.

In [4]:
def multcolumn(frame):
    for u in range(3, len(columns)):
        name2=columns[u]+'.^2'
        name3=columns[u]+'.^3'
        namelog=columns[u]+'.log'
        frame.loc[:,name2] = frame[columns[u]]**2
        frame.loc[:,name3] = frame[columns[u]]**3
        frame.loc[:,namelog] = np.log(frame[columns[u]])

Moreover, we want to know the correlation between all the variables, so to acomplish this, we have created the following loop, which will help us create a new dataframe where we will have: the indicator, the type of relation, the value of the r^2, its behaviour, the country and the continent.

In [5]:
multcolumn(df)
demo2=pd.DataFrame()
for i in range(0,len(clist)):
    dat=df.loc[df.loc[:, 'Country'] == clist[i]]
    cor=dat.corr()    
    cor.loc[:,'GDP-R^2'] = cor['GDP']**2
    cor.loc[:,'Indicator']=cor.index
    cor[['Indicator','Type']]=cor.Indicator.str.split('.',expand=True)  
    corcolumn=cor[['Indicator','Type','GDP-R^2','GDP']]
    corcolumn=corcolumn.loc[corcolumn.loc[:, 'GDP-R^2'] >= 0.75]
    id=corcolumn.groupby('Indicator')['GDP-R^2'].transform(max)==corcolumn['GDP-R^2']
    corcolumn[id]
    max_df=pd.DataFrame(corcolumn[id])
    max_df['Behaviour']=np.where(max_df['GDP']>0, 'Positive', 'Negative')
    max_df['Type']=max_df['Type'].replace(['^2','^3','log'],['Cuadratic','Cubic','Logarithmic'])
    max_df['Country']= clist[i]
    max_df.drop("GDP",axis=1,inplace=True)
    max_df=max_df.reset_index(drop=True)
    max_df = max_df.drop(max_df[max_df['Indicator']=='Year'].index)
    max_df = max_df.drop(max_df[max_df['Indicator']=='GDP'].index)
    max_df = max_df.drop(max_df[max_df['Indicator']=='Unnamed: 0'].index)
    max_df=max_df.sort_values(by = 'GDP-R^2',ascending = False)
    demo2=pd.concat((demo2, max_df), axis = 0)
demo2['Continent']=demo2['Country'].map(all_countries)
demo2

Unnamed: 0,Indicator,Type,GDP-R^2,Behaviour,Country,Continent
5,Exports-G&S,,0.954928,Positive,DEU,Europe
7,Health services use,,0.916594,Positive,DEU,Europe
4,Exports-Commercial services,,0.911725,Positive,DEU,Europe
10,Employment-services,Cuadratic,0.883525,Positive,DEU,Europe
12,Alcohol per capita,Cubic,0.875820,Negative,DEU,Europe
...,...,...,...,...,...,...
6,Suicide,,0.922759,Negative,CHN,Pair
5,Renewable electricity,,0.912560,Positive,CHN,Pair
2,Exports-Commercial services,,0.879177,Positive,CHN,Pair
14,Alcohol per capita,Cuadratic,0.867314,Positive,CHN,Pair


Now that we’ve loaded the data, we can start right away to create widgets. These widgets are essentials to add interactivity to our visualizations. We’re going to use two widgets: both, multiple selection widgets. To create these widgets, we can use `ipywidgets` library that is available for Jupyter Notebook.

The first widget that we are going to create is the multiple selection widget. We can do this by using `SelectMultiple()attribute` from `ipywidgets`. With this widget, we have the option to visualize the R^2 only in particular selection of indicators instead of all.

The first argument that we should specify is `options` , which should contain the list of available options of our variable (in our case different indicators). The next one is `value` , which should contain the variable values that we want to display as default, and then `description` is for the text field to describe the name of the widget.The rest of options are just visual details.

In [6]:
unique_tri = demo2['Indicator'].unique()
tri = widgets.SelectMultiple(
    options = unique_tri.tolist(),
    value = ['Exports-G&S'],
    description='Indicator',
    disabled=False,
    layout = Layout(width='50%', height='80px')
)

def graf1(tri):
    dat=demo2.loc[demo2.loc[:, 'Indicator'].isin(np.array(tri))]
    a=px.choropleth(dat, locations="Country", locationmode='ISO-3', 
                     color="GDP-R^2", hover_name="Country",hover_data = [dat.Type, dat.Behaviour],projection="natural earth",
                     color_continuous_scale='Reds', width=700, height=500, title= dat.Indicator.unique().tolist()[0])
    print(tri)
    a.show()
widgets.interactive(graf1, tri=tri)


interactive(children=(SelectMultiple(description='Indicator', index=(0,), layout=Layout(height='80px', width='…

To wrap up, we can create the second widget that is exactly the same as the previous multiple selection widget. The purpose of this widget is to enable us to choose which Continent that we want to visualize. Below is the code implementation of this widget.

In [7]:
unique_tric = demo2['Continent'].unique()
tric = widgets.SelectMultiple(
    options = unique_tric.tolist(),
    value = ['North Africa'],
    description='Continent',
    disabled=False,
    layout = Layout(width='50%', height='80px')
)

def graf1(tric):
    dat=demo2.loc[demo2.loc[:, 'Continent'].isin(np.array(tric))]
    a=px.scatter(dat, x="GDP-R^2", y='Indicator',
                     color="GDP-R^2", hover_name="Country",hover_data = [dat.Type, dat.Behaviour],
                     color_continuous_scale='Blues', width=700, height=500, title= dat.Continent.unique().tolist()[0])
    a.show()
widgets.interactive(graf1, tric=tric)

interactive(children=(SelectMultiple(description='Continent', index=(2,), layout=Layout(height='80px', width='…