## Set up notebook

The first step is to import standard python packages and the class `DataQueryInterface` from the `macrosynergy.management` module.

In [1]:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import yaml

from timeit import default_timer as timer
from datetime import timedelta, date

from macrosynergy.management.dq import DataQueryInterface

import warnings
warnings.simplefilter('ignore')

Next we define cross-sectional identifiers for potentially relevant countries or currency areas. These are the basis for forming data panels (sets of comparable time series) across markets.

Most cross section identifiers refer to currencies, currency areas or - in the case of euro area countries - economic areas. The currency names are in alphabetical order: AUD (Australian dollar), BRL (Brazilean real), CAD (Canadian dollar), CHF (Swiss franc), CLP (Chilean peso), CNY (Chinese yuan renminbi), COP (Colombian peso), CZK (Czech Republic koruna), DEM (German mark), ESP (Spanish peseta), EUR (euro), FRF (French franc), GBP (British pound), HKD (Hong Kong dollar), HUF (Hungarian forint), IDR (Indonesian rupiah), ILS (Israeli shekel), INR (Indian rupee), ITL (Italian lira), JPY (Japanese yen), KRW (Korean won), MXN (Mexican peso), MYR (Malaysian ringgit), NLG (Dutch guilder), NOK (Norwegian krone), NZD (New Zealand dollar), PEN (Peruvian sol), PHP (Philippine peso), PLN (Polish zloty), RON (Romanian leu), RUB (Russian ruble), SEK (Swedish krona), SGD (Singaporean dollar), THB (Thai baht), TRY (Turkish lira), TWD (Taiwanese dollar), USD (U.S. dollar), ZAR (South African rand).

In [2]:
cids_dmca = ['AUD', 'CAD', 'CHF', 'EUR', 'GBP', 'JPY', 'NOK', 'NZD', 'SEK', 'USD']  # DM currency areas
cids_dmec = ['DEM', 'ESP', 'FRF', 'ITL', 'NLG']  # DM euro area countries
cids_latm = ['BRL', 'COP', 'CLP', 'MXN', 'PEN']  # Latam countries
cids_emea = ['HUF', 'ILS', 'PLN', 'RON', 'RUB', 'TRY', 'ZAR']  # EMEA countries
cids_emas = ['CNY', 'HKD', 'IDR', 'INR', 'KRW', 'MYR', 'PHP', 'SGD', 'THB', 'TWD']  # EM Asia countries
cids_dm = cids_dmca + cids_dmec
cids_em = cids_latm + cids_emea + cids_emas
cids = sorted(cids_dm + cids_em)

In [3]:
path = ''
with open(f"{path}config.yml", 'r') as f:
    cf = yaml.load(f, Loader=yaml.FullLoader)

dq_username = cf["dq"]["username"]  # replace by your username
dq_password = cf["dq"]["password"]  # replace by your password
pki_crt = f"{path}api_macrosynergy_com.crt"  # replace by your public key certificate 
pki_key = f"{path}api_macrosynergy_com.key"   # replace by your PKI key 

In [4]:
ecos = ['CPIXFE_SJA_P6M6ML6AR', 'IP_SA_P6M6ML6AR']  # example economic data
mkts = ['RIR_NSA', 'EQCRR_NSA', 'FXCRR_NSA']  # example market data
rets = ['EQXR_NSA', 'FXXR_NSA', 'DU05YXR_NSA', 'DU05YXR_VT10']  # example returns data
xcats = ecos + mkts + rets  # list of categories to be downloaded

dq = DataQueryInterface(username=dq_username, password=dq_password, crt=pki_crt, key=pki_key)  #  instantiate DQ interface

start = timer()
dfd = dq.download(xcats = xcats, metrics=['value'], start_date='2000-01-01', suppress_warning=True)  # import via API
end = timer()

print("Download time from DQ: "+ str(timedelta(seconds=end - start)))
print("Last updated:", date.today())

dfd['ticker'] = dfd['cid'] + '_' + dfd['xcat']  # add ticker composite column for convenience

Download time from DQ: 0:12:20.652751
Last updated: 2021-11-17


A quick check of size and shape of the downloaded dataframe:

In [5]:
print(dfd.info())  # summarize available categories
dfd.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1421043 entries, 0 to 1421042
Data columns (total 5 columns):
 #   Column     Non-Null Count    Dtype         
---  ------     --------------    -----         
 0   cid        1421043 non-null  object        
 1   xcat       1421043 non-null  object        
 2   real_date  1421043 non-null  datetime64[ns]
 3   value      1229951 non-null  float64       
 4   ticker     1421043 non-null  object        
dtypes: datetime64[ns](1), float64(1), object(3)
memory usage: 54.2+ MB
None


Unnamed: 0,cid,xcat,real_date,value,ticker
0,AUD,CPIXFE_SJA_P6M6ML6AR,2000-01-03,0.61972,AUD_CPIXFE_SJA_P6M6ML6AR
1,AUD,CPIXFE_SJA_P6M6ML6AR,2000-01-04,0.61972,AUD_CPIXFE_SJA_P6M6ML6AR
2,AUD,CPIXFE_SJA_P6M6ML6AR,2000-01-05,0.61972,AUD_CPIXFE_SJA_P6M6ML6AR
3,AUD,CPIXFE_SJA_P6M6ML6AR,2000-01-06,0.61972,AUD_CPIXFE_SJA_P6M6ML6AR
4,AUD,CPIXFE_SJA_P6M6ML6AR,2000-01-07,0.61972,AUD_CPIXFE_SJA_P6M6ML6AR


## Historical distributions of indicators

### Histograms for single indicators

A histogram provides a simplified visualization of the past empirical distribution. It shows the size of bins that fall into certain value ranges rather than actual values. In Seaborn the `sns.histplot()` has replaced the older `sns.distplot` method.
Conveniently, it comes with a kernel density estimate overlay option, which can be exercised by setting the `kde` argument to True.

In [None]:
dfx = dfd[dfd['real_date'] >= pd.to_datetime('2000-01-01')]  # set start date
dfw = dfx.pivot(index='real_date', columns='ticker', values='value').replace(0, np.nan)  # bring df to wide format
var = 'TRY_RIR_NSA'  # specified indicator to analyze

col='teal'
sns.set_theme(style='whitegrid', rc={'figure.figsize':(6, 4)})  #  choose appearance
sns.histplot(x = var, data=dfw, bins=25, kde=True, color=col)  # histogram with custom bin number and kde overlay
plt.axvline(x= np.mean(dfw[var]), color=col, linestyle='--')  # add vertical line for mean

plt.title('Turkey real interest rate: mean and distribution', fontsize=13)  # add chart title
plt.xlabel('% annualized', fontsize=11)  # overwrite standard x-axis label
plt.ylabel('days observed', fontsize=11)  # overwrite standrad y-axis label
plt.show()

The `sns.histplot()` method can customize the width of the bins with the `binwidth` argument. One can also, change the units of the y axis with the `stat` argument from 'count' to 'frequency' (number of observations divided by the bin width),  'density' (normalizes counts so that the area of the histogram is 1) or 'probability' (normalizes counts so that the sum of the bar heights is 1).

In [None]:
dfx = dfd[dfd['real_date'] >= pd.to_datetime('2000-01-01')]  # set start date
dfw = dfx.pivot(index='real_date', columns='ticker', values='value').replace(0, np.nan)  # bring df to wide format
var = 'USD_CPIXFE_SJA_P6M6ML6AR'  # specified indicator to analyze

col='royalblue'
sns.set_theme(style='darkgrid', rc={'figure.figsize':(6, 4)})  #  choose appearance
sns.histplot(x = var, data=dfw, binwidth=0.2, stat = 'probability')  # histogram pre-set bin-width and probability bars
plt.axvline(x= np.mean(dfw[var]), color=col, linestyle='--')  # add vertical line for mean
plt.axvline(x= dfw[var].dropna().iloc[-1], color='red', linestyle='--')  # add line for latest

plt.title('U.S. core CPI trend, daily observed (red=latest)', fontsize=13)  # add chart title
plt.xlabel('% annualized', fontsize=11)  # overwrite standard x-axis label
plt.ylabel('historic probability (since 2000)', fontsize=11)  # overwrite standrad y-axis label
plt.show()

### Histograms for multiple indicators

The `hue` argument allows displaying multiple counts or probabilities in one plot. This can serve two principal purposes. 
1. We can compare the distribution of the values of cross sections. For this purpose we choose the setting `multiple='layer`, which plots overlapping histograms. 
2. We can visual the contribution of various series or cross sections to a joint histogram, by setting `multiple='stacked'`.

In [None]:
cids_sel = ['TWD', 'MXN', 'TRY']  # select small group of cross sections
filt1 = dfd['xcat'] == 'FXCRR_NSA'  # choose (filter out) category
filt2 = dfd['cid'].isin(cids_sel)  # choose cross sections
filt3 = dfd['real_date'] >= pd.to_datetime('2010-01-01')  # set start date
dfx = dfd[filt1 & filt2 & filt3][['value', 'cid']].replace(0, np.nan)  # dataframe in appropriate format

colors = 'pastel'  # choose color palette
sns.set_theme(style='whitegrid', rc={'figure.figsize':(8, 4)})  #  choose appearance
ax = sns.histplot(x='value', data=dfx,  
             hue='cid', element='poly', multiple='layer',  # use hue and polygons for overlapping cross sections
             binrange=(-10, 20), binwidth = 1, stat='density', palette=colors)
plt.title('Real FX forward carry distributions in comparison', fontsize=13)  # set title
plt.xlabel('% annualized', fontsize=11)  # set x-axis label
plt.ylabel('historic density', fontsize=11)  # set y-axis label
leg = ax.axes.get_legend()  # add legend box to plot to identify cross sections
leg.set_title('Currencies')  # give title to legend box
plt.show()

In [None]:
cids_sel = ['MXN', 'TRY', 'TWD']  # select small group of cross sections
filt1 = dfd['cid'].isin(cids_sel)  # filter out cross sections
filt2 = dfd['xcat'] == 'FXCRR_NSA'  # filter out category
filt3 = dfd['real_date'] >= pd.to_datetime('2010-01-01')  # set start date
dfx = dfd[filt1 & filt2 & filt3][['value', 'cid']].sort_values('cid')  # dataframe in appropriate format

colors = 'bone'  # choose color palette
sns.set_theme(style='whitegrid', rc={'figure.figsize':(8, 4)})  #  choose appearance
ax = sns.histplot(x='value', data=dfx,  
             hue='cid', element='bars', multiple='stack',  # use hue and bars/stack for overlapping visualization
             binrange=(-10, 20), binwidth = 0.5, stat='count', palette=colors)

plt.title('Real FX forward carry distribution and latest values', fontsize=13)  # set title
plt.xlabel('% annualized', fontsize=11)  # set x-axis label
plt.ylabel('days observed', fontsize=11)  # set y-axis label
leg = ax.axes.get_legend()  # add legend box to plot to identify cross sections
leg.set_title('Currencies')  # give title to legend box
plt.show()

### Multi-indicator distribution graphs

The boxplot is a **condensed categorical distribution plot**, which means it is particularly suitable for visualizing a few selected key distribution features across categories. In Seaborn this type of plot is managed by the `sns.boxplot()` method. The distributional features can be applied for one or multiple categories across a full range of cross sections.

The **boxes** in these plots mark the <u>thresholds of the 25% and 75% percentiles</u> (inner quartile range) on the outer edges and the median in the centre. The **whiskers** are the ranges obtained by picking the largest or smallest data points respectively within range defined by <u>stretching the inner quartile range 1.5 times above or below the median</u>. Points outside this range are considered outliers.

In [None]:
cids_sel = ['AUD', 'CAD', 'CHF', 'EUR', 'GBP', 'JPY', 'NOK', 'NZD', 'SEK', 'USD']  # select cross sections
filt1 = dfd['cid'].isin(cids_sel)  #  filter out cross sections
filt2 = dfd['xcat'] == 'CPIXFE_SJA_P6M6ML6AR'  #  filter out category

dfx = dfd[filt1 & filt2][['value', 'cid', 'xcat']].sort_values('cid')  # dataframe in appropriate format 

color='darkgoldenrod'
sns.set_theme(style='dark', rc={'figure.figsize':(7, 4)})  #  choose appearance
ax = sns.boxplot(data=dfx, x='cid', y='value', color=color, width=0.75, fliersize=2)  # single category box-whiskers

plt.axhline(y=0, color='black', linestyle='--', lw=1)  # horizontal line at zero
plt.title('Core inflation trends', fontsize=13)  # set title
plt.xlabel('')  # set x-axis label
plt.ylabel('% annualized, days observed', fontsize=11)  # set y-axis label
plt.show()

Multiple categories for each cross section can be plotted by using the `hue` argument and setting it to the column name that is used for categorization.

In [None]:
cids_sel = ['AUD', 'CAD', 'CHF', 'EUR', 'GBP', 'JPY', 'NOK', 'NZD', 'SEK']  # select cross sections
xcats_sel = ['RIR_NSA', 'FXCRR_NSA']  # select categories
filt1 = dfd['cid'].isin(cids_sel)  # filter out cross sections
filt2 = dfd['xcat'].isin(xcats_sel)  # select category
dfx = dfd[filt1 & filt2][['value', 'cid', 'xcat']].sort_values('cid')  # dataframe in appropriate format 

colors='hls'  # choose color palette
sns.set_theme(style='whitegrid', rc={'figure.figsize':(8, 4)})  #  choose appearance
ax = sns.boxplot(data=dfx, x='cid', y='value', hue='xcat',  # hue allows subcategories
                 palette=colors, width=0.6, fliersize=2)

plt.title('Real interest rates and real FX forward carry (vs dominant cross)', fontsize=13)  # set title
plt.axhline(y=0, color='black', linestyle='-', lw=1.5)  # horizontal line at zero
plt.xlabel('')  # set x-axis label
plt.ylabel('% annualized, days observed', fontsize=11)  # set y-axis label
leg = ax.axes.get_legend()  # add legend box explicitly for control
leg.set_title('Categories')  # set title of legend box
plt.show()

A **violin plot** is a categorical distribution plot that is a combination of boxplot and (mostly) symmetric KDE plot. Like the boxplot it displays medians and inner quartile ranges.However, unlike a boxplot, it does not focus on outliers but rather on shape of the probability distribution function. The outer shape represents all possible results. The next layer inside might represent the values that occur 95% of the time

In Seaborn violin plots are managed through the `sns.violinplot()` method.

In [None]:
cids_sel = ['AUD', 'CAD', 'CHF', 'EUR', 'GBP', 'JPY', 'NZD', 'SEK']  # select cross sections
xcats_sel = ['EQXR_NSA', 'FXXR_NSA']  # select categories
filt1 = dfd['cid'].isin(cids_sel)  # filter out cross sections
filt2 = dfd['xcat'].isin(xcats_sel)  # select category
dfx = dfd[filt1 & filt2][['value', 'cid', 'xcat']].sort_values('cid')  # dataframe in appropriate format 

colors='hls'  # choose color palette
sns.set_theme(style='darkgrid', rc={'figure.figsize':(6, 6)})  #  choose appearance
ax = sns.violinplot(data=dfx, y='cid', x='value', hue='xcat')  # hue visualizes multiple categories

plt.title('Distribution of daily equity and FX forward returns', fontsize=13)  # set title
plt.ylabel('')  # set x-axis label
plt.xlabel('% annualized, days observed', fontsize=11)  # set y-axis label
leg = ax.axes.get_legend()  # add legend box explicitly for control
leg.set_title('Categories')  # set title of legend box
plt.show()

## Timelines of indicators

### Lineplots

The purpose of a lineplot is two illustrate a continuous relationship between two variables, where time is typically one of these variables. In Seaborn the method to manage lineplots is `sns.lineplot()`. Its most simple application is to pass to it a wide dataframe with a time axis as rows and individual series as columns.

In [None]:
cids_sel = ['GBP', 'SEK']  # select cross sections
filt1 = dfd['cid'].isin(cids_sel)  # filter out cross sections
filt2 = dfd['xcat'] == 'IP_SA_P6M6ML6AR'  # filter out category
filt3 = dfd['real_date'] >= pd.to_datetime('2000-01-01')  # set start date
dfx = dfd[filt1 & filt2 & filt3]
dfw = dfx.pivot(index=['real_date'], columns='cid', values='value')  # pivot data frame to common time scale

colors='Paired'  # choose color palette
sns.set_theme(style='whitegrid', rc={'figure.figsize':(6, 4)})  #  choose appearance
ax = sns.lineplot(data=dfw, estimator=None, palette=colors)  # simply pass data frame with time scale to method

plt.axhline(y=0, color='black', linestyle='--', lw=1)  # horizontal line at zero
plt.title('Industrial production trends (observed daily values)', fontsize=13)  # set title
plt.xlabel('')  # set x-axis label
plt.ylabel('% 6 months over 6 months, annualized', fontsize=11)  # set y-axis label

leg = ax.axes.get_legend()  # add legend box explicitly for control
leg.set_title('Currency areas')  # set title of legend box

plt.show()

Importantly, the Seaborn lineplot method does not simply plot a line, but can also estimate a confidence interval. Seaborn uses **bootstrapping** to estimate confidence intervals. The method creates many samples by selecting uniformly and with replacement from the observed values that were actually reported at a given point in time. The default is to create 1000 samples and to create aggregates - typically the mean - from each of these samples. The 95% confidence interval denotes the lower and upper boundary of the inner 95% aggregate values that were such created.

In [None]:
cids_sel = cids_em  # select cross sections
xcat_sel = 'FXCRR_NSA'  # select category
filt1 = dfd['cid'].isin(cids_sel)  # filter for cross sections
filt2 = dfd['xcat'] == xcat_sel  #  filter for category
filt3 = dfd['real_date'] >= pd.to_datetime('2000-01-01')  # filter for start date
dfx = dfd[filt1 & filt2 & filt3]  # filter out relevant data frame
dfm = dfx.groupby(['cid', 'xcat']).resample('M', on='real_date').mean()['value'].reset_index()  # convert to monthly averages
dfw = dfm.pivot(index=['cid', 'real_date'], columns='xcat', values='value').reset_index()  # pivot to appropriate index

colors='Paired'  # choose color palette
sns.set_theme(style='whitegrid', rc={'figure.figsize':(6, 4)})  #  choose appearance
sns.lineplot(data=dfw, x='real_date', y=xcat_sel, estimator='mean', ci=95)  # plot mean and its 95% confidence interval

plt.axhline(y=0, color='black', linestyle='--', lw=1)  # horizontal line at zero
plt.title('Real FX carry across EM: monthly mean and 95% confidence', fontsize=13)  # set title
plt.xlabel('')  # set x-axis label
plt.ylabel('% annualized', fontsize=11)  # set y-axis label

plt.show()

The seaborn lineplot can not only display values chronologically, but also aggregate information over time units, such as months. This may reveal seasonal patterns. The confidence interval can be set with the `ci` argument. If high confidence intervals for many underlying observations do not overlap and reveal a clear patter, seasonality is likely.

With the 'hue' argument one can also compare confidence intervals across categories.

In [None]:
cids_sel = cids_em  # select cross sections
xcat_sel = 'FXXR_NSA'  # select category
filt1 = dfd['cid'].isin(cids_sel)  # filter for cross sections
filt2 = dfd['xcat'] == xcat_sel  #  filter for category
filt3 = dfd['real_date'] >= pd.to_datetime('2000-01-01')  # filter for start date
dfx = dfd[filt1 & filt2 & filt3]  # filter out relevant data frame

dfm = dfx.groupby(['cid', 'xcat']).resample('M', on='real_date').sum()['value'].reset_index()  # monthly means
dfw = dfm.pivot(index=['cid', 'real_date'], columns='xcat', values='value').reset_index()
dfw['month'] = dfw['real_date'].dt.month
dfw['period'] = 'before 2010'
dfw.loc[dfw['real_date'].dt.year > 2010, 'period'] = 'from 2010'

colors='Set2'  # choose color palette
sns.set_theme(style='whitegrid', rc={'figure.figsize':(6, 4)})  #  choose appearance
ax = sns.lineplot(data=dfw, x='month', y=xcat_sel, hue='period',  # draw different lines for classes of period category
                  estimator='mean', ci=95, palette=colors)  # plot tighter confidence interval

plt.axhline(y=0, color='black', linestyle='--', lw=1)  # horizontal line at zero
plt.title('EM FX returns across months: mean and 95% confidence', fontsize=13)  # set title
plt.xlabel('')  # set x-axis label
plt.ylabel('%', fontsize=11)  # set y-axis label
leg = ax.axes.get_legend()  # add legend box explicitly for control
leg.set_title('Periods')  # set title of legend box

plt.show()

### Line facets

The purpose of facet grids is the creation of small multiples. The class `sns.FacetGrid()` maps a dataset onto multiple axes arrayed in a grid of rows and columns that correspond to levels of variables in the dataset.

The `map()` method of the `FacetGrid` object applies a plotting function to each facet’s subset of the data. The `map_dataframe()` method of the `FacetGrid` is similar to map, but gives more flexibility because it passes arguments inserts data in `kwargs`.

In [None]:
cids_sel = ['AUD', 'CAD', 'CHF', 'GBP', 'NZD', 'SEK']  # select cross sections
xcat_sel = 'CPIXFE_SJA_P6M6ML6AR'  # select categories
filt1 = dfd['cid'].isin(cids_sel)  # filter for cross sections
filt2 = dfd['xcat'] == xcat_sel  #  filter for category
filt3 = dfd['real_date'] >= pd.to_datetime('2005-01-01')  # filter for start date
dfx = dfd[filt1 & filt2 & filt3]  # filter out relevant data frame

color='r'  # choose color palette
sns.set_theme(style='darkgrid')  #  choose appearance
fg = sns.FacetGrid(dfx, col='cid', col_wrap=3,  # set number of columns of the grid
                   height=3, aspect=1.5,  # set height and aspect ratio of cheach chart
                   sharey=True)  # gives same y axis to all grid plots
fg.map_dataframe(sns.lineplot, x='real_date', y='value', ci=None, lw=1, color=color)  # map lineplot to the grid
fg.map(plt.axhline, y=0, c=".5", lw=1, linestyle='--')  # map horizontal zero line to each chart in grid

fg.set_axis_labels('', '% 6m/6m, ar')  # set axes labels of individual charts
fg.set_titles(col_template='{col_name}')  # set individual charts' title
fg.fig.suptitle('Consistent core inflation trend', y=1.02)  # set facet grid title 
plt.show()

To display multiple categories in a facet grid of lineplots one manages the categories to be used by setting the `hue` argument in the lineplot method to the column that contains the categories.

In [None]:
cids_sel = ['AUD', 'CAD', 'CHF', 'GBP', 'NZD', 'SEK']  # select cross sections
xcats_sel = ['RIR_NSA', 'FXCRR_NSA']  # select categories
filt1 = dfd['cid'].isin(cids_sel)  # filter for cross sections
filt2 = dfd['xcat'].isin(xcats_sel)  #  filter for category
filt3 = dfd['real_date'] >= pd.to_datetime('2000-01-01')  # filter for start date
dfx = dfd[filt1 & filt2 & filt3]  # filter out relevant data frame

colors='bone'  # choose color palette
sns.set_theme(style='whitegrid', palette=colors)  #  choose appearance
fg = sns.FacetGrid(dfx, col='cid', col_wrap=3,  # set number of columns of the grid
                   palette=colors, hue='xcat',  # hue is typically defined at the level of the facet grird
                   height=3, aspect=1.5,  # set height and aspect ratio of cheach chart
                   sharey=False)  # gives individual y axes to grid plots
fg.map_dataframe(sns.lineplot, x='real_date', y='value', ci=None, lw=1)  # map lineplot to the grid
fg.map(plt.axhline, y=0, c=".5", lw=0.75)  # map horizontal zero line to each chart in grid

fg.set_axis_labels('', '% ar')  # set axes labels of individual charts
fg.set_titles(col_template='{col_name}')  # set individual charts' title
fg.fig.suptitle('Real interest rates and FX forward carry', y=1.02)  # set facet grid title 
handles = fg._legend_data.values()  # get hamdes or legend box
labels = ['Real short-term interest rate', 'Real FX forward carry'] # series labels for legend box
fg.fig.legend(handles=handles, labels=labels, loc='lower center', ncol=3)  # add legend to bottom of figure
fg.fig.subplots_adjust(bottom=0.15) # lift bottom so it does not conflict with legend
plt.show()

## Bivariate relations

### Scatterplots

In [None]:
cids_sel = ['AUD', 'CAD', 'CHF', 'GBP', 'NZD', 'SEK']  # select cross sections
xcats_sel = ['RIR_NSA', 'FXCRR_NSA']  # select categories
filt1 = dfd['cid'].isin(cids_sel)  # filter for cross sections
filt2 = dfd['xcat'].isin(xcats_sel)  #  filter for category
filt3 = dfd['real_date'] >= pd.to_datetime('2000-01-01')  # filter for start date
dfx = dfd[filt1 & filt2 & filt3]  # filter out relevant data frame
dfax = dfx.groupby(['cid', 'xcat']).resample('A', on='real_date').mean()['value'].reset_index()  # annual averages
dfaw = dfax.pivot(index=['cid', 'real_date'], columns='xcat', values='value').reset_index()  # pivot to wide dataframe

colors='deep'  # choose color palette
sns.set_theme(style='darkgrid', palette=colors, rc={'figure.figsize':(6, 4)})  #  choose appearance
ax = sns.scatterplot(x=xcats_sel[0], y=xcats_sel[1], data=dfaw,  # column names used for scatter
                     hue='cid', style='cid',  # distinguishes cids by color and marker
                     s=100)  # controls size of dots

plt.axhline(y=0, color='black', linestyle='--', lw=1)  # horizontal zero line
plt.axvline(x=0, color='black', linestyle='--', lw=1)  # vertical zero line

plt.title('Real interest rates and real FX forward carry (annual averages)', fontsize=13)  # set title
plt.xlabel('Real interest rates, % ar', fontsize=11)  # set x-axis label
plt.ylabel('Real forward carry, % ar', fontsize=11)  # set y-axis label
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)  # place legend outside box

plt.show()

If the scatter has many points the `alpha` argument allows visualizing density through color intensity.

In [None]:
cids_sel = ['AUD', 'CAD', 'CHF', 'GBP', 'NZD', 'SEK']  # select cross sections
xcats_sel = ['RIR_NSA', 'FXCRR_NSA']  # select categories
filt1 = dfd['cid'].isin(cids_sel)  # filter for cross sections
filt2 = dfd['xcat'].isin(xcats_sel)  #  filter for category
filt3 = dfd['real_date'] >= pd.to_datetime('2000-01-01')  # filter for start date
dfx = dfd[filt1 & filt2 & filt3]  # filter out relevant data frame
dfw = dfx.pivot(index=['cid', 'real_date'], columns='xcat', values='value').reset_index()

sns.set_theme(style='whitegrid', rc={'figure.figsize':(6, 4)})  #  choose appearance
ax = sns.scatterplot(x=xcats_sel[0], y=xcats_sel[1], data=dfw, color='steelblue',
                     marker='o', alpha=0.2, s=20)  # combination of marker style, alpha and size is important

plt.axhline(y=0, color='red', linestyle='--', lw=1)  # horizontal zero line
plt.axvline(x=0, color='red', linestyle='--', lw=1)  # vertical zero line

plt.title('Real interest rates and real FX forward carry (annual averages)', fontsize=13)  # set title
plt.xlabel('Real interest rates, % ar', fontsize=11)  # set x-axis label
plt.ylabel('Real forward carry, % ar', fontsize=11)  # set y-axis label

plt.show()

### Regression plots

The `sns.regplot()` method can plot scatters and a fitted regression line at the sime time. Various regression estimators are available.

The shaded bands around the regression line are confidence intervals that were created by bootstrapping. The option `robust = True` stipluates robust regression. This will de-weight outliers, but takes significantly more computation time. Using this option makes it advisable to reduce the bootstrap samples with `n_boot`.

In [None]:
cids_sel = ['AUD', 'CAD', 'CHF', 'GBP', 'NZD', 'SEK']  # select cross sections
xcats_sel = ['RIR_NSA', 'FXCRR_NSA']  # select categories
filt1 = dfd['cid'].isin(cids_sel)  # filter for cross sections
filt2 = dfd['xcat'].isin(xcats_sel)  #  filter for category
filt3 = dfd['real_date'] >= pd.to_datetime('2000-01-01')  # filter for start date
dfx = dfd[filt1 & filt2 & filt3]  # filter out relevant data frame
dff = dfx.groupby(['cid', 'xcat']).resample('M', on='real_date').mean()['value'].reset_index()  # weekly averages
dfw = dff.pivot(index=['cid', 'real_date'], columns='xcat', values='value').reset_index()  # pivot to wide dataframe

sns.set_theme(style='whitegrid', rc={'figure.figsize':(6, 4)})  #  choose appearance
sns.regplot(x=xcats_sel[0], y=xcats_sel[1], data=dfw, ci=98, order=1, 
            robust=False,  #  can use statsmodels' rboust regression method, but takes more time
            scatter_kws={'s': 20, 'alpha': 0.3, 'color':'lightgray'},  # customize appearance of scatter
            line_kws={'lw' : 2, 'linestyle': '-.', 'color': 'salmon'})  # customize appearance of line

plt.axhline(y=0, color='black', linestyle='--', lw=1)  # horizontal zero line
plt.axvline(x=0, color='black', linestyle='--', lw=1)  # vertical zero line

plt.title('Real interest rates and real FX forward carry (monthly averages)', fontsize=13)  # set title
plt.xlabel('Real interest rates, % ar', fontsize=11)  # set x-axis label
plt.ylabel('Real forward carry, % ar', fontsize=11)  # set y-axis label

plt.show()

It is possible to visualize **polynomial regression** curves with the `order` argument.

In [None]:
cids_sel = ['AUD', 'CAD', 'CHF', 'GBP', 'NZD', 'SEK']  # select cross sections
xcats_sel = ['FXCRR_NSA', 'FXXR_NSA']  # select explanatory/dependent categories
filt1 = dfd['cid'].isin(cids_sel)  # filter for cross sections
filt2 = dfd['xcat'].isin(xcats_sel)  #  filter for category
filt3 = dfd['real_date'] >= pd.to_datetime('2000-01-01')  # filter for start date
dfx = dfd[filt1 & filt2 & filt3]  # filter out relevant data frame
dff = dfx.groupby(['cid', 'xcat']).resample('Q', on='real_date').mean()['value'].reset_index()  # monthly averages
filt4 = dff['xcat']==xcats_sel[0]  # filter for explanatory data in frequency-transformed dataframe
dff.loc[filt4, 'value'] = dff[filt4].groupby(['cid', 'xcat'])['value'].shift(1)  # lag explanatory values by 1 time period
dfw = dff.pivot(index=['cid', 'real_date'], columns='xcat', values='value').reset_index()  # pivot to wide dataframe

sns.set_theme(style='darkgrid', rc={'figure.figsize':(6, 4)})  #  choose appearance
sns.regplot(x=xcats_sel[0], y=xcats_sel[1], data=dfw, ci=95,  
            order=2,  #  2nd-order polynomial fit
            scatter_kws={'s': 20, 'alpha': 0.3, 'color':'goldenrod'},  # customize appearance of scatter
            line_kws={'lw' : 1, 'linestyle': '-', 'color': 'tab:blue'})  # customize appearance of line

plt.axhline(y=0, color='tab:blue', linestyle='--', lw=1)  # horizontal zero line
plt.axvline(x=0, color='tab:blue', linestyle='--', lw=1)  # vertical zero line

plt.title('FX forward carry and subsequent returns (quarterly averages)', fontsize=13)  # set title
plt.xlabel('Real forward carry, % ar', fontsize=11)  # set x-axis label
plt.ylabel('FX forward returns, % ar', fontsize=11)  # set y-axis label

plt.show()

Other options include `logistic` (for binary dependent) and `lowess` for locally weighted linear regression.

In [None]:
cids_sel = ['AUD', 'CAD', 'CHF', 'GBP', 'NZD', 'SEK']  # select cross sections
xcats_sel = ['RIR_NSA', 'FXCRR_NSA']  # select categories
filt1 = dfd['cid'].isin(cids_sel)  # filter for cross sections
filt2 = dfd['xcat'].isin(xcats_sel)  #  filter for category
filt3 = dfd['real_date'] >= pd.to_datetime('2000-01-01')  # filter for start date
dfx = dfd[filt1 & filt2 & filt3]  # filter out relevant data frame
dff = dfx.groupby(['cid', 'xcat']).resample('M', on='real_date').mean()['value'].reset_index()  # weekly averages
dfw = dff.pivot(index=['cid', 'real_date'], columns='xcat', values='value').reset_index()  # pivot to wide dataframe

sns.set_theme(style='whitegrid', rc={'figure.figsize':(6, 4)})  #  choose appearance
sns.regplot(x=xcats_sel[0], y=xcats_sel[1], data=dfw,  # pass the date
            lowess=True,  #  uses statsmodels to estimate a nonparametric locally weighted linear regression
            marker='d',  # choose diamon market
            scatter_kws={'s': 50, 'alpha': 0.2, 'color':'gray'},  # customize appearance of scatter
            line_kws={'lw' : 1.5, 'color': 'black'})  # customize appearance of line

plt.axhline(y=0, color='red', linestyle='--', lw=1)  # horizontal zero line
plt.axvline(x=0, color='red', linestyle='--', lw=1)  # vertical zero line

plt.title('Real interest rates and real FX forward carry (monthly averages)', fontsize=13)  # set title
plt.xlabel('Real interest rates, % ar', fontsize=11)  # set x-axis label
plt.ylabel('Real forward carry, % ar', fontsize=11)  # set y-axis label

plt.show()

### Jointplots

The jointplot shows simulatenously the relation between two variables and their distributions. It consists of three separate plots: a relational plot and two histograms. Technically it is a facet grid. It is managed through the `sns.jointplot()` function.

Importantly, one can choose from a range of relational plots through the `kind` argument.

In [None]:
cids_sel = ['AUD', 'CAD', 'CHF', 'GBP', 'NZD', 'SEK']  # select cross sections
xcats_sel = ['RIR_NSA', 'FXCRR_NSA']  # select categories
filt1 = dfd['cid'].isin(cids_sel)  # filter for cross sections
filt2 = dfd['xcat'].isin(xcats_sel)  #  filter for category
filt3 = dfd['real_date'] >= pd.to_datetime('2000-01-01')  # filter for start date
dfx = dfd[filt1 & filt2 & filt3]  # filter out relevant data frame
dff = dfx.groupby(['cid', 'xcat']).resample('W', on='real_date').mean()['value'].reset_index()  # weekly averages
dfw = dff.pivot(index=['cid', 'real_date'], columns='xcat', values='value').reset_index()  # pivot to wide dataframe

sns.set_theme(style='white', rc={'figure.figsize':(6, 4)})  #  choose appearance
fg = sns.jointplot(x=xcats_sel[0], y=xcats_sel[1], data=dfw, color='steelblue',
                   kind='hex', alpha=0.5)  # display density in hexgons
fg.fig.suptitle('Real interest rates and real FX forward carry (weekly averages)', y=1.02, fontsize=13)  # set grid title
fg.set_axis_labels('Real interest rates, % ar', 'Real forward carry, % ar', fontsize=11)  # set x/y axis labels

plt.show()

A regression line can be added by applying the `plot_joint()` method to the joint plot factegrid.

In [None]:
cids_sel = ['AUD', 'CAD', 'CHF', 'GBP', 'NZD', 'SEK']  # select cross sections
xcats_sel = ['RIR_NSA', 'FXCRR_NSA']  # select categories
filt1 = dfd['cid'].isin(cids_sel)  # filter for cross sections
filt2 = dfd['xcat'].isin(xcats_sel)  #  filter for category
filt3 = dfd['real_date'] >= pd.to_datetime('2000-01-01')  # filter for start date
dfx = dfd[filt1 & filt2 & filt3]  # filter out relevant data frame
dff = dfx.groupby(['cid', 'xcat']).resample('M', on='real_date').mean()['value'].reset_index()  # weekly averages
dfw = dff.pivot(index=['cid', 'real_date'], columns='xcat', values='value').reset_index()  # pivot to wide dataframe

sns.set_theme(style='whitegrid', rc={'figure.figsize':(6, 4)})  #  choose appearance
fg = sns.jointplot(x=xcats_sel[0], y=xcats_sel[1], data=dfw, kind='hist',  #  choose 2-dimension histogram
                   color='red')
fg.plot_joint(sns.regplot, scatter=False, ci=False, color='black')  # one can overlay regression line
fg.fig.suptitle('Real interest rates and real FX forward carry (monthly averages)', y=1.02, fontsize=13)  # set grid title
fg.set_axis_labels('Real interest rates, % ar', 'Real forward carry, % ar', fontsize=11)  # set x/y axis labels

plt.show()

The kernel density estimator (`kind='kde'`) gives a very stylized visualization of the relations.

In [None]:
cids_sel = ['AUD', 'CAD', 'CHF', 'GBP', 'NZD', 'SEK']  # select cross sections
xcats_sel = ['RIR_NSA', 'FXCRR_NSA']  # select categories
filt1 = dfd['cid'].isin(cids_sel)  # filter for cross sections
filt2 = dfd['xcat'].isin(xcats_sel)  #  filter for category
filt3 = dfd['real_date'] >= pd.to_datetime('2000-01-01')  # filter for start date
dfx = dfd[filt1 & filt2 & filt3]  # filter out relevant data frame
dff = dfx.groupby(['cid', 'xcat']).resample('M', on='real_date').mean()['value'].reset_index()  # weekly averages
dfw = dff.pivot(index=['cid', 'real_date'], columns='xcat', values='value').reset_index()  # pivot to wide dataframe

sns.set_theme(style='dark')  #  choose appearance
fg = sns.jointplot(x=xcats_sel[0], y=xcats_sel[1], data=dfw, kind='kde', 
                   color='gray', height=6)  # color and size parameters
fg.plot_joint(sns.regplot, scatter=False, ci=False, color='red')  # one can overlay regression line
fg.fig.suptitle('Real interest rates and real FX forward carry (monthly averages)', y=1.02, fontsize=13)  # set grid title
fg.set_axis_labels('Real interest rates, % ar', 'Real forward carry, % ar', fontsize=11)  # set x/y axis labels

plt.show()

Information of categorical variables can be integrated through the `hue` argument.

Additional arguments can be passed to the central relational and marginal distribution plots through the `joint_kws` and `marginal_kw` keyword dictionaries respectively.

In [None]:
cids_sel = ['AUD', 'CAD', 'CHF', 'GBP', 'NZD', 'SEK']  # select cross sections
xcats_sel = ['RIR_NSA', 'FXCRR_NSA']  # select categories
filt1 = dfd['cid'].isin(cids_sel)  # filter for cross sections
filt2 = dfd['xcat'].isin(xcats_sel)  #  filter for category
filt3 = dfd['real_date'] >= pd.to_datetime('2000-01-01')  # filter for start date
dfx = dfd[filt1 & filt2 & filt3]  # filter out relevant data frame
dff = dfx.groupby(['cid', 'xcat']).resample('M', on='real_date').mean()['value'].reset_index()  # weekly averages
dfw = dff.pivot(index=['cid', 'real_date'], columns='xcat', values='value').reset_index()  # pivot to wide dataframe

dfw['Period'] = 'before 2010'  # create custom categorical variable
dfw.loc[dfw['real_date'].dt.year > 2010, 'Period'] = 'from 2010'

colors='Set1'  # choose color palette
sns.set_theme(style='dark')  #  choose appearance
fg = sns.jointplot(x=xcats_sel[0], y=xcats_sel[1], data=dfw,  # pass appropriate data
                   kind='scatter', palette=colors, height=6,  # parameters for appearance
                   hue='Period',  # classes of pepriod category will be visualized by hue
                   joint_kws={'marker':'+'},  # keyword dictionary specific to relational plot
                   marginal_kws={'lw': 1})  # keyword dictionary specific to distribution plot
fg.fig.suptitle('Real interest rates and real FX forward carry (monthly averages)', y=1.02, fontsize=13)  # set grid title
fg.set_axis_labels('Real interest rates, % ar', 'Real forward carry, % ar', fontsize=11)  # set x/y axis labels

plt.show()

### Pairplots

The `sns.pairplot()` function manages the display of multiple joint distributions. For example, it can be applied to visualize the joint density of a category across pairs of countries. Specifically, the pairplot is  a joint visualization grid of univaraiate distributions on the diagonals and bivariate distributions on the off-diagonals. <u>It collects a lot of information in one place and is therefore an instance of comprehensive exploratory data analysis</u>. 
The `sns.pairplot()` output is a `PairGrid` instance, similar to a facet grid. rather than a single axes object.

Many arguments of `sns.pairplot` apply either to all diagonals or all off-diagonals:
* `kind` governs the type of off-diagonal relational plot to use. It must be one of `scatter`, `kde`, `hist`, or `reg`.
* `plot_kws` takes a dictionary of further arguments that apply to the chosen off-diagonal (main) plots.
* `diag_kind` governs the type of diagnonal plot to use and is usually `hist` or `kde`.
* `diag_kws` takes a dictionary of arguments that apply to the chosen diagonal plots.

In [None]:
cids_sel = ['EUR', 'GBP', 'SEK', 'CHF']  # select cross sections
xcat_sel = 'RIR_NSA' # select categories
filt1 = dfd['cid'].isin(cids_sel)  # filter for cross sections
filt2 = dfd['xcat'] == xcat_sel  #  filter for category
filt3 = dfd['real_date'] >= pd.to_datetime('2000-01-01')  # filter for start date
dfx = dfd[filt1 & filt2 & filt3]  # filter out relevant data frame
dff = dfx.groupby(['cid']).resample('M', on='real_date').mean()['value'].reset_index()  # monthly averages
dfw = dff.pivot(index='real_date', columns='cid', values='value').reset_index()  # pivot to wide dataframe
dfw = dfw[(dfw.T != 0).any()]  # drop all rows that are all zeroes

color = 'teal'  # choose palette
sns.set_theme(style='darkgrid')  #  choose appearance
fg=sns.pairplot(data=dfw, vars=cids_sel, 
                height=2, aspect=1.2,  # height and aspect ratio of each facet in the plot
                corner=True,  # removes redundant bivariate plots in symmetric matrix
                kind='scatter',  # choose type of bivariate plot
                plot_kws={'s':20, 'alpha':0.3, 'color':color},  # set parameters for off-diagonal plots
                diag_kind='hist',  # choose type of univariate distribution plot
                diag_kws={'bins':20, 'color':color})  # set parameters for off-diagonal plots)
fg.fig.suptitle('Distributions of real interest rates in Europe (monthly averages)', y=1.02, fontsize=14)  # set grid title

plt.show()

In order to apply the `sns.pairplot()` function to cross sections one had to pivot the selected dataframe with cross sections ('cid') as basis for new columns. In order to apply the `sns.pairplot()` function to categories one simply needs to pivot the selected dataframe with cross sections ('xcat') as basis for new columns.

In [None]:
cids_sel = ['AUD', 'CAD', 'CHF', 'GBP', 'JPY', 'NOK', 'NZD', 'SEK']  # select cross sections
xcats_sel = ['FXXR_NSA', 'RIR_NSA', 'FXCRR_NSA', 'IP_SA_P6M6ML6AR'] # select categories
filt1 = dfd['cid'].isin(cids_sel)  # filter for cross sections
filt2 = dfd['xcat'].isin(xcats_sel)  #  filter for category
filt3 = dfd['real_date'] >= pd.to_datetime('2000-01-01')  # filter for start date
dfx = dfd[filt1 & filt2 & filt3]  # filter out relevant data frame
dff = dfx.groupby(['cid', 'xcat']).resample('A', on='real_date').mean()['value'].reset_index()  # annual averages
dfw = dff.pivot(index=['cid', 'real_date'], columns='xcat', values='value').reset_index()  # pivot to wide dataframe

color = 'gray'  # choose palette
sns.set_theme(style='whitegrid', palette=colors)  #  choose appearance
fg=sns.pairplot(data=dfw, vars=xcats_sel, 
                height=2, aspect=1.2,  # height and aspect ratio of each facet in the plot
                corner=True,  # removes redundant bivariate plots in symmetric matrix
                plot_kws={'color': color, 'bins':15},  # set parameters for off-diagonal plots
                kind='hist',  # choose type of bivariate plot
                diag_kind='kde',  # choose type of univariate distribution plot
                diag_kws={'color':color})  # set parameters for off-diagonal plots)
fg.fig.suptitle('Individual and pairwise distribution of FX-related indicators (annual)', 
                y=1.02, fontsize=14)  # set grid title
plt.show()

Adding even more information, the pairplot can show distributions and relations for separate values of a categorical variable, using the `hue` argument and a related `palette` choice.

In [None]:
cids_sel = ['AUD', 'CAD', 'MXN', 'ZAR', 'CHF']  # select cross sections
xcat_sel = 'FXXR_NSA' # select categories
filt1 = dfd['cid'].isin(cids_sel)  # filter for cross sections
filt2 = dfd['xcat'] == xcat_sel  #  filter for category
filt3 = dfd['real_date'] >= pd.to_datetime('2000-01-01')  # filter for start date
dfx = dfd[filt1 & filt2 & filt3]  # filter out relevant data frame
dff = dfx.groupby(['cid']).resample('M', on='real_date').sum()['value'].reset_index()  # monthly sums
dfw = dff.pivot(index='real_date', columns='cid', values='value').reset_index()  # pivot to wide dataframe

dfw['Period'] = 'before 2010'  # create custom categorical variable
dfw.loc[dfw['real_date'].dt.year > 2010, 'Period'] = 'from 2010'

colors = 'hls'  # choose palette
sns.set_theme(style='whitegrid')  #  choose appearance
fg=sns.pairplot(data=dfw, vars=cids_sel, palette=colors, hue='Period',  #  apply classification variable to hue
                height=2, aspect=1,  # height and aspect ratio of each facet in the plot
                corner=True,  # removes redundant bivariate plots in symmetric matrix
                kind='reg',  # choose type of bivariate plot
                plot_kws={'ci': False, 'scatter_kws':{'s': 20, 'alpha': 0.5}},  # set parameters for off-diagonal plots
                diag_kind='hist',  # choose type of univariate distribution plot
                diag_kws={'bins':20})  # set parameters for off-diagonal plots)
fg.fig.suptitle('Relations and distributions of monthly FX returns', fontsize=14)  # set grid title

plt.show()

## Color maps

### Heatmaps

Heatmaps visualize tabular data by mapping numeric values to colors. They are managed by the `sns.heatmap` function. Thisis a particularly powerful method for condensing a lot of information into a single visualization.

In [None]:
cids_sel = ['AUD', 'BRL', 'COP', 'CLP', 'HUF', 'MXN', 'PLN', 'TRY', 'ZAR', 'INR', 'MYR', 'PHP']  # select cross sections
xcat_sel = 'FXXR_NSA' # select categories
filt1 = dfd['cid'].isin(cids_sel)  # filter for cross sections
filt2 = dfd['xcat'] == xcat_sel  #  filter for category
filt3 = dfd['real_date'] >= pd.to_datetime('2000-01-01')  # filter for start date
dfx = dfd[filt1 & filt2 & filt3]  # filter out relevant data frame
dfx['year'] = dfx['real_date'].dt.year  #  # add year category to frame
dfw = dfx.groupby(['cid', 'year']).sum().reset_index().pivot(index='year', columns='cid', values='value')
dfh = dfw.T  # transpose to appropriate format for heatmap function

colors = 'vlag_r' # choose appropriate diverging color palette
fg, ax = plt.subplots(figsize=(18, 8))  # prepare axis and grid
ax = sns.heatmap(dfh, cmap=colors, center=0,  # requires diverging color palette with white zero
                 square=True,  # perfect squares
                 annot=True, fmt='.1f', annot_kws={'fontsize':11},  # format annotation numbers inside color boxes
                 linewidth=1)  # set width of lines between color boxes

plt.title('Annual FX forward returns in EM: A 20-year history', fontsize=16, y=1.05)  # set heatmap title
plt.xlabel('')  # control x-axis label
plt.ylabel('')  # control x-axis label
plt.yticks(rotation=0)  # set direction of y-axis marks
plt.show()

### Clustermaps

The clustermap suports heatmap visualization of matrices with additional hierarchical clustering information.

The clustering lines are called **dendrograms** and display the statistical similarity of columns and rows. Similarity here is the inverse of multi-dimensional distance. The default is Euclidean or spatial distance. The dendrogram is created based on hierarchical agglomorative clustering, i.e. sequential clustering of the nearest points in multi-dimensional space. Note that the `sns.clustermap` method returns a `Clustermap` object.

In [None]:
cids_sel = ['EUR', 'USD', 'GBP', 'CHF', 'JPY', 'SEK', 'CAD',  'ZAR', 'INR', 'MYR']  # select cross sections
xcat_sel = 'EQXR_NSA' # select categories
filt1 = dfd['cid'].isin(cids_sel)  # filter for cross sections
filt2 = dfd['xcat'] == xcat_sel  #  filter for category
filt3 = dfd['real_date'] >= pd.to_datetime('2005-01-01')  # filter for start date
dfx = dfd[filt1 & filt2 & filt3]  # filter out relevant data frame
dfx['year'] = dfx['real_date'].dt.year  #  # add year category to frame
dfw = dfx.groupby(['cid', 'year']).sum().reset_index().pivot(index='year', columns='cid', values='value')  # annual means
dfh = dfw.dropna().T  # transpose to appropriate format for heatmap function

colors = 'vlag_r' # choose appropriate diverging color palette
fg = sns.clustermap(dfh, cmap=colors, center=0,  # requires diverging color palette with white zero
                    figsize=(12, 7),  # set appropriate size
                    annot=True, fmt='.1f', annot_kws={'fontsize':11},  # format annotation numbers inside color boxes
                    linewidth=1)  # set width of lines between color boxes

fg.fig.suptitle('Similarities of countries and trading years, based on equity returns', y=1.02)  # setting title
fg.ax_heatmap.set_xlabel('')  # special way of controlling x-axis label
fg.ax_heatmap.set_ylabel('')  # special way of controlling y-axis label 
plt.show()