
# IMF_WEO_dash
Outlier interactive plot for IMF WEO hackathon

Example using Plotly's Dash library for making a simple but IMHO very useful dashboard. Purpose is to plot actual vs predicted values. General use to assess any type of forecast (predictive) model. Particularly useful for IMF-WEO (see below) where the model is a complex human-guided algorithm. Outliers are defined as x sigma from robust linear regression line. (Future work - clean up code, annotations, replace * with np, etc., and outlier specifications.)

#Outlier (goodness of fit) dashboard

Display predicted vs actual index. Purpose: interactive visualization to easily identify bad predictions that can be used by user to improve model.

## Motivation 

IMF WEO Hackathon Generated for the International Monetary Fund 2018 data visualization 2 hour hackathon. The goal was to make an insightful product for the World Economic Outlook report (WEO) (more below). Notably forecasts made by IMF go through a complex process between country teams and aggregated adjustments. We focused on a tool that assessed the country-level and aggregate predictions. The product was the design of Michelle C Mandolia, Benjamin P Cohn, and Shashaank Vattikuti.

Our guideline:

Choose one (or few) useful metrics - something that is easy to visualize and gives insight
use an open-source type programmatic platform with large buy-in and support - we chose Plotly's Dash, no reason to limit clients by unnecessary license fees
Chloropleth - the visualization we avoided

No chloropleths - We were setup for this and pretty much all teams chose these. It is an obvious choice for data with geographic features. However, there was no strong reason to expect geographic continuity at this resolution (unlike a pandemic model, see our covid project).
A different network structure may be useful such as group countries by known shared economic ties (or some cluster analysis). This was not clear at the outset.

Why goodness-of-fit?

extremely simple but powerful tool - plot actual vs predicted index
the visualization gives a quick view of how the model does overall and where (what countries) did it fail on
it is agnostic to the model details; can be used for any kind of quantitative models, modeling
interactive tool - hover (lasso-tool, click, etc.) on single or sets of points to get more info on them, adjust data filters to see patterns (like time from report to actual to see how prediction quality changes with time-out), scan data fitlers to see robustness (example, does outlier country stay as outlier across time-outs, maybe this gives more reason for further investigation)
Main features:

Aggregate metrics - intercept and slope of linear regression model
Outlier country estimates - used a robust regression (reduces outlier effects), then tag outliers based on chi-squared analysis of country residual from robust regression line

## About world economic outlook (WEO) 
### What is the International Monetary Fund?

(from Wiki: https://en.wikipedia.org/wiki/International_Monetary_Fund)

"Formed in 1944 ,started in 27 November 1945,7 at the Bretton Woods Conference primarily by the ideas of Harry Dexter White and John Maynard Keynes,8 it came into formal existence in 1945 with 29 member countries and the goal of reconstructing the international monetary system. It now plays a central role in the management of balance of payments difficulties and international financial crises.9 Countries contribute funds to a pool through a quota system from which countries experiencing balance of payments problems can borrow money. As of 2016, the fund had XDR 477 billion (about US$667 billion).10. ... consisting of 190 countries working to foster global monetary cooperation, secure financial stability, facilitate international trade, promote high employment and sustainable economic growth, and reduce poverty around the world while periodically depending on the World Bank for its resources.1"

### Forecasts IMF-WEO about forecasts( analyses of global economic developments during the near and medium term.)

Data is sourced from the the World Economic Outlook (WEO) database. (Data is in ./data/ from the hackathon, using publicly available data from IMF. Data is dated from 2018 competition year. Updated data needs to be reformatted for use, pending clarification from IMF.) 

This "is created during the biannual WEO (IMF) exercise, which begins in January and June of each year and results in the April and September/October WEO publication. A Survey by the IMF staff usually published twice a year. It presents IMF staff economists' analyses of global economic developments during the near and medium term."
(I forget if near and medium term were defined. One objective of our tool was to make this modifiable by 6 month periods (i.e. 6, 12, 18, ... months out).

"The IMF’s World Economic Outlook uses a “bottom-up” approach in producing its forecasts; that is, country teams within the IMF generate projections for individual countries. These are then aggregated, and through a series of iterations where the aggregates feed back into individual countries’ forecasts, forecasts converge to the projections reported in the WEO.

Because forecasts are made by the individual country teams, the methodology can vary from country to country and series to series depending on many factors."


#Compile Code

##Preliminary

In [1]:
import dash,time
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output, State

from jupyter_dash import JupyterDash

import pickle
from textwrap import wrap
import plotly.graph_objs as go
from numpy import *
from numpy.linalg import *
from sklearn.linear_model import TheilSenRegressor as TSR
from scipy.stats import *
import pandas as pd


##Load data

In [2]:
path = './data/'
df=pd.read_table(f'{path}WEOhack.tab', sep='\t', encoding="cp1252")
df = df.loc[~df.country.isin(['Advanced Economies','Emerging Market and Developing Economies','World'])]

##Main App

In [3]:
#make dash selection lists
#country list
#map IFS code to country name

df_country = pd.DataFrame(columns=['label','value'])
for cid in set(df['ifscode']): #iterate through country IMF ID
  x = {'label':df.loc[df['ifscode']==cid]['country'].values[0],'value':cid}
  df_country = df_country.append(x,ignore_index=True)

df_country.sort_values('label',inplace=True) #use this for alphabetically sorted country list

c_dash=[]
for index, row in df_country.iterrows():
    c=list(df.loc[df['ifscode']==cid]['country'])[0]
    c_dash.append({'label':row['label'],'value':row['value']})

#actual year list
yr_=arange(2007,2018).astype(str)
yr_dash=[]
for yr in yr_:
    yr_dash.append({'label':yr,'value':float(yr)})

#report  (vintage) in half-years

yrout_dash=[]
for i in range(6):
    yrout_dash.append({'label':(i+.5),'value':i})
    yrout_dash.append({'label':(i+1),'value':i+0.5})

## use this if running outside Jupyter
# app = dash.Dash() 
## use this if running in Jupyter
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
app = JupyterDash(__name__, external_stylesheets=external_stylesheets)

colors = {
    'background': '#91A3B0',
    'text': '#004953'
}

#define app layout
#populate children
app.layout = html.Div(style={'backgroundColor': colors['background']},children=[


#1. dcc = dash core components 

    dcc.Markdown('''
    # WEO model outlier analysis - GDP rate %
    ## To get started: select target (forecast) year and vintage (report) year
    ## Can edit country list by deselecting countries (can reselect to put them back)
    # '''.replace('  ', ''), className='container',
    style={'maxWidth': '1200px'}),

#2. Error plot - plots prediction from some base year (x) vs actual (y)
    html.Div([html.H1(children='Robust regression',style={'textAlign': 'center','color': colors['text'],'fontSize': 22}),
        dcc.Graph(id='errplot')],
        style={'width':'90%','margin-left':'auto','margin-right':'auto','display':'block'}
        ),

    html.Div([html.H1(children='Target year:',style={'textAlign': 'left','color': colors['text'],'fontSize': 22}),
        dcc.Dropdown(
            id="yrslide",
            options=yr_dash,
            multi=False
            )],
            style={'width':'49%','display':'inline-block'}
        ),


    html.Div([html.H1(children='Vintage (years from target):',style={'textAlign': 'left','color': colors['text'],'fontSize': 22}),
        dcc.Dropdown(
            id="yearout",
            options=yrout_dash,
            multi=False
            )],
            style={'width':'49%','display':'inline-block'}
        ),

    html.Div([html.H1(children='Country search (some may not match plot criteria):',style={'textAlign': 'left','color': colors['text'],'fontSize': 22}),
        dcc.Dropdown(
            id="countryselect",
            options= c_dash,
            multi=True,
            value=df_country['value'].values#list(set(df['ifscode']))
            )],
            style={'width':'100%'}#,'display':'inline-block'}
        ),




])



def mapdec2semester(x):
    sem_ = ['spring','fall']
    ind = int(x%1==0)
    return sem_[ind]

def getxy(yr_,yro_,c_):
    x=[]
    y=[]
    c2=[]
    # print(yro)
    # for yro in yro_:
    #     for yr in yr_:
    yr = yr_
    yro = yro_
    yro2=mapdec2semester(yro)+str(yr-int(str(yro)[0]))
    if yro2 in list(df.columns):
        for cid in c_:
            x2=df.loc[(df['ifscode']==cid) & (df['year']==yr)]['actual'].values[0]
            y2=df.loc[(df['ifscode']==cid) & (df['year']==yr)][yro2].values[0]
            # print(y2)
            if isfinite(y2) and isfinite(x2):
                # print(x2)
                x.append(x2)
                y.append(y2)
                c2.append(list(df.loc[(df['ifscode']==cid)]['country'])[0]+'<br>Target year: '+str(yr)+'<br>Vintage: '+yro2)
    x = array(x)
    y = array(y)
    c2 = array(c2)
    ind = argsort(c2)
    x = x[ind]
    y = y[ind]
    c2 = c2[ind]
    return x,y,c2
#
@app.callback(
Output(component_id='errplot', component_property='figure'),
[
Input('yrslide', 'value'),
Input('yearout','value'),
Input('countryselect','value')
]
)
def update_graph(yr_,yro_,c):
    x,y,c=getxy(yr_,yro_,c)
    x=reshape(x,[len(x),1])
    y=reshape(y,[len(y),1])
    if len(x)>0:
        xmin=min(x)
        xmax=max(x)
        model=TSR(random_state=42)
        model.fit(x,y[:,0])
        x2=reshape(linspace(xmin,xmax,100),[100,1])
        y2=model.predict(x2)
        dat=[]
        trace = go.Scatter(x=x2[:,0],y=x2[:,0],name='objective',line = dict(
            width = 2,
            dash='dot',
            color = 'gray'
        ))
        dat.append(trace)
        text_ = f'robust: y-intercept: {round(model.intercept_,2)}, slope: {round(model.coef_[0],2)}'
        trace = go.Scatter(x=x2[:,0],y=y2[:],name='robust regression',text=text_,hoverinfo='text',line = dict(
            width = 2,
            color = colors['text']
        ))
        dat.append(trace)
        yhat=model.predict(x)
        err=(yhat-y[:,0])**2 #assume error with variance 1
        sig=(std(y))
        n=len(c)
        p_=1-chi2.cdf(n/((3*sig)**2)*err,n-2)
        indout=where(p_<5e-2/len(c))[0]
        indin=where(p_>=5e-2/len(c))[0]
        for i in indin:
            trace = go.Scatter(x=x[i:i+1,0],y=y[i:i+1,0],name='',text=c[i],hoverinfo='text',marker=dict(color= colors['text'],symbol='square',size=8))
            dat.append(trace)
        for i in indout:
            trace = go.Scatter(x=x[i:i+1,0],y=y[i:i+1,0],name='',text=c[i],hoverinfo='text',marker=dict(color= 'red',symbol='square',size=8))
            dat.append(trace)
    else:
        trace=go.Scatter(x=[0,0],y=[0,0],name='',text='No data',marker=dict(color= colors['background'],symbol='square',size=24))
        dat=[trace]

    return{'data':
    dat,
    'layout': go.Layout(
    showlegend=False,
    title='',
        xaxis=dict(
        range=[-20,20],
        # range=[xmin,xmax],
        autorange=True,
        showgrid=True,
        zeroline=True,
        showline=False,
        # autotick=True,
        showticklabels=True,
        title='actual rate',
        titlefont=dict(
            family='Old Standard TT, serif',
            size=22,
            color=colors['text']
        ),
         tickfont=dict(
            family='Old Standard TT, serif',
            size=18,
            color='black'
        )
        ),
        yaxis=dict(
        range=[-20,20],
        # range=[ymin,ymax],
        autorange=True,
        showgrid=True,
        zeroline=True,
        showline=False,
        # autotick=True,
        showticklabels=True,
        title='predicted rate',
        titlefont=dict(
            family='Old Standard TT, serif',
            size=22,
            color=colors['text']
        ),
         tickfont=dict(
            family='Old Standard TT, serif',
            size=18,
            color='black'
        )
        ),
        hovermode='closest'
        # plot_bgcolor='rgba(0,0,0,0)',
         # paper_bgcolor='black'
    )
}




#Run app

In [4]:
app.run_server(mode='inline')