# Py-CIMDO-PoD

Code for estimating time-series of probabilities of distress (or default) using python [pandas](https://pandas.pydata.org/).    
The method is described in the Appendix I of [(Cortes, Lindner, Malik and Segoviano 2018)](https://www.imf.org/en/Publications/WP/Issues/2018/01/24/A-Comprehensive-Multi-Sector-Tool-for-Analysis-of-Systemic-Risk-and-Interconnectedness-SyRIN-45580)

### Code to display markdown files

This project sub-folder (`CIMDO/markdown`) has some common resources that can be automatically imported into notebooks.

In [1]:
from IPython.display import display, Markdown
def display_markdown(filename):
    f = open(filename, 'r')
    return(display(Markdown(f.read())))

## Overview

In [2]:
display_markdown('markdown/Overview.md')

The Consistent Information Multivariate Density Optimizing Methodology (CIMDO) (Segoviano 2006, Segoviano and Espinoza 2017) is a methodology to infer multivariate densities that describe the  a system of asset values, typically of financial institutions, and applies it to quantify systemic risk. The CIMDO density, is inferred from partial information but is consistent with the observed probabilities of distress of financial institutions. From the density various useful metrics of systemic risk can be calculated easily.   
In a more recent incarnation the CIMDO engine is used to build the Systemic Risk and Interconnectness (SyRIN) tool (Cortes, Lindner, Malik & Segoviano 2018), which measures systemic risk by looking at interconnectedness between institutions and sectors.

## Packages

In [4]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from scipy.stats import norm

## Estimate PoDs

### PoDs: Products and Investment Funds

### Generate random data

Some random returns data, for test purposes.

#### Generate `Series`

In [16]:
np.random.seed(seed=0)
s = pd.Series(np.random.randn(50))

#### Generate `DataFrame` time-series with dates

In [17]:
def createTimeSeries():
    date_today = datetime.now()
    dates = pd.date_range(date_today, date_today + timedelta(50), freq='D')
    np.random.seed(seed=0)
    data = np.random.randn(len(dates))
    df = pd.DataFrame({'Date': dates, 'Value': data})
    df = df.set_index('Date')
    return(df)

In [18]:
df=createTimeSeries()

### Analyze

Given a time-series (either as a `series` or `DataFrame`) of values (returns, not P&Ls) estimate the time series of probabilities of distress (PoDs).

#### Standardize using `pipe`

The `pipe` method to apply transformations to series elements.

In [12]:
standardize = lambda x: (x-x.mean()) / x.std()

In [None]:
# s.pipe(standardize).head(5)
# df.pipe(standardize).head(5)

### Using `rolling` window

The `rolling` method can be chained with `apply` to apply arbitrary aggregation functions to the window (sub-series) of values.

For `series`

In [19]:
# s.rolling(5).sum().head(10)
s.rolling(5).apply(sum).head(10)

0         NaN
1         NaN
2         NaN
3         NaN
4    7.251399
5    4.510069
6    5.060000
7    3.929905
8    1.585792
9    0.128833
dtype: float64

Or for `DataFrame`s

In [110]:
# df.rolling(5).std().rename(columns = {'Value':'Std'}).head(10)

#### Probability of distress by counting `PoD_count`

Simply counting the outcomes that are less than the threshold does not require any assumption of the distribution. However, for short horizons and small quantiles, the accuracy will be poor.

In [20]:
def PoD_count(s, distress_threshold):
    """Find probability of distress by counting outcomes below distress_threshold"""
    n = np.count_nonzero(s)
    m = np.count_nonzero(s < distress_threshold)
    return(m/n)

In [125]:
distress_threshold = -1.0
s.rolling(10).apply(lambda s: PoD_count(s,distress_threshold)).head(20)

0     NaN
1     NaN
2     NaN
3     NaN
4     NaN
5     NaN
6     NaN
7     NaN
8     NaN
9     0.2
10    0.2
11    0.1
12    0.1
13    0.2
14    0.3
15    0.4
16    0.4
17    0.4
18    0.4
19    0.3
dtype: float64

#### Probability of distress by assuming normal distribution `PoD_norm`

Although this approach requires us to assume a Gaussian distribution, the sampling error will be reasonable, even for small quantiles & short horizons. This is the approach advocated by (Cortes et al 2018)

In [21]:
def PoD_norm(s, distress_threshold):
    """Find probability of distress - i.e. falling below distress_threshold - by assuming normal distribution"""
    loc = s.mean()
    scale = s.std()
    pod = norm.pdf(distress_threshold, loc, scale)
    #m = np.count_nonzero(s < distress_threshold)
    return(pod)

In [117]:
distress_threshold = -1.0
s.rolling(10).apply(lambda s:PoD_norm(s,distress_threshold)).head(20)

0          NaN
1          NaN
2          NaN
3          NaN
4          NaN
5          NaN
6          NaN
7          NaN
8          NaN
9     0.227082
10    0.197554
11    0.181487
12    0.196752
13    0.221794
14    0.241253
15    0.245524
16    0.243667
17    0.311096
18    0.302463
19    0.301570
dtype: float64

### `PoD`

In [24]:
def PoD(s,alpha,horizon):
    """Find the probability of distress time-series for quantile alpha & rolling horizon"""
    distress_threshold = s.quantile(alpha)
    pod = s.rolling(10).apply(lambda s:PoD_norm(s,distress_threshold))
    return(pod)

In [29]:
# Works for series or DataFrame
PoD(df,0.05,20).head(15)

Unnamed: 0_level_0,Value
Date,Unnamed: 1_level_1
2018-08-23 20:04:31.912428,
2018-08-24 20:04:31.912428,
2018-08-25 20:04:31.912428,
2018-08-26 20:04:31.912428,
2018-08-27 20:04:31.912428,
2018-08-28 20:04:31.912428,
2018-08-29 20:04:31.912428,
2018-08-30 20:04:31.912428,
2018-08-31 20:04:31.912428,
2018-09-01 20:04:31.912428,0.019063


## CIMDO implementation

See [Py-CIMDO](https://notebooks.azure.com/ian-buckley/libraries/systemic-risk/html/CIMDO/Py-CIMDO.ipynb)

## Further reading

In [3]:
display_markdown('markdown/FurtherReading.md')

* Cortes, Fabio, Peter H. Lindert, Sheheryar Malik, and Miguel Segoviano Basurto. 2018. “A Comprehensive Multi-Sector Tool for Analysis of Systemic Risk and Interconnectedness (SyRIN).”    
* MathWorks. 2014. CIMDO Optimization in Matlab. https://www.mathworks.com/matlabcentral/answers/115886-optimization-problem-reducing-the-time-needed-to-solve.    
* Segoviano Basurto, Miguel A. 2006. “Portfolio Credit Risk and Macroeconomic Shocks; Applications to Stress Testing Under Data-Restricted Environments.” IMF Working Paper 06/283. International Monetary Fund. https://ideas.repec.org/p/imf/imfwpa/06-283.html.    
* Segoviano, Miguel. 2008. “The CIMDO Copula. Modeling of a Non-Parametric Copula.” International Monetary Fund, Forthcoming Working Paper.    
* Segoviano, Miguel, and Charles Goodhart. 2009. “Banking Stability Measures.” FMG Discussion Paper. Financial Markets Group. http://econpapers.repec.org/paper/fmgfmgdps/dp627.htm.     
* ———. 2016. “An Encompassing Framework to Estimate Systemic Risk Amplification Losses Based on Publicly Available Information.” https://www.imf.org/~/media/Files/News/Seminars/mcm-lse-segoviano_session4.ashx.    
* Segoviano, Miguel A., and Raphael A. Espinoza. 2011. “Financial Stability Measures.” http://www.lse.ac.uk/fmg/events/conferences/past-conferences/2011/systemicRisk_24-25Jan2011/MSegoviano_Presentation.pdf.