In [1]:
from IPython.core.display import HTML
HTML("<style>div.text_cell_render{font-family:'Times New Roman';font-size:1.4em;line-height:1.8em;}</style>")

## Isotonic Regression 
The problem of isotonic regression on an ordered set is as follows. Given real numbers $\{ y_1, y_2, \dots, y_n \}$, the problem is to find $\{ m_1, m_2, \dots, m_n \}$ to minimize $\sum_{i=1}^n(y_i-m_i)^2$ subject to the restriction $m_1 \le m_2 \le \dots \le m_n$. A unique solution to this problem exists and can be obtained from the "pool adjacent violators" algorithm. The basic idea is the following. [1] Friedam et al.

Imagine a scatterplot of $y_i$ vs $i$. Starting with $y_1$, we move to the right and stop at the first place that $y_i > y_{i+l}$. Since $y_{i+l}$ violates the monotone assumption, we pool $y_i$ and $y_{i+1}$ replacing them both by their average $y_i^* = y_{i+1}^* = (y_i + y_{i+1})/2$. We then move to the left to make sure that $y_{i-1} \le y_i^*$  ---- if not, we pool $y_{i-1}$ with $y_i^*$ and $y_{i+1}^*$, replacing all three with their average. We continue to the left until the monotone requirement is satisfied, then proceed again to the right. This process of pooling the first "violator" and back-averaging is continued until we reach the right hand edge. The solutions at each $i$, $m_i$ , are then given by the last average assigned to point at i.

It’s not obvious that the pool adjacent violators algorithm solves the isotonic regression problem -- a proof appears in Barlow et al (pg. 12).

## Lake Mendota Data
This example is given by Michael Newton. Consider historical data on the number of days each winter that Lake Mendota is frozen, from: http://www.aos.wisc.edu/~sco/lakes/Mendota-ice.html Evidently the data show a trend towards less time frozen. 

In [1]:
import numpy as np
import pandas as pd

mendota = pd.read_csv("Mendota.csv")
mendota["CLOSED"] = pd.to_datetime(mendota["CLOSED"])
mendota["OPENED"] = pd.to_datetime(mendota["OPENED"])
mendota.head()

Unnamed: 0,WINTER,CLOSED,OPENED,DAYS
0,1855,1855-12-18 00:00:00,1856-04-14 00:00:00,118
1,1856,1856-12-06 00:00:00,1857-05-06 00:00:00,151
2,1857,1857-11-25 00:00:00,1858-03-26 00:00:00,121
3,1858,1858-12-08 00:00:00,1859-03-14 00:00:00,96
4,1859,1859-12-07 00:00:00,1860-03-26 00:00:00,110


In [2]:
import plotly
import plotly.graph_objs as go
plotly.offline.init_notebook_mode(connected=True)

frozen_days = go.Scatter(x=mendota['CLOSED'], y=mendota['DAYS'], name = 'Days',)

layout = go.Layout(title='Number of Days that Lake Mendota is Frozen', 
                   yaxis=dict(title='Days'), xaxis=dict(title='Year'))

fig = go.Figure(data=[frozen_days], layout=layout)
plotly.offline.iplot(fig)

In [3]:
def isotonic(y):
    """
    Some code to implement the Pool Adjacent Violators Algorithm. 
    Modified on https://gist.github.com/fabianp/3081831
    y is assumed to be a vector contains non-decreasing.
    This code checks for violations and pools them. 
    """
    y = np.asarray(y)
    assert y.ndim == 1
    v = y.copy()
    lvlsets = np.transpose(np.tile(np.arange(len(y)), (2,1)))
    while True:
        deriv = np.diff(v)
        if np.all(deriv >= 0):
            break
        viol = np.where(deriv < 0)[0][0]
        start = lvlsets[viol, 0]
        end = lvlsets[viol+1, 1]
        
        val = np.mean(v[start:end+1])
        v[start:end+1] = val
        lvlsets[start:end+1, 0] = start
        lvlsets[start:end+1, 1] = end
    return v

Sample from the bootstrap distribution of the isotonic regression. I.e. treat residuals from the fitted model as a random sample from an error distribution F, and make bootstrap samples $\{x ,Y^*\}$ by adding a bootstrap sample of
residuals to the isotonic fit from the original data (i.e. leave the time data as fixed.) and then refit a isotonic regression on $\{x ,Y^*\}$. Repeat this B times. The bootstrap confidence bands which covers 95% of the fitted curves is ploted as below.

In [4]:
# do the isotonic regression
days = mendota.DAYS[::-1]
isot = isotonic(days)[::-1]
stairs = np.where(np.diff(isot) < 0)

# get Bootstrap resampling residuals
residuals = days - isot[::-1]
B = 1000
fitted_star = np.empty((B, len(residuals)))
for i in range(B):
    residual_star = np.random.choice(residuals, size=len(residuals), replace=True)
    y_star = isot[::-1] + residual_star
    fitted_star[i, ] = isotonic(y_star)
fitted_low = np.percentile(fitted_star, 2.5, axis=0)
fitted_high = np.percentile(fitted_star, 97.5, axis=0)

In [11]:
iso_trace = go.Scatter(x=mendota['CLOSED'], y=isot, name="Isotonic Regression")
iso_stair = go.Scatter(x=mendota['CLOSED'].loc[stairs], 
                       y=isot[stairs], 
                       mode='markers', 
                       marker = dict(size = 8, color="Red", symbol=204), 
                       showlegend=False, hoverinfo="none")
trace_low = go.Scatter(x=mendota['CLOSED'], y=fitted_low[::-1], showlegend=False, hoverinfo="none", 
                       line=dict(dash="dash", color="grey"))
trace_high = go.Scatter(x=mendota['CLOSED'], y=fitted_high[::-1], showlegend=False, hoverinfo="none", 
                        line=dict(dash="dash", color="grey"))

data = [frozen_days, iso_trace, iso_stair, trace_low, trace_high]
fig = go.Figure(data=data, layout=layout)
plotly.offline.iplot(fig)

References:    
[1] Friedman, Jerome, and Robert Tibshirani. "The monotone smoothing of scatterplots." Technometrics 26.3 (1984): 243-250.    
[2] Barlow, Richard E., et al. Statistical inference under order restrictions: The theory and application of isotonic regression. New York: Wiley, 1972.