### Platt Burges

&nbsp;
    
This script solves Platt-Burges model via regularized least squares with L2 norm penalty (Ridge Regression). Albeit the default solver in our repository is Expectation Maximization algorithm. Certainly there are merits to EM algorithm. 

1. You have a more detailed derivation from Stanford CS229 Autumn 2016 Problem Set 4 Problem 2. 
2. EM algorithm can give us the standard deviation of both intrinsic value and bias level whereas RLS only yields the mean of intrinsic value and bias level.
3. L2 penalty is implemented on bias level and its coefficient $\lambda$ is more arbitrary than EM algorithm's tolerance level. Since we are running a complete matrix here, we cannot play around cross validation to compute the optimal $\lambda$. 

Despite all the malaises we list out, why are we doing RLS? It is fast and straight forward. Assuming $P$ papers are submitted to the conference and $R$ reviewers in the committee mark the score of these papers, each paper will be given $R$ different scores by all the reviewers. Therefore, the score of a paper given by a reviewer, denoted as $x$, can be decomposed into the linear combination of three components – the underlying intrinsic value $y$, the reviewer bias $z$ and some random disturbance $\epsilon$. $x$, $y$ and $z$ independently follow different Gaussian distributions.

$$ y^{(pr)} \sim \mathcal{N} (\mu_p,\sigma_p^2)$$

$$ z^{(pr)} \sim \mathcal{N} (\nu_r,\tau_r^2)$$

$$ x^{(pr)}|y^{(pr)},z^{(pr)} \sim \mathcal{N} (y^{(pr)}+z^{(pr)},\sigma^2)$$

RLS solves Platt-Burges model by minimizing the loss function $\mathcal{L}$.

$$ \mathcal{L}=\frac {1}{2} \sum_{p=1}^P\sum_{r=1}^R (x^{(pr)}-\mu_p-\nu_r)^2+\frac {1}{2} \sum_{r=1}^R \lambda \nu_r^2$$

For EM algorithm, plz check the below

https://github.com/je-suis-tm/machine-learning/blob/master/Wisdom%20of%20Crowds%20project/platt%20burges.ipynb

Reference to Hong Ge's paper

http://mlg.eng.cam.ac.uk/hong/unpublished/nips-review-model.pdf

Neil Lawrence's personal blog

https://inverseprobability.com/2014/08/02/reviewer-calibration-for-nips

Neil Lawrence's jupyter notebook

https://github.com/lawrennd/conference

Others' jupyter notebook

https://github.com/leonidk/reviewers

In [1]:
import matplotlib.pyplot as plt
import os
os.chdir('K:/ecole/github/televerser/wisdom of crowds')
import numpy as np
import pandas as pd
import scipy.optimize

In [2]:
#raise error when zero is encountered in logarithm
np.seterr(divide='raise')

{'divide': 'warn', 'over': 'warn', 'under': 'ignore', 'invalid': 'warn'}

### Functions

In [3]:
#compute rls loss function
def loss_function(x0,data,lambda_):
    
    #unpack
    intrinsic_value=x0[:data.shape[1]]
    bias_level=x0[data.shape[1]:]

    #convert intrinsic value and bias lvl into matrix
    miu_p=np.repeat(np.array(intrinsic_value).reshape(1,-1),data.shape[0],axis=0)
    nu_r=np.repeat(np.array(bias_level).reshape(-1,1),data.shape[1],axis=1)

    #compute loss function
    rls_loss=np.square(
        data-miu_p-nu_r).sum()/2+lambda_*np.square(bias_level).sum()/2
    
    return rls_loss

In [4]:
#using rls to solve platt burges
def regularized_least_square(X,lambda_=0.5,**kwargs):

    #pack
    miu_init=X.mean(axis=0).ravel().tolist()[0]
    nu_init=X.mean(axis=1).ravel().tolist()[0]
    x0=miu_init+nu_init

    #rls
    result=scipy.optimize.minimize(loss_function,x0,
                                   args=(X,lambda_),
                                   **kwargs
                                  )

    if result['success']:

        #unpack
        intrinsic_value=result['x'][:X.shape[1]]
        bias_level=result['x'][X.shape[1]:]

        return intrinsic_value,bias_level

### ETL

In [5]:
#read data
y0matrix2019=pd.read_csv('y0matrix2019.csv')

y1matrix2020=pd.read_csv('y1matrix2020.csv')

monthly=pd.read_csv('monthly.csv')

annual=pd.read_csv('annual.csv')

In [6]:
#set index
y0matrix2019.set_index('Source Name',inplace=True)

y1matrix2020.set_index('Source Name',inplace=True)

monthly.set_index('Date',inplace=True)
monthly.index=pd.to_datetime(monthly.index)
monthly.columns=y0matrix2019.columns

annual=annual.pivot(index='Date',
                    columns='Name',values='Value')
annual.index=pd.to_datetime(annual.index)
annual.columns=y0matrix2019.columns

In [7]:
#normalize forecast by pct return
y0_mat_nor=np.mat(
    np.divide(y0matrix2019,
              monthly['2019-08-31':'2019-08-31'])-1)
y1_mat_nor=np.mat(
    np.divide(y1matrix2020,
              monthly['2019-08-31':'2019-08-31'])-1)

### Run Model

In [8]:
#current year outlook
intrinsic_value,bias_level=regularized_least_square(y0_mat_nor,lambda_=0.5,)

print(bias_level)

#comparison with result from em
print([0.02019064,0.02024495,0.00609068,-0.01649482,-0.00426001,
0.00847965,-0.00821187,0.01734934])

[ 0.01475545  0.01399294  0.00063756 -0.02127891 -0.00941928  0.0029321
 -0.01331331  0.01169339]
[0.02019064, 0.02024495, 0.00609068, -0.01649482, -0.00426001, 0.00847965, -0.00821187, 0.01734934]


In [9]:
#one year ahead outlook
intrinsic_value,bias_level=regularized_least_square(y1_mat_nor,lambda_=0.5,)

print(bias_level)

#comparison with result from em
print([0.01352132,0.02981012,0.01582776,-0.00265233,-0.01151194,
-0.01075301,-0.01477036,0.12330401])

[-0.00299608  0.01043045 -0.00037993 -0.02092039 -0.02842805 -0.02943199
 -0.03099745  0.10272337]
[0.01352132, 0.02981012, 0.01582776, -0.00265233, -0.01151194, -0.01075301, -0.01477036, 0.12330401]
