# Empirical IO PS 0
Maximilian Huber

This code is stored at: https://github.com/MaximilianJHuber/NYU/blob/master/EmpIO/PS0.ipynb

## Part 0: Logit Inclusive Value

### 1 
The derivative of $ -\log\Big(\sum_{i=1}^N \exp(x_i)\Big) $ is $\frac{\partial f}{\partial x_i} = -\frac{exp(x_i)}{\sum_{k=1}^N exp(x_k)}$, hence:
    $$\frac{\partial^2 f}{\partial x_i^2}=\frac{exp(x_i) \sum_{k\neq i}exp(x_k)}{\big(\sum_{k=1}^N exp(x_k)\big)^2} \quad \text{and} \quad \frac{\partial^2 f}{\partial x_j \partial x_i}=\frac{exp(x_i + x_j)}{\big(\sum_{k=1}^N exp(x_k)\big)^2} \forall j \neq i$$
I gather the partial derivatives into a matrix $H$ and factor the common terms, $h_i = \frac{exp(x_i)}{\sum_{j=1}^N exp(x_j)}$, then:
    $$H = h \, h' - \begin{bmatrix}
    h_1 & \cdots & 0\\ 
    \vdots & \ddots & \vdots\\ 
    0 & \cdots  & h_N
    \end{bmatrix}$$
   Let $y$ be any non-zero vector, then $y'Hy=y'hh'y - y' \begin{bmatrix}
    h_1 & \cdots & 0\\ 
    \vdots & \ddots & \vdots\\ 
    0 & \cdots  & h_N
    \end{bmatrix}$$ y=\big(y_1h_1+\cdots+y_Nh_N\big) - \big(y_1^2h_1+\cdots+y_N^2h_N\big)$
    
    
Interpreting $h_i$ as probability this is: $-\Big(\mathbb{E}\big[Y^2\big] - \mathbb{E}\big[Y\big]^2\Big)$, the negative of a positive-semidefinite variance matrix is negative-definite. Hence, the function itself is concave.

### 2

In [1]:
using Plots; pyplot();

In [2]:
function IV(x)
    - log(sum(exp.(x)))
end

IV (generic function with 1 method)

In [3]:
[IV(x) for x in linspace(690, 720, 10)]

10-element Array{Float64,1}:
 -690.0  
 -693.333
 -696.667
 -700.0  
 -703.333
 -706.667
 -Inf    
 -Inf    
 -Inf    
 -Inf    

For any $m\in\mathbb{R}$, also $m = max(x)$, the following holds:
$$ -\log\Big(\sum_{i=1}^N \exp(x_i)\Big) = -\log\Big(\sum_{i=1}^N \exp(x_i - m + m)\Big) = -\log\Big(\sum_{i=1}^N \exp(x_i - m) * \exp(m)\Big) = -\log\Big(\sum_{i=1}^N \exp(x_i - m) \Big) - m$$ 


In [4]:
function IV2(x)
    m = maximum(x)
    - log(sum(exp.(x .- m))) - m
end

IV2 (generic function with 1 method)

In [5]:
IV([1000, 2])

-Inf

In [6]:
IV2([1000, 2])

-1000.0

At least it is not infinity, but the loss of significance (https://en.wikipedia.org/wiki/Loss_of_significance) acts up.

## Part 1: Markov Chains

The stationary distribution, $\pi^*$ of an ergodic MC is uniquely given by:
$$\pi^* = \pi^* P \implies \pi^* * (I - P) = 0$$
I.e. the left eigenvector of $P'$ corresponding to the eigenvalue one.

In [7]:
P = [0.2 0.4 0.4; 0.1 0.3 0.6; 0.5 0.1 0.4]

3×3 Array{Float64,2}:
 0.2  0.4  0.4
 0.1  0.3  0.6
 0.5  0.1  0.4

In [8]:
function stationary(P)
    F = eig(P')    
    pos = F[1] .≈ Complex(1) #array of true or false
    
    if sum(pos) > 1 throw("The eigenvalue one has a algebraic multiplicity of higher than one!")
        elseif sum(pos) < 1 throw("The eigenvalue one has a algebraic multiplicity of zero!")
    end
            
    πstar = convert(Array{Float64}, F[2][:, pos])
    πstar = (πstar / sum(πstar))
end

stationary (generic function with 1 method)

In [9]:
πstar = stationary(P)

3×1 Array{Float64,2}:
 0.310345
 0.241379
 0.448276

In [10]:
πstar' * P - πstar'

1×3 Array{Float64,2}:
 1.66533e-16  2.77556e-17  -1.66533e-16

This is the stationary distribution.
## Part 2: Numerical Integration
I use the package "FastGaussQuadrature" for the Gauss-Hermite and "HCubature" for adaptive quadrature.

In [11]:
using FastGaussQuadrature
using Distributions
using HCubature
using BasisMatrices

### 1

In [12]:
function binomiallogit(β)
    exp(β * 0.5) / (1 + exp(β * 0.5))
end

binomiallogit (generic function with 1 method)

In [13]:
function binomiallogit_known(β)
    exp.(β * 0.5) ./ (1 + exp.(β * 0.5)) .* pdf(Normal(0.5, 2), β)
end

binomiallogit_known (generic function with 1 method)

### 2
Matlab uses an adapative quadrature, I mimic this by using the "HCubature" package. The function "binomiallogit_known" is numerically zero outside $[-1000,\,1000]$.

In [14]:
oneDtrue = hcubature(binomiallogit_known, [-1000.], [1000])[1]

1-element Array{Float64,1}:
 0.551493

I also use a very precise Gauss-Hermite quadrature with the appropriate transformation. I follow https://en.wikipedia.org/wiki/Gauss%E2%80%93Hermite_quadrature#Example_with_change_of_variable for the non-standard normally distributed $\beta$.

In [15]:
1/sqrt(π) * (gausshermite(100000)[2]' * binomiallogit.(sqrt(2) * sqrt(2) * gausshermite(100000)[1] + 0.5))

### 3

In [16]:
oneDmonte = [mean(binomiallogit.(rand(Normal(0.5, 2), n))) for n in [20, 400, 1000]]

3-element Array{Float64,1}:
 0.518433
 0.550616
 0.538423

### 4

In [17]:
oneDgauss = [1/sqrt(π) * (gausshermite(n)[2]' * binomiallogit.(sqrt(2) * sqrt(2) * gausshermite(n)[1] + 0.5)) for n in [4, 5, 7, 8, 12]]

5-element Array{Float64,1}:
 0.555916
 0.555944
 0.555939
 0.555939
 0.555939

### 5
The results are similar, but the Gauss-Hermite quadrature works already with very few points.
### 6

In [18]:
function binomiallogit_multi(β)
    exp(β' * [0.5, 1]) / (1 + exp(β' * [0.5, 1]))
end

#the hcubature calls this function and hands over a row from a StaticArray (for increased performance),
#while vector product understands this, the Distributions packages does not.
function binomiallogit_multi_known(β)
    exp(β' * [0.5, 1]) / (1 + exp(β' * [0.5, 1])) * pdf(MvNormal([0.5, 1], [2. 0; 0 1]), convert(Vector{Float64}, β))
end

binomiallogit_multi_known (generic function with 1 method)

The adaptive cubature gives:

In [19]:
twoDtrue = hcubature(binomiallogit_multi_known, [-100., -100], [100., 100])[1]

The Monte-Carlo-simulation yields:

In [20]:
twoDmonte = [mean([binomiallogit_multi(rand(MvNormal([0.5, 1], [2. 0; 0 1]))) for i in 1:n]) for n in [20, 400, 1000]]

3-element Array{Float64,1}:
 0.739186
 0.740382
 0.719469

The Gauss-Hermite-cubature can be split into two successive integrations:
$$\int \frac{1}{\sqrt{\det{2\pi\Sigma}}} e^{-\frac{1}{2}(y-\mu)^T\Sigma^{-1}(y-\mu)} f(y) dy$$

A change of variables with $x = \frac{1}{\sqrt{2}}L^{-1}(y-\mu)$, where $\Sigma = LL^T$ the Cholesky decomposition, yields:

$$\int \frac{1}{\sqrt{\det{2\pi\Sigma}}} e^{-x' x} f\left(\sqrt{2}Lx + \mu\right)  \det \left(\sqrt{2} L\right) dx = 
\int \pi^{-\frac{N}{2}} e^{-x' x} f\left(\sqrt{2}Lx + \mu\right) dx = \frac{1}{\pi} \int e^{-x_2^2} \int e^{-x_1^2} f\left(\sqrt{2}Lx + \mu\right) dx_1 dx_2=$$
$$\frac{1}{\pi} \sum_{j=1}^n w_{2,j} e^{x_{2,j}^2} \sum_{i=1}^n w_{1,i} e^{x_{1,i}^2} f\left(\sqrt{2}L\begin{bmatrix}
x_{1,i}\\ 
x_{2,j}
\end{bmatrix} + \mu\right)$$

In [21]:
twoDgauss = [begin 
    (nodes, weights) = gausshermite(n)
    L = chol([2. 0; 0 1])
    X = gridmake(nodes, nodes)
    F = [binomiallogit_multi(sqrt(2) * L * X[i,:] + [0.5, 1]) for i in 1:n^2]
    Weights = prod(gridmake(weights, weights),2)
    1 / 2π * sum(Weights' * (prod(exp.(X), 2)[:,1] .* F))
end for n in [4, 5, 7, 8, 12]]

5-element Array{Float64,1}:
 0.722989
 0.723047
 0.723046
 0.723046
 0.723046

Looks good!

### 7

In [33]:
display("text/html", "<h2 style='padding: 10px'>Errors</h2><table class='table table-striped'> <thead> <tr> <th></th> <th>1D</th> <th>2D</th> </tr> </thead> <tbody>  <tr> <th scope='row'>Monte Carlo 20</th> <td>$((oneDmonte[1]-oneDtrue)[1])</td> <td>$((twoDmonte[1]-twoDtrue)[1])</td></tr>     <tr> <th scope='row'>Monte Carlo 400</th> <td> $((oneDmonte[2]-oneDtrue)[1]) </td> <td>$((twoDmonte[2]-twoDtrue)[1])</td>  </tr>     <tr> <th scope='row'>Monte Carlo 1000</th> <td>$((oneDmonte[3]-oneDtrue)[1])</td> <td>$((twoDmonte[3]-twoDtrue)[1]) </td>  </tr>     <tr> <th scope='row'>Gauss Hermite 4</th> <td>$((oneDgauss[1]-oneDtrue)[1])</td> <td>$((twoDgauss[1]-twoDtrue)[1]) </td>  </tr>     <tr> <th scope='row'>Gauss Hermite 5</th> <td>$((oneDgauss[2]-oneDtrue)[1])</td> <td>$((twoDgauss[2]-twoDtrue)[1]) </td>  </tr>     <tr> <th scope='row'>Gauss Hermite 7</th> <td>$((oneDgauss[3]-oneDtrue)[1])</td> <td>$((twoDgauss[3]-twoDtrue)[1]) </td>  </tr>     <tr> <th scope='row'>Gauss Hermite 8</th> <td>$((oneDgauss[4]-oneDtrue)[1])</td> <td>$((twoDgauss[4]-twoDtrue)[1]) </td>  </tr>     <tr> <th scope='row'>Gauss Hermite 12</th> <td>$((oneDgauss[5]-oneDtrue)[1])</td> <td>$((twoDgauss[5]-twoDtrue)[1]) </td>  </tr>     </tbody> </table>")

Unnamed: 0,1D,2D
Monte Carlo 20,-0.033059893841141,0.013299766452589
Monte Carlo 400,-0.000876947712199,0.0144959668549015
Monte Carlo 1000,-0.0130706158300052,-0.0064174495027462
Gauss Hermite 4,0.0044224028415192,-0.0028967801455583
Gauss Hermite 5,0.0044505984251825,-0.0028386034965228
Gauss Hermite 7,0.0044461348888799,-0.0028395280606751
Gauss Hermite 8,0.0044458283703399,-0.0028397412406763
Gauss Hermite 12,0.0044458897916574,-0.0028397015587515


### 8 

In [23]:
function p(β, x)
    exp.(β * x) / (1 + exp.(β * x))
end

function binomiallogitmixture(X)
    nodes, weights = gausshermite(12)
    1/sqrt(π) * (weights' * reshape(mapslices(x -> p(x[1], x[2]), 
                gridmake(sqrt(2) * sqrt(2) * nodes + 0.5, X), 2), length(nodes), length(X)))'
end

binomiallogitmixture (generic function with 1 method)

In [24]:
binomiallogitmixture([1, 2.])

2-element Array{Float64,1}:
 0.589948
 0.617203

## Part 3: Function Approximation
I use my favorite package for very fast generation of basis matrices, BasisMatrices.

In [25]:
N = 20
M = 40
basis = Basis(ChebParams(N, -4, 4))
f = binomiallogitmixture

binomiallogitmixture (generic function with 1 method)

It calcualtes the correct Chebyshev nodes:

### 1

In [26]:
maximum(nodes(basis)[1] - ((1 + (- cos.((2 * (1:N) - 1) / (2 * N) * π))) * (4 - -4) / 2 - 4))

### 2, 3, 4, 5
I write a function that takes all the parameters as input and returns the approximate function.

In [27]:
function chebfit(f, a, b, N=20, M=40)
    
    basis = Basis(ChebParams(N, a, b))
    grid = nodes(Basis(ChebParams(M, a, b)))[1]
    
    X = BasisMatrix(basis, Expanded(), grid).vals[1] # M * N matrix of Chebyshev polynomials evaluated at grid points
    y = f(grid)
    
    coeff = (X' * X) \ (X' * y) # regression using LU-factorization, very fast, not the most precise!
    
    return function(Xeval) 
        BasisMatrix(basis, Expanded(), Xeval).vals[1] * coeff
    end
end

chebfit (generic function with 3 methods)

In [28]:
fhat = chebfit(f, -4, 4)

(::#19) (generic function with 1 method)

In [29]:
Xtest = -4:0.01:4
plot(Xtest, f(Xtest), label = "f")
plot!(Xtest, fhat(Xtest), label = "fhat")

In [30]:
plot(Xtest, f(Xtest) - fhat(Xtest), label = "error")

The approximation is relatively good!

### 8 
The polynomials diverge outside of the domain:

In [31]:
Xtest = -8:0.01:8
plot(Xtest, f(Xtest), label = "f")
plot!(Xtest, fhat(Xtest), label = "fhat")

The function approximation gets better and better with an increased number of polynomials:

In [32]:
Xtest = -4:0.01:4
plot(20:40, [begin 
        fhat = chebfit(f, -4, 4, n); 
        maximum(f(Xtest) - fhat(Xtest))
            end for n in 20:40], label="maximal error within domain")