## Abstract Problem

Objective function : 

>$min_{\Phi} \quad L(\mathbf{y}, \mathbf{f}_{\Phi})$

Where,

> $\mathbf{y} \in \mathbb{R}^{N}\text{ is a vector containing N measurements}$

> $\mathbf{f}_{\Phi} \text{ is a model function which provides estimates}$

> Loss function  $L \in \mathbb{R}$

$\Phi \in \mathbb{R}^{K} \text{ are the parameters of the model which minimizes some loss function } L \text { ( and K is the total number of parameters)}$ 

Iterative solution for $\Phi$


> $\Phi_{t+1} = \Phi_t + update(\frac{\partial L}{\partial \Phi}) \qquad \Phi_t, \frac{\partial L}{\partial \Phi} \in \mathbb{R}^{K}$

Different iterative optimization techniques can be realized based on the choice of **update** function.

Eg, for Stochastic Gradient descent,

> $update(\frac{\partial L}{\partial \Phi}) = \gamma \frac{\partial L}{\partial \Phi} \qquad \gamma \in \mathbb{R} \text{ is the learning rate}$

Based on the choice of **L** and **f**, we can realize solutions to a variety of problems. Some examples are listed below, 




## Some choices of Loss function L

### 1. Joint Quantile Regression

For a set of quantiles, $Q \in \{ q_1, q_2, ..., q_M \}$

$L(\mathbf{y}, \mathbf{f}) = \sum_q \sum_{i=y_i < \mathbf{f}_q} (1-q)|y_i - \mathbf{f}_q | + \sum_{i=y_i \ge \mathbf{f}_q} (q)|y_i - \mathbf{f}_q | \qquad q \in \{1,2,..,M\} \qquad i \in \{1,2,..,N\}$


$  \mathbf{f}(x, \Phi ) \in \mathbb{R}^{M} $

Where, 

> M is the number of quantiles to be realized

> **x** is the feature given as input to the model


### 2.  Independent Quantile Regression

For a set of quantiles, $Q \in \{ q_1, q_2, ..., q_M \}$

$L_q(\mathbf{y}, \mathbf{f}) = \sum_{i=y_i < \mathbf{f}_q} (1-q)|y_i - \mathbf{f}_q | + \sum_{i=y_i \ge \mathbf{f}_q} (q)|y_i - \mathbf{f}_q | \qquad i \in \{1,2,..,N\} $


$  \mathbf{f}_q(x, \Phi ) \in \mathbb{R} $

Solve **M** optimization problems for $\{L_q, \mathbf{f}_q\} \qquad q \in \{1,2,..,M\}$


### 3. Least Square Regression

**f** approximates the **mean**

$L(\mathbf{y}, \mathbf{f}) = \sum_{i} (y_i - \mathbf{f})^{2} \qquad i \in \{1,2,..,N\}$


$  \mathbf{f}(x, \Phi ) \in \mathbb{R} $



### 4. Weighted Least Squares

(Solution for sampling bias, when we want to have more weights for data points at ridge points)

$L(\mathbf{y}, \mathbf{f}) = \sum_{i} w_i(y_i - \mathbf{f})^{2} \qquad i \in \{1,2,..,N\}$


$  \mathbf{f}(x, \Phi ) \in \mathbb{R} $



## Some choices of model function f

### 1. Multi Layer Perceptron

To set up a multilayer perceptron, we should fix the following hyper parameters : 

> 1. **J** : The number of layers in a multi layer perceptron.

> 2. $L_j \text{ : the number of hidden units of a multi layer perceptron in every layer j}$

Then our model function will be, 

> $\mathbf{f}(x, \Phi)= \mathbf{W}_J . a(\mathbf{z_{J}}) \qquad  \Phi = \{W_1, ..., W_j\} \qquad \mathbf{z}_1 = x$ 

Where,

> $\mathbf{z_j} = \mathbf{W}_{j-1}.a(\mathbf{z}_{j-1}) \qquad j \in \{2,...,J\}$

> $W_j \in \mathbb{R}^{L_j \times L_{(j-1)}}$

> Activation function $ a : \mathbb{R}^{L_j} \to \mathbb{R}^{L_j}$

> $\mathbf{f} \in \mathbb{R}^{L_{\mathbf{J}}} \qquad x \in \mathbb{R}^{L_1}$

