# A Probabilistic Approach  

In [1]:
import os
import gzip
import pathlib
import numpy as np
import pandas as pd
import tensorflow as tf
import keras_tuner as kt
import tensorflow_probability as tfp

tfd = tfp.distributions
tfpl = tfp.layers

from trainer import *
# The experiment for tuning and searching can be found in the notebook:
# https://colab.research.google.com/drive/1CXthH4dujMu475C-9G9FyLul6Ritrts1?usp=sharing
# (the code has not been optimized for readability)

## 0 Motivation

In fields like finance and medicine, it is often insufficient to provide a single point estimate of certain values: the desired estimations should come with a probability or confidence level, in order to answer questions like: 
- How sure are we to make this prediction? 
  
- What is the probability that the actual value is higher than our estimate? 
  
- What is the probability for the case where it is lower?    

Therefore, it is natural to consider probabilistic machine learning techniques for tasks like these. Probabilistic methods allow us to take both **epistemic uncertainty** and **aleatoric uncertainty** into consideration, where epistemic uncertainty describes the uncertainty that are due to things one could in principle know but does not in practice (*e.g.* it is impossible to achieve zero measurement error) and aleatoric uncertainty refers to the data's inherent randomness (*e.g.* the data are generated from a random process).   

In [2]:
parent_path = str(pathlib.Path(os.getcwd()).parent)
viewData(parent_path).sample(5)

Unnamed: 0,optionid,securityid,strike,callput,date_traded,contract_price,market_price,underlyings_price,contract_volume,days_to_maturity,moneyness,rate,volatility
42234,173731132.0,702263.0,19.3,C,2020-11-27,0.31,0.3175,19.34225,227.0,21.0,1.002189,0.001333,0.157719
50766,155396804.0,702263.0,15.5,C,2015-01-29,0.3125,0.40375,15.6925,20.0,50.0,1.012419,0.002211,0.188274
17414,150256952.0,506534.0,5.4,C,2007-01-10,0.073,0.07225,4.9213,38.0,254.0,0.911352,0.053485,0.12903
78002,161255597.0,702263.0,16.0,C,2017-08-02,0.11,0.105,15.60225,556.0,44.0,0.975141,0.012943,0.126559
82738,161927336.0,702263.0,15.8,C,2017-07-20,0.3775,0.33375,15.96875,12.0,29.0,1.01068,0.012413,0.125298


In [3]:
train_ds, valid_ds, test_ds = pipeline(parent_path)

train: (85999, 6), val: (10750, 6), test: (10750, 6)


## 1 Bayesian Neural Networks

### 1.1 Introduction  

A Bayesian Neural Network (BNN) is a network is stochastic weights, that is, the weights follow some probability distribution, say $p(\theta)$ follows a normal distribution $N_{\theta}(0, \eta I)$ . Then we can define an observation model as 

$$
p(t \mid \mathbf{x}, \theta) = N_t(f_\theta(\mathbf{x}), \sigma^2) 
$$  

Then we can use Bayes' Rule to update the parameters  

$$
p(\theta \mid \mathcal{D}) \propto p(\theta) \prod_{i=1}^N p(t^{(i)} \mid x^{(i)}, \theta)
$$

The marginal distribution is often intractable to compute, so we maximize the lower bound of the log likelihood instead.

### 1.2 The Model  

We use a Bayesian Neural Network with a hidden layer of `[300,100,100]`. The code for creating the model and training&evaluating are in the scripts

### 1.3 Remarks

## 2 Mixture Density Networks

Mixture Density Networks (MDN) [Bishop 1994](https://publications.aston.ac.uk/id/eprint/373/1/NCRG_94_004.pdf) learn the mixing coefficients and parameters for normal distirbutions and output the prediction as a sample drawn from a Gaussian Mixture. It is particularly useful when the distribution of data is multi-model, which is the case here.  

More precisely, the data.

## 3 Ensemble of MDNs

This has not yet been implemented. The idea is that through the use of bagging, the different MDNs can learn the different modals in the data, thus they can be used together to give a better estimate.

## 3 Discussion

A not so surprisingly common problem with BNNs is underfitting: the use of isotropic normal distributions are not sufficient to model high dimensional, high complexity true posteriors. To solve this issue, many approaches have been proposed, for instance a new posterior (ref. ) and a heavy tailed prior. 

These topics will be explored further.